Phoenix EMBL Parser
Introduction
Phoenix is a C++ parser for EMBL-Bank flat files.
EMBL-Bank constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.
The database is produced in an international collaboration with GenBank (USA) and the DNA Database of Japan (DDBJ). Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis.
The format of EMBL-Bank flat files is described in the EMBL User Manual, while the format of the EMBL Feature Table element is described in the Feature Table Definition document.
The Parser
Phoenix recognizes all of the public EMBL-Bank line types including the line types used in TPA and CON (constructed) entries.
Phoenix also performs complete and reliable parsing of EMBL Feature Table location strings, and is able to correctly disambiguate between the various recognized publication types.
Using Phoenix with non C++ programs
Phoenix is a C++ parser, so you cannot directly use Phoenix in Perl or Java programs.
You can however use Phoenix to convert EMBL flat files into some suitable XML format and then read the resulting XML files using whatever XML parser is available in your language.
Two of Phoenix sample applications show how Phoenix can be used to convert EMBL-Bank flat files into some kind of XML format.