Accessing GB flat files by GI

From BioPerl
Jump to: navigation, search

(see bioperl-l thread here.)

Jarod Pardon (云 何) asks:

I have some sequence databases such as RefSeq in flat GenPept/GenBank format. There is a list of GI numbers and I want to extract the sequence from the database according to the GI number.


Mark Jensen replies:

Bio::DB::Flat is nicely generalized to allow different 'namespaces' for the different identifiers used on different sequences. You can choose the type of identifier you want (gi, in your case) by using get_Seq_by_acc() as follows (this actually works on my machine):

 $db = Bio::DB::Flat->new(-directory  => "$ENV{HOME}/scratch",
    -dbname     => 'mydb',
    -format     => 'genbank',
    -index      => 'bdb',
    -write_flag => 1);
 $db->build_index("$ENV{HOME}/scratch/plastid1.rna.gbff");
 $seq = $db->get_Seq_by_acc('GI' => 71025988);

If you want to get by accession number, use get_Seq_by_acc('ACC' => $accno), etc.

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox