Species names from accession numbers

From BioPerl
Jump to: navigation, search

{{#comment| test whether user set the "named" parameter }} {{#function|present||{{#not|{{#strpos|{{#1}}|{{#2}}}}}}}} {{#var|See|@=|see}} {{#var|sp|@=| }} {{#if|{{#present|{{#var|See}}|2}}||{{#var|See|@=|see{{#var|sp}}}}||{{#var|See|@=|^.}}}} ({{#if|{{#present|{{#var|See}}|2}}||{{#var|See}}||see{{#var|sp}} }}thread)

Bhakti Dwivedi wonders:

Does anyone know how to retrieve the "Source" or the "Species name" given the accession number?


The following scrap (with portions suspiciously reminiscent of HOWTO:EUtilities) demonstrates how you might do this:

use Bio::DB::EUtilities;
 
my (%taxa, @taxa);
my (%names, %idmap);
 
# these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide',
# (probably)
 
my @ids = qw(1621261 89318838 68536103 20807972 730439);
 
my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
                                       -db => 'taxonomy',
                                       -dbfrom => 'protein',
                                       -correspondence => 1,
                                       -id => \@ids);
 
# iterate through the LinkSet objects
while (my $ds = $factory->next_LinkSet) {
    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
}
 
@taxa = @taxa{@ids};
 
$factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
        -db    => 'taxonomy',
        -id    => \@taxa );
 
while (local $_ = $factory->next_DocSum) {
    $names{($_->get_contents_by_name('TaxId'))[0]} = 
($_->get_contents_by_name('ScientificName'))[0];
}
 
foreach (@ids) {
    $idmap{$_} = $names{$taxa{$_}};
}
 
# %idmap is
#    1621261 => 'Mycobacterium tuberculosis H37Rv'
#    20807972 => 'Thermoanaerobacter tengcongensis MB4'
#    68536103 => 'Corynebacterium jeikeium K411'
#    730439 => 'Bacillus caldolyticus'
#    89318838 => undef    (this record has been removed from the db)
 
1;

--maj

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox