HOWTO:EUtilities

From BioPerl
Jump to: navigation, search

NOTE: THIS IS A STUB FOR A WORK IN PROGRESS. This will not be added to the official HOWTO until it is mostly complete.

Contents

Entrez Utilities (EUtilities)

These are a group of methods with an unusual API that allows to access most of the information from the Entrez databases. There's different types of Eutilities (detailed below), each of them with its own function.

For some quick recipes with EUtilities, read the cookbook.

einfo

esearch

efetch

This Eutility allows to fetch sequences from different databases on different formats. See on the NCBI site.

Note: a sequence file doesn't necessarily has the actual DNA/RNA/protein sequence. It can have the Annotations and SeqFeatures only.

my $fetcher = Bio::DB::EUtilities->new(
                                       -eutil      => 'efetch',
                                       -db         => 'nucleotide',   # database to search (gene/nucleotude/protein/etc)
                                       -rettype    => 'gb',           # file format
                                       -retmode    => 'text',         # output type
                                       -id         => $acc,           # the accesion such as NM_002105.2
                                       -seq_start  => $start,         # self-explanatory
                                       -seq_stop   => $stop,          # self-explanatory
                                       -strand     => $strand,        # 2 for complement or minus strand. 1 otherwise
                                       );

efetch parameters

db

This option controls the database to search for.

rettype and retmode

These options controls the output type and format. Possible values are dependent on the database from where data is being retrieved.

rettype is the output type such as fasta, gb, native, etc...

rettmode is the output format such as text, xml, html or asn1. To feed the output of this Eutility to Bio::SeqIO without pre-parsing, use 'text'.

seq_start and seq_stop

When retrieving a sequence with efetch, these control the range of values. If not specified the whole sequence is download (which can be HUGE in the case of contigs).

strand

When retrieving a sequence with efetch, it's necessary to specify the strand. These can be 1 or 2. A strand value of 1 is the same as the plus strand while a value of 2 is the same as the minus or complement strand.

esummary

epost

elink

BioPerl and eutils

Taking advantage of History (aka Cookies)

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox