Annotation is the process of adding "extra information" to a data set.

For example, consider a freshly sequenced DNA sequence of a bacterial plasmid. It is mostly unannotated; except perhaps for quality information for each base. Identifying the coordinates of the probable open reading frames (ORFs) is a form of annotation. This would usually be done by ORF finding software, which is automatic annotation. Ideally this would be followed by some manual annotation (or curation) by a suitably qualified bioinformatician to fix any mis-predicted ORFs.

The sheer volume of genomic data has meant that automatic annotation is becoming far more important. Usually new sequence data is passed through a chain of software tools to predict and annotate features on (and of) the sequence. This chain is called an annotation pipleline.

However manual annotation is still very important, and many well-respected data sets such as Swissprot and PDB are manually curated by teams of microbiologists and biochemists.

In BioPerl the term has a more precise meaning and refers to some attribute of an entire sequence as opposed to a Feature, which is an attribute of a sub-sequence that has a defined location. See the Feature-Annotation HOWTO and Bio::AnnotationI for more.

