Site entropy in an alignment

From BioPerl
Jump to: navigation, search

(see bioperl-l discussion here, and see also Site frequencies in an alignment)

Cláudio Sampaio asks

How to get the entropy of each nucleotide of an alignment?

from Mark Jensen:

If you have a Bio::SimpleAlign object prepared -- maybe from

$alnio = new Bio::AlignIO(-format=>'fasta', -file=>'your.fas');
$aln = $alnio->next_aln;

try the following function, as

 $entropies = entropy_by_column( $aln )
 =head2 entropy_by_column
 Title   : entropy_by_column
 Usage   : entropy_by_column( $aln )
 Function: returns the Shannon entropy for each column in an alignment
 Example :
 Returns : hashref of the form { $column_number => $entropy, ... }
 Args    : a Bio::SimpleAlign object
 sub entropy_by_column {
    my ($aln) = @_;
    my (%ent);
    foreach my $col (1..$aln->length) {
      my %res;
      foreach my $seq ($aln->each_seq) {
          my $loc = $seq->location_from_column($col);
          next if (!defined($loc) || $loc->location_type eq 'IN-BETWEEN');
      $ent{$col} = entropy(values %res);
   return [%ent];
 =head2 entropy
 Title   : entropy
 Usage   : entropy( @numbers )
 Function: returns the Shannon entropy of an array of numbers,
           each number represents the count of a unique category
           in a collection of items
 Example : entropy ( 1, 1, 1 )  # returns 1.09861228866811 = log(1/3)
 Returns : Shannon entropy or undef if entropy undefined;
 Args    : an array
 use List::Util qw(sum);
 sub entropy {
    @a = grep {$_} @_;
    return undef unless grep {$_>0} @a;
    return undef if grep {$_<0} @a;
    my $t = sum(@a);
    $_ /= $t foreach @a;
    return sum(map { -$_*log($_) } grep {$_} @a);
[back to top]

Personal tools
Main Links