Parsing BLAST HSPs

From BioPerl
Jump to: navigation, search

(see bioperl-l thread here)

The key to parsing any external program report is knowing where (i.e., in what BioPerl object) the stuff you want lives. This info can be tricky to find, though walking through the code to find it is an excellent exercise for learning how BioPerl is structured.

Below is some code to get BLAST high-scoring pairs from a BLAST report. Here is the basic BP-object structure:

$report (is-a Bio::SearchIO::blast, which contains
 $result (is-a Bio::Search::Result::BlastResult, which contains
   $hit (is-a Bio::Search::Hit::BlastHit, which contains
     $hsp (is-a Bio::Search::HSP::GenericHSP)


As a professor (David Wollkind) of mine once said (while teaching the chain rule), you have to peel away the players until you find the one with the ball.

Note there are many accessors on the Bio::Search::HSP::GenericHSP object though which the interesting stuff can be obtained.

The scrap uses blastn to align two sequences. Get the standalone BLAST programs from NCBI (they don't come with BioPerl!).

--MAJ


 #!/usr/bin/perl -w
 
 use strict;
 use Bio::SeqIO;
 use Bio::Tools::Run::StandAloneBlast;
 my $query1_in  = Bio::SeqIO->newFh ( -file   => "mus-betaglobin-bh0.fas",
  				    -format => 'fasta' );
 my $query1 = <$query1_in>; 
 my $query2_in  = Bio::SeqIO->newFh ( -file   => "mus-betaglobin-bh3.fas", 
  				      -format => 'fasta' );
 my $query2 = <$query2_in>; 
 
 $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn');
 $report = $factory->bl2seq($query1, $query2);
 
 while (my $result = $report->next_result) {
    print "Query: ".$result->query_name."\n";
    while (my $hit = $result->next_hit) {
 	while ($hsp = $hit->next_hsp) {
 	    print $hsp->algorithm, ": identity ", 100*$hsp->frac_identical, "\%, rank ", $hsp->rank, " (E:", $hsp->evalue, ")\n";
 	    printf("%7s: %s\n", "subj", $hsp->query_string);
 	    printf("%7s: %s\n", "", $hsp->homology_string);
  	    printf("%7s: %s\n", "hom", $hsp->hit_string);
 	    print "\n";
 	}
    print "\n";
    }
 }
[back to top]


Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox