HOWTO Discussion:Tiling

From BioPerl
Jump to: navigation, search

Fixed a bug in the example

http://www.bioperl.org/w/index.php?title=HOWTO%3ATiling&diff=11968&oldid=11961

The previous version was giving the following error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Denominator selection must be one of ('total', 'aligned'), not 'exact'
STACK: Error::throw
STACK: Bio::Root::Root::throw ~/perl5/lib/perl5/Bio/Root/Root.pm:368
STACK: Bio::Search::Tiling::MapTiling::frac ~/perl5/lib/perl5/Bio/Search/Tiling/MapTiling.pm:378
STACK: Bio::Search::Tiling::MapTiling::frac_identical ~/perl5/lib/perl5/Bio/Search/Tiling/MapTiling.pm:442
STACK: ./procBlast.plx:30
-----------------------------------------------------------


Fixed using the docs here: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Search/Tiling/MapTiling.html#POD8

Excellent-- thanks Dan! --MAJ 18:00, 3 July 2009 (UTC)

When is a hit not a hit?

Sorry for the basic question, but when are two HSPs so 'off diagonal' that they are put into separate hits? i.e. if sequence A and B share two domains, but A has an interleaving domain, the HSPs between A and B will have a big gap between them... would this none the less be a single hit between A and B?

In pictures:

A: --/--- Domain x ---/--- Domain y ---/--- Domain z ---/--
B: --/--- Domain x ---/...   GAP    .../--- Domain z ---/--


I guess this is the heart of tiling, but I don't see it clearly described (although I'll go and have a proper look). In particular I'd like to know how this is handled in Blast output, and how that decision interacts with this Perl module. --DanBolser 15:44, 20 July 2009 (UTC)

  • Hey Dan- Hmm. I talk myself through it here, and maybe the answer will appear-

"Hits" as such are part of the organization of the Blast report, and MapTiling counts on SearchIO to provide them; MapTiling stays well clear of making such decisions. But I would also say that SearchIO doesn't make these decisions either-- the hits are parsed out of the Blast report as sets of HSPs delimited by lines looking like

>gb|AY052359.1| Arabidopsis thaliana At2g17400 mRNA, complete cds
...
>gb|AC002329.2|AC002329 Arabidopsis thaliana chromosome II section 100 of 255 of the complete
...

These are the queries, and the subjects (or 'hits', which unfortunately are different from the Hit objects) are those seqs that are aligned to in the HSPs. In the Blast report, all the Hits (query sequences with significant alignment within the subject sequences/subject database) are listed up front under the lines

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

So, I would say that the answer to this question is pushed back to the algorithm that generated the report in the first place. Do you have a proposal? We could post-process the Hits as returned by SearchIO, looking for this particular situation, either flagging them or doing something else more interesting, if there's a commonly-required (or even not-so-commonly-required ) necessity for doing so.

Here's something maybe more useful:

If you are really trying to collate all the query sequences that cover a subject sequence -- i.e., in your example, you want to get A and B out by tiling on x, y, and z, I think you should be able to do that by doing your tiling from the subject perspective, rather than the query (i.e., judiciously using -type => 'subject' in places), and checking the coverage map associated, say, with x, for overlaps with query sequences--you'd get A and B for x, only A for y, and A and B again for z. I'd have to think this through, but maybe this helps a bit. WARNING: You'd probably have to loop over the Hit objects, since each tiling (from either perspective), deals only with the HSPs contained in a single parsed Hit as described above.

--maj 16:18, 20 July 2009 (UTC)

  • Thanks maj ... I'll have to read that a few times before I understand it! I guess I'm getting confused by the difference that blast -m7 and -m8 would produce, given the same input... (-m7 has hit groups, but -m8 only has HSPs... is that right?). Actually what I'm trying todo is find large regions of synteny between two genomes built up using HSPs... so probably I'm using a hammer when I need a screwdriver! As you suggested (I think) I'll look for HSPs between the same two sequences that are not included in the same HIT by blast and see how they look (if I can find any!).

I guess this all got prompted when I decided to plot some summary statistics from my blast report:

--DanBolser 16:59, 20 July 2009 (UTC)

Foo.01.png Foo.02.png

#!/usr/bin/perl -w
 
use strict;
 
use Bio::SearchIO;
use Bio::Search::Tiling::MapTiling;
 
 
 
my $resFile = "blast_results.xml";
 
my $SearchIO = Bio::SearchIO->
  new( -file   => $resFile,
       -format => "blastxml"
     );
 
 
 
while( my $result = $SearchIO->next_result ){
  # These are 'Module:Bio::Search::Result::ResultI compliant objects'
 
  ##warn $result->query_name, "\n";
 
  while( my $hit = $result->next_hit ) {
    # These are 'Bio::Search::Hit::HitI compliant objects'
 
    ##warn $hit->name, "\n";
    ##warn $hit->num_hsps, "\n";
 
    ## Try tiling HSPs in hit...
    my $tiling =
      Bio::Search::Tiling::MapTiling->new($hit);
 
    printf("%10s\t%6d\t%5d\t%3.0f\n",
	   $result->query_name,
	   $result->query_length,
	   #$hit->name,
	   #$hit->length,
	   $hit->num_hsps,
	   $tiling->frac_aligned(-type=> 'query') * 100
	  );
  }
  ##exit;
}

--DanBolser 15:13, 22 July 2009 (UTC)

  • Awesome! Look at that cliff at ~500bp, that looks like a paper. (And MapTiling works--yipee!) --Majensen 15:20, 22 July 2009 (UTC)
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox