Module:Bio::Coordinate::Pair

From BioPerl
Jump to: navigation, search
Bio::Coordinate::Pair
PDoc Bio::Coordinate::Pair
CPAN Bio::Coordinate::Pair
metaCPAN Bio::Coordinate::Pair


See also:

Example usage

Here is an example of how to use the code to 'lift over' features aligned to a contig into a chromosomal coordinate space (i.e. using an AGP file).

#!/usr/bin/perl -w
 
use strict;
use Bio::Location::Simple;
use Bio::Coordinate::Pair;
 
# Define the contig
my $ctg = Bio::Location::Simple->
  new( -seq_id => 'ctg', -start =>    1, -end =>  999, -strand => +1 );
 
# Define the contig's position in the chromosome
my $ctg_on_chr = Bio::Location::Simple->
  new( -seq_id => 'chr', -start => 2001, -end => 2999, -strand => -1 );
 
# Define the mapping between the two coordinate systems (as an object)
my $agp = Bio::Coordinate::Pair->
  new( -in  => $ctg,
       -out => $ctg_on_chr
     );
 
# Create an example 'match' that has contig coordinates
my $match_on_ctg = Bio::Location::Simple->
  new( -seq_id => 'hit', -start =>   25, -end =>  125, -strand => +1 );
 
# Convert the match into chromosomal coordinates
my $match_on_chr = $agp->map( $match_on_ctg );
 
# (and print it)
print
  join("\t",
       $match_on_chr->seq_id,
       $match_on_chr->start,
       $match_on_chr->end,
       $match_on_chr->strand
      ), "\n";

which gives:

chr     2875    2975    -1

'Edge' cases

See the test script (and its results) on the talk page. Here is some commentary on that script:

What we see is that, that part of the feature to 'map' that extends beyond the range of the '-in' sequence is (arbitrarily?) mapped to a 'gap' (a Bio::Coordinate::Result::Gap) on *that* sequence in the result (a Bio::Coordinate::Result). This gap has the seq_id and strand of *that* sequence, and that's that.

i.e. The Bio::Coordinate::Pair may define the position of (part of) a contig in a scaffold, and we may wish to 'map' some features that are in 'contig coordinates' onto the scaffold to get the same features in 'scaffold coordinates'. Any feature falling fully within the contig (or fully within that part of the contig that we specify when creating the Bio::Coordinate::Pair) is 'mapped' as a single 'Match' (Bio::Coordinate::Result::Match) within the result.

If, however, feature (in 'contig coordinates') lies outside or across that part of the contig specified when creating the Bio::Coordinate::Pair, then that (part of) the feature is 'mapped' as a single 'Gap' (Bio::Coordinate::Result::Gap) in the result. The gap is given in terms of the contig (i.e. it has the contig seq_id and the contig strand).

I'm really not sure of the semantics of this... It makes sense that, if we put half of one contig into one scaffold, for example, that we don't want features from the other half of the contig to be mapped onto the scaffold, but I don't know why we would want those features to turn up as gaps, but there you have it.

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox