Merge gapped sequences across a common region

From BioPerl
Jump to: navigation, search

{{#comment| test whether user set the "named" parameter }} {{#function|present||{{#not|{{#strpos|{{#1}}|{{#2}}}}}}}} {{#var|See|@=|see}} {{#var|sp|@=| }} {{#if|{{#present|{{#var|See}}|2}}||{{#var|See|@=|see{{#var|sp}}}}||{{#var|See|@=|^.}}}} ({{#if|{{#present|{{#var|See}}|2}}||{{#var|See}}||see{{#var|sp}} }}thread)

Albert Vilella sez: I basically want to start with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.234     QWERTYU-------------------
seq2.345     ----------ASDFGH----------
seq2.456     -------------------ZXCVBNM

and end with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM




Here's one of my favorite tricks for this: XOR mask on gap symbol. Fast! --ed.

use Bio::SeqIO;
use Bio::Seq;
use strict; 
 
my $seqio = Bio::SeqIO->new( -fh => \*DATA );
 
my $acc = $seqio->next_seq->seq ^ '-';
while ($_ = $seqio->next_seq ) {
    $acc ^= ($_->seq ^ '-');
}
my $mrg = Bio::Seq->new( -id => 'merged',
    -seq => $acc ^ '-' );
1;
 
__END__
>seq2.234     
QWERTYU-------------------
>seq2.345     
----------ASDFGH----------
>seq2.456     
-------------------ZXCVBNM
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox