Random sequence generation

From BioPerl
Jump to: navigation, search

{{#comment| test whether user set the "named" parameter }} {{#function|present||{{#not|{{#strpos|{{#1}}|{{#2}}}}}}}} {{#var|See|@=|see}} {{#var|sp|@=| }} {{#if|{{#present|{{#var|See}}|2}}||{{#var|See|@=|see{{#var|sp}}}}||{{#var|See|@=|^.}}}} ({{#if|{{#present|{{#var|See}}|2}}||{{#var|See}}||see{{#var|sp}} }}thread)

  • ...or "Sequence Surprise" --Ed.

Roger Hall asks:

Is there a random generator for creating nucleotides (of length l with composition frequencies a, c, g, and t) in there somewhere?

Bruno Vecchi actually supplies some code (with a little cheat from List::Utils) :

use List::Util qw(shuffle);
use Bio::SeqIO;
my ($seqfile, $number) = @ARGV;
my $in = Bio::SeqIO->new(-file => $seqfile);
my $fh = Bio::SeqIO->newFh(-format => 'fasta');
my $seq = $in->next_seq;
my @chars = split '', $seq->seq;
for my $i (1 .. $number) {
    @chars = shuffle @chars;
    my $new_seq = Bio::Seq->new(-id => $i, -seq => join '', @chars);
    print $fh $new_seq;

You can use it like this from the command line (assuming you want 20 output sequences):

$ shuffle.pl input_sequence.fasta 20 > random_sequences.fasta

And a couple of -shudder- non-BioPerl (or at least, impure-BioPerl) solutions:

from Aidan Budd:

  • Use something external like seq-gen or similar - tools designed for outputing "random" sequences simulated over a tree - one could simply sample a single simulated sequence at random from the output alignment.

from "Big Dave" Messina:

  • The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on this post from Neil Saunders' blog. You can also do this outside of BioPerl using shuffle from Sean Eddy's SQUID package, available here.
Personal tools
Main Links