Regular expressions and Repeats

From BioPerl
Jump to: navigation, search

By Brian --Ed.

How do I find an iteration of any sequence of a specific length?

So /(QA)+/ will match one or more iterations of QA but what if you want to match any repeat of length 2?

Try
/(..)\1+/

Then $1 will tell you what the repeat was, length($&)/2 will tell you the number of repeats.

How do I find some sequence flanked by homopolymers of a given length?

For example, to find FAFCRCFCFAFAFCRF flanked by n number of Q, e.g.:

  AGTWRWDFDQQQQQQQQFAFCRCFCFAFAFCRFQQQQQQQQQQQQQ
The regular expression would be something like
/(Q{$n,})([^Q]{$x,})(Q{$n,})/

Example:

perl -e '$n=5; $x=9; $_= "AGTWRWDFDQQQQQQQQFAFCRCFCFAFAFCRFQQQQQQQQQQQQQ"; print "$1|$2|$3\n" if /(Q{$n,})([^Q]{$x,})(Q{$n,})/;'
QQQQQQQQ|FAFCRCFCFAFAFCRF|QQQQQQQQQQQQQ|


How do I find any homopolymer flanked on both sides by the same amino acid?

For example, HTTTTTTTTTTH or TGGGGGGGGGGGT.

/(.)[^\1]+\1/

In action:

perl -e '$_ = "HTTH"; print "|$1|\n" if /((.)[^\2]+\2)/;'

Note that the "homopolymer" could have a length of 1!

[back to top]


Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox