MrBayes

From BioPerl
Jump to: navigation, search

MrBayes is software for the Bayesian estimation of phylogeny. The program can obtained from the MrBayes web site.

Running MrBayes

The program takes as input NEXUS multiple alignment format which is appended with a block of commands intended for MrBayes. The following is an example of a slice of data from Jason Stajich's research which includes alignment of proteins from 47 taxa. The outgroup is set to atha_gbk, the prior model for proteins is set to mixed so multiple rate matricies are considered, and the MCMC search is run with 1,000,000 generations but will stop if the chains converge before the total generations are run (using 2 chains with two runs each so will need 4 CPUs).

#NEXUS
begin data;
dimensions ntax=47 nchar=50;
format interleave datatype=protein gap=- ;

matrix
sklu_AUG            RLKYALNGRE VKAIMMQRHV KVDGKVRTDT TYPAGFMDVI TLEATNENFR 
sscl_GLEAN          RLKYALNSRE TKAILMQRLI KVDGKVRTDA TYPAGFMDVI GIEKTSENFR 
rory_SNAP           RLKYALNGRE VQSILMQRLV KVDGKVRTDS TFPAGFMDVI SVEKTGENFR 
ddis_gbk            RLKYALTKKE VTLILMQRLV KVDGKVRTDP NYPAGFMDVI SIEKTKENFR 
calb_AUG            RLKYALNGRE VKAIMMQQHV QVDGKVRTDT TYPAGFMDVI TLEATNEHFR 
sbay_AUG            RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
scer_yjm78          RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
scas_AUG            RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPTGFMDVI TLDATNENFR 
cimm_AUG            RLKYALNGRE TNAILMQRLV KVDGKVRTDA TYPAGFMDVI SIEKTGENFR 
cneo_R265           RLKYALTGRE VTAIVKQRLI KVDGKVRTDE TFPAGFMDVI SIERSGEHFR 
fver_GLEAN          RLKYALNYRE TKAILMQRLV KVDGKVRTDS TYPSGFMDVI TIEKTGENFR 
ylip_GENO           RLKYALNGRE VNAILMQRLV KVDGKVRTDS TFPAGFMDVI QLEKTGENFR 
skud_AUG            RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
scer_rm11           RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
pchr_GLEAN          RLKYALTGKE VLSIVMQRLI KVDNKVRTDP TYPAGFMDVI TIEKSGEHFR 
umay_BRD            RLKYALTGRE VNAITAQRLI KIDGKVRTDP TYPTGFQDVV SIEKSGEHFR 
ater_GLEAN          RLKYALNGRE TKAIMMQRLI KVDGKVRTDP TYPAGFMDVI GIEKTGENFR 
cneo_WM276          RLKYALTGRE VTAIVKQRLI KVDGKVRTDE TFPAGFMDVI SIERSGEHFR 
cneo_H99            RLKYALTGRE VTAIVKQRLI KVDGKVRTDE TFPAGFMDVI SIERSGEHFR 
ctro_AUG            RLKYALNGRE VKAIMMQQHV QVDGKVRTDS TYPAGFMDVI TLEATNEHFR 
spom_SANG           RLKYALNGRE VKAILMQRLI KVDGKVRTDS TFPTGFMDVI SVEKTGEHFR 
cdub_AUG            RLKYALNGRE VKAIMMQQHV QVDGKVRTDT TYPAGFMDVI TLEATNEHFR 
hsap_ens            RLKYALTGDE VKKICMQRFI KIDGKVRTDI TYPAGFMDVI SIDKTGENFR 
snod_BRD            RLKYALNARE VNAILMQRLV KVDGKVRTDS TFPSGLMDVI SIEKTGENFR 
klac_GENO           RLKYALNGRE VKAILMQRHV KVDGKVRTDT TFPAGFMDVI TLEATNENFR 
scer_s288c          RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
cgla_GENO           RLKYALNGRE VKAIMMQRHV KVDGKVRTDA TYPAGFMDVI TLEATNENFR 
uree_GLEAN          RLKYALNGRE TNAILMQRLV KVDGKVRTDS TFPTGFMDVI SIEKTGENFR 
crei_jgi            RLKYALTGKE VQSILMQRLV KVDGKVRTDH TYPTGFMDVI SMEKTDENFR 
kwal_AUG            RLKYALNGRE VRAIMMQRHV KVDGKVRTDI TYPAGFMDVI TLEATNENFR 
ncra_BRD            RLKYALNYRE TKAIMMQRLI KVDGKVRTDI TYPAGFMDVI TIEKTGENFR 
smik_AUG            RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
spar_AUG            RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLDATNENFR 
hcap_186R           RLKYALNARE TNAILMQRLV KVDGKVRTDS TYPTGFMDVI TIDKTGENFR 
anid_BRD            RLKYALNGRE TKAIMMQRLI QVDGKVRTDP TYPAGFMDVI TIEKTGENFR 
fgra_BRD            RLKYALNYRE VKAILMQRLV KVDGKVRTDS TFPSGFMDVI TIEKTGENFR 
dhan_GENO           RLKYALNGRE VKAILMQEHV KVDGKVRTDA TFPAGFMDVI TLEATNEHFR 
cgui_AUG            RLKYALNGRE VKAILMQEHV KVDGKVRTDS TFPAGFMDVI TLEATNEHFR 
pans_AUG            RLKYALNFRE TRAILMQRLV KVDGKVRTDM TYPAGFMDVI SIEKTGENFR 
tree_jgi            RLKYALNYRE TKAIMMQRLV KVDAKVRTDI TYPAGFMDVI TIEKTGENFR 
cneo_JEC21          RLKYALTGRE VTAIVKQRLI KVDGKVRTDE TFPAGFMDVI SIERSGEHFR 
cglo_BRD            RLKYALNYRE TKAIMMQRLV KVDGKVRTDV TYPAGFMDVI TIEKTGENFR 
mgri_BRD            RLKYALNGRE TKAILMQRLV KVDGKVRTDS TYPAGFMDVV SIEKTGENFR 
agos_GBK            RLKYALNGRE VKAILMQRHV KVDGKVRTDT TYPAGFMDVI TLEATNENFR 
clus_AUG            RLKYALNGRE VKAILMQEHV KVDGKVRTDS TYPAGFMDVI TLEATNENFR 
atha_gbk            RLKYALTYRE VISILMQRHI QVDGKVRTDK TYPAGFMDVV SIPKTNENFR 
afum_GLEAN          RLKYALNGRE TKAIMMQRLI KVDGKVRTDP TYPAGFMDVI SIEKTGENFR 
;
end;
begin mrbayes;
 outgroup atha_gbk;
 prset aamodelpr=mixed;
 mcmc ngen=1000000 stoprule=yes nchain=2 nrun=2;
end;

Supporting this in a Run wrapper

Currently it is not directly supported by BioPerl although suitable NEXUS format files can be written out as imput, but the program block (just like PAUP) will still need to be added by the user. The Run package does not currently support this application either. In reality this is a pretty easy program to pipeline on a cluster, all it needs is the input NEXUS file with the alignment and the program block, so a run wrapper would just generate this file and start the program with

mb -i FILENAME.nex

It gets more complicated when you want to submit MPI jobs on a cluster so it sometimes makes more sense to write a script which generates the jobfiles and for a set of input alignments. So it may be hard to write a perfectly generic solution to this to handle cluster (LSF, PBS, SGE) jobs as as well as single CPU jobs.

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox