ListSummary:April 1-11,2006

From BioPerl
Jump to: navigation, search


April 1 to April 11

It’s started out as a quiet beginning to the bioperl list summaries. I actually wanted to use the image of tumbleweeds rolling across the desert plains in Texas, my home state, and no, before you ask, I’m not a fan of our current president. But my hopes were dashed; it ended with a flurry of activity. In case people ask, ’YT’ is "yours truly"; I feel a bit odd inserting my name in these summaries, even occasionally. Low self-esteem? I don’t know; I’m a biologist, not a shrink. I’ll let you be the judge. Anyway, I’ll probably make these little reports biweekly (that’s every other week, as opposed to ’semiweekly,’ but I digress). That’s to avoid upsetting the PI and allow myself some time to do a few other things, like get out of the lab, talk to the wife, grab a beer, etc. Oh yeah, and write some Perl. If anyone wants to trade off every other week, let me know. First up…


BOSC 2006

Okay, okay. Technically this was relayed on the list in late March, but it IS big enough to be included regardless. Darin London has posted the official announcement for BOSC 2006.

BOSC 2006 will be held by the Open Bioinformatics Foundation on August 4-5 in Fortaleza, Brasil as a Special Interest Group (SIG) meeting at the 14th International Conference on Intelligent Systems for Molecular Biology. Consult The Official BOSC 2006 Website for more information:

The BOSC weblog:

And the EventDB calendar (for ICAL-compatible calendars) is here:

And now on to the list traffic…


Bio::SeqIO::genbank confusions

Scott Markel reported problems that tag values in Bio::Annotation::Simple objects which had a zero value were not parsed correctly and were written as an empty string. He then added a test case and script to prove his point. The confusion started when YT decided to try the script on Windows (oops #1), found that it gave a different error (oops #2), and, thinking the two were linked (oops #3), committed Scott’s recommended fix. The test case data (a small GenBank file) was off by one space with the feature line, thus causing one error, while a second fix was made in Bio::Annotation::Simple by Heikki around December 2005 which fixed Scott’s original error (aha! someone didn’t update from CVS). Everything was right with the world again, except that Scott’s reported (and now redundant) fix was committed to CVS. YT corrected the ’correction’ (thanks Hilmar) and there was much scratching of heads… Confused? Me too.

The fun starts here:

Bio::SearchIO::psl issues

Albert Smith reported via Bugzilla and the main list that SearchIO had an issue parsing PSL formatted files from WebBlat. We (Albert and YT) were finally able to replicate the error but not without some perseverance. This ended up being a case where in inserting the ’-w’ flag helped to spot a not-so-obvious error. Fixes were made and people were joyous (or at least Albert was…)

PAML 3.15 now works

Jason updated Bio::Tools::Phylo::PAML that now allows parsing of PAML 3.15 output. Seems the programmers of PAML like changing output almost as much as NCBI does with BLAST…

Lesson: Don’t try parsing ’very large’ BLAST output at home…it can kill (processes)

A report was made that parsing a standalone BLAST report (the file was ’very large’) was causing a 99.9% spike in processor before ending in a killed process message. Jason pointed out that it was likely that the file was simply too large. It turns out that ’very large’ is approximately 215 MB. Didn’t know that…

Getting sequences by ID

A question was raised by Yuval on how to parse a large group of sequences (20,000) to get only a small group of IDs into fasta format in a reasonable amount of time. Torsten and Ryan chipped in with their suggestions (flat DB indexing sounded good); Yuval came up with his own. Amir Karger also chipped in a perl one-liner that’s not Bioperl-related (blasphemy!!!)…

Bio::Tools::RestrictionEnzyme and Bio::DB::fasta issues

Nick Staffa wanted to know what’s up with the cut_seq method in Bio::Tools::RestrictionEnzyme. Turns out that module is deprecated. Brian O fixed that in documentation and YT pointed this out. Then Nick found that someone forgot to implement the is_circular method, which Brian fixed in Bio::DB::Fasta…

Adding GAPOPEN to Muscle (that just sounds bad…)

Jordan Swanson suggested adding GAPOPEN to the bioperl-run wrapper module for the muscle multiple alignment program, which Albert Vilella gladly added as a parameter…

Use SearchIO::blast, not Bio::Tools::BPLite!

Sonmitra Mondal wanted to know why he got a ’bad gateway’ error using his script and what was going on with hsp->sbjctseq. YT pointed out that the ’bad gateway’ error sometimes happens at NCBI during peak hours, but that a bigger problem was his use of sbjctseq method which is from the deprecated BPLite module. Brian corrected the current documentation to reflect that…

How to trap warnings with eval{} blocks

Albert Vilella had a question about how to trap a Bioperlish warning in an eval{} block? YT and Heikki had a few suggestions on how to do this…

How do you retrieve RefSeq seqfeatures?

A question was posed on how to retrieve sequence features from RefSeq sequences? YT gives his thoughts: basically prevent redirection from Bio::DB::GenBank objects and hope for the best…

An Interpretation of Percentage_Identity

Jason relays what the differences are between percentage identity, average percentage identity, and overall percentage identity. Don’t worry, there’s not a quiz on this…

GFF3 validator(s)

Lincoln points the presence of a GFF3 validator; Andrew Dalke (BioPython) points out his own…using Python? Blasphemy!!!

Coloring with GFF2

Marco Blanchette wants to know how to color alternatively-spliced exons as a different color, and Scott Cain gives some pointers.

Marco then asked another question how to display binding sites, coloring based on the score, thus further proving he’s the reincarnation of Bob "Happy Little Trees" Ross. Okay, if you’re not American or didn’t watch PBS in the `80’sand `90’s, that probably flew over your head. The saga continues…

Humanely Slicing Alignments

Iain Wallace tries to figure out is there is a way to take a slice out of an alignment when one of the sequences doesn’t have a residue in that position. He keeps getting an error. Brian O. offers an undocumented solution, then documents it…

Primers and Sequences

Kevin Victor wants to know if there is a way to search for a primer sequence pair for a long sequence in batch mode. As rightly pointed out by Donald Jackson, there are probably better ways of doing this, namely EPCR…

Orphans and Leftovers

’Orphans’ are those questions which haven’t been answered but probably should be addressed. ’Leftovers’ are bits which probably don’t have anything to do with Bioperl but may be of interest.

Orphan #1 : Nathan Haigh wonders if there is a way to calculate nt frequencies of 4-fold degenerate sites in coding sequences in Bioperl, but says it’s not an emergency. Therefore, we treat it like it’s not…

Orphan #2 : Georg Otto has problems using standalone blast with the preformatted GenBank database…

Leftover #1 : Sunil posted a link to a new tool, MeltDNA, which is a Perl-based tool for predicting DNA duplex hybridization and thermodynamics. It was wondered aloud where a publication describing this might be found, and Torsten had the answer…


Tumbleweeds flowing across the plains (I had to use this somewhere)… no posts.

Bioperl-guts (for the die-hards)

Lincoln Stein has posted a slew of new and updated modules in relation to GFF3 databases, along with updated tests and a script.

New modules:

Updated modules:

new scripts:

  • bp_seqfeature_load.PLS

Hilmar Lapp also added a couple of extra Bio::SeqIO modules for comma-delineated tables and Excel workbooks, along with tests:

Other updates and bug fixes

Minor updates:


For suggestions, errors, gripes, etc., please post on the Talk page here.

See you on the 25th...

Personal tools
Main Links