Bugs

From BioPerl
Jump to: navigation, search

Contents

Bug submission

BioPerl has switched (as of mid-April, 2014) to GitHub Issues. We still have access to the Redmine-based tracking system, but this will be effectively read-only. We no longer support access to the original Bugzilla instance.

Don't make me bug you!

Please submit bugs or enhancement requests to BioPerl GitHub Issues. The older BioPerl Redmine tracking system remains but is will no longer be used, and the oldest Bugzilla-based system is not supported and addition of new bugs has been disabled.

We really do want you to submit bugs, even though it means more things for us to do! It means there is something we didn't think of or test in the particular module and we won't know about this unless you tell us. The Redmine system requires you register to avoid spam and to allow us to contact you again when the bug is fixed or to clarify the problem and solutions.

It is important that you record the version of BioPerl you are running (if you don't know, see the FAQ the question is addressed there). You can also include the version of Perl you are using and the Operating System you are running.

Simon Tatham has a great resource on how to effectively report bugs.

Submitting Bugs

When submitting new bugs on BioPerl on GitHub, enter a brief description and other general information. You can use Markdown to add links, syntax highlighting, and so on; see the GitHub Markdown docs for more.

You can paste example code in the description, but we suggest submitting as a GitHub Gist. If you have example fixes, we highly suggest using the tools GitHub has in place, namely the ability to fork the code and create a pull request with the relevant fix. This will show up as an issue automatically, so there isn't a need to file one separately.

Note that attachments on GitHub issues only work for images. If the example is a text file then use a GitHub Gist; alternatively, if the file is something available publicly then please provide a link to the file.

Submitting Patches

We gladly welcome patches. Patches for bioperl code should be created as described in the SubmitPatch HOWTO. Try to ensure the patch is derived against the latest code checked out from Git, particularly if the patch is large.

Briefly, you can generate the patch using the following command:

diff -u old new

For best results, follow this example:

cd $YOUR_WORKING_DIST_DIRECTORY/Bio/Frobnicator
git pull
git diff GrokFrobnicator.pm > my-patch.dif

We also accept patches as an issue on GitHub; submit the patch as a GitHub Gist. Even better, submit a pull request on GitHub. It's also worth discussing these on the mailing list.

Submitting New Modules and Code Snippets

We also accept new code, either as full-fledged modules or as snippets of code (snippets work better as a patch). New code must include documentation, example code (typically listed in a SYNOPSIS section), and tests with decent test coverage following our testing standards in our Writing_BioPerl_Tests HOWTO). Because we are moving to a more modular scheme for future Bioperl installations we highly suggest individual submission of modules to CPAN, primarily to help lower the barrier to submitting bug fixes.

Etiquette

Good bug reports are ones which provide a small amount of code and the necessary test files to reproduce your bug. By doing this work up front you insure the developer spends most of his or her time actually working on the problem. Pasting your entire 600 line program into the comment buffer is probably not going to get an enthusiastic response. In addition, isolating the problem down to a small amount of your code will help ensure that the bug is not on your end before we dive in and start working on it.

Open Issues (GitHub)

This list details the open Bugs on GitHub for BioPerl.

Github Issues to RSS

Issue 96: Speed up Bio::DB::Fasta using compiled regular expressions or Inline::C
* If available, use [Inline::C](https://metacpan.org/pod/Inline::C) * As a fallback, using compiled regexps in *s///* speeds up *subseq()* by about 7% from 7.76s to 7.22s over 32358 calls on Variant Effect Prediction data. See my [gist benchmarks]( https://gist.github.com/rocky/61f929d58a286189a758) * Compile regular expression for *compound_id* in *Bio::DB::IndexedBase*

Issue 95: Speeding up DB::Fasta::subseq
This follows Pull request #94. The background is basically that in running Variant Effect Prediction, a significant portion of time is spent in the stripping carriage returns and line feeds in the subseq method via: $data =~ s/\n//g; $data =~ s/\r//g Compiling the match portion in a regular expression outside of the function helps, but better is doing it in C. [This commit](https://github.com/rocky/bioperl-live/commit/65e599b126d1598b594ecaca53777382686d0084) I hope will be acceptable, and if not should serve as a concrete discussion for further work. In that commit, in the [Inline-C-Fasts](https://github.com/rocky/bioperl-live/tree/Inline-C-Fasta) branch of my fork, if module Inline::C is around, that will compile use a C function that overwrites another function of the same name which is a pure Perl function. See also these [benchmarks](https://gist.github.com/rocky/61f929d58a286189a758).
Issue 91: Support FTS attribute table and WITHOUT ROWID optimization for Bio::DB::SeqFeature::Store::DBI::SQLite
Several of our GBrowse instances had issues with full-text (attribute) searches timing out. Profiling revealed that the execution time of Bio::DB::SeqFeature::Store::search_attributes on our Bio::DB::SeqFeature::Store::DBI::SQLite databases contributed significantly to this problem. The proposed changes, which add support for indexing the attribute table using SQLite's FTS (full-text search) extension, resolved the issue (when used in conjunction with Scott Cain's recent commit that removed the use of CGI::Pretty in GBrowse: https://github.com/GMOD/GBrowse/commit/298cab3ece8f68b7baf2e9d085fab9055772fa3c) To demonstrate the performance impact of an FTS attribute table on search_attributes(), consider the following script: ``` # search_attributes.pl use strict; use warnings; use Bio::DB::SeqFeature::Store; my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::SQLite', -dsn => $ARGV[0]); my @features = $db->search_attributes($ARGV[1], ['arabidopsis_defline', 'arabidopsis_symbol', 'pfam', 'go', 'panther', 'kegg_enzyme', 'kegg_orthology', 'cog_cluster']); print 'Features: ' . scalar(@features) . "\n"; ``` Given a Bio::DB::SeqFeature::Store::DBI::SQLite database with gene models & annotation created from a GFF3 file with 692,300 features, the following were typical observed execution times in our environment: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $ time perl search_attributes.pl track-orig.db iron Features: 1283 real 0m1.61s user 0m0.71s sys 0m0.89s $ time perl search_attributes.pl track-fts.db iron Features: 1280 real 0m0.27s user 0m0.20s sys 0m0.06s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The difference in number of matches is due to the difference in behavior between the two methods: FTS (with the MATCH operator) searches for tokens, while LIKE '%iron%' finds substrings. The extra three results returned with "LIKE '%iron%'" contained a spurious match containing "iron" as a substring: acclimation of photosynthesis to environment and two occurrences of "diiron", which may be relevant to a user: dicarboxylate diiron protein, putative (Crd1) OTOH, if the user searches for "Fe" instead, they get "real" hits with an FTS attribute table, whereas the non-FTS search returns thousands of spurious hits where "fe" is a substring. Because of this difference in behavior (and possible portability issues to systems with old DBD::SQLite instances---see below), I thought that FTS should be op-in rather than the default. Also, note that FTS support depends on the version of DBD::SQLite. The current DBD::SQLite by default supports two versions: FTS3 since sometime before 1.30_04 (2010-08-25), and FTS4 since 1.36_01 (2012-01-19). At least one design decision I made while implementing this change should be considered/debated before accepting this pull request: the -fts option is just a boolean flag. My initial implementation supported the creation of an FTS attribute table using a user-specified FTS version, but at the last minute I decided to KISS and just use the most recent version supported by the installed DBD::SQLite. This isn't a problem if someone decides to implement an FTS attribute table for MySQL, which supports only one such index type (FULLTEXT: http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html). However, it's conceivable that one might want to implement FTS for PostgreSQL and have control over whether GIN or GiST indexing is used (http://www.postgresql.org/docs/9.3/static/textsearch-indexes.html), or, with SQLite, specify FTS3 at database (instead of the most recent FTS4) at creation time to allow its use on a host with an older DBD::SQLite.
Issue 87: Issues with bioperl versioning
Blocker on new 1.6/1.7 releases, see: https://github.com/andk/pause/issues/75
Issue 86: Mismatched output in bp_taxid4species
The bp_taxid4species script produces incorrect output. ```bash $ bp_taxid4species "Arabidopsis thaliana" "Homo sapiens" "arabidopsis thaliana[All Names]", 9606 "homo sapiens[All Names]", 3702 ``` However the taxon id of A. thaliana should be 3702 and humans should be 9606. The bug stems from a false assumption about Entrez output. As shown below ```bash $ base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi' $ term='Arabidopsis+thaliana+OR+Glycine+max' $ wget -qO /dev/stdout "${base}?db=taxonomy&term=${term}" | xmlstarlet fo 2 2 0 3847 3702 arabidopsis thaliana[All Names] All Names 1 N glycine max[All Names] All Names 1 N OR arabidopsis thaliana[All Names] OR glycine max[All Names] ``` Here I am using the same entrez query used in the bp_taxid4species. The \ terms in \ and the \ terms in the \ are not in the same order. There is simply no way to map scientific names to taxon ids using this XML output. One option would be to query entrez once for each species name. However this is slow. A better option would be to extract the ids in IdList and then perform a second entrez query using the efetch or esummary utilities against the taxonomy database.
Issue 83: GenBank parsing CONTIG issues
See: http://mailman.open-bio.org/pipermail/bioperl-l/2014-September/088945.html Basically, this works with the August patch for GenBank parsing (so the bug isn't there) but some change since 1.6.922 has caused parsing to slow dramatically. We'll need to bisect this.
Issue 79: Added script to extract DNA sequences (as well as 5' or 3' regions if specified) from a FASTA file using a BLAST output file
Added a personal script to extract a DNA sequence from a FASTA file using a BLAST output file. Expects at least two arguments, the BLAST file and the FASTA file. There are a number of optional arguments that are explained in the script. This script is especially useful when trying to extract sequences with variance (hence the BLAST search beforehand) from FASTA files. For example, say that you are trying to extract a given gene and 2000 base pairs 5' to it from 20 different genomes. All you have is one gene sequence, however. By doing a BLAST search between each of the genomes and the gene and then using this script, you can extract the sequences that you are interested in. The script also has options to extract a specified 3' or 5' sequence from the FASTA file, as well as an e-value cut off. The final output is the extracted sequence in FASTA format. Is this useful/generic enough to be included in the scripts directory? The script is well tested and takes command-line arguments.
Issue 61: new modules Bio::Tools::Alignment::Overview and Bio::DB::NextProt
Some time ago I made the mistake of uploading to CPAN a module called Bio::Tools::Alignment::Overview, the module wasn't associated to bioperl but for some reason I decided to use the namespace anyway (yeah, I know...) Now I'm organizing some projects and I decided to include the module to bioperl, so if you accept this pull request I will remove from PAUSE/CPAN the current module, so that you can upload it under bioperl. Sorry for the noobish mistake =) Cheers.
Issue 46: Simplealign
Hi, Chris, This is my implementation of Bio::SimpleAlign. I have a detailed report describing the improvement to the code, and explaining the failed tests using t/Align/SimpleAlign.t. If you need any help, please just let me know. Cheers, Jun

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox