Ten recommendations for creating usable bioinformatics command line software
© Seemann; licensee BioMed Central Ltd. 2013
Received: 21 October 2013
Accepted: 11 November 2013
Published: 13 November 2013
Bioinformatics software varies greatly in quality. In terms of usability, the command line interface is the first experience a user will have of a tool. Unfortunately, this is often also the last time a tool will be used. Here I present ten recommendations for command line software author’s tools to follow, which I believe would greatly improve the uptake and usability of their products, waste less user’s time, and improve the quality of scientific analyses.
KeywordsBioinformatics software Software quality User interface Unix Tools
New bioinformatics tools are released and published every day, most of which are designed for the Unix command line. Ignoring the important issue of algorithmic correctness, the first barrier for community uptake of a bioinformatics tool is the command line interface and usability. It is this author’s experience that the majority of these tools fail basic requirements of usability; and thus, a course of action to overcome this would be a list of minimum standards for all command line scientific software that would help software authors and reviewers to improve the average usability of released software tools.
I have used and installed a lot of bioinformatics software over the last 12 years, and I have also released a lot of my own software - I try to make it as painless to use as possible. From these experiences, I present ten recommendations for bioinformatics software, using the fictitious “BioTool” project as an example.
1. Print something if no parameters are supplied
2. Always have a “-h” or “--help” switch
3. Have a “-v” or “--version” switch
4. Do not use stdout for messages and errors
5. Always raise an error if something goes wrong
6. Validate your parameters
7. Don’t hard-code any paths
A better solution is to locate your dependent files relative to where the main tool is installed. This can be done manually, or via a helper module like Perl’s FindBin .
8. Don’t pollute the PATH
Use only one master command, which is used to invoke sub-commands. This is used effectively by popular software like SAMtools .
Prefix all your sub-tools and helper scripts with the name biotool-.
Ensure internal helper scripts are non-executable, so they don’t get indexed in the PATH, and instead invoke the scripts explicitly from biotool.
Place them in a separate sub-folder (eg., auxiliary/, scripts/) and explicitly call them (but take note of rule #7 above).
9. Check that your dependencies are installed
10. Don’t distribute bare JAR files
Implementation of these recommendations would greatly improve the usability of command line bioinformatics software, and otherwise excellent ideas and tools would get the audience they deserve, rather than be ignored in frustration.
TS received a PhD in computer science in 2002, and joined the Victorian Bioinformatics Consortium as a junior scientist. He has worked in the field of microbial genomics and bioinformatics for over 10 years, and has written many open-source software tools for the analysis of genomics data. He is currently the Scientific Director of the Victorian Bioinformatics Consortium, and a senior researcher at the Life Sciences Computation Centre in Melbourne, Australia.
The author would like to thank the reviewers Pierre Lindenbaum, Robert P Davey and Daniel Swan for their feedback, which improved this manuscript.
This research was supported by the VLSCI’s Life Sciences Computation Centre - a collaboration between Melbourne, Monash and La Trobe Universities, and an initiative of the Victorian Government, Australia.
- POSIX standards for command line interfaces.http://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html.
- Wikipedia: getopt.http://en.wikipedia.org/wiki/Getopt.
- Galaxy tool shed.http://wiki.galaxyproject.org/Tool%20Shed.
- Exit and exit status.http://tldp.org/LDP/abs/html/exit-status.html.
- Perl core modules: FindBin.http://perldoc.perl.org/FindBin.html.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.