This article has Open Peer Review reports available.
Ten recommendations for software engineering in research
© Hastings et al.; licensee BioMed Central 2014
Received: 25 September 2014
Accepted: 19 November 2014
Published: 4 December 2014
Research in the context of data-driven science requires a backbone of well-written software, but scientific researchers are typically not trained at length in software engineering, the principles for creating better software products. To address this gap, in particular for young researchers new to programming, we give ten recommendations to ensure the usability, sustainability and practicality of research software.
KeywordsSoftware engineering Best practices
Scientific research increasingly harnesses computing as a platform , and the size, complexity, diversity and relatively high availability of research datasets in a variety of formats is a strong driver to deliver well-designed, efficient and maintainable software and tools. As the frontier of science evolves, new tools constantly need to be written; however scientists, in particular early-career researchers, might not have received training in software engineering , thus their code is in jeopardy of being difficult and costly to maintain and re-use.
To address this gap, we have compiled ten brief software engineering recommendations.
1. Keep it simple
Every software project starts somewhere. A rule of thumb is to start as simply as you possibly can. Significantly more problems are created by over-engineering than under-engineering. Simplicity starts with design: a clean and elegant data model is a kind of simplicity that leads naturally to efficient algorithms.
Do the simplest thing that could possibly work, and then double-check it really does work.
2. Test, test, test
For objectivity, large software development efforts assign different people to test software than those who develop it. This is a luxury not available in most research labs, but there are robust testing strategies available to even the smallest project.
Unit tests are software tests which are executed automatically on a regular basis. In test driven development, the tests are written first, serving as a specification and checking every aspect of the intended functionality as it is developed . One must make sure that unit tests exhaustively simulate all possible – not only that which seems reasonable – inputs to each method.
3. Do not repeat yourself
Do not be tempted to use the copy-paste-modify coding technique when you encounter similar requirements. Even though this seems to be the simplest approach, it will not remain simple, because important lines of code will end up duplicated. When making changes, you will have to do them twice, taking twice as long, and you may forget an obscure place to which you copied that code, leaving a bug.
Automated tools, such as Simian , can help to detect and fix duplication in existing codebases. To fix duplications or bugs, consider writing a library with methods that can be called when needed.
4. Use a modular design
Modules act as building blocks that can be glued together to achieve overall system functionality. They hide the details of their implementation behind a public interface, which provides all the methods that should be used. Users should code – and test – to the interface rather than the implementation . Thus, concrete implementation details can change without impacting downstream users of the module. Application programming interfaces (APIs) can be shared between different implementation providers.
Scrutinise modules and libraries that already exist for the functionality you need. Do not rewrite what you can profitably re-use – and do not be put off if the best candidate third-party library contains more functionality than you need (now).
5. Involve your users
Users know what they need software to do. Let them try the software as early as possible, and make it easy for them to give feedback, via a mailing list or an issue tracker. In an open source software development paradigm, your users can become co-developers. In closed-source and commercial paradigms, you can offer early-access beta releases to a trusted group.
Many sophisticated methods have been developed for user experience analysis. For example, you could hold an interactive workshop .
6. Resist gold plating
Sometimes, users ask for too much, leading to feature creep or “gold plating”. Learn to tell the difference between essential features and the long list of wishes users may have. Prioritise aggressively with as broad a collection of stakeholders as possible, perhaps using “game-storming” techniques .
Gold plating is a challenge in all phases of development, not only in the early stages of requirements analysis. In its most mischievous disguise, just a little something is added in every iterative project meeting. Those little somethings add up.
7. Document everything
Comprehensive documentation helps other developers who may take over your code, and will also help you in the future. Use code comments for in-line documentation, especially for any technically challenging blocks, and public interface methods. However, there is no need for comments that mirror the exact detail of code line-by-line.
Write an easily accessible module guide for each module, explaining the higher level view: what is the purpose of this module? How does it fit together with other modules? How does one get started using it?
8. Avoid spaghetti
9. Optimise last
Beware of optimising too early. Although research applications are often performance-critical, until you truly encounter the wide range of inputs that your software will eventually run against in the production environment, it may not be possible to anticipate where the real bottlenecks will lie. Develop the correct functionality first, deploy it and then continuously improve it using repeated evaluation of the system running time as a guide (while your unit tests keep checking that the system is doing what it should).
10. Evolution, not revolution
Maintenance becomes harder as a system gets older. Take time on a regular basis to revisit the codebase, and consider whether it can be renovated and improved . However, the urge to rewrite an entire system from the beginning should be avoided, unless it is really the only option or the system is very small. Be pragmatic  – you may never finish the rewrite . This is especially true for systems that were written without following the preceding recommendations.
Effective software engineering is a challenge in any enterprise, but may be even more so in the research context. Among other reasons, the research context can encourage a rapid turnover of staff, with the result that knowledge about legacy systems is lost. There can be a shortage of software engineering-specific training, and the “publish or perish” culture may incentivise taking shortcuts.
Software Carpentry: scientific
computing skills; learn online
or in face-to-face workshops
The Software Sustainability Institute
Learn more about what makes
code easy to maintain
How to write good unit tests
What is clean code?
Introduction to refactoring
The danger of prematureoptimization
This commentary is based on a presentation given by JH at a workshop on Software Engineering held at the 2014 annual Metabolomics conference in Tsuruoka, Japan. The authors would like to thank Saravanan Dayalan for organising the workshop and giving JH the opportunity to present. We would furthermore like to thank Robert P. Davey and Chris Mungall for their careful and helpful reviews of an earlier version of this manuscript.
- Goble C: Better software, better research. IEEE Internet Comput. 2014, 18: 4-8.View ArticleGoogle Scholar
- Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, Guy RT, Haddock SHD, Huff KD, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P: Best practices for scientific computing. PLoS Biol. 2014, 12: e1001745-10.1371/journal.pbio.1001745.View ArticlePubMedPubMed CentralGoogle Scholar
- Beck K: Test Driven Development. 2002, New York: Addison-WesleyGoogle Scholar
- Simian: Similarity Analyzer. [http://www.harukizaemon.com/simian/]
- Bloch J: Effective Java. 2008, New York: Addison-WesleyGoogle Scholar
- Pavelin K, Pundir S, Cham JA: Ten simple rules for running interactive workshops. PLoS Comput Biol. 2014, 10: e1003485-10.1371/journal.pcbi.1003485.View ArticlePubMedPubMed CentralGoogle Scholar
- Gray D, Brown S, Macanufo J: Game-storming: A playbook for innovators rulebreakers and changemakers. 2010, Sebastopol, CA: O’ReillyGoogle Scholar
- Martin RC: Clean Code: A Handbook of Agile Software Craftsmanship. 2008, New York: Prentice HallGoogle Scholar
- Dijkstra EW: Letters to the editor: go to statement considered harmful. Commun ACM. 1968, 11 (3): 147-148. 10.1145/362929.362947.View ArticleGoogle Scholar
- Fowler M: Refactoring: Improving the Design of Existing Code. 1999, Reading, Massachusetts: Addison WesleyGoogle Scholar
- Hunt A, Thomas D: The Pragmatic Programmer. 2000, Reading, Massachusetts: Addison Wesley LongmanGoogle Scholar
- Brooks FP: The Mythical Man Month. 1975, New York: Addison-WesleyView ArticleGoogle Scholar
- The Git distributed version control management system. [http://git-scm.com/]
- GitHub: An online collaboration platform for Git version-controlled projects. [http://github.com/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.