Testing Scientific Software: A Practical Guide for Developers
Testing Scientific Software: A Practical Guide for Developers
These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you have any questions or suggestions for blog posts, please don’t hesitate to reach out!
Introduction
Scientific software plays a crucial role in research. When your software is used to analyze data, simulate models, or derive scientific conclusions, ensuring its correctness becomes critical. A small bug can have a significant impact on your results, potentially invalidating months of work, or worse, causing the retraction of published research (for example: see here). Fortunately, software testing can help minimize such risks, giving you more confidence in your code and a greater chance to catch issues early.
In this guide, we’ll walk through key types of software tests, practical advice for using popular testing tools like pytest
and doctest
, and how you can incorporate these into your scientific development workflow.
Why testing matters in science
Imagine you’ve written a simulation that generates data based on a complex scientific model. It works well under some conditions, but during peer review, a colleague finds a subtle bug. This bug doesn’t affect small data sets but produces significant errors on larger simulations. The consequences? You have to revise your paper, and time is lost fixing code. There’s also the possibility of bugs taking a long time to find (if ever) potentially leading to erroneous research.
Consider: how can an audience know a software creates a reproducible outcome without tests they can run and verify themselves? Testing your software upfront ensures that potential errors are caught early and your scientific conclusions remain valid and robust. This article covers how to write code which automates the process of testing your software.
You may already have tests in place which haven’t yet been automated. If this is the case, consider integrating these with automated tools like those mentioned below to help create reproducible research software!
Production code vs test code
When working with software testing principles it can be helpful to distinguish “production” or “application” code (code which provides some utility besides testing itself) from “test” code (code which will be used to test the production or application code). These are often (but not always) stored in separate directories or components within the project.
Types of tests for scientific software
In software development, tests are typically categorized into several types. Each plays a unique role in ensuring your code functions as intended.
- Unit tests: These validate small, isolated parts of your code, like functions or methods. They are one of the most basic forms of testing but extremely valuable, ensuring the correctness of atomic units in your codebase.
- Integration tests: Once your units of code are tested individually, integration tests ensure software components work together. This is especially important in scientific software where different models, algorithms, and data structures interact.
- System / End-to-End tests: These check the software from a user’s perspective. For scientific software, this often means running entire workflows or simulations to make sure that everything from data input to output runs smoothly.
There are also many other different types of tests and testing philosophies which can be found here: https://en.wikipedia.org/wiki/Software_testing#Categorization.
Testing in Python
Testing in Python is often performed using the built-in unittest
module or pytest
package.
There is also an additional built-in module, doctest
, which allows you to test whether statements run as expected within docstrings.
assert
statements are a common part of writing tests in Python.
We can use assert
to determine the truthiness of certain output (see below for an example).
# we can use the assert statements to determine
# the truthiness (or lack thereof) of values.
assert 1 == 1
# returns True
assert 1 == 0
# returns False
The following examples will be added to a demonstrational repository which can be found here: https://github.com/CU-DBMI/demo-python-software-testing
Introduction to pytest
pytest
is one of the most popular testing frameworks in Python.
An advantage to using pytest
is that it is widely used, includes many different plugins to extend functionality, and is relatively uncomplicated.
Getting started with pytest
Consider the following project tree structure representing directories and files which is common to pytest
projects.
Note the tests
directory, which includes code dedicated to testing the package_name
package module.
pytest
seeks to run test code with the prefix of test_
under the tests
directory.
example_project
├── pyproject.toml
├── src
│ └── package_name
│ └── package_module.py
└── tests
└── test_package_module.py
Just in case, make sure you install pytest
into the project’s environment management:
# use pip to install pytest into an existing environment
pip install pytest
# or, add pytest to a poetry environment
poetry add pytest
Assume we have a simple function within package_module.py
which helps us understand whether a given integer is an even number.
def is_even(number: int) -> bool:
"""
Determines if a number is even.
An even number is divisible by 2 without a remainder.
Args:
number (int):
The number to check.
Returns:
bool:
True if the number is even, False if it is odd.
"""
return number % 2 == 0
Next, we could create a simple unit test within test_package_module.py
for the is_even()
function.
def test_is_even():
# assert that 2 is detected as an even number
assert is_even(2)
Once we have the test written, we can use the pytest
command through our project environment
# run the `pytest` command through your terminal
pytest
# or, run `pytest` through a poetry environment
poetry run pytest
pytest
will automatically find all files starting with test_ and run any functions inside them that start with test_.
It also produces concise output, helping you pinpoint errors quickly.
See below for an example of what the output might look like (we can see that the single test passed).
============== test session starts ===============
platform darwin -- Python 3.11.9, pytest-8.3.3,
pluggy-1.5.0
rootdir: /example_project
configfile: pyproject.toml
collected 1 item
tests/test_package_module.py . [100%]
=============== 1 passed in 0.00s ================
Additional pytest
features
-
Using temporary directories:
pytest
allows for the creation of temporary directories where test data can be stored for each test run in isolation. This pattern can be helpful for times where you may need to generate and store test data for use among multiple tests. -
Fixtures: Use
pytest
fixtures to set up any necessary preconditions for your tests (like loading test datasets). -
Parameterization: If you need to test multiple inputs on the same function,
pytest
allows you to parameterize your tests, running the same test function with different values.
Using doctest
for documentation and testing
doctest
is another tool which can be used for testing.
It serves a dual purpose: it embeds tests directly in your documentation by using interactive examples in docstrings.
This can be a lightweight way to share testable functionality within the documentation for your code.
Be cautious however; doctests
are not generally suitable for large or complex testing.
Writing doctests
A doctest
is simply an example of using a function, placed in docstrings.
The general pattern follows Python interactive terminal input (denoted by >>>
) followed by the expected standard output (which sometimes may be truncated when dealing with large numbers or strings).
The Examples
section of the docstring below demonstrates what a doctest for our earlier function might look like.
def is_even(number: int) -> bool:
"""
Determines if a number is even.
An even number is divisible by 2 without a remainder.
Args:
number (int):
The number to check.
Returns:
bool:
True if the number is even, False if it is odd.
Examples:
>>> is_even(2)
True
>>> is_even(3)
False
"""
return number % 2 == 0
You can run always run doctests by adding the following to the same module which includes that code.
if __name__ == "__main__":
import doctest
# run doctests within module
doctest.testmod()
You also can run doctests through pytest
by using the --doctest-modules
command flag.
This can be helpful for areas where we don’t want to use the if __name__ == "__main__":
pattern.
# run the `pytest` command through your terminal
pytest --doctest-modules
# or, run `pytest` through a poetry environment
poetry run pytest --doctest-modules
The output might look like this:
============== test session starts ===============
platform darwin -- Python 3.11.9, pytest-8.3.3,
pluggy-1.5.0
rootdir: /example_project
configfile: pyproject.toml
collected 2 items
src/package_name/module.py . [ 50%]
tests/test_package_module.py . [100%]
=============== 2 passed in 0.01s ================
This ensures that the examples in your documentation are always accurate and tested as part of your development cycle.
Using Hypothesis for testing
Using Hypothesis for testing is a powerful approach for validating the correctness of scientific software. By employing the Hypothesis library, you can perform property-based testing that generates test cases based on the characteristics of your input data. This method allows you to test your functions against a broad range of inputs, ensuring that edge cases and unexpected scenarios are adequately handled.
What is property-based testing?
Property-based testing focuses on verifying that certain properties hold true for a wide range of input values, rather than checking specific outputs for predetermined inputs. This contrasts with traditional example-based testing, where you specify the exact inputs and outputs (which can take time to imagine or construct individually).
Getting started with hypothesis
To begin using Hypothesis in your project, you first need to install the library:
# use pip to install hypothesis into an existing environment
pip install hypothesis
# or, add hypothesis to a poetry environment
poetry add hypothesis
Once installed, you can write tests that utilize its capabilities. With Hypothesis, you can write a test that asserts properties about even numbers. For instance, all even numbers should return True when passed to the is_even function:
from hypothesis import given
from hypothesis.strategies import integers
@given(integers())
def test_is_even(number):
if number % 2 == 0:
assert is_even(number)
else:
assert not is_even(number)
Using the Hypothesis ghostwriter
Hypothesis also includes a “ghostwriter” CLI which can infer how to write Hypothesis tests given an object from Python code. This can help automate the process of writing your test code or provide inspiration for how to construct your Hypothesis tests. Caveat emptor: please be sure to review any code generated by the Hypothesis ghostwriter (it may not capture useful tests or edge cases).
Given the example code from the above, we could ask the Hypothesis ghostwriter to construct a test for the is_even()
function as follows:
# run the command from your activated Python environment
hypothesis write package_name.module.is_even
# or, run through a poetry environment
poetry run hypothesis write package_name.module.is_even
The output looks similar but not quite the same as the Hypothesis test we shared above. Note that the test name implies and itself employs a technique called fuzzing, or fuzz testing, which is used to help determine where software might break.
# This test code was written by the `hypothesis.extra.ghostwriter` module
# and is provided under the Creative Commons Zero public domain dedication.
import package_name.module
from hypothesis import given, strategies as st
@given(number=st.integers())
def test_fuzz_is_even(number: int) -> None:
package_name.module.is_even(number=number)
Benefits of using Hypothesis
Below are just a few of the benefits you’ll find with using Hypothesis:
- Discovering Edge Cases: Hypothesis automatically generates diverse input scenarios, including edge cases that might be overlooked in example-based tests.
- Reduced Boilerplate Code: You can focus on the properties of your functions rather than writing extensive examples for every possible case.
- Increased Confidence: By validating the behavior of your code against a broader set of inputs, you can be more confident that your scientific software will behave correctly in practice.
Best practices for scientific software testing
Now that you understand some testing tools, here are some best practices for testing scientific software:
-
Write tests early: Incorporate testing from the start. This is crucial, especially when your software are prone to evolving.
-
Test small and test often: Focus on unit tests that cover individual functions and methods. Catching small errors early prevents larger problems down the line.
-
Use realistic test data: When testing your functions, prioritize test data that reflects the real-world conditions where your software will be applied. Secondarily, use “mock” or synthetically created data when the real data are too large or complex to test quickly. For more on this topic, see “Prefer Realism Over Isolation” from the Test Doubles chapter in the book Software Engineering at Google.
-
Automate your tests: Use tools like
pytest
and Continuous Integration (CI) services (e.g., GitHub Actions, GitLab CI) to run your tests automatically on every commit. This ensures that every update is tested, and bugs are identified early. -
Combine different tests approaches to help diversify your test coverage: Both are essential in scientific software. While unit tests help you pinpoint specific issues, integration tests validate that modules work correctly when combined.
Conclusion
Testing is a vital part of developing scientific software. By using tools like pytest
, doctest
, and Hypothesis, you can automate the testing process and ensure your codebase remains robust.
Investing time in writing good tests upfront will save you countless hours in debugging and re-running experiments.
Remember, the correctness of your code is directly tied to the validity of your scientific results. By adopting a solid testing strategy, you’re taking a significant step toward ensuring reproducible, reliable, and impactful scientific research.
Now, you’re ready to ensure your scientific code is as solid as your research!
If interested, be sure to reference the related demonstrational repository with code from this blog post which can be found here: https://github.com/CU-DBMI/demo-python-software-testing
Additional material
- Eisty, N. U., & Carver, J. C. (2022). Testing Research Software: A Survey. Empirical Software Engineering, 27(6), 138. https://doi.org/10.1007/s10664-022-10184-9
- Kanewala, U., & Bieman, J. M. (2018). Testing Scientific Software: A Systematic Literature Review (arXiv:1804.01954). arXiv. http://arxiv.org/abs/1804.01954
- Bender, A. (2020) Testing Overview. Winters, T., Manshreck, T., & Wright, H. Software engineering at Google: Lessons learned from programming over time. https://abseil.io/resources/swe-book/html/ch11.html
- CU-DBMI SET Blog Post: Uncovering Code Coverage: Ensuring Software Reliability with Comprehensive Testing