Software Engineering Team CU Dept. of Biomedical Informatics

Tip of the Week: Codesgiving - Open-source Contribution Walkthrough

Tip of the Week: Codesgiving - Open-source Contribution Walkthrough

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to #software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

Introduction

What good harvests from open-source have you experienced this year?
What good harvests from open-source have you experienced this year?

Thanksgiving is a holiday practiced in many countries which focuses on gratitude for good harvests of the preceding year. In the United States, we celebrate Thanksgiving on the fourth Thursday of November each year often by eating meals we create together with others. This post channels the spirit of Thanksgiving by giving our thanks through code as a “Codesgiving”, acknowledging and creating better software together.

Giving Thanks to Open-source Harvests

Part of building software involves the use of code which others have built, maintained, and distributed for a wider audience. Using other people’s work often comes in the form of open-source “harvesting” as we find solutions to software challenges we face. Examples might include installing and depending upon Python packages from PyPI or R packages from CRAN within your software projects.

“Real generosity toward the future lies in giving all to the present.” - Albert Camus

These open-source projects have internal costs which are sometimes invisible to those who consume them. Every software project has an implied level of software gardening time costs involved to impede decay, practice continuous improvements, and evolve the work. One way to actively share our thanks for the projects we depend on is through applying our time towards code contributions on them.

Many projects are in need of additional people’s thinking and development time. Have you ever noticed something that needs to be fixed or desirable functionality in a project you use? Consider adding your contributions to open-source!

All Contributions Matter

Contributing to open-source can come in many forms and contributions don’t need to be gigantic to make an impact. Software often involves simplifying complexity. Simplification requires many actions beyond solely writing code. For example, a short walk outside, a conversation with someone, or a nap can sometimes help us with breakthroughs when it comes to development. By the same token, open-source benefits greatly from communications on discussion boards, bug or feature descriptions, or other work that might not be strictly considered “engineering”.

An Open-source Contribution Approach

The troubleshooting process as a workflow involving looped checks for verifying an issue and validating the solution fixes an issue.

It can feel overwhelming to find a way to contribute to open-source. Similar to other software methodology, modularizing your approach can help you progress without being overwhelmed. Using a troubleshooting approach like the above can help you break down big challenges into bite-sized chunks. Consider each step as a “module” or “section” which needs to be addressed sequentially.

Embrace a Learning Mindset

“Before you speak ask yourself if what you are going to say is true, is kind, is necessary, is helpful. If the answer is no, maybe what you are about to say should be left unsaid.” - Bernard Meltzer

Open-source contributions almost always entail learning of some kind. Many contributions happen solely in the form of code and text communications which are easily misinterpreted. Assume positive intent and accept input from others while upholding your own ideas to share successful contributions together. Prepare yourself by intentionally opening your mind to input from others, even if you’re sure you’re absolutely “right”.

Before communicating, be sure to use Bernard Meltzer’s self-checks mentioned above.

  1. Is what I’m about to say true?
    • Have I taken time to verify the claims in a way others can replicate or understand?
  2. Is what I’m about to say kind?
    • Does my intention and communication channel kindness (and not cruelty)?
  3. Is what I’m about to say necessary?
    • Do my words and actions here enable or enhance progress towards a goal (would the outcome be achieved without them)?
  4. Is what I’m about to say helpful?
    • How does my communication increase the quality or sustainability of the project (or group)?

Setting Software Scheduling Expectations

Suggested ratio of time spent by type of work for an open-source contribution.

  1. 1/3 planning (~33%)
  2. 1/6 coding (~16%)
  3. 1/4 component and system testing (25%)
  4. 1/4 code review, revisions, and post-actions (25%)

This modified rule of thumb from The Mythical Man Month can assist with how you structure your time for an open-source contribution. Notice the emphasis on planning and testing and keep these in mind as you progress (the actual programming time can be small if adequate time has been spent on planning). Notably, the original time fractions are modified here with the final quarter of the time spent suggested as code review, revisions, and post-actions. Planning for the time expense of the added code review and related elements assists with keeping a learning mindset throughout the process (instead of feeling like the review is a “tack-on” or “optional / supplementary”). A good motto to keep in mind throughout this process is Festina lente, or “Make haste, slowly.” (take care to move thoughtfully and as slowly as necessary to do things correctly the first time).

Planning an Open-source Contribution

Has the Need Already Been Reported?

Be sure to check whether the bug or feature has already been reported somewhere! In a way, this is a practice of “Don’t repeat yourself” (DRY) where we attempt to avoid repeating the same block of code (in this case, the “code” can be understood as natural language). For example, you can look on GitHub Issues or GitHub Discussions with a search query matching the rough idea of what you’re thinking about. You can also use the GitHub search bar to automatically search multiple areas (including Issues, Discussions, Pull Requests, etc.) when you enter a query from the repository homepage. If it has been reported already, take a look to see if someone has made a code contribution related to the work already.

An open discussion or report of the need doesn’t guarantee someone’s already working on a solution. If there aren’t yet any code contributions and it doesn’t look like anyone is working on one, consider volunteering to take a further look into the solution and be sure to acknowledge any existing discussions. If you’re unsure, it’s always kind to mention your interest in the report and ask for more information.

Is the Need a Bug or Feature?

One way to help solidify your thinking and the approach is to consider whether what you’re proposing is a bug or a feature. A software bug is considered something which is broken or malfunctioning. A software feature is generally considered new functionality or a different way of doing things than what exists today. There’s often overlap between these, and sometimes they can inspire branching needs, but individually they usually are more of one than the other. If you can’t decide whether your need is a bug or a feature, consider breaking it down into smaller sub-components so they can be more of one or the other. Following this strategy will help you communicate the potential for contribution and also clarify the development process (for example, a critical bug might be prioritized differently than a nice-to-have new feature).

Reporting the Need for Change

# Using `function_x` with `library_y` causes `exception_z`

## Summary

As a `library_y` research software developer I want to use `function_x` 
for my data so that I can share data for research outcomes.

## Reproducing the error

This error may be seen using Python v3.x on all major OS's using
the following code snippet:
...

An example of a user story issue report with imagined code example.

Open-source needs are often best reported through written stories captured within a bug or feature tracking system (such as GitHub Issues) which if possible also include example code or logs. One template for reporting issues is through a “user story”. A user story typically comes in the form: As a < type of user >, I want < some goal > so that < some reason >. (Mountain Goat Software: User Stories). Alongside the story, it can help to add in a snippet of code which exemplifies a problem, new functionality, or a potential adjacent / similar solution. As a general principle, be as specific as you can without going overboard. Include things like programming language version, operating system, and other system dependencies that might be related.

Once you have a good written description of the need, be sure to submit it where it can be seen by the relevant development community. For GitHub-based work, this is usually a GitHub Issue, but can also entail discussion board posts to gather buy-in or consensus before proceeding. In addition to the specifics outlined above, also recall the learning mindset and Bernard Meltzer’s self-checks, taking time to acknowledge especially the potential challenges and already attempted solutions associated with the description (conveying kindness throughout).

What Happens After You Submit a Bug or Feature Report?

When making open-source contributions, sometimes it can also help to mention that you’re interested in resolving the issue through a related pull request and review. Oftentimes open-source projects welcome new contributors but may have specific requirements. These requirements are usually spelled out within a CONTRIBUTING.md document found somewhere in the repository or the organization level documentation. It’s also completely okay to let other contributors build solutions for the issue (like we mentioned before, all contributions matter, including the reporting of bugs or features themselves)!

Developing and Testing an Open-source Contribution

Creating a Development Workspace

Once ready to develop a solution for the reported need in the open-source project you’ll need a place to version your updates. This work generally takes place through version control on focused branches which are named in a way that relates to the focus. When working on GitHub, this work also commonly takes place on forked repository copies. Using these methods helps isolate your changes from other work that takes place within the project. It also can help you track your progress alongside related changes that might take place before you’re able to seek review or code merges.

Bug or Feature Verification with Test-driven Development

One can use a test-driven development approach as numbered steps (Wikipedia).

  1. Add or modify a test which checks for a bug fix or feature addition
  2. Run all tests (expecting the newly added test content to fail)
  3. Write a simple version of code which allows the tests to succeed
  4. Verify that all tests now pass
  5. Return to step 3, refactoring the code as needed

If you decide to develop a solution for what you reported, one software strategy which can help you remain focused and objective is test-driven development. Using this pattern sets a “cognitive milestone” for you as you develop a solution to what was reported. Open-source projects can have many interesting components which could take time and be challenging to understand. The addition of the test and related development will help keep you goal-orientated without getting lost in the “software forest” of a project.

Prefer Simple Over Complex Changes

… Simple is better than complex. Complex is better than complicated. … - PEP 20: The Zen of Python

Further channeling step 3. from test-driven development above, prefer simple changes over more complex ones (recognizing that the absolute simplest can take iteration and thought). Some of the best solutions are often the most easily understood ones (where the code addition or changes seem obvious afterwards). A “simplest version” of the code can often be more quickly refactored and completed than devising a “perfect” solution the first time. Remember, you’ll very likely have the help of a code review before the code is merged (expect to learn more and add changes during review!).

It might be tempting to address more than one bug or feature at the same time. Avoid feature creep as you build solutions - stay focused on the task at hand! Take note of things you notice on your journey to address the reported needs. These can be become additional reported bugs or features which could be addressed later. Staying focused with your development will save you time, keep your tests constrained, and (theoretically) help reduce the time and complexity of code review.

Developing a Solution

Once you have a test in place for the bug fix or feature addition it’s time to work towards developing a solution. If you’ve taken time to accomplish the prior steps before this point you may already have a good idea about how to go about a solution. If not, spend some time investigating the technical aspects of a solution, optionally adding this information to the report or discussion content for further review before development. Use timeboxing techniques to help make sure the time you spend in development is no more than necessary.

Code Review, Revisions, and Post-actions

Pull Requests and Code Review

When your code and new test(s) are in a good spot it’s time to ask for a code review. It might feel tempting to perfect the code. Instead, consider whether the code is “good enough” and would benefit from someone else providing feedback. Code review takes advantage of a strength of our species: collaborative & multi-perspectival thinking. Leverage this in your open-source experience by seeking feedback when things feel “good enough”.

Demonstrating Pareto Principle “vital few” through a small number of changes to achieve 80% of the value associated with the needs.

One way to understand “good enough” is to assess whether you have reached what the Pareto Principle terms as the “vital few” causes. The Pareto Principle states that roughly 80% of consequences come from 20% of causes (the “vital few”). What are the 20% changes (for example, as commits) which are required to achieve 80% of the desired intent for development with your open-source contribution? When you reach those 20% of the changes, consider opening a pull request to gather more insight about whether those changes will suffice and how the remaining effort might be spent.

As you go through the process of opening a pull request, be sure to follow the open-source CONTRIBUTING.md document documentation related to the project; each one can vary. When working on GitHub-based projects, you’ll need to open a pull request on the correct branch (usually upstream main). If you used a GitHub issue to help report the issue, mention the issue in the pull request description using the #issue number (for example #123 where the issue link would look like: https://github.com/orgname/reponame/issues/123) reference to help link the work to the reported need. This will cause the pull request to show up within the issue and automatically create a link to the issue from the pull request.

Code Revisions

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

You may be asked to update your code based on automated code quality checks or reviewer request. Treat these with care; embrace learning and remember that this step can take 25% of the total time for the contribution. When working on GitHub forks or branches, you can make additional commits directly on the development branch which was used for the pull request. If your reviewers requested changes, re-request their review once changes have been made to help let them know the code is ready for another look.

Post-actions and Tidying Up Afterwards

Once the code has been accepted by the reviewers and through potential automated testing suite(s) the content is ready to be merged. Oftentimes this work is completed by core maintainers of the project. After the code is merged, it’s usually a good idea to clean up your workspace by deleting your development branch and syncing with the upstream repository. While it’s up to core maintainers to decide on report closure, typically the reported need content can be closed and might benefit from a comment describing the fix. Many of these steps are considered common courtesy but also, importantly, assist in setting you up for your next contributions!

Concluding Thoughts

Hopefully the above helps you understand the open-source contribution process better. As stated earlier, every little part helps! Best wishes on your open-source journey and happy Codesgiving!

References

Previous post
Tip of the Week: Data Quality Validation through Software Testing Techniques
Next post
Python Memory Management and Troubleshooting