Software Engineering Team CU Dept. of Biomedical Informatics

Navigating Dependency Chaos with Lockfiles

Navigating Dependency Chaos with Lockfiles

These blog posts are intended to provide software tips, concepts, and tools geared towards helping you achieve your goals. Views expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you have any questions or suggestions for blog posts, please don’t hesitate to reach out!

Introduction

Writing software often entails using code from other people to solve common challenges and take advantage of existing work. External software used by a specific project can be called a “dependency” (the software “depends” on that external work to accomplish tasks). Collections of software are oftentimes made available as “packages” through various platforms. Package management for dependencies, the task of managing collections of dependencies for a specific project, is a specialized area of software development that can involve the use of unique tools and files. This article will cover package dependency management through special files generally referred to as “lockfiles”.

Why use dependencies?

'Reinvent the Wheel' comic by Randall Munroe, XKCD.
‘Reinvent the Wheel’ comic by Randall Munroe, XKCD.

There are various advantages to using packaged dependencies in your projects. Using existing work this way practices a collective “don’t repeat yourself [or ourselves]” (DRY) among the global community of software developers to avoid reinventing the wheel. Using dependencies allows us to make explicit decisions about the specific focus, or context, which the project will prioritize. While it’s oftentimes easy to include and use dependencies in a project they come with risks that are important to consider.

See below for a rough list of reasons why one might opt to use specific dependencies in a project:

  1. Solutions which entail a lot of edge cases (particularly error prone).
  2. Solutions which need constant maintenance, i.e. a “frequently moving targets”.
  3. Solutions which require special domain knowledge or training to correctly implement.

A common dependency which demonstrates these aspects are those which assist with datetimes, timezones, and time deltas.

The dependency wilderness

Dependencies are often on their own unpredictable schedule outside of your project’s control.

Using existing software package dependencies helps conserve resources but comes with unique challenges related to unpredictability (such as when those dependencies are updated). This unpredictability can sometimes result in what’s colloquially called “dependency hell” or “dependency chaos”, where for example multiple external dependencies conflict with one another and are unable to be automatically resolved (among other issues). These challenges can be especially frustrating due to when they occur (often outside of our personal schedule awareness) and how long they can take to debug (finding fixes sometimes entails costly trial-and-error). It can feel like walking through a forest at night without a flashlight, constantly tripping over roots or running into stumps and branches!

Illuminating the dependency thicket

Software dependency choices may be understood through careful consideration between the cost of internal overwhelming invention vs external dependency chaos.

Dependency chaos can sometimes lead to “not invented here syndrome” where there’s less trust in external-facing work outside of an individual or group of people. When or if this happens it can be important to understand dependencies as a scale of choices between overwhelming invention and infinite dependency chaos. For example, to accomplish a small project it may not be wise to create a brand new programming language (towards the extreme of overwhelming invention). On the other hand, if we depended upon all existing work within a certain context the solution may not be specialized, efficient, or resourceful enough to meet the goals within a reasonable amount of time.

mindmap
  root((Project))
    Data storage
      File 1
      Database 2
    Data x processing
      Package X
      Package Y
    Integration
      Solution A
      Platform B

Dependency awareness and opportunity can be grouped into concerns and documented as part of a literature review (seen here as a mind map).

It can be helpful to reconsider existing knowledge on a topic area through formal or informal literature review (understanding that code within software is a type of literature) when thinking about the scale of decisions mentioned above. Outlining existing work through a literature review can help with second-order thinking revision where we might benefit from reflecting on dependency decision-making again after an initial (first-order) creative process. Each potential dependency discovered through this process can be organized using separation of concerns (SoC) under specific concern labels, or a general set of information which affects related code. Include dependencies within your project which will helpfully limit the code produced (or SoC sections) thereby reducing the overall amount of concerns the project must maintain.

Bounded contexts along with shared or distinct components can be used to help limit the complexity of a project in helpful ways.

The concept of bounded context from domain-driven design can sometimes be used to help distinguish what is in or out of scope for a particular project as a way of reducing complexity. Bounded context can be used as a way to draw abstract lines around a certain span of control in order to align available resources (like time and people) with the focus of the project. It also can help promote loose coupling of software components in order to enable flexible design over time. Without these considerations and the use of dependencies we might face “endless” software feature creep by continually adding new bounded contexts that are outside of our span of control (or resources).

Version constraints as dependency specification control

Version constraint Description of the version constraint
==2.1.0 Exactly and only version 2.1.0
>=2.0.0 Greater than or equal to version 2.0.0
>=2.0.0, <3.0.0 Greater than or equal to version 2.0.0 and less than 3.0.0
>=2.0.0, <3.0.0, !=2.5.1 Greater than or equal to version 2.0.0, less than 3.0.0, and anything that’s not exactly version 2.5.1

Version constraint specifications provide code-based descriptions for dependency versions within your project (Pythonic version specification examples above).

Many aspects of dependency chaos arise from the fact that dependencies are updated at various times. We often want to make certain we use the most up-to-date version of a dependency because those updates may come with performance, corrective, security, or other benefits. To accomplish this we can use what are sometimes called dependency “version range constraints” or “compliant version specifications” to provide some flexibility in how packages are installed for our projects. Version ranges are usually preferred to help keep software projects updated and also allow for flexible dependency resolutions (for example, when a single dependency is required by multiple other dependencies). These are often specific to the package management system and programming language being used. See the Python Packaging Authority’s Version Specifiers section for an example of how these version constraints work.

Many version specification constraints build upon ideas from semantic versioning (SemVer). Generally, SemVer uses a dotted three number syntax which includes a major, minor, and patch version separated by periods. For example, a SemVer 1.2.3 represents major version 1, minor version 2, patch 3. Developers may use of this type of specification to help differentiate the various releases of their software and help build user confidence about expected operations. See the Semantic Versioning specification at https://semver.org/ for more information about how SemVer works.

Version constraints can still be chaotic

Unintentional failures can occur due to timeline variations between internal projects and external dependencies.

We sometimes require repeatable behavior to be productive with a project in addition to the flexibility of version range specifications. For example, we may want for each developer and continuous integration step to have reproducible environments even if a dependency gets updated while internal development takes place. Dependency version constraints oftentimes aren’t enough on their own to prevent reproducibility issues from occurring. See the above diagram for a timeline depicting how Developer B and Developer D may have different experiences despite best efforts with version constraints (Dependency A may make a release that fits the version constraint but breaks Project C when Developer D tries to modify unrelated code).

Lockfiles for reproducible version constraint behavior

Version constraint lockfiles provide one way to ensure reproducible behaviors within your projects. Lockfiles are usually recommended to be included in source control, so one always has a complete snapshot (short of the literal full source code of the dependencies) of the project’s last known working configuration.

Lockfiles usually have the following characteristics (this varies by programming language and dependency type):

See the above modified timeline for Developer B and Developer D to better understand how their project will benefit from a shared lockfile and reproducible dependency installations.

Pythonic Example

Python Poetry command used Description of what occurs
poetry add pandas
  • Adds a caret-based version constraint specification based on the latest release (for example ^2.2.1) within a pyproject.toml file. This version constraint can be understood as >= 2.2.1, < 2.3.0.
  • Create or update the poetry.lock lockfile with known compatible versions of Pandas based on the version constraint mentioned above.
  • Installs the version of Pandas which matches the pyproject.toml and poetry.lock specifications.
poetry install Installs the version of Pandas which matches the pyproject.toml and poetry.lock specifications (for example, within a new environment or for another developer).
poetry update pandas
  • Poetry checks for available Pandas releases which are compatible with the version constraint (for ex. ^2.2.1).
  • If there are new versions available which match the constraint, Poetry will update the poetry.lock lockfile and install the matching version.
poetry lock
  • Update all dependencies referenced in the poetry.lock lockfile with the latest compatible versions based on the version constraints specified within the pyproject.toml.
  • Optionally, if the --no-update flag is also used, refresh the dependency versions referenced within the poetry.lock lockfile based on version constraints specified within the pyproject.toml without seeking updated dependency releases.

Use Poetry commands to implement dependency version constraints and lockfiles for reproducible Python project environments.

Poetry is a Python packaging and dependency management tool which implements version constraints and lockfiles to help developers maintain their software projects. Using commands like poetry add ... and poetry lock automatically creates poetry.lock lockfiles based on specifications which are added either automatically or manually to pyproject.toml files. Similar to other tools, Poetry can operate with or without poetry.lock lockfiles (see here for more information). Another alternative to Poetry which makes use of lockfiles is PDM (pdm.lock files).

Avoiding over-constrained dependencies

Automated dependency checking tools like Dependabot or Renovate can be used to reduce project risk through timely dependency update changes assisted by human reviewers.

Using dependency version constraints and lockfiles are helpful for reproducibility but imply a risk of over-constraint. Two important over-constraint considerations are:

Make sure to address these risks by routinely considering whether your dependencies need to be updated (manually) or through the use of automated tools like GitHub’s Dependabot or Mend Renovate. Tools like Dependabot or Renovate enable scheduled checks and updates to be applied to your project which can lead to a balanced way of ensuring risk reduction and productive future-focused development.

Concluding Thoughts

This article covered why dependencies are used, what complications they come with, and some tools to use addressing those challenges. Every project can vary quite a bit when it comes to dependency management decision making and maintenance. We hope you find success with dependency management through these and look forward to providing more information on this topic in the future.

Previous post
Python Memory Management and Troubleshooting
Next post
Parquet: Crafting Data Bridges for Efficient Computation