Software Engineering Team CU Dept. of Biomedical Informatics

Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster

Tip of the Week: Using Python and Anaconda with the Alpine HPC Cluster

Each week we seek to provide a software tip of the week geared towards helping you achieve your software goals. Views expressed in the content belong to the content creators and not the organization, its affiliates, or employees. If you have any software questions or suggestions for an upcoming tip of the week, please don’t hesitate to reach out to #software-engineering on Slack or email DBMISoftwareEngineering at olucdenver.onmicrosoft.com

This post is intended to help demonstrate the use of Python on Alpine, a High Performance Compute (HPC) cluster hosted by the University of Colorado Boulder’s Research Computing. We use Python here by way of Anaconda environment management to run code on Alpine. This readme will cover a background on the technologies and how to use the contents of an example project repository as though it were a project you were working on and wanting to run on Alpine.

Diagram showing a repository’s work as being processed on Alpine.

Table of Contents

  1. Background: here we cover the background of Alpine and related technologies.
  2. Implementation: in this section we use the contents of an example project repository on Alpine.

Background

Why would I use Alpine?

Diagram showing common benefits of Alpine and HPC clusters.

Alpine is a High Performance Compute (HPC) cluster. HPC environments provide shared computer hardware resources like memory, CPU, GPU or others to run performance-intensive work. Reasons for using Alpine might include:

How does Alpine work?

Diagram showing high-level user workflow and Alpine components.

Alpine’s compute resources are used through compute nodes in a system called Slurm. Slurm is a system that a large number of users to run jobs on a cluster of computers; the system figures out how to use all the computers in the cluster to execute all the user’s jobs fairly (i.e., giving each user approximately equal time and resources on the cluster). A job is a request to run something, e.g. a bash script or a program, along with specifications about how much RAM and CPU it needs, how long it can run, and how it should be executed.

Slurm’s role in general is to take in a job (submitted via the sbatch command) and put it into a queue (also called a “partition” in Slurm). For each job in the queue, Slurm constantly tries to find a computer in the cluster with enough resources to run that job, then when an available computer is found runs the program the job specifies on that computer. As the program runs, Slurm records its output to files and finally reports the program’s exit status (either completed or failed) back to the job manager.

Importantly, jobs can either be marked as interactive or batch. When you submit an interactive job, sbatch will pause while waiting for the job to start and then connect you to the program, so you can see its output and enter commands in real time. On the other hand, a batch job will return immediately; you can see the progress of your job using squeue, and you can typically see the output of the job in the folder from which you ran sbatch unless you specify otherwise. Data for or from Slurm work may be stored temporarily on local storage or on user-specific external (remote) storage.

Wait, what are “nodes”?

A simplified way to understand the architecture of Slurm on Alpine is through login and compute “nodes” (computers). Login nodes act as a place to prepare and submit jobs which will be completed on compute nodes. Login nodes are never used to execute Slurm jobs, whereas compute nodes are exclusively accessed via a job. Login nodes have limited resource access and are not recommended for running procedures.

One can interact with Slurm on Alpine by use of Slurm interfaces and directives. A quick way of accessing Alpine resources is through the use of the acompile command, which starts an interactive job on a compute node with some typical default parameters for the job. Since acompile requests very modest resources (1 hour and 1 CPU core at the time of writing), you’ll typically quickly be connected to a compute node. For more intensive or long-lived interactive jobs, consider using sinteractive, which allows for more customization: Interactive Jobs. One can also access Slurm directly through various commands on Alpine.

Many common software packages are available through the Modules package on Alpine (UCB RC documentation: The Modules System).

How does Slurm work?

Diagram showing how Slurm generally works.

Using Alpine effectively involves knowing how to leverage Slurm. A simplified way to understand how Slurm works is through the following sequence. Please note that some steps and additional complexity are omitted for the purposes of providing a basis of understanding.

  1. Create a job script: build a script which will configure and run procedures related to the work you seek to accomplish on the HPC cluster.
  2. Submit job to Slurm: ask Slurm to run a set of commands or procedures.
  3. Job queue: Slurm will queue the submitted job alongside others (recall that the HPC cluster is a shared resource), providing information about progress as time goes on.
  4. Job processing: Slurm will run the procedures in the job script as scheduled.
  5. Job completion or cancellation: submitted jobs eventually may reach completion or cancellation states with saved information inside Slurm regarding what happened.

How do I store data on Alpine?

Data used or produced by your processed jobs on Alpine may use a number of different data storage locations. Be sure to follow the Acceptable data storage and use policies of Alpine, avoiding the use of certain sensitive information and other items. These may be distinguished in two ways:

  1. Alpine local storage (sometimes temporary): Alpine provides a number of temporary data storage locations for accomplishing your work. ⚠️ Note: some of these locations may be periodically purged and are not a suitable location for long-term data hosting (see here for more information)!
    Storage locations available (see this link for full descriptions):

    • Home filesystem: 2 GB of backed up space under /home/$USER (where $USER is your RMACC or Alpine username).
    • Projects filesystem: 250 GB of backed up space under /projects/$USER (where $USER is your RMACC or Alpine username).
    • Scratch filesystem: 10 TB (10,240 GB) of space which is not backed up under /scratch/alpine/$USER (where $USER is your RMACC or Alpine username).
  2. External / remote storage: Users are encouraged to explore external data storage options for long-term hosting.
    Examples may include the following:

How do I send or receive data on Alpine?

Diagram showing external data storage being used to send or receive data on Alpine local storage.

Data may be sent to or gathered from Alpine using a number of different methods. These may vary contingent on the external data storage being referenced, the code involved, or your group’s available resources. Please reference the following documentation from the University of Colorado Boulder’s Research Computing regarding data transfers: The Compute Environment - Data Transfer. Please note: due to the authentication configuration of Alpine many local or SSH-key based methods are not available for CU Anschutz users. As a result, Globus represents one of the best options available (see 3. 📂 Transfer data results below). While the Globus tutorial in this document describes how you can download data from Alpine to your computer, note that you can also use Globus to transfer data to Alpine from your computer.

Implementation

Diagram showing how an example project repository may be used within Alpine through primary steps and processing workflow.

Use the following steps to understand how Alpine may be used with an example project repository to run example Python code.

0. 🔑 Gain Alpine access

First you will need to gain access to Alpine. This access is provided to members of the University of Colorado Anschutz through RMACC and is separate from other credentials which may be provided by default in your role. Please see the following guide from the University of Colorado Boulder’s Research Computing covering requesting access and generally how this works for members of the University of Colorado Anschutz.

1. 🛠️ Prepare code on Alpine

[username@xsede.org@login-ciX ~]$ cd /projects/$USER
[username@xsede.org@login-ciX username@xsede.org]$ git clone https://github.com/CU-DBMI/example-hpc-alpine-python
Cloning into 'example-hpc-alpine-python'...
... git output ...
[username@xsede.org@login-ciX username@xsede.org]$ ls -l example-hpc-alpine-python
... ls output ...

An example of what this preparation section might look like in your Alpine terminal session.

Next we will prepare our code within Alpine. We do this to balance the fact that we may develop and source control code outside of Alpine. In the case of this example work, we assume git as an interface for GitHub as the source control host.

Below you’ll find the general steps associated with this process.

  1. Login to the Alpine command line (reference this guide).
  2. Change directory into the Projects filesystem (generally we’ll assume processed data produced by this code are large enough to warrant the need for additional space):
    cd /projects/$USER
  3. Use git (built into Alpine by default) commands to clone this repo:
    git clone https://github.com/CU-DBMI/example-hpc-alpine-python
  4. Verify the contents were received as desired (this should show the contents of an example project repository):
    ls -l example-hpc-alpine-python

What if I need to authenticate with GitHub?

There are times where you may need to authenticate with GitHub in order to accomplish your work. From a GitHub perspective, you will want to use either GitHub Personal Access Tokens (PAT) (recommended by GitHub) or SSH keys associated with the git client on Alpine. Note: if you are prompted for a username and password from git when accessing a GitHub resource, the password is now associated with other keys like PAT’s instead of your user’s password (reference). See the following guide from GitHub for more information on how authentication through git to GitHub works:

2. ⚙️ Implement code on Alpine

[username@xsede.org@login-ciX ~]$ sbatch --export=CSV_FILEPATH="/projects/$USER/example_data.csv" example-hpc-alpine-python/run_script.sh
[username@xsede.org@login-ciX username@xsede.org]$ tail -f example-hpc-alpine-python.out
... tail output (ctrl/cmd + c to cancel) ...
[username@xsede.org@login-ciX username@xsede.org]$ head -n 2 example_data.csvexample-hpc-alpine-python
... data output ...

An example of what this implementation section might look like in your Alpine terminal session.

After our code is available on Alpine we’re ready to run it using Slurm and related resources. We use Anaconda to build a Python environment with specified packages for reproducibility. The main goal of the Python code related to this work is to create a CSV file with random data at a specified location. We’ll use Slurm’s sbatch command, which submits batch scripts to Slurm using various options.

  1. Use the sbatch command with exported variable CSV_FILEPATH.
    sbatch --export=CSV_FILEPATH="/projects/$USER/example_data.csv" example-hpc-alpine-python/run_script.sh
  2. After a short moment, use the tail command to observe the log file created by Slurm for this sbatch submission. This file can help you understand where things are at and if anything went wrong.
    tail -f example-hpc-alpine-python.out
  3. Once you see that the work has completed from the log file, take a look at the top 2 lines of the data file using the head command to verify the data arrived as expected (column names with random values):
    head -n 2 example_data.csv

3. 📂 Transfer data results

Diagram showing how example_data.csv may be transferred from Alpine to a local machine using Globus solutions.

Now that the example data output from the Slurm work is available we need to transfer that data to a local system for further use. In this example we’ll use Globus as a data transfer method from Alpine to our local machine. Please note: always be sure to check data privacy and policy which change the methods or storage locations you may use for your data!

  1. Globus local machine configuration
    1. Install Globus Connect Personal on your local machine.
    2. During installation, you will be prompted to login to Globus. Use your ACCESS credentials to login.
    3. During installation login, note the label you provide to Globus. This will be used later, referenced as “Globus Connect Personal label”.
    4. Ensure you add and (importantly:) provide write access to a local directory via Globus Connect Personal - Preferences - Access where you’d like the data to be received from Alpine to your local machine.

  2. Globus web interface
    1. Use your ACCESS credentials to login to the Globus web interface.
    2. Configure File Manager left side (source selection)
      1. Within the Globus web interface on the File Manager tab, use the Collection input box to search or select “CU Boulder Research Computing ACCESS”.
      2. Within the Globus web interface on the File Manager tab, use the Path input box to enter: /projects/your_username_here/ (replacing “your_username_here” with your username from Alpine, including the “@” symbol if it applies).
    3. Configure File Manager right side (destination selection)
      1. Within the Globus web interface on the File Manager tab, use the Collection input box to search or select the __Globus Connect Personal label you provided in earlier steps.
      2. Within the Globus web interface on the File Manager tab, use the Path input box to enter the local path which you made accessible in earlier steps.
    4. Begin Globus transfer
      1. Within the Globus web interface on the File Manager tab on the left side (source selection), check the box next to the file example_data.csv.
      2. Within the Globus web interface on the File Manager tab on the left side (source selection), click the “Start ▶️” button to begin the transfer from Alpine to your local directory.
      3. After clicking the “Start ▶️” button, you may see a message in the top right with the message “Transfer request submitted successfully”. You can click the link to view the details associated with the transfer.
      4. After a short period, the file will be transferred and you should be able to verify the contents on your local machine.

Further References

Previous post
Tip of the Week: Automate Software Workflows with GitHub Actions
Next post
Tip of the Week: Python Packaging as Publishing