Coding environment & reproducibility

How to fail consistently

Coding environment

We are all products of our environment — C.J. Heck

The coding environment affects the result of running software

  • Operating system
  • Software version
  • Presence of dependencies

Software version

  • Software is released in different versions

  • On the command line, you can usually use --version or -V to print the version info

    bash --version
    python --version
  • Semantic versioning proposes rules how to assign and increment version numbers

    MAJOR.MINOR.PATCH

Dependencies

  • Software may depend on other software/libraries

  • Efficient/architecture-specific implementations

  • Reliability due to larger community

  • Dependencies are not always pre-installed:

    python3 -c "import numpy"

Hands-on session

  • Find out which Python version is available on your machine
  • Find out which Python version is available on levante
  • Do they provide NumPy?

Environments

The context in which software is created or exectued

  • Local/development environment
  • Testing environment
  • Production environment

It works on my machine 🤷

It works in this environment! 🚀

Package manager

Tools to install packages and their dependencies

  • General purpose package manager (brew, apt, micromamba)

    brew install coreutils
  • Python package manager (pip, micromamba, uv, pixi)

    python3 -m pip install numpy

Package manager

Tools to install packages and their dependencies

  • General purpose package manager (brew, apt, micromamba)

    brew install coreutils@9.7
  • Python package manager (pip, micromamba, uv, pixi)

    python3 -m pip install "numpy==2.0.2"

Virtual environments (venv)

  • Contain all dependencies needed to support a project
  • Isolated from other virtual environments (and the OS)
  • Not checked into source control systems such as Git
  • Considered as disposable
  • Not considered as movable or copyable
  • See PEP 405 for more info on python virtual environments

Specifying a python environment

  • micromamba is a widely used package manager for python

  • Handles virtual environments, general purpose packages (and libraries), and python packages

  • Supports YAML file to specify environments:

    environment.yaml
    name: my-environment
    channels:
      - conda-forge
    dependencies:
      - python=3.12
      - numpy>=2
      - matplotlib=3.10.*
      - scipy=1.15.2

Hands-on session

  • Install micromamba on your machine

  • Create an environment.yaml with some dependencies

  • Create the environment using

    micromamba env create -f environment.yaml
  • Activate the environment (micromamba --help)

  • Run the following code again in the terminal:

    python --version
    python -c "import numpy; print(numpy.__version__)"

Reproducibility

  • Resolving a set of dependencies is deterministic, in theory

  • Depends on creation time (new package versions)

  • May depend on the machine

  • For reproducibility, it is important to export the resolved environment, e.g., micromamba env export:

    name: my-environment
    channels:
      - conda-forge
    dependencies:
      - brotli=1.1.0=hd74edd7_2
      - brotli-bin=1.1.0=hd74edd7_2
      - bzip2=1.0.8=h99b78c6_7
      - ca-certificates=2025.1.31=hf0a4a13_0
      - contourpy=1.3.1=py312hb23fbb9_0
      - cycler=0.12.1=pyhd8ed1ab_1
      - fonttools=4.57.0=py312h998013c_0
      - freetype=2.13.3=h1d14073_0
      - kiwisolver=1.4.8=py312h2c4a281_0
      - lcms2=2.17=h7eeda09_0
      - lerc=4.0.0=h9a09cb3_0
      - libblas=3.9.0=31_h10e41b3_openblas
      - libbrotlicommon=1.1.0=hd74edd7_2
      - libbrotlidec=1.1.0=hd74edd7_2
      - libbrotlienc=1.1.0=hd74edd7_2
      - libcblas=3.9.0=31_hb3479ef_openblas
      - libcxx=20.1.2=ha82da77_0
      - libdeflate=1.23=hec38601_0
      - libexpat=2.7.0=h286801f_0
      - libffi=3.4.6=h1da3d7d_1
      - libgfortran=14.2.0=heb5dd2a_105
      - libgfortran5=14.2.0=h2c44a93_105
      - libjpeg-turbo=3.0.0=hb547adb_1
      - liblapack=3.9.0=31_hc9a63f6_openblas
      - liblzma=5.8.1=h39f12f2_0
      - libopenblas=0.3.29=openmp_hf332438_0
      - libpng=1.6.47=h3783ad8_0
      - libsqlite=3.49.1=h3f77e49_2
      - libtiff=4.7.0=h551f018_3
      - libwebp-base=1.5.0=h2471fea_0
      - libxcb=1.17.0=hdb1d25a_0
      - libzlib=1.3.1=h8359307_2
      - llvm-openmp=20.1.2=hdb05f8b_1
      - matplotlib=3.10.1=py312h1f38498_0
      - matplotlib-base=3.10.1=py312hdbc7e53_0
      - munkres=1.1.4=pyh9f0ad1d_0
      - ncurses=6.5=h5e97a16_3
      - numpy=2.2.4=py312h7c1f314_0
      - openjpeg=2.5.3=h8a3d83b_0
      - openssl=3.5.0=h81ee809_0
      - packaging=24.2=pyhd8ed1ab_2
      - pillow=11.1.0=py312h50aef2c_0
      - pip=25.0.1=pyh8b19718_0
      - pthread-stubs=0.4=hd74edd7_1002
      - pyparsing=3.2.3=pyhd8ed1ab_1
      - python=3.12.10=hc22306f_0_cpython
      - python-dateutil=2.9.0.post0=pyhff2d567_1
      - python_abi=3.12=6_cp312
      - qhull=2020.2=h420ef59_5
      - readline=8.2=h1d1bf99_2
      - scipy=1.15.2=py312h99a188d_0
      - setuptools=78.1.0=pyhff2d567_0
      - six=1.17.0=pyhd8ed1ab_0
      - tk=8.6.13=h5083fa2_1
      - tornado=6.4.2=py312hea69d52_0
      - tzdata=2025b=h78e105d_0
      - unicodedata2=16.0.0=py312hea69d52_0
      - wheel=0.45.1=pyhd8ed1ab_1
      - xorg-libxau=1.0.12=h5505292_0
      - xorg-libxdmcp=1.1.5=hd74edd7_0
      - zstd=1.5.7=h6491c7d_2
    
    prefix: "/Users/lkluft/micromamba/envs/my-environment"
    
    
    
    
    
    
    

Hands-on session

  • Run micromamba env export
  • Compare your output with that of someone else in the class

Consistency

  • Predictable structure makes the code easier to follow
  • Reduces cognitive load
  • Easier collaboration

Code formatting

import numpy as np
def hypot(a,b):
  """Given the legs of a right triangle, return its hypotenuse."""
  return np.sqrt(a ** 2+b**2)

def is_even(a):
    '''Check if a given number is even.'''
    answer=42
    return a % 2==0

Code formatting

import numpy as np


def hypot(a, b):
    """Given the legs of a right triangle, return its hypotenuse."""
    return np.sqrt(a**2 + b**2)


def is_even(a):
    """Check if a given number is even."""
    answer = 42
    return a % 2 == 0

Automated code formatting

Code formatters are used to apply style guidelines

  • They follow standards (e.g. PEP8) but can be customized
  • They format code in a deterministic way
  • For Python: ruff or black
ruff format test.py

Static code analysis

Code linters are used to statically analyse code

  • Find unused variables, style violations, and more
  • For Python: ruff or flake8
ruff check test.py
F841 Local variable `answer` is assigned to but never used
  --> test.py:11:5
   |
 9 | def is_even(a):
10 |     """Check if a given number is even."""
11 |     answer = 42
   |     ^^^^^^
12 |     return a % 2 == 0
   |
help: Remove assignment to unused variable `answer`

Take home messages

  • The results of software depend on the coding environment
  • Virtual environments help to isolate dependencies
  • To ensure reproducibility (for you and others), it’s essential to share resolved environments

Shotgun buffet

Module system

  • DKRZ uses Environment Modules to provide different versions of various packages

  • List all available modules

    module avail
  • Activate a certain module (with implicit/or explicit version)

    module load python3
    module load python3/unstable
  • Unload all currently loaded modules (useful at the beginning of scripts)

    module purge

Containerization

  • Packaging together software code with all it’s necessary components

    It’s basically a fully functional and portable computing environment. — RedHat

  • Most well-known are docker (de facto standard) and Apptainer (open-source alternative)

  • Can be scaled and orchestrated on larger platforms (Kubernetes)

pixi

  • Pixi is a promising package management tool for python
  • It builds on the the existing conda ecosystem, but replaces all it’s tools
  • Faster resolution and creation of environments
  • Built-in reproducibility through lock files (pixi.lock)

uv

  • uv is yet another promising package manager for python
  • It builds on the the existing PyPI ecosystem
  • Faster resolution and creation of environments
  • Built-in reproducibility through lock files (uv)