Data structures

Invent a storage format for arrays

Without using standard tools (e.g. xarray’s or numpy’s file io functions), think of a way of storing a numpy array on the filesystem. Please store the array’s contents in some binary (i.e. non-text) format. You may choose between text or binary representation for metadata.

  • You may find it easier to store the array and necessary metadata in separate files. This is ok, in that case, just consider a folder on your filesystem as your dataset.
  • You may want to look at the TLV data structure or the dictionary as list of key-value pairs for inspiration on storing metadata.

Tasks

  1. store a 1-dimensional array of floats on the filesystem and load it again into Python
  2. extend this to support arrays of any data type
  3. extend this to support multi-dimensional arrays
  4. (bonus) extend this to support multiple arrays, identified by their name

Push a branch containing the scripts which can perform the above mentioned tasks. Open a MR and add Tobias Kölling as reviewer until 2025-06-24.

Notes

To extract the underlying bytes from a numpy array, you can use bytes(), e.g.:

import numpy as np
data = bytes(np.array([1,2], dtype="<u2"))
data
b'\x01\x00\x02\x00'

… and to get it back into numpy, use np.frombuffer():

np.frombuffer(data, dtype="<u2")
array([1, 2], dtype=uint16)

Keep in mind, that when reading and writing bytes objects using Pythons file io, pass binary mode (b) to open:

with open("datafile", "wb") as outfile:
    outfile.write(data)

# or

open("datafile", "rb").read()

When packing multiple variables into a fixed byte-layout, you might also want to check out Python’s struct library.