Starting with hrtfpykit.sofa

hrtfpykit.sofa is the file-level layer for reading, inspecting, validating, editing, cloning, copying, and saving .sofa files. The goal is to understand a SOFA file as a structured object before moving into higher-level HRTF workflows: dimensions, variables, global attributes, variable attributes, convention metadata, stored acoustic arrays, and file provenance.

If the notebook environment is not ready yet, start with Set Up. The first code cell prepares one measured HUTUBS SOFA file, pp1_HRIRs_measured.sofa, with the HUTUBS dataset class. After the file is available locally, all operations stay in hrtfpykit.sofa and use load_sofa plus the SOFA object directly.

Following the notebook teaches the complete file-level workflow: running security and convention checks, loading a SOFA file, reading the full summary, inspecting dimensions and metadata, cloning before editing, modifying stored Data.IR, creating a structured copy when a dimension size changes, saving the edited file, and reloading it to verify the result. By the end, you should know when direct SOFA handling is the right level of abstraction and how file-level changes relate to the HRTF object workflows introduced later.

Download one HUTUBS SOFA file

A relative dataset root keeps the same code usable from any project folder. A HUTUBS dataset object prepares one measured SOFA file for pp1. The resulting file is datasets/hutubs/pp1_HRIRs_measured.sofa; every section after this one works directly with that SOFA file through hrtfpykit.sofa.

from pathlib import Path

from hrtfpykit.datasets import HUTUBS

# Define the local dataset root.
root = Path("datasets/hutubs")

# Exclude all HUTUBS subjects except pp1.
exclude_subject_ids = tuple(f"pp{i}" for i in range(2, 97))

# Download only the measured HRTF resource for pp1.
HUTUBS(
    root=root,
    download=True,
    download_resources="hrtf",
    download_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    verify_checksum=True,
)

# Build the path to the downloaded SOFA file.
sofa_path = root / "pp1_HRIRs_measured.sofa"

# Stop early if the expected file is not available.
if not sofa_path.exists():
    raise FileNotFoundError(f"Expected SOFA file was not found: {sofa_path}")

print(sofa_path)

Check the file before loading it (optional)

check_sofa_security inspects the HDF5/netCDF safety context and looks for suspicious metadata before the file is used in the rest of the workflow. SOFA files are stored as netCDF/HDF5 containers, so the check verifies that the linked HDF5 runtime meets the minimum safety baseline used by hrtfpykit and reports possible parser-risk categories such as memory-corruption, remote-code-execution, and denial-of-service exposure at a high level. It also scans metadata for external links, domains, and suspicious file extensions.

For this HUTUBS file, the printed security report will show Security check [STANDARD]: FAILED because the metadata contains an external website/domain link. That does not automatically mean the acoustic data are unusable; it means the preflight check found metadata that should be reviewed instead of silently trusted.

check_sofa_against_conventions compares the file against the SOFA convention declared in its global attributes. This explicit convention check is optional in the tutorial because load_sofa runs it by default when its check_sofa_against_conventions parameter is True. The separate call below is included only to make the resolved convention report visible before loading the file.

from hrtfpykit.sofa import check_sofa_against_conventions, check_sofa_security

# Run a standard SOFA security check.
security_report = check_sofa_security(
    sofa_path,
    print_report=True,
)

# Check the file against its declared SOFA convention.
convention_report = check_sofa_against_conventions(sofa_path)

# Inspect the convention resolved by the validation check.
print(convention_report["convention"])

Load the SOFA file

load_sofa opens the file and returns a SOFA object. The object keeps the netCDF4 storage handle and exposes the main SOFA surfaces through Dimensions, GlobalAttributes, Variables, and VariableAttributes.

from hrtfpykit.sofa import load_sofa

# Load the SOFA file as a hrtfpykit SOFA object.
sofa = load_sofa(sofa_path)

# Check the path attached to the loaded object.
print(sofa.path)

Inspect the full SOFA summary

summary returns a text view of the global attributes, variables, dimensions, and variable attributes. Printing the complete summary gives a first complete view of the file before moving into specific collection wrappers.

# Build the complete SOFA summary.
summary = sofa.summary()

# Print the complete SOFA summary.
print(summary)

Inspect dimensions

SOFA dimensions define the axes used by variables. In a SimpleFreeFieldHRIR file, M usually indexes measurements or source positions, R indexes receivers, and N indexes HRIR time samples.

# Access the dimension collection.
dimensions = sofa.Dimensions

# List all dimension names.
print(dimensions.get_names())

# Inspect the dimensions used by the HRIR data.
for name in ("M", "R", "N"):
    dimension = dimensions.get(name)
    print(f"{name}: size={dimension.value}, unlimited={dimension.is_unlimited}")

# Print the compact dimensions summary.
print(dimensions.summary())

Inspect global attributes

Global attributes describe the file-level convention and metadata. The most important attributes for interpretation are the SOFA convention name, convention version, data type, and modification metadata.

# Access the global attribute collection.
global_attributes = sofa.GlobalAttributes

# List available global attribute names.
global_attribute_names = global_attributes.get_names()
print(global_attribute_names)

# Inspect key convention attributes when they are present.
for name in ("SOFAConventions", "SOFAConventionsVersion", "DataType", "DateCreated", "DateModified"):
    if name in global_attribute_names:
        print(f"{name}: {global_attributes.get(name).value}")

Inspect variables and stored values

Variables exposes the arrays stored in the SOFA file. For this HUTUBS file, the main acoustic data are stored in Data.IR, while SourcePosition, ReceiverPosition, and Data.SamplingRate describe how to interpret those HRIR samples.

# Access the variable collection.
variables = sofa.Variables

# List all stored variable names.
print(variables.get_names())

# Read the HRIR array.
ir = variables.get("Data.IR").value

# Read the source positions.
source_positions = variables.get("SourcePosition").value

# Read the sampling rate.
sampling_rate = variables.get("Data.SamplingRate").value

# Inspect shapes and representative values.
print("Data.IR shape:", ir.shape)
print("SourcePosition shape:", source_positions.shape)
print("Data.SamplingRate (Hz):", sampling_rate)
print("First source position (azimuth degrees, elevation degrees, radius meters):", source_positions[0])

Inspect variable attributes

Variable attributes attach units, coordinate-system information, and semantic labels to individual SOFA variables. hrtfpykit exposes them with Variable:Attribute keys.

# Access the variable attribute collection.
variable_attributes = sofa.VariableAttributes

# Inspect the source position coordinate system.
print("SourcePosition:Type:", variable_attributes.get("SourcePosition:Type").value)
print("SourcePosition:Units:", variable_attributes.get("SourcePosition:Units").value)

# Inspect sampling rate units.
print("Data.SamplingRate:Units:", variable_attributes.get("Data.SamplingRate:Units").value)

Clone before editing

The safest editing workflow is to clone the loaded SOFA object, modify the clone, and save it to a new file. The original loaded file remains available for comparison and is not changed by edits made to the clone.

# Create an independent in-memory SOFA clone.
editable = sofa.clone()

# Confirm that the clone is not attached to a file path yet.
print(editable.path)

# Confirm that the original object does not contain the tutorial note.
print("TutorialNote" in sofa.GlobalAttributes.get_names())

Edit metadata on the clone

Metadata edits are a good first modification because they do not alter the acoustic arrays. The SOFA API separates global attributes from variable attributes, so file-level notes and per-variable notes are edited with different methods.

# Create a file-level tutorial note.
editable.create_global_attribute(
    "TutorialNote",
    "Created in the Starting with hrtfpykit.sofa tutorial",
)

# Modify the file-level note.
editable.modify_global_attribute(
    "TutorialNote",
    "Edited in the Starting with hrtfpykit.sofa tutorial",
)

# Add a note to the HRIR variable.
editable.create_variable_attribute(
    "Data.IR:TutorialNote",
    "HRIR samples edited on an in-memory clone",
)

# Read the edited metadata back from the clone.
print(editable.GlobalAttributes.get("TutorialNote").value)
print(editable.VariableAttributes.get("Data.IR:TutorialNote").value)

Edit stored data on the clone

modify_variable replaces the stored values of an existing SOFA variable while preserving the variable definition, dimensions, dtype, and attributes. Here we zero the first eight HRIR samples only on the clone.

import numpy as np

# Read the cloned HRIR array.
editable_ir = editable.Variables.get("Data.IR").value

# Create an edited copy of the HRIR array.
edited_ir = np.array(editable_ir, copy=True)

# Zero the first eight samples for every measurement and receiver.
edited_ir[..., :8] = 0.0

# Write the edited values back to the clone.
editable.modify_variable("Data.IR", edited_ir)

# Compare original and edited values.
print("Original first samples:", sofa.Variables.get("Data.IR").value[0, 0, :8])
print("Edited first samples:", editable.Variables.get("Data.IR").value[0, 0, :8])

Add a small derived variable

The SOFA API can also create dimensions and create variables. This is useful for controlled metadata or derived values, but convention-required acoustic variables should be changed carefully. This example uses a separate clone so the derived variable is inspected without being saved into the edited tutorial file.

# Create a separate clone for the derived variable example.
derived = sofa.clone()

# Create a custom dimension for a small tutorial vector.
derived.create_dimension("Q", 3)

# Create a custom variable using the new dimension.
derived.create_variable(
    "TutorialVector",
    [1.0, 2.0, 3.0],
    ("Q",),
    attributes={
        "Units": "1",
        "Description": "Small derived vector created in the SOFA tutorial",
    },
)

# Inspect the derived variable and one of its attributes.
print(derived.Variables.get("TutorialVector").value)
print(derived.VariableAttributes.get("TutorialVector:Units").value)

Create a structured copy with changed dimensions

copy_with is useful when a change also requires a fixed dimension size to change. This matters because SOFA variables are backed by netCDF dimensions: if Data.IR is stored with shape (M, R, N), direct replacement with modify_variable must still fit the current (M, R, N) shape. If the new array has a different sample length, netCDF rejects the assignment because the variable dimensions no longer match.

copy_with creates a new in-memory SOFA object where selected fixed dimensions and variables are replaced together. This example creates a separate cropped copy with a shorter N dimension and a matching shorter Data.IR array.

# Read the original HRIR array.
original_ir = sofa.Variables.get("Data.IR").value

# Create a structured copy with a shorter sample dimension.
cropped = sofa.copy_with(
    dim_sizes={"N": 128},
    global_attributes={
        "TutorialNote": "Cropped copy created with SOFA.copy_with",
    },
    variables={
        "Data.IR": original_ir[..., :128],
    },
)

# Compare the original and cropped shapes.
print("Original shape:", sofa.Variables.get("Data.IR").value.shape)
print("Cropped shape:", cropped.Variables.get("Data.IR").value.shape)
print("Cropped N:", cropped.Dimensions.get("N").value)

Save the edited SOFA file

save writes the edited clone to disk. Because the clone was modified, hrtfpykit also updates DateModified and writes a hrtfpykit provenance global attribute. The tutorial saves into a separate output folder so the original downloaded file remains unchanged.

# Define the output folder for tutorial files.
output_dir = root / "tutorial_outputs"

# Create the output folder if needed.
output_dir.mkdir(parents=True, exist_ok=True)

# Define the edited SOFA output path.
edited_sofa_path = output_dir / "pp1_HRIRs_measured_sofa_api_tutorial.sofa"

# Save the edited clone without touching the original file.
saved_path = editable.save(
    edited_sofa_path,
    overwrite=True,
)

# Inspect the saved path.
print(saved_path)

Reload and confirm the saved file

A reliable editing workflow should finish by reloading the saved file with load_sofa and checking that the expected changes are present. The saved file contains tutorial metadata, so the reload below skips convention checking and focuses on confirming the saved values.

# Reload the saved SOFA file without convention checking.
reloaded = load_sofa(
    saved_path,
    check_sofa_against_conventions=False,
)

# Inspect the tutorial metadata.
print(reloaded.GlobalAttributes.get("TutorialNote").value)

# Inspect the hrtfpykit provenance metadata added on save.
print(reloaded.GlobalAttributes.get("hrtfpykit").value)

# Confirm the edited HRIR samples are present in the saved file.
print(reloaded.Variables.get("Data.IR").value[0, 0, :8])