Starting with hrtfpykit.sofa¶

hrtfpykit.sofa is hrtfpykit’s direct file-level interface for .sofa containers. It exposes the SOFA structure as Python objects: dimensions, variables, global attributes, variable attributes, convention metadata, stored acoustic arrays, and the netCDF4 file handle behind them. The HRTF layer uses this SOFA layer internally when it loads SimpleFreeFieldHRIR and SimpleFreeFieldHRTF files, but the SOFA layer is also useful on its own for stable SOFA conventions registered in hrtfpykit.

If the notebook environment is not ready yet, start with Set Up. The first code cell prepares one measured SONICOM SOFA file, P0001_FreeFieldComp_44kHz.sofa, with the SONICOM dataset class. After the file is available locally, the workflow stays in hrtfpykit.sofa and uses load_sofa plus the SOFA object directly.

Following the notebook teaches the file-level workflow used underneath the rest of hrtfpykit: running security and convention checks, loading a SOFA file, managing the open netCDF4 handle, reading summaries, inspecting dimensions and metadata, cloning before editing, modifying stored Data.IR, creating a structured copy when a dimension size changes, saving the edited file, and reloading it to verify the result. By the end, you should understand how direct SOFA handling fits in the library, why hrtfpykit.hrtf builds on it for HRTF-specific workflows, and when a standalone SOFA workflow is the more appropriate level of abstraction.

Download one SONICOM SOFA file¶

A relative dataset root keeps the same code usable from any project folder. A SONICOM dataset object prepares one measured SOFA file for P0001. The resulting file is datasets/sonicom/P0001/HRTF/HRTF/44kHz/P0001_FreeFieldComp_44kHz.sofa; every section after this one works directly with that SOFA file through hrtfpykit.sofa.

from pathlib import Path

from hrtfpykit.datasets import SONICOM

# Define the local dataset root.
root = Path("datasets/sonicom")

# Keep this tutorial small: use only P0001.
selected_subject_ids = ("P0001",)

# Download only the measured 44.1 kHz FreeFieldComp HRTF resource for P0001.
SONICOM(
    root=root,
    download=True,
    download_resources="hrtf",
    download_hrtf_variant={
        "type": "measured",
        "sample_rate": 44100,
        "version": "FreeFieldComp",
    },
    download_server="imperial",
    download_subject_ids=selected_subject_ids,
    subject_ids=selected_subject_ids,
    verify_checksum=True,
)

# Build the path to the downloaded SOFA file.
sofa_path = root / "P0001" / "HRTF" / "HRTF" / "44kHz" / "P0001_FreeFieldComp_44kHz.sofa"

# Stop early if the expected file is not available.
if not sofa_path.exists():
    raise FileNotFoundError(f"Expected SOFA file was not found: {sofa_path}")

print(sofa_path)

Check the file before loading it (optional)¶

check_sofa_security inspects the HDF5/netCDF safety context and looks for suspicious metadata before the file is used in the rest of the workflow. SOFA files are stored as netCDF/HDF5 containers, so the check verifies that the linked HDF5 runtime meets the minimum safety baseline used by hrtfpykit and reports possible parser-risk categories such as memory-corruption, remote-code-execution, and denial-of-service exposure at a high level. It also scans metadata for external links, domains, and suspicious file extensions.

For this SONICOM file, the standard security report should pass when the local HDF5 runtime meets the required baseline. If the report fails, inspect the listed checks before using the file in a workflow.

check_sofa_against_conventions compares the file against the SOFA convention declared in its global attributes. This explicit convention check is optional in the tutorial because load_sofa runs it by default when its check_sofa_against_conventions parameter is True. The separate call below is included only to make the resolved convention report visible before loading the file.

from hrtfpykit.sofa import check_sofa_against_conventions, check_sofa_security

# Run a standard SOFA security check.
security_report = check_sofa_security(
    sofa_path,
    print_report=True,
)

# Check the file against its declared SOFA convention.
convention_report = check_sofa_against_conventions(sofa_path)

# Inspect the convention resolved by the validation check.
print(convention_report["convention"])

Load the SOFA file¶

load_sofa opens the file and returns a SOFA object. The object owns an open netCDF4 dataset handle and exposes the main SOFA collections through Dimensions, GlobalAttributes, Variables, and VariableAttributes. Calling load_sofa() directly keeps the workflow at the file and convention level. Calling load_hrtf for SimpleFreeFieldHRIR or SimpleFreeFieldHRTF files uses this same SOFA layer first, then builds an HRTF object with synchronized IR, TF, sources, metrics, transforms, and plots.

from hrtfpykit.sofa import load_sofa

# Load the SOFA file as a hrtfpykit SOFA object.
sofa = load_sofa(sofa_path)

# Check the path attached to the loaded object.
print(sofa.path)

Inspect the full SOFA summary¶

summary returns a text view of the global attributes, variables, dimensions, and variable attributes. Printing the complete summary gives a first complete view of the file before moving into specific collection wrappers.

# Build the complete SOFA summary.
summary = sofa.summary()

# Print the complete SOFA summary.
print(summary)

Manage the SOFA file handle¶

A loaded SOFA object owns an open netCDF4 handle. The collection wrappers and summary read from that handle, so they require the dataset to be open. Use is_open to check the state, close to release the handle, and open to reconnect a file-backed SOFA object from its stored path.

This matters in scripts that inspect many files or load HRTFs and no longer need direct SOFA access. Values copied into regular NumPy arrays remain available after closing, but SOFA-backed properties such as Variables, Dimensions, GlobalAttributes, VariableAttributes, and summary() raise a clear error until the dataset is opened again. In-memory clones are different: they do not have a file path until saved, so they should be saved or kept open while they are being edited.

# The SOFA object is open immediately after load_sofa.
print("Open after load:", sofa.is_open())

# Copy a value out before closing. This NumPy array does not depend on the file handle.
sample_rate_copy = sofa.Variables.get("Data.SamplingRate").value.copy()
print("Copied sample rate:", sample_rate_copy)

# Close the backing netCDF4 dataset.
sofa.close()
print("Open after close:", sofa.is_open())

# SOFA-backed access requires an open dataset.
try:
    sofa.summary()
except ValueError as exc:
    print("Closed SOFA access:", exc)

# Reopen from the stored path and continue with normal SOFA access.
sofa.open()
print("Open after reopen:", sofa.is_open())
print("Dimensions after reopen:", sofa.Dimensions.get_names())

Inspect dimensions¶

SOFA dimensions define the axes used by variables. In a SimpleFreeFieldHRIR file, M usually indexes measurements or source positions, R indexes receivers, and N indexes HRIR time samples.

# Access the dimension collection.
dimensions = sofa.Dimensions

# List all dimension names.
print(dimensions.get_names())

# Inspect the dimensions used by the HRIR data.
for name in ("M", "R", "N"):
    dimension = dimensions.get(name)
    print(f"{name}: size={dimension.value}, unlimited={dimension.is_unlimited}")

# Print the compact dimensions summary.
print(dimensions.summary())

Inspect global attributes¶

Global attributes describe the file-level convention and metadata. The most important attributes for interpretation are the SOFA convention name, convention version, data type, and modification metadata.

# Access the global attribute collection.
global_attributes = sofa.GlobalAttributes

# List available global attribute names.
global_attribute_names = global_attributes.get_names()
print(global_attribute_names)

# Inspect key convention attributes when they are present.
for name in ("SOFAConventions", "SOFAConventionsVersion", "DataType", "DateCreated", "DateModified"):
    if name in global_attribute_names:
        print(f"{name}: {global_attributes.get(name).value}")

Inspect variables and stored values¶

Variables exposes the arrays stored in the SOFA file. For this SONICOM file, the main acoustic data are stored in Data.IR, while SourcePosition, ReceiverPosition, and Data.SamplingRate describe how to interpret those HRIR samples.

# Access the variable collection.
variables = sofa.Variables

# List all stored variable names.
print(variables.get_names())

# Read the HRIR array.
ir = variables.get("Data.IR").value

# Read the source positions.
source_positions = variables.get("SourcePosition").value

# Read the sampling rate.
sampling_rate = variables.get("Data.SamplingRate").value

# Inspect shapes and representative values.
print("Data.IR shape:", ir.shape)
print("SourcePosition shape:", source_positions.shape)
print("Data.SamplingRate (Hz):", sampling_rate)
print("First source position (azimuth degrees, elevation degrees, radius meters):", source_positions[0])

Inspect variable attributes¶

Variable attributes attach units, coordinate-system information, and semantic labels to individual SOFA variables. hrtfpykit exposes them with Variable:Attribute keys.

# Access the variable attribute collection.
variable_attributes = sofa.VariableAttributes

# Inspect the source position coordinate system.
print("SourcePosition:Type:", variable_attributes.get("SourcePosition:Type").value)
print("SourcePosition:Units:", variable_attributes.get("SourcePosition:Units").value)

# Inspect sampling rate units.
print("Data.SamplingRate:Units:", variable_attributes.get("Data.SamplingRate:Units").value)

Clone before editing¶

The safest editing workflow is to clone the loaded SOFA object, modify the clone, and save it to a new file. A clone is an independent in-memory SOFA object: the original loaded file remains available for comparison, and edits made to the clone are not written to disk until save is called.

# Create an independent in-memory SOFA clone.
editable = sofa.clone()

# Confirm that the clone is not attached to a file path yet.
print(editable.path)

# Confirm that the original object does not contain the tutorial note.
print("TutorialNote" in sofa.GlobalAttributes.get_names())

Edit metadata on the clone¶

Metadata edits are a good first modification because they do not alter the acoustic arrays. The SOFA API separates global attributes from variable attributes, so file-level notes and per-variable notes are edited with different methods.

# Create a file-level tutorial note.
editable.create_global_attribute(
    "TutorialNote",
    "Created in the Starting with hrtfpykit.sofa tutorial",
)

# Modify the file-level note.
editable.modify_global_attribute(
    "TutorialNote",
    "Edited in the Starting with hrtfpykit.sofa tutorial",
)

# Add a note to the HRIR variable.
editable.create_variable_attribute(
    "Data.IR:TutorialNote",
    "HRIR samples edited on an in-memory clone",
)

# Read the edited metadata back from the clone.
print(editable.GlobalAttributes.get("TutorialNote").value)
print(editable.VariableAttributes.get("Data.IR:TutorialNote").value)

Edit stored data on the clone¶

modify_variable replaces the stored values of an existing SOFA variable while preserving the variable definition, dimensions, dtype, and attributes. Here we zero the first eight HRIR samples only on the clone.

import numpy as np

# Read the cloned HRIR array.
editable_ir = editable.Variables.get("Data.IR").value

# Create an edited copy of the HRIR array.
edited_ir = np.array(editable_ir, copy=True)

# Zero the first eight samples for every measurement and receiver.
edited_ir[..., :8] = 0.0

# Write the edited values back to the clone.
editable.modify_variable("Data.IR", edited_ir)

# Compare original and edited values.
print("Original first samples:", sofa.Variables.get("Data.IR").value[0, 0, :8])
print("Edited first samples:", editable.Variables.get("Data.IR").value[0, 0, :8])

Add a small derived variable¶

The SOFA API can also create dimensions and create variables. This is useful for controlled metadata or derived values, but convention-required acoustic variables should be changed carefully. This example uses a separate clone so the derived variable is inspected without being saved into the edited tutorial file.

# Create a separate clone for the derived variable example.
derived = sofa.clone()

# Create a custom dimension for a small tutorial vector.
derived.create_dimension("Q", 3)

# Create a custom variable using the new dimension.
derived.create_variable(
    "TutorialVector",
    [1.0, 2.0, 3.0],
    ("Q",),
    attributes={
        "Units": "1",
        "Description": "Small derived vector created in the SOFA tutorial",
    },
)

# Inspect the derived variable and one of its attributes.
print(derived.Variables.get("TutorialVector").value)
print(derived.VariableAttributes.get("TutorialVector:Units").value)

Create a structured copy with changed dimensions¶

copy_with creates an in-memory SOFA object while replacing selected dimensions, variables, or attributes in one controlled step. This is the practical option when an edit changes a fixed dimension size. For example, if Data.IR is stored with shape (M, R, N), direct replacement with modify_variable must still fit the current (M, R, N) shape. If the new HRIR array has a different sample length, the N dimension has to change with it.

copy_with keeps the rest of the SOFA structure intact and applies the requested replacements together. This example creates a separate cropped copy with a shorter N dimension and a matching shorter Data.IR array. The same dimension rule appears in HRTF workflows when selected or transformed HRTF data are written back to the backing SOFA object; the next tutorial shows that path in Synchronize the HRTF object back to SOFA.

# Read the original HRIR array.
original_ir = sofa.Variables.get("Data.IR").value

# Create a structured copy with a shorter sample dimension.
cropped = sofa.copy_with(
    dim_sizes={"N": 128},
    global_attributes={
        "TutorialNote": "Cropped copy created with SOFA.copy_with",
    },
    variables={
        "Data.IR": original_ir[..., :128],
    },
)

# Compare the original and cropped shapes.
print("Original shape:", sofa.Variables.get("Data.IR").value.shape)
print("Cropped shape:", cropped.Variables.get("Data.IR").value.shape)
print("Cropped N:", cropped.Dimensions.get("N").value)

Save the edited SOFA file¶

save writes the edited clone to disk. Because the clone was modified, hrtfpykit also updates DateModified and writes a hrtfpykit provenance global attribute. The tutorial saves into a separate output folder so the original downloaded file remains unchanged.

This is the direct SOFA save path. In HRTF workflows, the acoustic object is synchronized back to its backing SOFA object with update_sofa before saving; see Synchronize the HRTF object back to SOFA.

# Define the output folder for tutorial files.
output_dir = root / "tutorial_outputs"

# Create the output folder if needed.
output_dir.mkdir(parents=True, exist_ok=True)

# Define the edited SOFA output path.
edited_sofa_path = output_dir / "P0001_FreeFieldComp_44kHz_sofa_api_tutorial.sofa"

# Save the edited clone without touching the original file.
saved_path = editable.save(
    edited_sofa_path,
    overwrite=True,
)

# Inspect the saved path.
print(saved_path)

Reload and confirm the saved file¶

A reliable editing workflow should finish by reloading the saved file with load_sofa and checking that the expected changes are present. The saved file contains tutorial metadata, so the reload below skips convention checking and focuses on confirming the saved values.

The reloaded object follows the same file-handle lifecycle as the original object: it is open after loading, can be closed when direct SOFA access is no longer needed, and can be reopened from its stored path. After this SOFA-level check, the next tutorial moves to the HRTF layer, where load_hrtf builds an acoustic working object on top of the same SOFA foundation.

# Reload the saved SOFA file without convention checking.
reloaded = load_sofa(
    saved_path,
    check_sofa_against_conventions=False,
)

# Inspect the tutorial metadata.
print(reloaded.GlobalAttributes.get("TutorialNote").value)

# Inspect the hrtfpykit provenance metadata added on save.
print(reloaded.GlobalAttributes.get("hrtfpykit").value)

# Confirm the edited HRIR samples are present in the saved file.
print(reloaded.Variables.get("Data.IR").value[0, 0, :8])

# Apply the same file-handle lifecycle to the reloaded object.
print("Reloaded open after load:", reloaded.is_open())
reloaded.close()
print("Reloaded open after close:", reloaded.is_open())
reloaded.open(check_sofa_against_conventions=False)
print("Reloaded open after reopen:", reloaded.is_open())