Starting with hrtfpykit.datasets¶

hrtfpykit.datasets builds map-style datasets for HRTF workflows. A dataset object exposes len(dataset) and dataset[index], but it also keeps the HRTF dataset rules in one place: subject identifiers, file layouts, official resource variants, excluded subjects, deterministic splits, and resource summaries.

This tutorial uses SONICOM because it provides a reproducible public HRTF slice. The explanations are general for the dataset layer: the same ideas apply to ARI, HUTUBS, SONICOM, and future dataset classes. The SONICOM-specific parts are the concrete subject IDs, HRTF variant, and file layout used by the downloader.

By the end, you will have downloaded a small resource slice, built an acoustic dataset with specs, inspected samples, created train/validation/test splits, batched samples for PyTorch, and run a small autoencoder-shaped training loop where the input HRTF representation is also the reconstruction target.

Download a small SONICOM resource slice¶

Dataset downloading and dataset construction are related, but they are not the same operation.

Downloading answers: which official files should exist under the local dataset root? Dataset construction answers: which values should each sample return when dataset[index] is called? hrtfpykit keeps these choices separate by design. download_resources selects the resource families to fetch, such as "hrtf", "mesh", or "metadata". Variant arguments such as download_hrtf_variant select the official files to download. Later, dataset arguments such as inputs, target, dataset_hrtf_variant, split, and split_seed decide how local files become indexed samples.

This example downloads only measured SONICOM HRTFs for subjects P0001 through P0010, using the 44.1 kHz FreeFieldComp variant. Keeping the slice small makes the tutorial quick while still exercising the normal downloader and dataset construction path.

from pathlib import Path

from hrtfpykit.datasets import SONICOM

# Define a local dataset root for this tutorial.
root = Path("datasets/sonicom")

# Keep only P0001 through P0010 for a small reproducible tutorial slice.
selected_subject_ids = tuple(f"P{i:04d}" for i in range(1, 11))
exclude_subject_ids = tuple(f"P{i:04d}" for i in range(11, 401))

# Select the measured 44.1 kHz FreeFieldComp HRTF files.
hrtf_variant = {
    "type": "measured",
    "sample_rate": 44100,
    "version": "FreeFieldComp",
}

# Download only the selected HRTF resource slice.
SONICOM(
    root=root,
    download=True,
    download_resources="hrtf",
    download_hrtf_variant=hrtf_variant,
    dataset_hrtf_variant=hrtf_variant,
    exclude_subject_ids=exclude_subject_ids,
    verify_checksum=True,
    verbose=True,
)

print("Selected subjects:", selected_subject_ids)

Build a map-style acoustic dataset¶

Samples are declared with specs. A spec is a small configuration object that says which value should appear in sample["inputs"] or sample["target"]. Acoustic specs request values derived from the loaded HRTF object, such as HRIR arrays, HRTF magnitudes, ITD, ILD, or spherical harmonic coefficients. Resource specs align subject resources such as meshes, metadata, anthropometry, images, or videos.

This first tutorial stays shallow on specs because specs deserve their own dedicated tutorial. The important idea here is that specs are declarative: they describe the sample values, while the dataset object handles subjects, local files, variants, splits, and indexing.

The example below builds one subject-level sample per SONICOM subject. The input and target both use the same HRTF magnitude representation. That is useful for an autoencoder-style example later, where the model tries to reconstruct the same HRTF representation and the latent vector can be treated as a compact feature representation.

from hrtfpykit.datasets import HRTFSpec

input_spec = HRTFSpec(
    domain="frequency",
    signal="tf_magnitude_db",
    ears="both",
    index_by=("subject",),
    name="input_hrtf",
)

target_spec = HRTFSpec(
    domain="frequency",
    signal="tf_magnitude_db",
    ears="both",
    index_by=("subject",),
    name="target_hrtf",
)

dataset = SONICOM(
    root=root,
    download=False,
    dataset_hrtf_variant=hrtf_variant,
    exclude_subject_ids=exclude_subject_ids,
    inputs=input_spec,
    target=target_spec,
    split="all",
    verbose=True,
)

print("Number of dataset rows:", len(dataset))

Inspect dataset samples¶

A hrtfpykit dataset is map-style: integer indexing returns one sample dictionary. Each sample has three top-level entries.

sample["inputs"] contains the values requested by input specs. sample["target"] contains the values requested by target specs. sample["meta"] records where the row came from, including the dataset name, subject ID, and any active row context such as position, ear, frequency, or sample index.

sample["meta"] is not model input by default; it is provenance. It is useful when debugging sample construction, checking that a split contains the expected subjects, tracing a suspicious value back to a source position or ear, and analyzing predictions after a model has produced outputs. It also matters when datasets are combined: the metadata keeps the dataset name and row context attached to each sample, so a script can still know where a sample came from after concatenation or batching.

With index_by=("subject",), each row represents one subject and the HRTF arrays keep their position, ear, and frequency axes. If a later spec uses index_by=("subject", "position"), the dataset rows expand so each row represents one subject and one source position.

sample = dataset[0]

print("Sample keys:", sample.keys())
print("Input keys:", sample["inputs"].keys())
print("Target keys:", sample["target"].keys())
print("Meta:", sample["meta"])

input_hrtf = sample["inputs"]["input_hrtf"]
target_hrtf = sample["target"]["target_hrtf"]

print("Input type:", type(input_hrtf).__name__)
print("Input shape:", input_hrtf.shape)
print("Input dtype:", input_hrtf.dtype)
print("Target shape:", target_hrtf.shape)
print("Input and target shapes match:", input_hrtf.shape == target_hrtf.shape)

for index in range(3):
    sample = dataset[index]
    print(index, sample["meta"]["subject_id"], sample["inputs"]["input_hrtf"].shape)

Create deterministic subject splits¶

Splits are part of dataset construction. hrtfpykit first determines which subjects have all resources required by the selected specs, then applies the requested split. This means train, validation, and test objects can be built from the same root and the same specs without duplicating resource matching logic in user code.

The split is deterministic for a fixed split_seed. The example below keeps the same ten SONICOM subjects and divides them into train, validation, and test datasets.

dataset_kwargs = {
    "root": root,
    "download": False,
    "dataset_hrtf_variant": hrtf_variant,
    "exclude_subject_ids": exclude_subject_ids,
    "inputs": input_spec,
    "target": target_spec,
    "split_ratio": (0.7, 0.2, 0.1),
    "split_seed": 42,
}

train_dataset = SONICOM(**dataset_kwargs, split="train")
validation_dataset = SONICOM(**dataset_kwargs, split="validation")
test_dataset = SONICOM(**dataset_kwargs, split="test")

print("Train samples:", len(train_dataset))
print("Validation samples:", len(validation_dataset))
print("Test samples:", len(test_dataset))
print("First train subject:", train_dataset[0]["meta"]["subject_id"])

Apply dataset-level HRTF preprocessing¶

dataset_hrtf_transform applies one HRTF transform to every loaded subject before any acoustic spec extracts values. This is useful when a whole dataset should use the same preprocessing: selecting a source subset, changing the time window, padding HRIRs, changing FFT length, converting to another HRTF representation, or applying another HRTF operation.

Spec-level transforms also exist, but they are better covered in a deeper specs tutorial. For a first dataset workflow, the dataset-level transform is easier to reason about: every acoustic spec sees the same transformed HRTF object.

The next dataset keeps only three named source directions before extracting the HRTF magnitude. This makes the batched tensor small enough for a lightweight autoencoder example.

from hrtfpykit.datasets import HRTFTransform

compact_transform = HRTFTransform.select(positions=["front", "left", "right"])

compact_dataset_kwargs = {
    "root": root,
    "download": False,
    "dataset_hrtf_variant": hrtf_variant,
    "dataset_hrtf_transform": compact_transform,
    "exclude_subject_ids": exclude_subject_ids,
    "inputs": input_spec,
    "target": target_spec,
    "split_ratio": (0.7, 0.2, 0.1),
    "split_seed": 42,
}

compact_train_dataset = SONICOM(**compact_dataset_kwargs, split="train")
compact_validation_dataset = SONICOM(**compact_dataset_kwargs, split="validation")

compact_sample = compact_train_dataset[0]
print("Compact input shape:", compact_sample["inputs"]["input_hrtf"].shape)
print("Compact target shape:", compact_sample["target"]["target_hrtf"].shape)

Concatenate compatible datasets¶

Generated hrtfpykit datasets can be concatenated with PyTorch’s ConcatDataset when they expose the same sample structure. This can combine several dataset objects from the same public dataset, such as different SONICOM splits, or different public dataset classes, such as SONICOM, HUTUBS, and ARI, when they have been prepared to return compatible samples.

The main constraint is compatibility. If a training loop expects batch["inputs"]["input_hrtf"] and batch["target"]["target_hrtf"], every dataset in the concatenation should return those keys. The returned values should also have compatible shapes and numeric types if they are going to be stacked by collate_samples. For acoustic HRTF values, that usually means matching domains, selected positions, ears, frequency bins or sample lengths, and preprocessing transforms. When combining different public HRTF datasets, pay special attention to source grids, coordinate conventions, sample rates, FFT lengths, available ears, HRTF variants, subject resources, and any custom transforms. If the datasets do not naturally agree, select, transform, resample, interpolate, or otherwise map them into a common representation before using them as one training dataset.

The example below concatenates the compact train and validation objects only to show the mechanics. In a real experiment, keep validation and test datasets separate when measuring model performance.

from torch.utils.data import ConcatDataset

combined_dataset = ConcatDataset([
    compact_train_dataset,
    compact_validation_dataset,
])

combined_sample = combined_dataset[0]

print("Combined rows:", len(combined_dataset))
print("Combined sample keys:", combined_sample.keys())
print("Combined input shape:", combined_sample["inputs"]["input_hrtf"].shape)
print("Combined target shape:", combined_sample["target"]["target_hrtf"].shape)
print("Combined sample metadata:", combined_sample["meta"])

Batch samples for PyTorch¶

PyTorch training code normally consumes batches, not individual samples. torch.utils.data.DataLoader is the standard PyTorch object that repeatedly calls a map-style dataset with integer indices, collects several returned samples, and yields one batch at a time.

A hrtfpykit dataset already provides the map-style part: dataset[index] returns one dictionary with inputs, target, and meta. The missing step is collation. A data loader receives a list of individual hrtfpykit samples, but it does not automatically know how every nested sample value should be stacked, preserved, or converted. That is why hrtfpykit provides collate_samples.

collate_samples is the bridge between hrtfpykit samples and PyTorch batches. It recursively collates the sample dictionaries, stacks compatible NumPy arrays and tensors along a new batch axis, converts homogeneous numeric values to PyTorch tensors, and keeps non numeric or ragged values as Python lists. Floating point arrays are converted to torch.float32, so normal training loops can move values to the selected device directly.

Dataset indexing stays flexible and framework neutral. Tensor conversion happens at collation time, which is the point where PyTorch needs a batch.

from torch.utils.data import DataLoader

from hrtfpykit.datasets import collate_samples

train_loader = DataLoader(
    compact_train_dataset,
    batch_size=4,
    shuffle=True,
    collate_fn=collate_samples,
)

batch = next(iter(train_loader))

print("Batch input shape:", batch["inputs"]["input_hrtf"].shape)
print("Batch input dtype:", batch["inputs"]["input_hrtf"].dtype)
print("Batch target shape:", batch["target"]["target_hrtf"].shape)
print("Batch meta keys:", batch["meta"].keys())

Train a small HRTF autoencoder shape example¶

This is not intended to be a serious model. It shows how a batched hrtfpykit dataset can feed a normal PyTorch training loop.

The model receives the compact HRTF magnitude tensor and tries to reconstruct the same tensor. The encoder output is a latent vector. In real experiments, that latent vector could be used as a compact HRTF representation, a feature extractor output, or a starting point for dimensionality reduction experiments. The exact architecture, loss, validation strategy, and preprocessing choices depend on the research question.

import numpy as np
import torch
from torch import nn


class HRTFAutoencoder(nn.Module):
    def __init__(self, input_shape, latent_dim=16):
        super().__init__()
        self.input_shape = tuple(input_shape)
        num_features = int(np.prod(self.input_shape))
        self.encoder = nn.Sequential(
            nn.Flatten(),
            nn.Linear(num_features, 128),
            nn.ReLU(),
            nn.Linear(128, latent_dim),
        )
        self.decoder = nn.Sequential(
            nn.ReLU(),
            nn.Linear(latent_dim, 128),
            nn.ReLU(),
            nn.Linear(128, num_features),
        )

    def forward(self, hrtf_batch):
        latent = self.encoder(hrtf_batch)
        reconstruction = self.decoder(latent).reshape(
            hrtf_batch.shape[0],
            *self.input_shape,
        )
        return reconstruction, latent


device = "cuda" if torch.cuda.is_available() else "cpu"
input_shape = batch["inputs"]["input_hrtf"].shape[1:]
model = HRTFAutoencoder(input_shape=input_shape, latent_dim=16).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()

num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    epoch_loss = 0.0

    for batch in train_loader:
        hrtf_input = batch["inputs"]["input_hrtf"].to(device)
        hrtf_target = batch["target"]["target_hrtf"].to(device)

        reconstruction, latent = model(hrtf_input)
        loss = loss_fn(reconstruction, hrtf_target)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        epoch_loss += float(loss.detach().cpu())

    epoch_loss /= len(train_loader)
    print(f"Epoch {epoch + 1}/{num_epochs} - loss: {epoch_loss:.6f}")