Mastering hrtfpykit.datasets Specs

hrtfpykit.datasets builds map style datasets from HRTF files and subject resources. Specs are the objects that define what appears in dataset[index].

This tutorial goes deeper than Starting with hrtfpykit.datasets. It uses HUTUBS subjects pp1 through pp10, then adds real image files for only pp1, pp2, and pp3. That smaller image resource is important because it shows how hrtfpykit removes subjects that cannot provide every requested value.

By the end, you will know how specs choose resources, build dataset samples, name sample values, align custom resources, expose sample context, and hand selected paths or values to user transforms.

Download the HUTUBS resource slice

Downloading and dataset construction are separate steps. Downloading puts official files under a local root. Dataset construction scans the local files required by the selected specs and turns them into samples.

This tutorial uses the first ten HUTUBS subjects, pp1 through pp10. It requests the official HUTUBS resources needed for the examples: HRTFs, meshes, and anthropometry. The acoustic and image examples only require HRTFs.

from pathlib import Path

from hrtfpykit.datasets import HUTUBS

root = Path("datasets/hutubs")

selected_subject_ids = tuple(f"pp{index}" for index in range(1, 11))
exclude_subject_ids = tuple(f"pp{index}" for index in range(11, 97))

HUTUBS(
    root=root,
    download=True,
    download_resources="all",
    download_hrtf_variant="measured",
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    verify_checksum=True,
    verbose=True,
)

print("Selected HUTUBS subjects:", selected_subject_ids)

Add a custom resource folder

Experiments often use resources that a public HRTF dataset does not provide: rendered images, photographs, videos, annotations, measurements, or files created by another tool. A resource spec can point to a custom folder when the files can be matched to dataset subject IDs.

For the concrete example, this tutorial uses real PNG files from ArielAlvarez-Martinez/hrtfpykit_tutorial_resources. The repository contains hutubs_images with three subject folders: subject_1, subject_2, and subject_3. hrtfpykit maps those aliases to HUTUBS subjects pp1, pp2, and pp3.

from urllib.request import urlretrieve
from zipfile import ZipFile

tutorial_resource_root = Path("tutorial_resources").resolve()
resource_archive = tutorial_resource_root / "hrtfpykit_tutorial_resources-main.zip"
resource_project = tutorial_resource_root / "hrtfpykit_tutorial_resources-main"
image_root = resource_project / "hutubs_images"

if not image_root.is_dir():
    tutorial_resource_root.mkdir(parents=True, exist_ok=True)
    urlretrieve(
        "https://github.com/ArielAlvarez-Martinez/hrtfpykit_tutorial_resources/archive/refs/heads/main.zip",
        resource_archive,
    )
    with ZipFile(resource_archive) as archive:
        archive.extractall(tutorial_resource_root)

for subject_folder in sorted(image_root.iterdir()):
    if subject_folder.is_dir():
        image_files = sorted(path.name for path in subject_folder.glob("*.png"))
        print(subject_folder.name, image_files)

Understand what a spec controls

A spec declares one value in a dataset sample. From the selected specs, hrtfpykit knows which resource families to scan, which subjects can remain, which axes create separate dataset samples, which key to use in the sample, and which transform prepares the value.

  • HRTFSpec, ITDSpec, ILDSpec, and SHSpec require HRTF files.

  • MeshSpec requires mesh files.

  • ImageSpec and VideoSpec require media folders grouped by subject.

  • AnthropometrySpec and MetadataSpec require records that can be matched to subject IDs.

After scanning resources, hrtfpykit intersects the available subjects across all requested specs. That is why a dataset can request ten HRTF subjects and still produce three samples when a required image resource exists for only three subjects. Each sample keeps inputs, target, and meta. Specs define model values; meta records provenance such as dataset name, subject ID, position, ear, frequency, or time sample index.

Build the first acoustic dataset

Start with one dataset sample per subject. HRTFSpec defines which HRTF representation is returned, including the domain, signal, selected ears, selected positions or plane, the axes that split the dataset into samples, and the sample key.

The input below is a magnitude spectrum for the left ear. The target is the HRIR for both ears. Because index_by=("subject",), the dataset has one sample per selected subject, and the position, ear, frequency, and time sample axes stay inside the returned arrays.

from hrtfpykit.datasets import HRTFSpec

magnitude_spec = HRTFSpec(
    domain="frequency",
    signal="tf_magnitude_db",
    ears="left",
    index_by=("subject",),
    name="left_magnitude_db",
)

hrir_target_spec = HRTFSpec(
    domain="time",
    signal="ir",
    ears="both",
    index_by=("subject",),
    name="target_hrir",
)

subject_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=magnitude_spec,
    target=hrir_target_spec,
    split="all",
    verbose=True,
)

print("Samples:", len(subject_dataset))
sample = subject_dataset[0]

print("Sample keys:", sample.keys())
print("Input keys:", sample["inputs"].keys())
print("Target keys:", sample["target"].keys())
print("Meta:", sample["meta"])
print("Input shape:", sample["inputs"]["left_magnitude_db"].shape)
print("Target shape:", sample["target"]["target_hrir"].shape)

Compare the acoustic specs

Acoustic specs all start from the subject HRTF file, but they return different values. During indexing, the dataset loads the subject as an HRTF object, applies any HRTF transform configured on the dataset, applies a transform configured on the spec when present, and then extracts or computes the requested value.

  • HRTFSpec returns IR, TF, magnitude, magnitude in dB, phase, real part, imaginary part, or complex TF values.

  • ITDSpec computes interaural time difference from HRIR data.

  • ILDSpec computes broad band or frequency dependent interaural level difference.

  • SHSpec computes spherical harmonic coefficients from the selected HRTF state.

All indexed acoustic specs in one dataset must use the same index_by. That keeps every input and target aligned to the same sample context.

from hrtfpykit.datasets import ILDSpec, ITDSpec, SHSpec

acoustic_specs = [
    HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="both",
        index_by=("subject",),
        name="magnitude_db",
    ),
    ITDSpec(
        index_by=("subject",),
        plane=("horizontal", 0.0, "degrees"),
        output="samples",
        name="horizontal_itd",
    ),
    ILDSpec(
        index_by=("subject",),
        mode="broad-band",
        output="db",
        name="broadband_ild",
    ),
    SHSpec(
        sh_order=3,
        ears="left",
        index_by=("subject",),
        name="left_sh",
    ),
]

acoustic_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=acoustic_specs,
    split="all",
)

sample = acoustic_dataset[0]
for key, value in sample["inputs"].items():
    shape = getattr(value, "shape", None)
    print(key, type(value).__name__, shape)

Transform HRTFs before extraction

Dataset transforms run after each HRTF file is loaded and before any acoustic spec extracts values. In this example the transform keeps only the horizontal plane. Every acoustic spec in the dataset then sees that reduced HRTF, so the returned magnitude contains horizontal source positions only.

Spec transforms are more local: they affect only the value declared by that spec. Use dataset_hrtf_transform when the same HRTF selection or preparation should apply to all acoustic inputs and targets. Use a spec transform when one acoustic value needs its own version of the HRTF.

HRTFTransform.select(plane="horizontal") calls HRTF.select(...) on each subject. It receives an HRTF object and returns another HRTF object. The spec still decides the domain, signal, ear, and sample layout after the transform has reduced the source grid.

from hrtfpykit.datasets import HRTFTransform

dataset_transform = HRTFTransform.select(plane='horizontal')

transformed_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    dataset_hrtf_transform=dataset_transform,
    exclude_subject_ids=exclude_subject_ids,
    inputs=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="left",
        index_by=("subject",),
        name="horizontal_magnitude",
    ),
    split="all",
)

print("Samples:", len(transformed_dataset))
print("Shape:", transformed_dataset[0]["inputs"]["horizontal_magnitude"].shape)

Combine HRTFs with subject resources

Some specs do not extract acoustic arrays. They describe resources that belong to the same subject as the HRTF: mesh geometry, anthropometry values, metadata records, images, or videos. When one of these specs is requested, hrtfpykit scans that resource family, keeps only subjects present in every required resource, and places the selected value in the sample returned by dataset[index].

  • MeshSpec returns a mesh file path by default. A transform can parse that file with a mesh library chosen by the user.

  • AnthropometrySpec returns measurement values for the selected subject.

  • MetadataSpec returns metadata values for the selected subject.

  • ImageSpec returns image paths or transformed image values from folders grouped by subject.

  • VideoSpec points to video folders grouped by subject, and can also use left and right folders when ear grouping is required. A transform decides how paths become frames, clips, or tensors.

The code below builds one sample per available subject. Inputs contain the mesh path and anthropometry values. The target contains the HRTF magnitude. This is the basic pattern for datasets that combine acoustic targets with other subject resources.

from hrtfpykit.datasets import AnthropometrySpec, MeshSpec

resource_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=[
        MeshSpec(name="mesh_path"),
        AnthropometrySpec(name="anthropometry"),
    ],
    target=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="left",
        index_by=("subject",),
        name="target_magnitude",
    ),
    split="all",
)

sample = resource_dataset[0]
print("Samples:", len(resource_dataset))
print("Mesh path:", sample["inputs"]["mesh_path"])
print("Anthropometry type:", type(sample["inputs"]["anthropometry"]).__name__)
print("First anthropometry fields:", list(sample["inputs"]["anthropometry"])[:5])

Decide which axes create samples

index_by controls how indexed acoustic specs create dataset samples. Axes included in index_by are fixed for the current sample and removed from the returned array. Axes not included in index_by stay inside the returned value.

For example, index_by=("subject",) gives one sample per subject. index_by=("subject", "position") gives one sample per subject and selected source position. index_by=("subject", "ear") gives one sample per subject and selected ear. Frequency domain HRTF specs can use frequency; time domain HRTF specs can use samples.

grouped_by describes how resource specs organize external files or subject values. For images and videos, grouped_by=("subject",) means one folder per subject. grouped_by=("subject", "ear") means each subject folder has ear folders such as left and right. The distinction matters: index_by creates dataset samples from HRTF axes, while grouped_by tells hrtfpykit where the matching resource lives for each sample.

position_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="left",
        positions=(0, 1, 2),
        index_by=("subject", "position"),
        position_index=True,
        position_one_hot=True,
        name="left_magnitude_at_position",
    ),
    split="all",
)

print("Expected samples: 10 subjects x 3 positions =", 10 * 3)
print("Actual samples:", len(position_dataset))

sample = position_dataset[0]
print("Meta:", sample["meta"])
print("Input keys:", sample["inputs"].keys())
print("Value shape:", sample["inputs"]["left_magnitude_at_position"].shape)
print("Position index:", sample["inputs"]["position_index"])
print("Position one hot shape:", sample["inputs"]["position_one_hot"].shape)

Match custom folders to subjects

Custom resources often live outside the official HRTF data set: photographed ears, rendered mesh views, pinna crops, camera frames, quality check images, video captures, or annotations created for one experiment. The resource only needs a layout that hrtfpykit can align to dataset subjects.

The concrete example here uses ImageSpec, but the lesson is broader: the dataset owns subject matching and sample construction, while the custom resource owns its own storage format. For media grouped by subject, the root contains one folder per subject. hrtfpykit accepts the canonical dataset subject ID or a subject number alias:

hutubs_images/
   pp1/
      image_001.png
   pp2/
      image_001.png

or:

hutubs_images/
   subject1/
      image_001.png
   subject_2/
      image_001.png

For ear grouping, the subject folder contains ear folders. That layout requires grouped_by=("subject", "ear") and samples indexed by ear:

hutubs_images/
   pp1/
      left/
         image_001.png
      right/
         image_001.png

The tutorial resource uses subject_1, subject_2, and subject_3, with three PNG images in each folder. The next dataset requests ten HUTUBS HRTF subjects but image resources for only three of them. The result is three samples, not ten samples with missing images.

from hrtfpykit.datasets import ImageSpec

image_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=ImageSpec(
        path=image_root,
        grouped_by="subject",
        name="ear_images",
    ),
    target=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="left",
        index_by=("subject",),
        name="target_magnitude",
    ),
    split="all",
    verbose=True,
)

print("HRTF subjects requested:", len(selected_subject_ids))
print("Subjects with images:", len(image_dataset))

for index in range(len(image_dataset)):
    sample = image_dataset[index]
    image_value = sample["inputs"]["ear_images"]
    print(sample["meta"]["subject_id"], len(image_value), "image paths")

Convert aligned paths into model values

The previous section returned image paths. That default is intentional. hrtfpykit does not choose an image loader, color mode, resize operation, normalization, augmentation policy, tensor library, mesh parser, video decoder, or model input format for every project.

Instead, hrtfpykit resolves subject and resource alignment, then passes each selected path or resource value to transform. The transform can use PIL, torchvision, OpenCV, scikit image, a mesh library, a video reader, a custom renderer, or any other tool. The returned object is the value placed in the sample.

When a subject has several image files, the transform is applied to each path. If concatenate=False, the sample returns a list of transformed values. If concatenate=True, hrtfpykit concatenates the transformed values along axis 0, so the transform must return array like values with compatible shapes.

import numpy as np
from PIL import Image

def load_image(path: str) -> np.ndarray:
    with Image.open(path) as image:
        image = image.convert("RGB").resize((128, 128))
        return np.asarray(image, dtype=np.float32) / 255.0

transformed_image_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=ImageSpec(
        path=image_root,
        grouped_by="subject",
        transform=load_image,
        concatenate=False,
        name="images",
    ),
    target=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="left",
        index_by=("subject",),
        name="target_magnitude",
    ),
    split="all",
)

sample = transformed_image_dataset[0]
images = sample["inputs"]["images"]
print("Subject:", sample["meta"]["subject_id"])
print("Number of transformed images:", len(images))
print("First image type:", type(images[0]).__name__)
print("First image shape:", images[0].shape)
print("Target shape:", sample["target"]["target_magnitude"].shape)

Reuse external preprocessing pipelines

The same transform boundary can connect hrtfpykit to an existing preprocessing stack. hrtfpykit still handles HUTUBS subject IDs, folder aliases, resource filtering, and sample construction. The external pipeline handles image resizing, tensor conversion, normalization, augmentation, or any other preparation step the experiment needs.

The cell below first visualizes the real images selected by ImageSpec. Then it uses torchvision as one possible external pipeline and shows the tensor output converted back to RGB values only for display. The torchvision part is optional because torchvision is not required by hrtfpykit itself.

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

raw_sample = image_dataset[0]
raw_image_values = raw_sample["inputs"]["ear_images"]
if not isinstance(raw_image_values, list):
    raw_image_values = [raw_image_values]

preview_count = min(3, len(raw_image_values))
fig, axes = plt.subplots(1, preview_count, figsize=(3.2 * preview_count, 3.2))
if preview_count == 1:
    axes = [axes]

for image_index, (axis, image_path) in enumerate(
    zip(axes, raw_image_values[:preview_count]), start=1
):
    with Image.open(image_path) as image:
        axis.imshow(image.convert("RGB"))
    axis.set_title(f"{raw_sample['meta']['subject_id']} image {image_index}")
    axis.axis("off")

fig.tight_layout()
plt.show()

try:
    from torchvision import transforms
except ImportError:
    print("Install torchvision to run the tensor pipeline.")
else:
    torchvision_pipeline = transforms.Compose(
        [
            transforms.Resize((128, 128)),
            transforms.ToTensor(),
            transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
        ]
    )

    def load_image_tensor(path: str):
        with Image.open(path) as image:
            return torchvision_pipeline(image.convert("RGB"))

    tensor_dataset = HUTUBS(
        root=root,
        download=False,
        dataset_hrtf_variant="measured",
        exclude_subject_ids=exclude_subject_ids,
        inputs=ImageSpec(
            path=image_root,
            grouped_by="subject",
            transform=load_image_tensor,
            name="image_tensors",
        ),
        target=HRTFSpec(
            domain="frequency",
            signal="tf_magnitude_db",
            ears="left",
            index_by=("subject",),
            name="target_magnitude",
        ),
        split="all",
    )

    tensor_sample = tensor_dataset[0]
    tensor_images = tensor_sample["inputs"]["image_tensors"]
    if not isinstance(tensor_images, list):
        tensor_images = [tensor_images]

    print("Number of tensors:", len(tensor_images))
    print("First tensor shape:", tensor_images[0].shape)

    preview_count = min(3, len(tensor_images))
    fig, axes = plt.subplots(1, preview_count, figsize=(3.2 * preview_count, 3.2))
    if preview_count == 1:
        axes = [axes]

    for image_index, (axis, tensor) in enumerate(
        zip(axes, tensor_images[:preview_count]), start=1
    ):
        preview = tensor.detach().cpu().permute(1, 2, 0).numpy()
        preview = np.clip(preview * 0.5 + 0.5, 0.0, 1.0)
        axis.imshow(preview)
        axis.set_title(f"tensor {image_index}")
        axis.axis("off")

    fig.tight_layout()
    plt.show()

Add sample context indexes and one hot encodings

Sample context encodings are extra inputs created from the current sample. They are useful when the model receives a value selected at one position, ear, frequency, or time sample and also needs to know which context produced that value.

  • position_index and position_one_hot require index_by to include "position".

  • ear_index and ear_one_hot require index_by or grouped_by behavior that includes "ear".

  • frequency_index and frequency_one_hot require index_by to include "frequency".

  • sample_index and sample_one_hot require index_by to include "samples".

These encodings are placed in sample["inputs"], not in sample["meta"], because they are intended to be model inputs when requested. The metadata still records the same sample context for provenance.

ear_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        ears="both",
        index_by=("subject", "ear"),
        ear_index=True,
        ear_one_hot=True,
        name="magnitude_by_ear",
    ),
    split="all",
)

sample = ear_dataset[0]
print("Expected samples: 10 subjects x 2 ears =", 10 * 2)
print("Actual samples:", len(ear_dataset))
print("Meta:", sample["meta"])
print("Input keys:", sample["inputs"].keys())
print("Ear index:", sample["inputs"]["ear_index"])
print("Ear one hot:", sample["inputs"]["ear_one_hot"])
frequency_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=HRTFSpec(
        domain="frequency",
        signal="tf_magnitude_db",
        positions=(0,),
        ears="left",
        index_by=("subject", "frequency"),
        frequency_index=True,
        frequency_one_hot=True,
        name="magnitude_at_frequency",
    ),
    split="all",
)

sample = frequency_dataset[0]
print("Samples:", len(frequency_dataset))
print("Meta:", sample["meta"])
print("Input keys:", sample["inputs"].keys())
print("Value shape:", sample["inputs"]["magnitude_at_frequency"].shape)
print("Frequency index:", sample["inputs"]["frequency_index"])
print("Frequency one hot shape:", sample["inputs"]["frequency_one_hot"].shape)
sample_index_dataset = HUTUBS(
    root=root,
    download=False,
    dataset_hrtf_variant="measured",
    exclude_subject_ids=exclude_subject_ids,
    inputs=HRTFSpec(
        domain="time",
        signal="ir",
        positions=(0,),
        ears="left",
        index_by=("subject", "samples"),
        sample_index=True,
        sample_one_hot=True,
        name="hrir_sample",
    ),
    split="all",
)

sample = sample_index_dataset[0]
print("Samples:", len(sample_index_dataset))
print("Meta:", sample["meta"])
print("Input keys:", sample["inputs"].keys())
print("Value shape:", sample["inputs"]["hrir_sample"].shape)
print("Sample index:", sample["inputs"]["sample_index"])
print("Sample one hot shape:", sample["inputs"]["sample_one_hot"].shape)

Spec design checklist

  • Put every model value behind a spec. This keeps resource discovery, subject matching, split planning, and sample extraction reproducible.

  • Use name whenever a dataset has more than one input or target value from the same spec class.

  • Keep all indexed acoustic specs on the same index_by tuple inside one dataset.

  • Use grouped_by=("subject", "ear") only when the resource folders or subject values really have ear groups.

  • Expect specs to affect subject availability. If one requested resource exists for only three subjects, the dataset should produce samples only for those three subjects.

  • Return paths when downstream code should decide how to load the resource. Add transform when the dataset should return arrays, tensors, parsed meshes, decoded clips, normalized values, or custom objects.

The main design point is separation of responsibilities. hrtfpykit resolves HRTF files, subject IDs, resource alignment, sample context, splits, and provenance. Specs declare what a sample needs. Transforms let user code decide how those aligned resources become values for an experiment.