hrtfpykit.datasets

Description:

hrtfpykit.datasets builds map-style datasets from public HRTF datasets and aligned custom resources. It turns dataset-specific file layouts into objects with len(dataset) and dataset[index] access, so the same dataset can be inspected directly, preprocessed in scripts, or passed to PyTorch-style data loaders.

Dataset integrations such as ARI, HUTUBS, and SONICOM keep dataset-specific rules outside user code. They handle subject identifiers, folder layouts, downloadable resource groups, resource variants where the dataset defines them, excluded subjects, deterministic splits, and resource summaries.

Samples are declared with specs passed through inputs and target. Each spec names one value to return and the context used to index it. Acoustic specs such as HRTFSpec, ITDSpec, ILDSpec, and SHSpec request HRTF-derived arrays, cues, or spherical-harmonic values. Resource specs such as MeshSpec, AnthropometrySpec, MetadataSpec, ImageSpec, and VideoSpec align non-acoustic subject resources with the same sample rows.

Resource specs can use official resources when a dataset provides them, such as SONICOM meshes or metadata, but they are not limited to those files. They can also point to custom resources prepared for an experiment. AnthropometrySpec and MetadataSpec can read custom tables aligned with the dataset subject IDs. MeshSpec can use a custom mesh root while preserving the mesh naming pattern declared by the dataset configuration. ImageSpec and VideoSpec scan explicit media roots organized by subject, which makes them useful for visual pipelines such as ear images rendered from meshes or collected by another acquisition process.

For image and video resources, the media root must contain one folder per dataset subject. Subject folders can use the canonical dataset subject ID, or the aliases subjectN and subject_N based on the dataset subject number. When the resource is grouped by ("subject", "ear"), each subject folder must contain ear folders such as left and right:

ear_images/
   pp2/
      left/
         image_001.png
      right/
         image_001.png
   subject3/
      left/
         image_001.png
      right/
         image_001.png

Transforms on resource specs define how selected paths or table values become the values returned by dataset[index]. Without a transform, mesh, image, and video specs return organized paths, while anthropometry and metadata specs return the selected table values. With a transform, hrtfpykit still handles subject alignment, resource inspection, split selection, and sample construction; the transform defines how the selected resource is loaded and prepared.

For example, ImageSpec can locate custom ear render images for each subject, then a transform can open those images with PIL, apply a torchvision Compose pipeline, resize, normalize, augment, and return the array or tensor expected by a model. In an HRTF individualization workflow, those image values can be used as inputs while HRTFSpec or SHSpec provides the acoustic target. The same pattern applies to videos, meshes, anthropometry, and metadata: hrtfpykit resolves the aligned resource, and the transform turns that resource into the sample value.

When dataset[index] is called, the dataset returns a dictionary with sample["inputs"], sample["target"], and sample["meta"] entries. inputs and target contain the values requested by the selected specs. meta keeps row provenance separate from model values: it records the dataset name, native subject ID, and the active row context such as position, ear, frequency, or sample index when those axes are part of index_by. This is especially useful when datasets are combined with PyTorch ConcatDataset or when one subject expands into many indexed rows. Acoustic values are resolved through hrtfpykit.hrtf: subject SOFA files are loaded as HRTF objects, optional dataset-level transforms are applied, and specs extract the requested values from the loaded object. As a result, dataset samples use the same HRTF loading and transformation path as the rest of hrtfpykit.

This design is useful for HRTF individualization and related research tasks, where acoustic targets often need to be paired with subject information. By combining custom or official image, video, mesh, anthropometry, metadata, and acoustic specs in one dataset definition, users can build reproducible multimodal pipelines for deep learning experiments without maintaining separate subject matching code for each resource family.

Content: