SONICOM¶
SONICOM is a public head-related transfer function dataset created within the
SONICOM project for spatial audio and immersive audio research. The public
dataset page describes measured HRTFs together with related subject resources
such as 3D models and depth images. In hrtfpykit,
SONICOM maps the SONICOM folder layout into the
package’s shared dataset interface.
The dataset paper was published by Isaac Engel, Rapolas Daugintis, Thibault Vicente, Aidan O. T. Hogg, Johan Pauwels, Arnaud J. Tournier, and Lorenzo Picinali in the Journal of the Audio Engineering Society. The official SONICOM pages and publications should be read as release snapshots: the original public dataset page describes HRTF data measured from 200 subjects, and the 2025 extended dataset announcement describes additional measured participants, synthetic HRTFs generated from processed 3D scans, and continued work to expand the dataset.
Implementation status.
Last updated: 2026-05-29. SONICOM is an actively developing dataset, and
new subjects or resources can appear after a hrtfpykit release. This
implementation supports the released resources indexed by subject identifiers
P0001 through P0405. To use newer SONICOM releases, hrtfpykit must first
be updated with the corresponding subject identifiers, resource paths, and
checksums.
Dataset scope.
hrtfpykit is configured for SONICOM subject identifiers P0001 through
P0405. The built-in configuration excludes P0253, P0258,
P0270, P0272, P0275, and P0396 before resource scanning and
split planning. Actual subject availability depends on the resource groups and
variants present under the local dataset root.
The SONICOM resources used by hrtfpykit are:
HRTF/HRIR SOFA files for acoustic data.
3D scan or synthetic mesh resources.
The official metadata table.
SONICOM HRIRs are released at 96 kHz and 24 bits, with lower-rate 44.1 kHz and 48 kHz versions also available for measured HRTFs. hrtfpykit loads these SOFA files through the same HRTF workflow used by the rest of the package.
Variants and layout.
SONICOM provides measured HRTF variants with both sample-rate and processing
version selectors. hrtfpykit supports measured HRTF sample rates 44100,
48000, and 96000 with these versions:
RawRaw_NoITDWindowedWindowed_NoITDFreeFieldCompFreeFieldComp_NoITDFreeFieldCompMinPhaseFreeFieldCompMinPhase_NoITD
Measured HRTF files are expected under
{subject_id}/HRTF/HRTF/{sample_rate_label}/ with names of the form
{subject_id}_{version}_{sample_rate_label}.sofa. The default HRTF
selection in hrtfpykit is type=measured, sample_rate=44100,
version=FreeFieldComp.
Synthetic HRTFs use the synthetic type, the generic version, and sample
rates 44100 or 48000. They are expected under
{subject_id}/SYNTHETIC_HRTF/ as HRIR_SONICOM_{sample_rate}.sofa.
SONICOM mesh resources are selected independently from HRTF resources. Scanned
meshes support raw, point_cloud, and watertight versions. Synthetic
meshes support preprocessed, plugged, graded_left, and
graded_right versions. The default mesh selection in hrtfpykit is
type=scanned, version=watertight.
The metadata table is expected at metadata_and_readme/metadata.csv.
Downloads.
The built-in downloader uses the SONICOM transfer URL and supports the
metadata, hrtf, and mesh resource groups. Set download=True to
download resources before dataset construction, and use download_resources
to choose which resource groups to fetch.
download_hrtf_variant and download_mesh_variant control which variants
are downloaded. dataset_hrtf_variant and dataset_mesh_variant control
which local variants are scanned and used for samples. Keeping download
selection separate from dataset construction makes the selected local resources
explicit.
References.
- class hrtfpykit.datasets.SONICOM(root, dataset_hrtf_variant={'sample_rate': 44100, 'type': 'measured', 'version': 'FreeFieldComp'}, dataset_mesh_variant={'type': 'scanned', 'version': 'watertight'}, dataset_hrtf_transform=None, download=False, download_resources='hrtf', download_hrtf_variant={'sample_rate': 44100, 'type': 'measured', 'version': 'FreeFieldComp'}, download_mesh_variant={'type': 'scanned', 'version': 'watertight'}, verify_checksum=True, exclude_subject_ids=None, inputs=None, target=None, split='all', split_ratio=(0.8, 0.1, 0.1), split_seed=0, verbose=False)¶
Dataset interface for local or downloadable SONICOM resources.
SONICOMturns SONICOM HRTF, mesh, and metadata layouts into the sharedBaseDatasetAPI. It resolves measured and synthetic HRTF variants, scanned or synthetic mesh variants, subject metadata, subject exclusions, and split selection before exposing samples through the shared integer-indexed dataset interface.Samples are driven by input and target specs. Acoustic specs load a subject HRTF with
load_hrtf(). Ifdataset_hrtf_transformis provided, it is applied to that loaded HRTF first. Acoustic specs then operate on the dataset-level HRTF version, optionally apply their own HRTF transform, and finally extract time-domain values, frequency-domain values, ITD, ILD, or spherical-harmonic coefficients. Resource specs can add mesh and metadata values to the same sample. Subjects missing any required resource family are removed before row construction.Download selection is independent from dataset construction selection. download_resources, download_hrtf_variant, and download_mesh_variant control which official files are downloaded. dataset_hrtf_variant and dataset_mesh_variant control which local files are scanned and loaded after the download step. The dataset does not infer download resources from inputs or target and does not copy dataset variants into download variants.
- Parameters:
root (str or Path) – Local SONICOM dataset root.
dataset_hrtf_variant (dict or str) – SONICOM HRTF variant used for dataset construction. Full SONICOM HRTF variants use type, sample_rate, and version keys.
dataset_mesh_variant (dict or str) – SONICOM mesh variant used for dataset construction. Full SONICOM mesh variants use type and version keys.
dataset_hrtf_transform (callable or None, default=None) – Optional transform applied to every loaded HRTF before any acoustic spec is evaluated. Spec-level HRTF transforms are applied after this dataset-level transform and before value extraction or derived cue calculation.
download (bool, default=False) – If True, downloads selected official SONICOM resources before dataset construction.
download_resources (str or sequence of str, default=``hrtf``) – Official resources requested for download. This value is not inferred from inputs or target.
download_hrtf_variant (dict, str, or None) – HRTF variant values requested for download. This value is independent from dataset_hrtf_variant.
download_mesh_variant (dict, str, or None) – Mesh variant values requested for download. This value is independent from dataset_mesh_variant.
verify_checksum (bool, default=True) – Whether official SHA-256 checksums are verified during resource download. Keeping this enabled is the recommended behavior. Set it to False only when you intentionally want to skip checksum verification; file existence, non-empty checks, and archive integrity checks still run.
exclude_subject_ids (str, int, sequence, or None, default=None) – SONICOM subjects excluded before scanning and splitting.
inputs (spec, sequence of specs, or None, default=None) – Specs exposed under sample inputs.
target (spec, sequence of specs, or None, default=None) – Specs exposed under sample targets.
split ({
all,train,validation,test}, default=``all``) – Subject split used by this dataset instance.split_ratio (tuple of float, default=(0.8, 0.1, 0.1)) – Train, validation, and test split ratios.
split_seed (int, default=0) – Random seed used for deterministic split assignment.
verbose (bool, default=False) – If True, prints resource and dataset summaries. Download summaries print whenever files are downloaded.
- Returns:
Dataset object supporting indexed sample extraction and subject HRTF loading.
- Return type:
Examples
Build a training split from measured 44.1 kHz FreeFieldComp HRTFs, scanned watertight meshes, and the SONICOM metadata table:
>>> from hrtfpykit.datasets import HRTFSpec, MeshSpec, MetadataSpec, SONICOM >>> dataset = SONICOM( ... root="datasets/sonicom", ... dataset_hrtf_variant={ ... "type": "measured", ... "sample_rate": 44100, ... "version": "FreeFieldComp", ... }, ... dataset_mesh_variant={ ... "type": "scanned", ... "version": "watertight", ... }, ... inputs=[ ... HRTFSpec( ... domain="frequency", ... signal="tf_magnitude_db", ... index_by=("subject", "position", "ear"), ... ears="both", ... position_index=True, ... ear_index=True, ... name="magnitude_db", ... ), ... MeshSpec(name="head_mesh"), ... MetadataSpec(name="subject_metadata"), ... ], ... split="train", ... split_ratio=(0.8, 0.1, 0.1), ... split_seed=42, ... ) >>> sample = dataset[0]
- __getitem__(index)¶
Return one sample by integer row index.
This method resolves the row context, dispatches each input and target spec through the value selector layer, and adds requested context encodings. It is the runtime path that turns dataset state into sample dictionaries for training, evaluation, or direct inspection.
Returned samples always contain
inputs,target, andmetakeys.inputsis None when no input specs and no context encodings were requested.targetis None when no target specs were requested.metacontains dataset and row-provenance fields. When context encodings are requested by specs, keys such asposition_one_hot,position_index,ear_one_hot,frequency_index, orsample_indexare added to sample inputs for rows that carry the corresponding context.- Parameters:
index (int) – Dataset row index. Negative integers follow the underlying row-list behavior. Non-integer indices are rejected.
- Returns:
Sample dictionary with
inputs,target, andmetaentries.- Return type:
dict[str, object]
- Raises:
TypeError – If index is not an integer.
IndexError – If index is outside the constructed row table.
- __len__()¶
Return the number of dataset rows.
Rows are created from selected subjects and any shared indexed axes such as position, ear, frequency, or samples. The result is the number of integer indices accepted by
__getitem__()before normal Python list bounds checking is applied.- Returns:
Number of samples addressable by integer indexing.
- Return type:
int
- property available_subjects: list[str]¶
Return subjects available after resource intersection.
Available subjects are the non-excluded subjects that have every resource required by the selected input and target specs. This property describes resource availability, not necessarily the final train, validation, or test split subset.
- Returns:
Canonical subject identifiers available for the selected specs.
- Return type:
list of str
- property azimuth_angles: ndarray | None¶
Return available dataset azimuth angles.
The angles are derived from the full dataset source grid. They report available spatial coverage independently from the subset selected by position-indexed specs.
- Returns:
Unique azimuth angles from the dataset-level source positions.
- Return type:
numpy.ndarray or None
- property dataset_hrtf_variant: str | dict[str, object] | None¶
Return the selected HRTF resource variant.
This value records the HRTF variant used for local resource scanning and loading. Datasets with one selector axis return a string such as
measured. Datasets with multiple selector axes return a dictionary containing fields such astype,sample_rate, andversion. None means no HRTF variant was selected or no HRTF resource family is configured.- Returns:
Selected HRTF variant stored in the dataset state.
- Return type:
str, dict, or None
- property dataset_mesh_variant: str | dict[str, object] | None¶
Return the selected mesh resource variant.
This value records the mesh variant used for local resource scanning and loading. Datasets with one selector axis return a string. Datasets with multiple selector axes return a dictionary containing fields such as
typeandversion. None means no mesh variant was selected or no mesh resource family is configured.- Returns:
Selected mesh variant stored in the dataset state.
- Return type:
str, dict, or None
- dataset_summary()¶
Return the dataset summary created during construction.
The summary captures the final dataset state after resource intersection and split planning: root path, selected split, subject counts, normalized input and target specs, selected resource variants, row count, and acoustic context when HRTF resources are available.
- Returns:
Human-readable summary of subjects, split, specs, selected variants, row count, and acoustic metadata.
- Return type:
str
- property elevation_angles: ndarray | None¶
Return available dataset elevation angles.
The angles are derived from the full dataset source grid. They describe the available elevation coverage before any position subset selected by specs is applied.
- Returns:
Unique elevation angles from the dataset-level source positions.
- Return type:
numpy.ndarray or None
- property excluded_subjects: list[str]¶
Return subjects excluded from this dataset instance.
This list combines configuration-level exclusions and user-provided exclusions after subject-reference normalization. Excluded subjects are removed before resource intersection and split planning, so they never contribute rows.
- Returns:
Canonical subject identifiers excluded from this dataset instance.
- Return type:
list of str
- property frequency_bins: ndarray | None¶
Return dataset-level frequency bins.
The bins come from the selected HRTF resources when frequency-domain data are available or can be derived. They define the dataset-level frequency axis used by frequency-indexed specs and remain separate from
selected_frequency_indices.- Returns:
Frequency bins from selected HRTF resources, or None when no frequency-domain acoustic context was built.
- Return type:
numpy.ndarray or None
- get_subject_hrtf(subject_id)¶
Load one subject HRTF through the dataset resource map.
This method is the subject-level access point shared by concrete datasets. It applies the same subject normalization, HRTF path lookup, cache, and dataset-level HRTF transform used by indexed sample extraction, so direct inspection and indexed sample extraction use the same loading path.
- Parameters:
subject_id (str or int) – Dataset subject reference. Integer values are mapped to the configured subject order.
- Returns:
Loaded
HRTFobject after applying any dataset-level HRTF transform.- Return type:
- Raises:
ValueError – If dataset state is incomplete, subject mapping fails, HRTF loading fails, or the dataset-level HRTF transform does not return an
HRTFobject.KeyError – If the mapped subject does not have an available HRTF resource in the dataset scan.
FileNotFoundError – If the resolved HRTF file is missing.
- property inputs: tuple[HRTFSpec | ITDSpec | ILDSpec | SHSpec | MeshSpec | AnthropometrySpec | MetadataSpec | ImageSpec | VideoSpec, ...]¶
Return input specs used by this dataset.
The tuple contains the normalized specs that feed sample inputs. It reflects spec workflow decisions such as default names, shared
index_byaxes, context encodings, and dataset-specific validation.- Returns:
Normalized input specs in extraction order.
- Return type:
tuple of specs
- property name: str¶
Return the dataset configuration name.
The value is copied from the active dataset configuration during construction and can be used to identify the dataset source without reading private state.
- Returns:
Dataset name stored in the dataset state.
- Return type:
str
- property positions: ndarray | None¶
Return dataset-level source positions.
These positions describe the full source grid resolved from the selected HRTF resources before spec-level row selection. Position-aware specs may use only a subset of this grid; that subset is exposed separately through
selected_position_indices,selected_azimuth_angles, andselected_elevation_angles.- Returns:
Source-position array from selected HRTF resources, or None when no acoustic context was built.
- Return type:
numpy.ndarray or None
- resources_summary()¶
Return the resource scan summary created during construction.
The summary describes resources relevant to the selected specs and variants, not every resource a dataset family can support. It reports the local resource paths considered during construction, resource counts, missing files, partial media resources, and subject removals caused by resource intersection.
- Returns:
Human-readable summary of scanned resources used by the selected specs.
- Return type:
str
- property root: Path¶
Return the local dataset root.
The returned path is the expanded root stored during construction and used by every resource scanner. It may point to a directory that contains only the resource families required by the selected specs.
- Returns:
Expanded local dataset root.
- Return type:
Path
- property sample_indices: ndarray | None¶
Return dataset-level time sample indices.
The indices describe the full HRIR sample axis from the selected HRTF resources. They support sample-indexed specs while keeping the complete time-domain acoustic context inspectable.
- Returns:
Time-sample indices from selected HRTF resources, or None when no time-domain acoustic context was built.
- Return type:
numpy.ndarray or None
- property sample_rate: float | None¶
Return dataset-level acoustic sample rate.
The value is derived from the selected HRTF resources after resource validation. It represents the dataset-level acoustic context and is not changed by per-spec extraction choices such as position, frequency, or sample selection. None means the constructed dataset did not require or discover HRTF resources.
- Returns:
Sample rate read from selected HRTF resources.
- Return type:
float or None
- property selected_azimuth_angles: ndarray | None¶
Return azimuth angles selected by position-aware specs.
The values summarize the selected position subset used for row generation. They are None when no selected spec produced a position-indexed acoustic subset.
- Returns:
Unique azimuth angles for selected positions.
- Return type:
numpy.ndarray or None
- property selected_elevation_angles: ndarray | None¶
Return elevation angles selected by position-aware specs.
The values summarize the selected position subset used for row generation. They help inspect plane selectors and position-indexed datasets without losing the full elevation coverage available through
elevation_angles.- Returns:
Unique elevation angles for selected positions.
- Return type:
numpy.ndarray or None
- property selected_frequency_indices: tuple[int, ...]¶
Return selected frequency-bin indices.
These indices are used when
frequencyappears in the shared dataset index_by axes. They identify the frequency bins that expand rows and determine how many frequency-indexed samples each selected subject contributes.- Returns:
Frequency-bin indices into
frequency_bins.- Return type:
tuple of int
- property selected_position_indices: tuple[int, ...]¶
Return source position indices selected by specs.
This property exposes the position subset used to build indexed rows after explicit position or plane selection. It is separate from
positionsso selected row context does not hide the full source grid.- Returns:
Source-position indices into
positions.- Return type:
tuple of int
- property selected_sample_indices: tuple[int, ...]¶
Return selected time-sample indices.
These indices are used when
samplesappears in the shared dataset index_by axes. They identify the HRIR samples that expand rows and determine how many sample-indexed samples each selected subject contributes.- Returns:
Time-sample indices into
sample_indices.- Return type:
tuple of int
- property selected_subjects: list[str]¶
Return subjects selected for the requested split.
Selected subjects are the available subjects used to build rows for this dataset instance. For split=``all``, this usually matches
available_subjects; for train, validation, or test splits it is a deterministic subset derived fromsplit_ratioandsplit_seed.- Returns:
Canonical subject identifiers used to build dataset rows.
- Return type:
list of str
- property split: str¶
Return the requested dataset split name.
The split controls which available subjects become rows in this dataset instance. It is stored separately from resource availability so callers can distinguish subjects that have all required resources from the subset chosen for train, validation, or test use.
- Returns:
Split name used by this dataset instance.
- Return type:
str
- property split_ratio: tuple[float, float, float]¶
Return train, validation, and test split ratios.
These ratios are used by the split planner when split is
train,validation, ortest. They remain visible on the dataset object so split behavior can be inspected and reproduced.- Returns:
Three split ratios in train, validation, and test order.
- Return type:
tuple of float
- property split_seed: int¶
Return the split random seed.
The seed controls deterministic subject shuffling before train, validation, and test partitioning. It is part of the dataset state so selected subjects can be reproduced from the same resource set.
- Returns:
Seed used for deterministic split planning.
- Return type:
int
- property target: tuple[HRTFSpec | ITDSpec | ILDSpec | SHSpec | MeshSpec | AnthropometrySpec | MetadataSpec | ImageSpec | VideoSpec, ...]¶
Return target specs used by this dataset.
The tuple contains the normalized specs that feed sample targets. A dataset with no target specs returns None under the
targetkey during indexed access.- Returns:
Normalized target specs in extraction order.
- Return type:
tuple of specs