AnthropometrySpec¶

class hrtfpykit.datasets.AnthropometrySpec(path=None, extensions=None, exclude_row=None, exclude_column=None, accessed_by='row', grouped_by=('subject',), subject_id=True, ear=None, ear_one_hot=False, ear_index=False, transform=None, name=None)¶

Define subject anthropometry values returned by a sample.

AnthropometrySpec asks a dataset to load physical measurement data, such as head, pinna, or ear measurements, and align those values to the selected subject IDs. Values can come from the anthropometry table declared by the active dataset configuration or from a custom table supplied through path. The spec controls table access direction, subject ID handling, ear grouping, row or column exclusion, and optional value transforms.

path selects the anthropometry table used by this resource family. When path is provided, it overrides the table declared by the active dataset configuration for this spec. Absolute paths are used directly and relative paths are resolved from the dataset root. When path is None, the configured anthropometry path is used if the dataset declares one. If neither location is available and an anthropometry spec is requested, dataset construction raises an error. If several anthropometry specs are requested, the first one controls the table path, accepted extensions, and table loading options used for the resource family.

If the spec is passed to inputs, its value appears under dataset[0]["inputs"][name]. If it is passed to target, its value appears under dataset[0]["target"][name]. When name is None, the default key is "anthropometry".

CSV and MAT anthropometry tables are normalized before sample selection. The generic loaded value is a dict mapping measurement names to values for the matched subject, regardless of whether subjects were stored along rows or columns in the source table. Dataset selectors can then filter or reshape that value, for example to keep fields for the selected ear. If transform is provided, the returned type is whatever the transform returns.

During dataset construction, subjects with missing, empty, NaN, or infinite anthropometry fields are removed before samples are built. This avoids constructing datasets with empty table-derived samples, but users should account for it when choosing table fields. If only some fields are incomplete, exclude those fields instead of losing otherwise valid subjects: with accessed_by="row", subjects are rows and fields are columns, so use exclude_column; with accessed_by="column", subjects are columns and fields are rows, so use exclude_row.

The transform callable receives the selected anthropometry value after subject and optional ear selection. Use it when the selected value should be reshaped, filtered, normalized, or converted into the structure expected by the custom pipeline.

Notes

Measurement names, units, table schemas, and ear-specific field naming are dataset specific. Concrete dataset integrations or value selectors handle those details; this spec only describes how anthropometry resources are requested, indexed, and returned.

Parameters:

path (str, Path, or None, default=None) – Optional anthropometry table path. None uses the active dataset anthropometry configuration when available. Absolute paths are used directly. Relative paths are resolved from the dataset root.
extensions (tuple of str or None, default=None) – Optional table extensions to allow.
exclude_row (int, sequence of int, or None, default=None) – Row or column indices to remove while loading the table.
exclude_column (int, sequence of int, or None, default=None) – Row or column indices to remove while loading the table.
accessed_by ({row, column}, default=``row``) – Whether subjects are represented by rows or columns.
grouped_by (str or tuple of str, default=(subject,)) – Dataset grouping used to select anthropometry values. Supported values are "subject", ("subject",), "subject-ear", and ("subject", "ear").
subject_id (bool, default=True) – Whether the table provides explicit subject IDs. CSV tables with subjects in rows use the first column, CSV tables with subjects in columns use the subject column headers, and MAT tables use a subject ID variable.
ear ({both, left, right} or None, default=None) – Optional ear selection for ear-grouped anthropometry.
ear_one_hot (bool, default=False) – Whether ear context encodings are exposed in sample inputs.
ear_index (bool, default=False) – Whether ear context encodings are exposed in sample inputs.
transform (callable or None, default=None) – Optional transform applied to the selected table value after subject and optional ear selection.
name (str or None, default=None) – Optional public key used in sample dictionaries.

Returns:

Specification object consumed by dataset construction.

Return type:

AnthropometrySpec

Examples

>>> from hrtfpykit.datasets import AnthropometrySpec, HUTUBS
>>> dataset = HUTUBS(
...     root="datasets/hutubs",
...     inputs=AnthropometrySpec(name="anthropometry"),
... )
>>> anthropometry = dataset[0]["inputs"]["anthropometry"]
>>> print(type(anthropometry).__name__)
dict
>>> print(list(anthropometry)[:3])
[...]
>>> print({key: anthropometry[key] for key in list(anthropometry)[:2]})
{...}