MetadataSpec¶
- class hrtfpykit.datasets.MetadataSpec(path=None, extensions=None, exclude_row=None, exclude_column=None, accessed_by='row', grouped_by=('subject',), subject_id=True, ear=None, ear_one_hot=False, ear_index=False, transform=None, name=None)¶
Define subject metadata values returned by a sample.
MetadataSpecasks a dataset to load general subject annotations and align them to the selected subject IDs. Metadata can come from the table declared by the active dataset configuration or from a custom table supplied throughpath. Metadata is kept separate fromAnthropometrySpecso physical measurements and general annotations can be requested under different resource families and sample keys.pathselects the metadata table used by this resource family. Whenpathis provided, it overrides the table declared by the active dataset configuration for this spec. Absolute paths are used directly and relative paths are resolved from the dataset root. Whenpathis None, the configured metadata path is used if the dataset declares one. If neither location is available and a metadata spec is requested, dataset construction raises an error. If several metadata specs are requested, the first one controls the table path, accepted extensions, and table loading options used for the resource family.If the spec is passed to
inputs, its value appears underdataset[0]["inputs"][name]. If it is passed totarget, its value appears underdataset[0]["target"][name]. Whennameis None, the default key is"metadata".CSV and MAT metadata tables are normalized before sample selection. The generic loaded value is a
dictmapping metadata field names to values for the matched subject, regardless of whether subjects were stored along rows or columns in the source table. Dataset selectors can then filter or reshape that value. Iftransformis provided, the returned type is whatever the transform returns.The
transformcallable receives the selected metadata value after subject and optional ear selection. Use it when the selected value should be reshaped, filtered, normalized, or converted into the structure expected by the custom pipeline.Notes
Metadata availability, field names, and semantic meaning are dataset specific. If the active dataset configuration does not declare a metadata table, provide
pathto a compatible custom table or use a dataset integration that declares metadata resources.- Parameters:
path (str, Path, or None, default=None) – Optional metadata table path.
Noneuses the active dataset metadata configuration when available. Absolute paths are used directly. Relative paths are resolved from the dataset root.extensions (tuple of str or None, default=None) – Optional table extensions to allow.
exclude_row (int, sequence of int, or None, default=None) – Row or column indices to remove while loading the table.
exclude_column (int, sequence of int, or None, default=None) – Row or column indices to remove while loading the table.
accessed_by ({
row,column}, default=``row``) – Whether subjects are represented by rows or columns.grouped_by ({
subject} or (subject,ear), default=(subject,)) – Dataset grouping used to select metadata values.subject_id (bool, default=True) – Whether the table provides explicit subject IDs. CSV tables with subjects in rows use the first column, CSV tables with subjects in columns use the subject column headers, and MAT tables use a subject ID variable.
ear ({
both,left,right} or None, default=None) – Optional ear selection for ear-grouped metadata.ear_one_hot (bool, default=False) – Whether ear context encodings are exposed in sample inputs.
ear_index (bool, default=False) – Whether ear context encodings are exposed in sample inputs.
transform (callable or None, default=None) – Optional transform applied to the selected metadata value after subject and optional ear selection.
name (str or None, default=None) – Optional public key used in sample dictionaries.
- Returns:
Specification object consumed by dataset construction.
- Return type:
Examples
>>> from hrtfpykit.datasets import MetadataSpec, SONICOM >>> dataset = SONICOM( ... root="datasets/sonicom", ... inputs=MetadataSpec(name="metadata"), ... ) >>> metadata = dataset[0]["inputs"]["metadata"] >>> print(type(metadata).__name__) dict >>> print(list(metadata)[:3]) [...] >>> print({key: metadata[key] for key in list(metadata)[:2]}) {...}