Mastering HRTF transformation and visualization

hrtfpykit.hrtf and hrtfpykit.plots are easiest to understand when they are used together. A transform creates a new HRTF object with a controlled acoustic change. A metric checks that change numerically, and a plot shows how the change appears in the HRTF signal, spectrum, cue curve, or source map.

This tutorial uses two SONICOM HRTFs. P0001 is the reference object. Transformed copies of P0001 isolate one operation at a time, so the effect is controlled. P0002 is kept as a real subject comparison, so controlled changes can be compared with natural subject variation.

Each section explains what the transform changes, which metric checks the change, which plot visualizes it, and what the figure shows in this example. Some sections use one plot; others use two when the cue view and the waveform view answer different questions.

The examples assume the SOFA azimuth convention used by SONICOM and SOFA: front is 0 degrees, listener left is 90 degrees, back is 180 degrees, and listener right is 270 degrees. Plot functions that expose azimuth_range_mode can display either "0-360" or signed "-180-180" axes.

Prepare SONICOM HRTFs

The tutorial starts with two measured SONICOM files from the Imperial server. Both use the FreeFieldComp variant at 44.1 kHz. Keeping the same variant and source grid makes the metric comparisons direct: every source position in P0001 has the corresponding position in P0002.

This section does not transform data yet. It prepares stable input files so the later metric values and plots come from a known reference, one controlled transformation at a time, and one real subject comparison.

from pathlib import Path

from hrtfpykit.datasets import SONICOM

root = Path("datasets/sonicom")
selected_subject_ids = ("P0001", "P0002")
hrtf_variant = {
    "type": "measured",
    "sample_rate": 44100,
    "version": "FreeFieldComp",
}

SONICOM(
    root=root,
    download=True,
    download_resources="hrtf",
    download_hrtf_variant=hrtf_variant,
    download_server="imperial",
    dataset_hrtf_variant=hrtf_variant,
    download_subject_ids=selected_subject_ids,
    subject_ids=selected_subject_ids,
    verify_checksum=True,
)

reference_path = root / "P0001" / "HRTF" / "HRTF" / "44kHz" / "P0001_FreeFieldComp_44kHz.sofa"
subject_path = root / "P0002" / "HRTF" / "HRTF" / "44kHz" / "P0002_FreeFieldComp_44kHz.sofa"

for sofa_path in (reference_path, subject_path):
    if not sofa_path.exists():
        raise FileNotFoundError(f"Expected SONICOM SOFA file was not found: {sofa_path}")

print(reference_path)
print(subject_path)

Load the acoustic objects

This tutorial uses the loaded HRTFs as fixed comparison anchors. hrtf_reference is the object every controlled transform starts from, and hrtf_subject is kept untouched so the final section can compare a real listener difference against the artificial changes.

The printed dimensions are not a loading lesson. They are the reference frame for the rest of the tutorial: metrics reduce over positions, ears, samples, or frequency bins, and plots read the same active state. When a transform changes one of those axes, the section that introduces the transform makes that change explicit.

import numpy as np
import matplotlib.pyplot as plt

from hrtfpykit.hrtf import (
    hrtf_difference,
    ild,
    ild_difference,
    itd,
    itd_difference,
    load_hrtf,
    rms,
)
from hrtfpykit.plots import (
    compare_amplitude,
    compare_hrtf_difference,
    compare_ild,
    compare_ild_difference,
    compare_itd,
    compare_magnitude,
    plot_amplitude,
    plot_plane_grid,
    compare_absolute_itd,
)

hrtf_reference = load_hrtf(reference_path)
hrtf_subject = load_hrtf(subject_path)

for label, hrtf in (("P0001 reference", hrtf_reference), ("P0002 subject", hrtf_subject)):
    print(label)
    print("  IR shape:", hrtf.IR.values.shape)
    print("  TF shape:", hrtf.TF.values.shape)
    print("  source positions:", hrtf.Sources.get_positions().shape)
    print("  sample rate:", hrtf.IR.sample_rate)
    print("  FFT length:", hrtf.fft_length)

Select a working plane

Selection returns a new HRTF object whose active state contains only the requested subset. Here the selected subset is the horizontal plane at 0 degrees elevation, which is the clearest plane for reading azimuth-dependent ITD and ILD behavior.

The printed shapes check what selection changed: the source axis becomes smaller while ears and samples stay the same. The plane plot visualizes which measured ring is selected, so later cue curves and source-map metrics can be interpreted as measurements around ear height rather than the full sphere.

# Build a horizontal-plane copy while leaving the full reference untouched.
horizontal_reference = hrtf_reference.select(plane="horizontal", plane_angle=0.0)

print("Full reference IR:", hrtf_reference.IR.values.shape)
print("Horizontal-plane IR:", horizontal_reference.IR.values.shape)
print("Full reference unchanged:", hrtf_reference.IR.values.shape)

# Show the selected plane on the original source grid.
plot_plane_grid(hrtf_reference, plane="horizontal", show=False)
plt.show()

Window the impulse response

apply_window multiplies a sample interval by a named window and rebuilds the frequency response from the modified HRIR. Here the Hann window tapers the first 160 samples of both ears, so the waveform is changed but the IR shape stays the same.

The RMSE checks the front-left waveform difference in linear amplitude units, and the global RMS change reports the level effect in dB. In the rendered amplitude plot, the windowed front-left curve has a visibly smaller main peak than the reference: the peak magnitude drops from about 0.196 to about 0.091. That reduction is the visual counterpart of the negative global RMS change printed above the plot.

# Create a windowed copy and compare only the front-left HRIR for a readable figure.
windowed = hrtf_reference.transform.apply_window(
    "hann",
    start_sample=0,
    end_sample=160,
    ear="both",
)

window_rmse = hrtf_difference(
    hrtf_reference,
    windowed,
    metric="rmse",
    ear="left",
    positions="front",
    reduction_axis="global",
    reduction_method="rms",
)
window_rms_change = (
    rms(windowed, output="db", reduction_axis="global")
    - rms(hrtf_reference, output="db", reduction_axis="global")
)

print("Windowed front-left RMSE amplitude:", float(window_rmse))
print("Windowed global RMS change (dB):", float(window_rms_change))

compare_amplitude(
    [hrtf_reference, windowed],
    positions="front",
    ear="left",
    x_axis="time",
    legends=["reference", "Hann window"],
    line_styles=["-", "--"],
    show=False,
)
plt.show()

Remove an interval without shortening the IR

apply_crop removes samples from start_sample up to, but not including, end_sample. Later samples shift left and zeros are appended at the end, so the IR length stays compatible with the original HRTF. This example removes the first 10 samples only from the left ear.

The RMSE checks how much the front-left waveform changed after the interval removal. In the amplitude plot, the cropped curve keeps the same overall amplitude range as the reference, but the main peak moves earlier in time because the start of the signal was removed and the remaining samples shifted left.

# Remove samples 0 through 9 from the left ear and taper into the zero-padded tail.
cropped = hrtf_reference.transform.apply_crop(
    start_sample=0,
    end_sample=10,
    window="hann",
    ear="left",
)

crop_rmse = hrtf_difference(
    hrtf_reference,
    cropped,
    metric="rmse",
    ear="left",
    positions="front",
    reduction_axis="global",
    reduction_method="rms",
)

print("Cropped IR shape:", cropped.IR.values.shape)
print("Cropped front-left RMSE amplitude:", float(crop_rmse))

compare_amplitude(
    [hrtf_reference, cropped],
    positions="front",
    ear="left",
    x_axis="time",
    legends=["reference", "left crop"],
    line_styles=["-", "--"],
    show=False,
)
plt.show()

Delay one ear with start padding

apply_padding can add samples to one ear while preserving the original length. With location="start", preserve_length=True, and ear="left", zeros are inserted at the beginning of the left-ear HRIR and the same number of trailing samples is removed. That creates a controlled timing difference without changing array shape.

itd_difference checks the delay in both samples and microseconds. The absolute ITD plot reflects the new timing cue directly: the reference reaches about 680 microseconds at its largest horizontal-plane delay, while the left-padded copy reaches about 862 microseconds. The printed 8-sample MAE explains that larger curve.

# Delay the left ear by 8 samples while keeping the original IR length.
left_padded = hrtf_reference.transform.apply_padding(
    8,
    location="start",
    preserve_length=True,
    ear="left",
)

padding_itd_samples = itd_difference(
    hrtf_reference,
    left_padded,
    output="samples",
    absolute=True,
    reduction_axis="global",
    reduction_method="mean",
)
padding_itd_time = itd_difference(
    hrtf_reference,
    left_padded,
    output="time",
    absolute=True,
    reduction_axis="global",
    reduction_method="mean",
)

print("Left-padded IR shape:", left_padded.IR.values.shape)
print("ITD MAE (samples):", float(padding_itd_samples))
print("ITD MAE (microseconds):", float(padding_itd_time))

compare_absolute_itd(
    [hrtf_reference, left_padded],
    legends=["reference", "left padded"],
    line_styles=["-", "--"],
    show=False,
)
plt.show()

Low-pass filter the HRIR

apply_iir_filter filters IR.values and rebuilds TF.values. The 3 kHz low-pass filter leaves the lower band close to the reference and strongly attenuates the high-frequency band.

The two LSD metrics show the split clearly: about 0.69 dB below 3 kHz and about 72 dB from 3 kHz to 16 kHz. The magnitude plot shows the same behavior at the front-left source. The filtered curve is already lower than the reference near the low end, then drops far below the reference above the cutoff, reaching very large attenuation at high frequencies.

# Apply the same low-pass filter to both ears.
lowpass = hrtf_reference.transform.apply_iir_filter(
    filter="lowpass",
    cutoff=3000.0,
    order=6,
    ear="both",
)

low_band_lsd = hrtf_difference(
    hrtf_reference,
    lowpass,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 3000.0),
    reduction_axis="global",
    reduction_method="rms",
)
high_band_lsd = hrtf_difference(
    hrtf_reference,
    lowpass,
    metric="lsd",
    ear="left",
    frequency_bands=(3000.0, 16000.0),
    reduction_axis="global",
    reduction_method="rms",
)

print("Low-pass LSD 20-3000 Hz (dB):", float(low_band_lsd))
print("Low-pass LSD 3000-16000 Hz (dB):", float(high_band_lsd))

compare_magnitude(
    [hrtf_reference, lowpass],
    positions="front",
    ear="left",
    x_axis="log",
    freq_max=16000.0,
    legends=["reference", "low-pass 3 kHz"],
    line_styles=["-", "--"],
    show=False,
)
plt.show()

Add gain to one ear

apply_gain changes TF magnitude while preserving phase. Applying +3 dB to the left ear changes the level balance between ears, so ILD is the cue metric that moves. NRMSE is also printed because a one-ear gain changes the reconstructed HRIR amplitudes.

This section uses two plots because they show different parts of the same change. In the ILD plot, the transformed curve is the reference curve shifted upward by 3 dB across the horizontal plane. In the front-left amplitude plot, the transformed waveform keeps the same timing as the reference, but its peak grows from about 0.196 to about 0.276 because the left ear was amplified.

# Increase only the left ear by 3 dB.
left_gain = hrtf_reference.transform.apply_gain(3.0, scale="db", ear="left")

left_gain_ild_mae = ild_difference(
    hrtf_reference,
    left_gain,
    mode="broad-band",
    absolute=True,
    reduction_axis="global",
    reduction_method="mean",
)
left_gain_nrmse = hrtf_difference(
    hrtf_reference,
    left_gain,
    metric="nrmse",
    ear="both",
    reduction_axis="global",
    reduction_method="rms",
)

print("Left-gain ILD MAE (dB):", float(left_gain_ild_mae))
print("Left-gain NRMSE (dB):", float(left_gain_nrmse))

compare_ild(
    [hrtf_reference, left_gain],
    azimuth_range_mode="0-360",
    legends=["reference", "left +3 dB"],
    line_styles=["-", "--"],
    show=False,
)

compare_amplitude(
    [hrtf_reference, left_gain],
    positions="front",
    ear="left",
    x_axis="time",
    legends=["reference", "left +3 dB"],
    line_styles=["-", "--"],
    show=False,
)

plt.show()

Shape the magnitude spectrum

modify_magnitude replaces TF magnitude and keeps the existing phase. This example applies a smooth tilt: low frequencies are lifted relative to high frequencies, and the strongest attenuation appears toward the upper part of the plotted band.

LSD is the main metric because the transform is spectral; NRMSE checks the HRIR difference after the modified TF is converted back to the time domain. In the front-left magnitude plot, the shaped curve is close to the reference at low frequencies and falls below it through the high-frequency region, with the deepest visible separation near the upper end of the spectrum.

# Build a frequency tilt in dB and apply it to the current TF magnitude.
frequency_bins = np.asarray(hrtf_reference.TF.frequency_bins, dtype=float)
reference_magnitude = np.maximum(np.abs(hrtf_reference.TF.values), 1e-12)
tilt_db = np.interp(frequency_bins, [0.0, 16000.0], [3.0, -6.0])
shaped_magnitude_db = 20.0 * np.log10(reference_magnitude) + tilt_db[np.newaxis, np.newaxis, :]
shaped = hrtf_reference.transform.modify_magnitude(shaped_magnitude_db, scale="db")

shaped_lsd = hrtf_difference(
    hrtf_reference,
    shaped,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 16000.0),
    reduction_axis="global",
    reduction_method="rms",
)
shaped_nrmse = hrtf_difference(
    hrtf_reference,
    shaped,
    metric="nrmse",
    ear="left",
    reduction_axis="global",
    reduction_method="rms",
)

print("Spectral tilt LSD (dB):", float(shaped_lsd))
print("Spectral tilt NRMSE (dB):", float(shaped_nrmse))

compare_magnitude(
    [hrtf_reference, shaped],
    positions="front",
    ear="left",
    x_axis="log",
    freq_max=16000.0,
    legends=["reference", "spectral tilt"],
    line_styles=["-", "--"],
    show=False,
)
plt.show()

Add and remove ITD directly

add_itd applies a controlled interaural delay. Positive values delay the left ear; unit="time" uses microseconds. delete_itd estimates the current delay per source and compensates it.

itd_difference checks the added delay against the reference, while the residual absolute ITD checks how much timing cue remains after delete_itd. In the ITD plot, the +100 microsecond curve is shifted upward relative to the reference by about 90.7 microseconds on average. The delete_itd curve sits almost on zero across the horizontal plane, with only a small residual peak of about 22.7 microseconds.

# Add a 100 microsecond delay, then build a separate ITD-normalized copy.
itd_added = hrtf_reference.transform.add_itd(100.0, unit="time")
itd_removed = hrtf_reference.transform.delete_itd()

added_itd_mae = itd_difference(
    hrtf_reference,
    itd_added,
    output="time",
    absolute=True,
    reduction_axis="global",
    reduction_method="mean",
)
removed_residual_itd = np.mean(np.abs(itd(itd_removed, output="time")))

print("Added ITD MAE (microseconds):", float(added_itd_mae))
print("Residual absolute ITD after delete_itd (microseconds):", float(removed_residual_itd))

compare_itd(
    [hrtf_reference, itd_added, itd_removed],
    azimuth_range_mode="0-360",
    legends=["reference", "+100 us", "delete_itd"],
    line_styles=["-", "--", "-."],
    show=False,
)
plt.show()

Add and remove ILD directly

add_ild applies a level difference in dB by changing TF magnitude symmetrically between ears. Positive values raise the left ear relative to the right ear. delete_ild measures frequency-dependent ILD and applies the inverse correction.

ild_difference checks the added level cue, and the residual broad-band ILD checks how much level imbalance remains after delete_ild. In the ILD plot, the +6 dB curve is the reference curve shifted upward by 6 dB. The delete_ild curve is visually flat at zero; the printed residual is around numerical precision.

# Add 6 dB of ILD, then build a separate ILD-normalized copy.
ild_added = hrtf_reference.transform.add_ild(6.0)
ild_removed = hrtf_reference.transform.delete_ild()

added_ild_mae = ild_difference(
    hrtf_reference,
    ild_added,
    mode="broad-band",
    absolute=True,
    reduction_axis="global",
    reduction_method="mean",
)
removed_residual_ild = np.mean(np.abs(ild(ild_removed, mode="broad-band")))

print("Added ILD MAE (dB):", float(added_ild_mae))
print("Residual absolute ILD after delete_ild (dB):", float(removed_residual_ild))

compare_ild(
    [hrtf_reference, ild_added, ild_removed],
    azimuth_range_mode="0-360",
    legends=["reference", "+6 dB ILD", "delete_ild"],
    line_styles=["-", "--", "-."],
    show=False,
)
plt.show()

Inspect cue deletion in the waveform

delete_itd and delete_ild can be combined when you want to inspect a waveform with both timing and level cues reduced. delete_itd compensates left-right delay in the HRIR, while delete_ild equalizes the frequency-dependent level difference in the TF and rebuilds the IR.

The residual ITD and ILD metrics check the cue-reduced copy numerically. The two amplitude plots show the left-position HRIR before and after cue deletion. In the original plot, the left ear dominates the left-source response: its peak magnitude is about 0.299, while the right-ear peak is about 0.022 and occurs later. In the cue-reduced plot, the two ears become much closer in level, with peaks around 0.077 and 0.064, while the waveform still keeps source-dependent structure.

# Remove ITD first, then remove ILD from the timing-normalized copy.
without_itd_and_ild = hrtf_reference.transform.delete_itd().transform.delete_ild()

without_cues_residual_itd = np.mean(np.abs(itd(without_itd_and_ild, output="time")))
without_cues_residual_ild = np.mean(np.abs(ild(without_itd_and_ild, mode="broad-band")))

print("Residual absolute ITD after delete_itd + delete_ild (microseconds):", float(without_cues_residual_itd))
print("Residual absolute ILD after delete_itd + delete_ild (dB):", float(without_cues_residual_ild))

plot_amplitude(
    hrtf_reference,
    positions="left",
    ear="both",
    x_axis="time",
    show=False,
)
plot_amplitude(
    without_itd_and_ild,
    positions="left",
    ear="both",
    x_axis="time",
    show=False,
)
plt.show()

Map where a cue change appears

A global metric can confirm that a transform changed ILD, but it does not show where the change appears on the source grid. compare_ild_difference computes broad-band ILD differences and draws one value per source position.

The printed ILD MAE is 3 dB, and the heatmap confirms why: the plotted values are essentially uniform at 3 dB across the grid. That is the spatial signature of the one-ear gain transform used here; it adds the same broad-band level offset everywhere instead of concentrating the change in one region.

# Reuse the left-ear gain transform and map its broad-band ILD difference.
map_ild_mae = ild_difference(
    hrtf_reference,
    left_gain,
    absolute=True,
    reduction_axis="global",
    reduction_method="mean",
)

print("Mapped left-gain ILD MAE (dB):", float(map_ild_mae))

compare_ild_difference(
    hrtf_reference,
    left_gain,
    absolute=True,
    plot_type="heatmap",
    azimuth_range_mode="0-360",
    show=False,
)
plt.show()

Map spectral error and reductions

hrtf_difference returns a metric array before reduction. For LSD with one ear, the natural result is one value per source. Reducing over global collapses that array into one scalar; keeping the source axis allows compare_hrtf_difference to draw the spatial map.

This section uses the low-pass transform because its spectral effect is large and easy to localize. The printed unreduced shape is (793,), one value per source, and the global RMS LSD is about 65 dB. The heatmap stays high across the whole grid, with plotted values roughly from 56 dB to 67 dB, showing that the low-pass filter creates broad spectral error rather than a small localized region.

# Keep the source axis for the map, then reduce the same metric globally.
lsd_by_source = hrtf_difference(
    hrtf_reference,
    lowpass,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 16000.0),
)
lsd_global_rms = hrtf_difference(
    hrtf_reference,
    lowpass,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 16000.0),
    reduction_axis="global",
    reduction_method="rms",
)

print("Low-pass LSD by source shape:", lsd_by_source.shape)
print("Low-pass global RMS LSD (dB):", float(lsd_global_rms))

compare_hrtf_difference(
    hrtf_reference,
    lowpass,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 16000.0),
    plot_type="heatmap",
    azimuth_range_mode="0-360",
    show=False,
)
plt.show()

Build a short transformation pipeline

Transforms return new objects, so operations can be chained. This pipeline selects the horizontal plane, crops a short interval, windows the early response, and applies a small global gain. The unprocessed horizontal plane is kept as the reference so both HRTFs have the same source grid.

LSD checks the spectral effect of the pipeline, and RMSE checks the time-domain amplitude difference. In the front-left magnitude plot, the processed curve is lower than the horizontal reference across the band and drops much more at high frequencies, reaching about -48 dB. The figure shows the combined result of the crop, window, and gain rather than a single isolated operation.

# Chain selection, crop, window, and gain while keeping the original HRTF unchanged.
processed = (
    hrtf_reference
    .select(plane="horizontal", plane_angle=0.0)
    .transform.apply_crop(start_sample=96, end_sample=128, window="hann", ear="left")
    .transform.apply_window("hann", start_sample=0, end_sample=160, ear="both")
    .transform.apply_gain(-3.0, scale="db", ear="both")
)

pipeline_lsd = hrtf_difference(
    horizontal_reference,
    processed,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 16000.0),
    reduction_axis="global",
    reduction_method="rms",
)
pipeline_rmse = hrtf_difference(
    horizontal_reference,
    processed,
    metric="rmse",
    ear="left",
    reduction_axis="global",
    reduction_method="rms",
)

print("Horizontal reference IR:", horizontal_reference.IR.values.shape)
print("Processed horizontal IR:", processed.IR.values.shape)
print("Original transformed flag:", hrtf_reference.is_transformed())
print("Processed transformed flag:", processed.is_transformed())
print("Pipeline LSD (dB):", float(pipeline_lsd))
print("Pipeline RMSE amplitude:", float(pipeline_rmse))

compare_magnitude(
    [horizontal_reference, processed],
    positions="front",
    ear="left",
    x_axis="log",
    freq_max=16000.0,
    legends=["horizontal reference", "processed"],
    line_styles=["-", "--"],
    show=False,
)
plt.show()

Compare controlled changes with a real subject

A controlled transform answers a narrow question: what changed when one operation was applied to P0001? A real subject comparison asks how different another measured listener is under the same metric.

This section prints the same global metrics for a low-pass transform, a +6 dB ILD transform, and P0002: LSD for spectral difference, RMSE for HRIR amplitude difference, and ILD MAE for broad-band cue difference. The real-subject heatmap ranges from about 3.7 dB to 13.3 dB, with a structured spatial pattern. That is different from the uniform 3 dB ILD map and the very large low-pass spectral map, so the figure gives scale to the controlled edits.

# Compare controlled transforms and a real subject with the same scalar metrics.
comparison_cases = {
    "P0001 low-pass": lowpass,
    "P0001 +6 dB ILD": ild_added,
    "P0002 real subject": hrtf_subject,
}

for name, compared in comparison_cases.items():
    lsd_value = hrtf_difference(
        hrtf_reference,
        compared,
        metric="lsd",
        ear="left",
        frequency_bands=(20.0, 16000.0),
        reduction_axis="global",
        reduction_method="rms",
    )
    rmse_value = hrtf_difference(
        hrtf_reference,
        compared,
        metric="rmse",
        ear="left",
        reduction_axis="global",
        reduction_method="rms",
    )
    ild_value = ild_difference(
        hrtf_reference,
        compared,
        mode="broad-band",
        absolute=True,
        reduction_axis="global",
        reduction_method="mean",
    )
    print(
        f"{name}: LSD={float(lsd_value):.3f} dB, "
        f"RMSE={float(rmse_value):.5f}, ILD MAE={float(ild_value):.3f} dB"
    )

compare_hrtf_difference(
    hrtf_reference,
    hrtf_subject,
    metric="lsd",
    ear="left",
    frequency_bands=(20.0, 16000.0),
    plot_type="heatmap",
    azimuth_range_mode="0-360",
    show=False,
)
plt.show()