User guide¶

This page walks through the SDK from end to end. For a focused tour see the Quickstart on the home page; for hands-on code see the Examples.

Opening a file¶

The File class is the single entry point. It auto-detects the format from the file suffix:

from invisensing import File

f = File("acquisition.dat")    # DAT, native Rust reader
f = File("acquisition.h5")     # HDF5 via h5py
f = File("acquisition.tdms")   # TDMS via npTDMS
f = File("acquisition.sgy")    # SEG-Y via segyio

If your file has no recognised suffix, pass format= explicitly:

f = File("acquisition_data", format="dat")

Use it as a context manager — the underlying file handle is closed cleanly even when an exception is raised:

with File("acquisition.dat") as f:
    ...
# file handle released here

Inspecting metadata¶

Every File exposes the parsed header as plain Python properties — no struct unpacking, no flag bit-twiddling:

with File("acquisition.dat") as f:
    f.line_size              # samples per pulse on the wire
    f.positions_per_line     # spatial fibre positions per pulse
    f.sample_size            # bytes per sample (2, 4, 8…)
    f.sample_rate            # Hz
    f.trig_frequency         # Hz
    f.num_lines              # total trigger pulses recorded
    f.lines_left             # pulses behind the read cursor
    f.duration               # seconds (= num_lines / trig_frequency)
    f.distance               # metres of fibre covered per pulse
    f.range                  # ADC voltage range (V)
    f.timestamp              # producer-side timestamp string
    f.dtype                  # numpy dtype of the wire samples
    f.shape                  # (num_lines, line_size)
    f.spatial_shape          # (num_lines, positions_per_line)

    # Flag predicates — properties, work in if/while idioms.
    if f.is_demodulated and f.is_interleaved:
        ...

    f.is_demodulated, f.is_interleaved, f.is_float, f.is_phase
    f.is_unsigned, f.is_ac, f.is_hiz

Acquisition modes (`Mode` enum)¶

A PCIe7821 acquisition can run in one of four on-board DSP modes. Each produces a different file content. The Mode enum lets you dispatch on what's in the file with one match statement:

from invisensing import Mode

with File("acquisition.dat") as f:
    match f.mode:
        case Mode.RAW:               ...
        case Mode.IQ:                ...
        case Mode.ARCTAN_MAGNITUDE:  ...
        case Mode.PHASE:             ...

`Mode`	Wire layout per pulse	What the FPGA writes
`RAW`	`[s₀, s₁, …, s_{N-1}]`	Raw ADC codes, no DSP
`IQ`	`[I₀, Q₀, I₁, Q₁, …]`	I/Q pair after NCO + LPF, interleaved
`ARCTAN_MAGNITUDE`	`[arctan₀, √₀, arctan₁, √₁, …]`	`arctan(Q/I)` + `√(I²+Q²)`, interleaved
`PHASE`	`[φ₀, φ₁, …, φ_{N-1}]`	Unwrapped DAS phase, post fading-suppression + gauge differential + detrend

The SDK is mode-faithful, not mode-translating

The library never silently re-derives one product from another. The on-board DSP chain (fading suppression, spatial differential, detrend filter) is not reproducible in software from an earlier tap, so the channel you can extract is the one the FPGA wrote. Calling the wrong extractor raises a ValueError that names both modes:

>>> with File("phase.dat") as f:
...     f.get_i()
ValueError: get_i: file mode is 'phase', but this extractor
only applies to 'iq'.

Extracting channels¶

Every extractor has the same shape: (rows, positions_per_line), typed with the most natural dtype for the lane. Each one accepts an optional pre-read buffer so you can reuse it across extractors without double-advancing the cursor:

with File("iq.dat") as f:
    chunk = f.read_lines(1000)
    i = f.get_i(chunk)            # reuses the buffer
    q = f.get_q(chunk)            # same — no extra file read
    iq = f.get_iq(chunk)          # same again

Default extractors — wire dtype¶

Fast, no copy beyond the de-interleave, no precision loss on round-trip writes.

`Mode`	Method	dtype	Notes
`RAW`	`read_lines()`	`int16`	ADC codes
`IQ`	`get_i()` / `get_q()`	`int16`	One lane each
`IQ`	`get_iq()`	`complex64`	`I + j·Q` packed
`ARCTAN_MAGNITUDE`	`get_arctan()`	`int16`	Fixed-point: `32767 ↔ +π`
`ARCTAN_MAGNITUDE`	`get_magnitude()`	`uint16`	Bitcast from wire i16
`PHASE`	`get_phase()`	`float32`	Radians (already converted)

Physical-unit extractors — `float32`¶

When you start doing DSP, you typically want volts and radians, not fixed-point codes. The _volts / _radians family does the scaling for you:

`Mode`	Method	dtype	Unit	Scaling
`IQ`	`get_i_volts()` / `get_q_volts()`	`float32`	V	`i16 × range / 32768`
`IQ`	`get_iq_volts()`	`complex64`	V	real/imag both in volts
`ARCTAN_MAGNITUDE`	`get_arctan_radians()`	`float32`	rad	`i16 × π / 32768`
`ARCTAN_MAGNITUDE`	`get_magnitude_volts()`	`float32`	V	`u16 × range / 32768`

with File("iq.dat") as f:
    iq_v = f.get_iq_volts()       # complex64, volts
    envelope_v = np.abs(iq_v)     # volts
    phase_rad  = np.angle(iq_v)   # radians, wrapped

Memory cost of physical units

Default extractors keep the wire dtype — a 4 GB i16 capture becomes a 4 GB numpy array. The _volts / _radians family allocates a new float32 buffer (2× the size for i16/u16 inputs, 8× for the complex64 get_iq_volts). For multi-GB captures, prefer the wire dtype + an explicit .astype(np.float32) * scale inside your processing loop.

Get every channel in one call¶

For ad-hoc scripts, channels() returns a dict keyed by mode:

with File("iq.dat") as f:
    ch = f.channels()
    # {"i": int16, "q": int16, "iq": complex64}

with File("arctan_mag.dat") as f:
    ch = f.channels()
    # {"arctan": int16, "magnitude": uint16}

Streaming large files¶

read_lines(n) advances a cursor in the file. Iterate to process captures larger than memory:

with File("long_capture.h5") as f:
    while f.lines_left:
        chunk = f.read_lines(10_000)
        process(chunk)

For one-pulse-at-a-time processing, iterate the file directly:

with File("long_capture.dat") as f:
    for pulse in f:               # yields (line_size,) arrays
        process(pulse)

For small files, read_all() returns everything in one allocation:

with File("small.dat") as f:
    data = f.read_all()           # (num_lines, line_size)

rewind() resets the cursor:

with File("small.dat") as f:
    a = f.read_all()
    f.rewind()
    b = f.read_all()
    assert (a == b).all()

File formats — what each backend does¶

The File facade dispatches to a format-specific backend. All four funnel through the same Header so the channel extractors above are format-agnostic.

DAT — native Rust¶

128-byte header + raw little-endian samples. Read by the native invisensing._core.DatReader — no third-party dep, no Python overhead in the hot loop. The most common format and the fastest path.

HDF5 — via `h5py`¶

Audace HDF5 files store the samples as a (num_traces, samples_per_trace) dataset named acoustic_data, typed by bytes_per_sample and the FLOAT attribute. Every header field is mirrored as a scalar attribute. The whole dataset loads into RAM at open time.

TDMS — via `npTDMS`¶

Single channel acoustic_data under the group AudaceGroup. Header fields travel as file-level properties.

SEG-Y — via `segyio`¶

One trace per fibre position, samples_per_trace = line_size (wire-side, doubled for INTERLEAVED). Header fields not expressible in the SEG-Y binary header are encoded as discrete lines in the EBCDIC textual header (e.g. C21 HEADER FLAGS (raw bits): 0xNNNNNNNN). The SDK parses those back into a Header.

Writing files¶

export_dat() writes a 2-D numpy array out as an Audace .dat file (128-byte header + raw samples):

from invisensing import export_dat
import numpy as np

samples = np.random.randn(10_000, 512).astype("float32")
export_dat(
    "out.dat",
    samples,
    sample_rate=250_000_000,
    trig_frequency=2_000,
    range_v=1.0,
    timestamp="2026-05-25_12:34:56",
)

The FLOAT flag is set automatically when data.dtype.kind == 'f'.

Performance & safety notes¶

Performance¶

DAT reads stream through a BufReader<File>; the inner read_exact runs with the GIL released.
The typed-conversion path allocates an uninitialised Vec<T> and fills it with one copy_nonoverlapping. Skips the 4 GB of pointless DRAM writes a vec![0; n] would do on a 4 GB read.
De-interleave kernels read via as_slice() and iterate with chunks_exact(2) — LLVM auto-vectorises into SSE/AVX gather-extract instructions on x86_64.
HDF5 / TDMS / SEG-Y loading is C-backed by their respective libraries; the de-interleave step always runs through the same Rust kernels, so channel-extraction perf doesn't depend on the source format.

Safety¶

Strict header validation at open: line_size > 0, sample_size ∈ {1, 2, 4, 8}, INTERLEAVED ⇒ even line_size. Downstream maths can assume positive sizes everywhere.
Bounded unsafe: three small blocks in the Rust extension, each one documented and exercised by 1-million-sample correctness tests.
No mmap: deliberate. Memory-mapped files turn I/O errors into SIGBUS (segfault) instead of Python exceptions, which is unacceptable in a long-running lab process.
Thread safety: each File owns its own backend; sharing a single File across Python threads is not supported. Open one per thread for parallel reads — they don't compete for any internal state.

Legacy SDK compatibility¶

The original invisensing.File module is preserved unchanged:

import invisensing.File as iFile

file = iFile.File("acquisition.dat")
file.get_line_size(), file.get_trigger_frequency()
file.is_demodulated(), file.is_acquisition_ac()

while file.get_lines_left() > 0:
    data = file.get_lines(5)
    process(data)

iFile.export("out.dat", data, file.get_timestamp(),
             file.get_trigger_frequency(), file.get_sample_rate(),
             file.get_range())

The legacy class delegates to the same Rust-backed implementation as the modern File, so legacy scripts run at the new speed without any modification.