hyrax.datasets.result_dataset#
Lance-based storage for inference results.
This module provides ResultDataset and ResultDatasetWriter classes that store inference results in Lance columnar format instead of batched .npy files.
Attributes#
Classes#
Writer for Lance-based inference results. |
|
Reader for Lance-based inference results. |
Module Contents#
- class ResultDatasetWriter(result_dir: str | pathlib.Path)[source]#
Writer for Lance-based inference results.
Writes inference results incrementally to Lance format using table.add() for each batch, avoiding memory accumulation.
Initialize the writer.
- Parameters:
result_dir (Union[str, Path]) – Directory where Lance database will be created
- class ResultDataset(config: dict, data_location: pathlib.Path | str)[source]#
Bases:
hyrax.datasets.dataset_registry.HyraxDatasetReader for Lance-based inference results.
Provides HyraxQL-compatible getters to results stored in Lance format.
Initialize the dataset.
- Parameters:
config (dict) – Hyrax configuration dictionary
data_location (Union[Path, str]) – Path to results directory containing lance_db/
- __getitem__(idx: int | numpy.ndarray)[source]#
Get data by index.
- Parameters:
idx (Union[int, np.ndarray]) – Single index or array of indices
- Returns:
Data tensor(s)
- Return type:
np.ndarray
- Raises:
IndexError – If index is out of range
- __get_all__()[source]#
Get all data tensors in the dataset.
This is a specialized method that is meant for internal use (e.g. visualize_v2). It retrieves all tensors efficiently by assuming column names and accessing the array buffer directly, without creating Python objects for each row.
- Returns:
All data tensors
- Return type:
np.ndarray
- get_data(idx: int)[source]#
Get data tensor at index (HyraxQL getter).
- Parameters:
idx (int) – Index of the data item
- Returns:
Data tensor
- Return type:
np.ndarray