hyrax.datasets.lancedb_dataset#
Attributes#
Classes#
A minimal Hyrax wrapper around a LanceDB table. |
Module Contents#
- class LanceDBDataset(config: dict, data_location: pathlib.Path | str | None = None)[source]#
Bases:
hyrax.datasets.dataset_registry.HyraxDatasetA minimal Hyrax wrapper around a LanceDB table.
Overall initialization for all Datasets which saves the config
Subclasses of HyraxDataset ought call this at the end of their __init__ like:
from hyrax.datasets import HyraxDataset class MyDataset(HyraxDataset): def __init__(config): <your code> super().__init__(config)
If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:
from hyrax.datasets import HyraxDataset from astropy.table import Table class MyDataset(HyraxDataset): def __init__(config): <your code> metadata_table = Table(<Your catalog data goes here>) super().__init__(config, metadata_table)
- Parameters:
config (dict, Optional) – The runtime configuration for hyrax
metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.
object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.
- _get_row(idx: int)[source]#
Return the PyArrow record-batch for idx, using a small FIFO row cache.
Caching avoids redundant
lance_dataset.takecalls when multipleget_<field>accessors are invoked for the same sample index, which is the common pattern when DataProvider resolves all fields for a single item. The cache holds at most_ROW_CACHE_SIZErows; the oldest entry is evicted once that limit is reached.