hyrax.datasets.lancedb_dataset

hyrax.datasets.lancedb_dataset#

Attributes#

_ROW_CACHE_SIZE

Classes#

LanceDBDataset

A minimal Hyrax wrapper around a LanceDB table.

Module Contents#

_ROW_CACHE_SIZE = 16[source]#

class LanceDBDataset(config: dict, data_location: pathlib.Path | str | None = None)[source]#

Bases: hyrax.datasets.dataset_registry.HyraxDataset

A minimal Hyrax wrapper around a LanceDB table.

__init__()[source]#

Overall initialization for all Datasets which saves the config

Subclasses of HyraxDataset ought call this at the end of their __init__ like:

from hyrax.datasets import HyraxDataset

class MyDataset(HyraxDataset):
    def __init__(config):
        <your code>
        super().__init__(config)

If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:

from hyrax.datasets import HyraxDataset
from astropy.table import Table

class MyDataset(HyraxDataset):
    def __init__(config):
        <your code>
        metadata_table = Table(<Your catalog data goes here>)
        super().__init__(config, metadata_table)

Parameters:

config (dict, Optional) – The runtime configuration for hyrax
metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.
object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.

data_location = ''[source]#

table_name[source]#

connect_kwargs[source]#

open_table_kwargs[source]#

db[source]#

table[source]#

lance_dataset[source]#

_row_cache: collections.OrderedDict[source]#

_all_available_fields() → list[str][source]#

_get_row(idx: int)[source]#

Return the PyArrow record-batch for idx, using a small FIFO row cache.

Caching avoids redundant lance_dataset.take calls when multiple get_<field> accessors are invoked for the same sample index, which is the common pattern when DataProvider resolves all fields for a single item. The cache holds at most _ROW_CACHE_SIZE rows; the oldest entry is evicted once that limit is reached.

_resolve_table_name(configured_table_name) → str[source]#

_register_getters() → None[source]#

__len__() → int[source]#