hyrax.datasets.lancedb_dataset#

Attributes#

Classes#

LanceDBDataset

A minimal Hyrax wrapper around a LanceDB table.

Module Contents#

_ROW_CACHE_SIZE = 16[source]#
class LanceDBDataset(config: dict, data_location: pathlib.Path | str | None = None)[source]#

Bases: hyrax.datasets.dataset_registry.HyraxDataset

A minimal Hyrax wrapper around a LanceDB table.

__init__()[source]#

Overall initialization for all Datasets which saves the config

Subclasses of HyraxDataset ought call this at the end of their __init__ like:

from hyrax.datasets import HyraxDataset

class MyDataset(HyraxDataset):
    def __init__(config):
        <your code>
        super().__init__(config)

If per tensor metadata is available, it is recommended that dataset authors create an astropy Table of that data, in the same order as their data and pass that metadata_table as shown below:

from hyrax.datasets import HyraxDataset
from astropy.table import Table

class MyDataset(HyraxDataset):
    def __init__(config):
        <your code>
        metadata_table = Table(<Your catalog data goes here>)
        super().__init__(config, metadata_table)
Parameters:
  • config (dict, Optional) – The runtime configuration for hyrax

  • metadata_table (Optional[Table], optional) – An Astropy Table with 1. the metadata columns desired for visualization AND 2. in the order your data will be enumerated.

  • object_id_column_name (Optional[str], optional) – The name of the column containing object IDs. If None, uses the default from config or creates one from the ids() method.

data_location = ''[source]#
table_name[source]#
connect_kwargs[source]#
open_table_kwargs[source]#
db[source]#
table[source]#
lance_dataset[source]#
_row_cache: collections.OrderedDict[source]#
_all_available_fields() list[str][source]#
_get_row(idx: int)[source]#

Return the PyArrow record-batch for idx, using a small FIFO row cache.

Caching avoids redundant lance_dataset.take calls when multiple get_<field> accessors are invoked for the same sample index, which is the common pattern when DataProvider resolves all fields for a single item. The cache holds at most _ROW_CACHE_SIZE rows; the oldest entry is evicted once that limit is reached.

_resolve_table_name(configured_table_name) str[source]#
_register_getters() None[source]#
__len__() int[source]#