hyrax.verbs

hyrax.verbs#

Submodules#

Classes#

`DatabaseConnection`	Verb to create a connection to a vector database with inference results.
`Umap`	Umap latent space points into 2d
`Infer`	Inference verb
`InferStream`	Streaming inference verb — loads model once, processes batches on demand.
`Train`	Train verb
`Test`	Test verb - evaluates a trained model on test data
`Visualize`	Verb to create a visualization
`VisualizeV2`	Verb to create a hexbin visualization of a 2D latent space.
`Lookup`	Look up an inference result using the ID of a data member
`SaveToDatabase`	Verb to insert inference results into a vector database index for fast
`Model`	Resolves the model class that is defined in the config file.
`ToOnnx`	Export the model to ONNX format
`Engine`	This verb drives inference with an ONNX model in production.
`Prepare`	Prepare Verb, Prepares a dataset and returns it
`ReduceDimensions`	Verb to reduce the dimensionality of a dataset
`CreateSplits`	Create and persist reproducible dataset splits.
`Verb`	Base class for all hyrax verbs

Functions#

`all_class_verbs`(→ list[str])	Returns all verbs that are currently registered with a class-based implementation
`all_verbs`(→ list[str])	Returns all verbs that are currently registered
`fetch_verb_class`(→ type[Verb] \| None)	Gives the class object for the named verb
`is_verb_class`(→ bool)	Returns true if the verb has a class based implementation

Package Contents#

class DatabaseConnection(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Verb to create a connection to a vector database with inference results.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'database_connection'#

add_parser_kwargs#

description = 'Create a connection to the vector database for interactive queries.'#

static setup_parser(parser: argparse.ArgumentParser)[source]#: Stub of parser setup

run_cli(args: argparse.Namespace | None = None)[source]#: Stub CLI implementation

run(database_dir: pathlib.Path | str | None = None)[source]#

Create a connection to the vector database for interactive queries.

Parameters:: database_dir (str or Path, Optional) – The directory containing the database that will be connected to. If None, attempt to connect to the most recently created …-vector-db-… directory. If specified, it can point to either an empty directory or a directory containing an existing vector database. If the latter, the database will be updated with the new vectors.

_get_database_type_from_config(database_dir: pathlib.Path)[source]#

Internal function that will read a config file from a directory and return the name of the vector database from it. i.e. “chromadb”, “qdrant”.

Parameters:: database_dir (Path) – The directory containing the vector database and the config file that be used as reference.
Returns:: The config value for [“vector_db”][“name”] in the reference config.
Return type:: str

class Umap(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Umap latent space points into 2d

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'umap'#

add_parser_kwargs#

description = 'Transforms the entire dataset into a lower-dimensional space by fitting a UMAP model.'#

static setup_parser(parser: argparse.ArgumentParser)[source]#: Stub of parser setup

run_cli(args: argparse.Namespace | None = None)[source]#: Stub CLI implementation

run(input_dir: pathlib.Path | str | None = None, model_path: pathlib.Path | str | None = None)[source]#

Deprecated wrapper for reduce_dimensions running the UMAP algorithm.

This wrapper delegates execution to reduce_dimensions with algorithm='umap' so that umap verb remains available for backward compatibility. But users are encouraged to switch to using reduce_dimensions.

Parameters:

input_dir (str or Path, Optional) – The directory containing the inference results.
model_path (str or Path, Optional) – The path to a pre-existing UMAP model.

Returns:

The method does not return anything but saves the UMAP representations to disk.

Return type:

None

_run(input_dir: pathlib.Path | str | None = None, model_path: pathlib.Path | str | None = None)[source]#: See run()

class Infer(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Inference verb

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'infer'#

add_parser_kwargs#

description = 'Run inference on a model using a dataset.'#

REQUIRED_DATA_GROUPS = ('infer',)#

OPTIONAL_DATA_GROUPS = ()#

static setup_parser(parser)[source]#: We don’t need any parser setup for CLI opts

run_cli(args=None)[source]#: CLI stub for Infer verb

run()[source]#

Run inference on a model using a dataset

Parameters:: config (dict) – The parsed config file as a nested dict

class InferStream(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Streaming inference verb — loads model once, processes batches on demand.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'infer_stream'#

add_parser_kwargs#

description = 'Run streaming inference: load model once and process batches interactively.'#

REQUIRED_DATA_GROUPS = ('infer_stream',)#

OPTIONAL_DATA_GROUPS = ()#

static setup_parser(parser)[source]#: No CLI arguments needed.

abstractmethod run_cli(args=None)[source]#: CLI stub — infer_stream is a programmatic API only.

run(sample_batch: dict | None = None) → InferStreamSession[source]#

Set up the model and return a session for streaming inference.

There are two ways to drive the session:

Data-source driven (sample_batch=None) — configure a streaming dataset under [data_request.infer_stream] (e.g. KafkaStreamDataset). The model is pre-flighted from the stream itself and a DataLoader is built, so the returned session can be iterated directly:
```
with hy.infer_stream() as session:
    for batch, results in session:
        ...
```

Manual — pass a representative sample_batch and feed batches yourself:

with hy.infer_stream(sample_batch=batch) as session:
    results = session.process(batch)

Parameters:: sample_batch (dict | None) – A representative batch dict with "object_id" and model-specific data fields, used to pre-flight the model architecture. When None, the model is pre-flighted from a [data_request.infer_stream] streaming dataset instead.
Returns:: A context manager / session object. Iterate it (data-source driven) or call session.process(batch) (manual); call session.close() when done.
Return type:: InferStreamSession
Raises:: ValueError – If sample_batch is None and no [data_request.infer_stream] is configured.

class Train(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Train verb

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'train'#

add_parser_kwargs#

description = 'Train a model using provided data.'#

REQUIRED_DATA_GROUPS = ('train',)#

OPTIONAL_DATA_GROUPS = ('validate', 'test')#

static setup_parser(parser)[source]#: We don’t need any parser setup for CLI opts

run_cli(args=None)[source]#: CLI stub for Train verb

run()[source]#: Run the training process for the configured model and data loader. Returns the trained model.

static _training(rank, model, dataset, config, results_dir)[source]#

static _log_params(config, results_dir)[source]#

Log the various parameters to mlflow from the config file.

Parameters:

config (dict) – The main configuration dictionary
results_dir (str) – The full path to the results sub-directory

class Test(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Test verb - evaluates a trained model on test data

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'test'#

add_parser_kwargs#

description = 'Evaluate a trained model on test data.'#

REQUIRED_DATA_GROUPS = ('test',)#

OPTIONAL_DATA_GROUPS = ()#

static setup_parser(parser)[source]#: We don’t need any parser setup for CLI opts

run_cli(args=None)[source]#: CLI stub for Test verb

run()[source]#

Run the test process for the configured model on test data. This evaluates a trained model, saves outputs, and returns metrics.

Note: The configuration dictionary will be updated with the full path to the model weights file that is loaded into the model (config[“test”][“model_weights_file”]).

Returns:: Dataset containing test results that can be used for further analysis
Return type:: InferenceDataset

static _log_params(config, results_dir)[source]#

Log the various parameters to mlflow from the config file.

Parameters:

config (dict) – The main configuration dictionary
results_dir (str) – The full path to the results sub-directory

class Visualize(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Verb to create a visualization

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'visualize'#

add_parser_kwargs#

description = 'Generate a visualization of a latent space created by a UMAP reduction.'#

REQUIRED_DATA_GROUPS = ('infer',)#

OPTIONAL_DATA_GROUPS = ()#

static setup_parser(parser: argparse.ArgumentParser)[source]#: CLI not implemented for this verb

run_cli(args: argparse.Namespace | None = None)[source]#: CLI not implemented for this verb

run(input_dir: pathlib.Path | str | None = None, *, return_verb: bool = False, make_lupton_rgb_opts: dict | None = None, **kwargs)[source]#

Generate an interactive notebook visualization of a latent space that has been umapped down to 2d.

The plot contains two holoviews objects, a scatter plot of the latent space, and a table of objects which can be populated by selecting from the scatter plot.

Parameters:

input_dir (Optional[Union[Path, str]], optional) – Directory holding the output from the ‘umap’ verb, by default None. When not provided, we use [results][inference_dir] from config. If that’s false; we the most recent umap in the current results directory.
return_verb (bool, optional) – If True, also return the underlying Visualize instance for post-hoc access to selection state. Defaults to False.
make_lupton_rgb_opts (dict, optional) – Dictionary of options to pass to astropy’s make_lupton_rgb function for RGB image creation. Default is {“stretch”: 5, “Q”: 8}. Common parameters include stretch (brightness/contrast) and Q (softening parameter for asinh transformation).
kwargs – Keyword arguments are passed through as options for the plot object as plot_pane.opts(**plot_options). It is not recommended to override the “tools” plot option, because that will break the integration between the plot selection operations and the table.

Returns:

Holoviews, if return_verb = True (defaul) – A Collection of Haloviews Panes
tuple of (pane, Visualize), if return_verb = True – Returns a 2-tuple with the pane and the verb instance.

visible_points(x_range: tuple | list, y_range: tuple | list)[source]#

Generate a hv.Points object with the points inside the bounding box passed.

This is the event handler for moving or scaling the latent space plot, and is called by Holoviews.

Parameters:

x_range (tuple or list) – min and max x values
y_range (tuple or list) – min and max y values

Returns:

Points lying inside the bounding box passed

Return type:

hv.Points

update_points(**kwargs) → None[source]#

This is the main UI event handler for selection tools on the plot. If you are a dynamic map in the layout of the visualizer who updates based on plot selection you MUST call this function.

This function accepts the data values from all streams and uses the differences between the current call and prior calls to differentiate between different UI events.

The self.prev_kwargs dictionary is used to store previous calls to this function, and the _called_* helpers perform the differencing for each case.

Calling this function GUARANTEES that self.points, self.points_id, and self.points_idx are up-to-date with the user’s latest selection, regardless of the order that Holoviews evaluates the DynamicMaps in.

_called_lasso(kwargs)[source]#

_called_tap(kwargs)[source]#

_called_box_select(kwargs)[source]#

poly_select_points(geometry) → tuple[numpy.typing.ArrayLike, numpy.typing.ArrayLike, numpy.typing.ArrayLike][source]#

Select points inside a polygon.

Parameters:: geometry (list) – List of x/y points describing the verticies of the polygon
Returns:: First element is an ndarray of x/y points in latent space inside the polygon Second element is an ndarray of corresponding object ids
Return type:: Tuple

box_select_points(x_range: tuple | list, y_range: tuple | list) → tuple[numpy.typing.ArrayLike, numpy.typing.ArrayLike, numpy.typing.ArrayLike][source]#

Return the points and IDs for a box in the latent space

Parameters:

x_range (tuple or list) – min and max x values
y_range (tuple or list) – min and max y values

Returns:

First element is an ndarray of x/y points in latent space inside the box Second element is an ndarray of corresponding object ids

Return type:

Tuple

box_select_indexes(x_range: tuple | list, y_range: tuple | list)[source]#

Return the indexes inside of a particular box in the latent space

Parameters:

x_range (tuple or list) – min and max x values
y_range (tuple or list) – min and max y values

Returns:

Array of data indexes where the latent space representation falls inside the given box.

Return type:

np.ndarray

selected_objects(**kwargs)[source]#

Generate the holoview table for a selected set of objects based on input from the Lasso, Tap, and SelectionXY streams.

Returns:: Table with Object ID, x, y locations of the selected objects
Return type:: hv.Table

_table_from_points()[source]#

static _bounding_box(points)[source]#

_even_aspect_bounding_box()[source]#

get_selected_df()[source]#

Retrieve a pandas DataFrame containing the currently selected points and their associated metadata.

Returns:: A DataFrame with one row per selected point and columns: [“object_id”, “x”, “y”, *additional_fields].
Return type:: pd.DataFrame

_load_images(**kwargs)[source]#

_make_image_pane(total_width: int = 500, *args, **kwargs)[source]#: Sample up to 6 of the selected object_ids, load their FITS cutouts from [general][data_dir], and render as small hv.Image thumbnails in a grid.

class VisualizeV2(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Verb to create a hexbin visualization of a 2D latent space.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'visualize_v2'#

add_parser_kwargs#

REQUIRED_DATA_GROUPS = ('visualize',)#

OPTIONAL_DATA_GROUPS = ()#

static setup_parser(parser: argparse.ArgumentParser)[source]#: CLI not implemented for this verb

run_cli(args: argparse.Namespace | None = None)[source]#: CLI not implemented for this verb

run(**kwargs)[source]#

Generate an interactive hexbin visualization of a latent space projected to 2D.

Uses HoloViews HexTiles with datashader for adaptive hexbin aggregation, box/lasso selection, a metadata table, and tabbed detail plots.

Parameters:: kwargs – Additional keyword arguments passed as HexTiles opts overrides.
Returns:: This verb instance. Use it to call restart_ui() or get_selected_df() after the UI has been displayed.
Return type:: VisualizeV2

restart_ui(**kwargs)[source]#

Rebuild and re-display the Panel UI without reloading data.

Call this after a Jupyter websocket disconnect instead of re-running the cell. The expensive data-loading step is skipped — only the widgets are rebuilt.

Parameters:: kwargs – Additional keyword arguments passed as HexTiles opts overrides.
Returns:: This verb instance. Use it to call restart_ui() or get_selected_df() after the UI has been displayed.
Return type:: VisualizeV2

_load_data()[source]#

Load dataset and build the points DataFrame.

Guards with a _data_loaded sentinel so the expensive steps only run once per verb instance. Safe to call multiple times.

_build_ui(**kwargs)[source]#: Build and display the Panel UI using data already loaded by _load_data().

get_selected_df()[source]#: Return the current selection as a DataFrame.

class Lookup(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Look up an inference result using the ID of a data member

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'lookup'#

add_parser_kwargs#

description = 'Look up an inference result using the ID of a data member.'#

static setup_parser(parser: argparse.ArgumentParser)[source]#

Set up our arguments by configuring a subparser

Parameters:: parser (ArgumentParser) – The sub-parser to configure

run_cli(args: argparse.Namespace | None = None)[source]#

Entrypoint to Lookup from the CLI.

Parameters:: args (Optional[Namespace], optional) – The parsed command line arguments

run(id: str, results_dir: pathlib.Path | str | None = None) → numpy.ndarray | None[source]#

Lookup the latent-space representation of a particular ID

Requires the relevant dataset to be configured, and for inference to have been run.

Parameters:

id (str) – The ID of the input data to look up the inference result
results_dir (str, Optional) – The directory containing the inference results.

Returns:

The output tensor of the model for the given input.

Return type:

Optional[np.ndarray]

class SaveToDatabase(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Verb to insert inference results into a vector database index for fast similarity search.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'save_to_database'#

add_parser_kwargs#

description = 'Insert inference results into vector database.'#

static setup_parser(parser: argparse.ArgumentParser)[source]#: Stub of parser setup

run_cli(args: argparse.Namespace | None = None)[source]#: Stub CLI implementation

run(input_dir: pathlib.Path | str | None = None, output_dir: pathlib.Path | str | None = None)[source]#

Insert inference results into vector database.

Parameters:

input_dir (str or Path, Optional) – The directory containing the inference results.
output_dir (str or Path, Optional) – The directory where the vector database is stored. If None, a new directory will be created. If specified, it can point to either an empty directory or a directory containing an existing vector database. If the latter, the database will be updated with the new vectors.

class Model(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Resolves the model class that is defined in the config file. This will return a reference to the model class.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'model'#

add_parser_kwargs#

description = 'Return a reference to the model class (not a new instance).'#

static setup_parser(parser)[source]#: Not implemented

run_cli()[source]#: Not implemented

run()[source]#: Fetch and return the model _class_. Does not create an instance of the model class.

class ToOnnx(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Export the model to ONNX format

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'to_onnx'#

add_parser_kwargs#

description = 'Export model to ONNX format.'#

static setup_parser(parser)[source]#: Setup parser for ONNX export verb

run_cli(args=None)[source]#: Run the ONNX export verb from the CLI

run(input_model_directory: str = None)[source]#: Export the model to ONNX format and save it to the specified path.

class Engine(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

This verb drives inference with an ONNX model in production.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'engine'#

add_parser_kwargs#

description = 'Run inference with an ONNX model.'#

static setup_parser(parser)[source]#: Setup parser for engine verb

run_cli(args=None)[source]#: CLI stub for Engine verb

run(model_directory: str = None)[source]#

Run inference with an ONNX model.

This method performs the following steps: - Read in the user config - Prepare all the datasets requested - Implement a simple strategy for reading in batches of data samples - Process the samples with any custom collate functions as well as a default collate function - Pass the collated batch to the prepare_inputs function - Send that output to the ONNX-ified model - Persist the results of inference

Parameters:: model_directory (str, optional) – Directory containing the ONNX model. If not provided, uses the config file or finds the most recent ONNX export directory.

create_ort_inputs(prepared_batch)[source]#: Create the inputs array for the ONNX model using the expected inputs from the loaded ONNX model and the type and shape of the prepared batch.

run_onnx_batch(ort_inputs)[source]#

Run the batch using our onnx runtime session

Only split out because this is when data is mutated and we need to be able to trace it.

_setup_trace(prepare_inputs_fn)[source]#

class Prepare(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Prepare Verb, Prepares a dataset and returns it

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'prepare'#

add_parser_kwargs#

static setup_parser(parser)[source]#: We don’t need any parser setup for CLI opts

run_cli(args=None)[source]#: CLI stub for Prepare verb

run()[source]#

Prepare the dataset for a given model and data loader using the verb’s configuration.

Uses self.config to construct and return the prepared dataset.

class ReduceDimensions(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Verb to reduce the dimensionality of a dataset

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'reduce_dimensions'#

add_parser_kwargs#

description = 'Reduce the dimensionality of a dataset using provided or default reduction algorithm.'#

static setup_parser(parser: argparse.ArgumentParser)[source]#: Setup parser for reduce-dimensions verb

run_cli(args: argparse.Namespace | None = None)[source]#: CLI stub for ReduceDimensions verb

Run dimensionality reduction on a dataset

This method loads the latent space representations from an inference run and applies the selected dimensionality reduction algorithm.

Algorithms that support reusable fitted models may either:

fit a new model using a sampled subset of the data, or
load an existing model if a model path is provided.

Algorithms without a separate fitting stage do not support model loading and directly transform the input data.

The full dataset is then transformed into the target lower-dimensional space, and the resulting embeddings are saved.

Parameters:

algorithm (str, Optional) – The dimensionality reduction algorithm to use. If not specified, the method will look in the config for a default algorithm.
input_dir (str or Path, Optional) – Directory containing the dataset to reduce dimensions for.
model_path (str or Path, Optional) – Path to a previously saved reducer model.

Returns:

The method does not return anything but saves the algorithm reducer representations to disk.

Return type:

None

_run(algorithm: str | None, input_dir: pathlib.Path | str | None, model_path: pathlib.Path | str | None)[source]#: See run()

class CreateSplits(config)[source]#

Bases: hyrax.verbs.verb_registry.Verb

Create and persist reproducible dataset splits.

__init__()[source]#

Overall initialization for all verbs that saves the config

cli_name = 'create_splits'#

add_parser_kwargs#

description = 'Compute and persist dataset splits for reproducible training workflows.'#

REQUIRED_DATA_GROUPS = ()#

OPTIONAL_DATA_GROUPS = ()#

static setup_parser(parser)[source]#: No additional CLI options needed.

run_cli(args=None)[source]#: CLI stub for CreateSplits verb.

run()[source]#

Compute dataset splits and write them to a results directory.

Reads the [split] and [balance] config tables to determine how to partition each data group, then persists .npz index files and a split_config.toml under a timestamped *-splits-* results directory. Subsequent verbs (train, infer, test) can point at this directory to reuse the same split without recomputing it.

Returns:: The populated dataset providers, keyed by group name.
Return type:: dict[str, DataProvider]

class Verb(config)[source]#

Bases: abc.ABC

Base class for all hyrax verbs

__init__()[source]#

Overall initialization for all verbs that saves the config

add_parser_kwargs: dict[str, str]#

REQUIRED_DATA_GROUPS: tuple[str, Ellipsis] = ()#

OPTIONAL_DATA_GROUPS: tuple[str, Ellipsis] = ()#

cli_name = 'VERB'#

description = ''#

config#

classmethod information()[source]#

Returns a string describing this verb. Includes the following: - Name of the verb - Required Data Groups - Optional Data Groups - One line description of what this verb does

If a data group is empty then it will be printed as an empty tuple.

Returns:: <name>: Data Groups: Req. (<req1>, <req2>, …), Opt. (<opt1>, <opt2>, …). <Description>
Return type:: str

validate_data_request() → None[source]#

Validate the data_request configuration for this verb’s known groups.

Reads data_request from the verb’s config and verifies that every group listed in REQUIRED_DATA_GROUPS is present. Verbs that define neither REQUIRED_DATA_GROUPS nor OPTIONAL_DATA_GROUPS skip validation entirely.

Raises:: RuntimeError – If a required group is absent from the data_request config.

all_class_verbs() → list[str][source]#: Returns all verbs that are currently registered with a class-based implementation

all_verbs() → list[str][source]#: Returns all verbs that are currently registered

fetch_verb_class(cli_name: str) → type[Verb] | None[source]#

Gives the class object for the named verb

Parameters:: cli_name (str) – The name of the verb on the command line interface
Returns:: The verb class or None if no such verb class exists.
Return type:: Optional[type[Verb]]

is_verb_class(cli_name: str) → bool[source]#

Returns true if the verb has a class based implementation

Parameters:: cli_name (str) – The name of the verb on the command line interface
Returns:: True if the verb has a class-based implementation
Return type:: bool