hyrax.verbs#
Submodules#
Classes#
Verb to create a connection to a vector database with inference results. |
|
Umap latent space points into 2d |
|
Inference verb |
|
Train verb |
|
Test verb - evaluates a trained model on test data |
|
Verb to create a visualization |
|
Verb to create a hexbin visualization of a 2D latent space. |
|
Look up an inference result using the ID of a data member |
|
Verb to insert inference results into a vector database index for fast |
|
Resolves the model class that is defined in the config file. |
|
Export the model to ONNX format |
|
This verb drives inference with an ONNX model in production. |
|
Prepare Verb, Prepares a dataset and returns it |
|
Base class for all hyrax verbs |
Functions#
|
Returns all verbs that are currently registered with a class-based implementation |
|
Returns all verbs that are currently registered |
|
Gives the class object for the named verb |
|
Returns true if the verb has a class based implementation |
Package Contents#
- class DatabaseConnection(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbVerb to create a connection to a vector database with inference results.
Overall initialization for all verbs that saves the config
- cli_name = 'database_connection'#
- add_parser_kwargs#
- description = 'Create a connection to the vector database for interactive queries.'#
- run(database_dir: pathlib.Path | str | None = None)[source]#
Create a connection to the vector database for interactive queries.
- Parameters:
database_dir (str or Path, Optional) – The directory containing the database that will be connected to. If None, attempt to connect to the most recently created …-vector-db-… directory. If specified, it can point to either an empty directory or a directory containing an existing vector database. If the latter, the database will be updated with the new vectors.
- _get_database_type_from_config(database_dir: pathlib.Path)[source]#
Internal function that will read a config file from a directory and return the name of the vector database from it. i.e. “chromadb”, “qdrant”.
- Parameters:
database_dir (Path) – The directory containing the vector database and the config file that be used as reference.
- Returns:
The config value for [“vector_db”][“name”] in the reference config.
- Return type:
str
- class Umap(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbUmap latent space points into 2d
Overall initialization for all verbs that saves the config
- cli_name = 'umap'#
- add_parser_kwargs#
- description = 'Transforms the entire dataset into a lower-dimensional space by fitting a UMAP model.'#
- run(input_dir: pathlib.Path | str | None = None, model_path: pathlib.Path | str | None = None)[source]#
Create a umap of a particular inference run
This method loads the latent space representations from an inference run, samples a subset of data points, flattens them if necessary, and then fits a UMAP model. The fitted reducer is then used to transform the entire dataset into a lower-dimensional space.
- Parameters:
input_dir (str or Path, Optional) – The directory containing the inference results.
model_path (str or Path, Optional) – The path to a pre-existing UMAP model. If provided, the method will use this model instead of fitting a new one.
- Returns:
The method does not return anything but saves the UMAP representations to disk.
- Return type:
None
- _run(input_dir: pathlib.Path | str | None = None, model_path: pathlib.Path | str | None = None)[source]#
See run()
- _load_pickle(model_path: pathlib.Path | str)[source]#
Helper function to wrap loading a pickle file from a given path for easier testing.
- Parameters:
model_path (str or Path) – The file path to the pickle file.
- Returns:
The object loaded from the pickle file.
- Return type:
object
- _transform_batch(batch_tuple: tuple)[source]#
Private helper to transform a single batch
- Parameters:
batch_tuple (tuple()) – first element is the IDs of the batch as a numpy array second element is the inference results to transform as a numpy array with shape (batch_len, N) where N is the total number of dimensions in the inference result. Caller flattens all inference result axes for us.
- Returns:
first element is the ids of the batch as a numpy array second element is the results of running the umap transform on the input as a numpy array.
- Return type:
tuple
- static _log_memory_usage(message: str = '')[source]#
Log the current resident set size (RSS) memory usage of the current process in gigabytes.
- Parameters:
message (str, optional) – A descriptive message to include in the log output for context.
Notes
This method is intended for debugging and performance monitoring.
- class Infer(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbInference verb
Overall initialization for all verbs that saves the config
- cli_name = 'infer'#
- add_parser_kwargs#
- description = 'Run inference on a model using a dataset.'#
- REQUIRED_DATA_GROUPS = ('infer',)#
- OPTIONAL_DATA_GROUPS = ()#
- class Train(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbTrain verb
Overall initialization for all verbs that saves the config
- cli_name = 'train'#
- add_parser_kwargs#
- description = 'Train a model using provided data.'#
- REQUIRED_DATA_GROUPS = ('train',)#
- OPTIONAL_DATA_GROUPS = ('validate', 'test')#
- class Test(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbTest verb - evaluates a trained model on test data
Overall initialization for all verbs that saves the config
- cli_name = 'test'#
- add_parser_kwargs#
- description = 'Evaluate a trained model on test data.'#
- REQUIRED_DATA_GROUPS = ('test',)#
- OPTIONAL_DATA_GROUPS = ()#
- run()[source]#
Run the test process for the configured model on test data. This evaluates a trained model, saves outputs, and returns metrics.
Note: The configuration dictionary will be updated with the full path to the model weights file that is loaded into the model (config[“test”][“model_weights_file”]).
- Returns:
Dataset containing test results that can be used for further analysis
- Return type:
- class Visualize(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbVerb to create a visualization
Overall initialization for all verbs that saves the config
- cli_name = 'visualize'#
- add_parser_kwargs#
- description = 'Generate a visualization of a latent space created by a UMAP reduction.'#
- REQUIRED_DATA_GROUPS = ('infer',)#
- OPTIONAL_DATA_GROUPS = ()#
- run(input_dir: pathlib.Path | str | None = None, *, return_verb: bool = False, make_lupton_rgb_opts: dict | None = None, **kwargs)[source]#
Generate an interactive notebook visualization of a latent space that has been umapped down to 2d.
The plot contains two holoviews objects, a scatter plot of the latent space, and a table of objects which can be populated by selecting from the scatter plot.
- Parameters:
input_dir (Optional[Union[Path, str]], optional) – Directory holding the output from the ‘umap’ verb, by default None. When not provided, we use [results][inference_dir] from config. If that’s false; we the most recent umap in the current results directory.
return_verb (bool, optional) – If True, also return the underlying Visualize instance for post-hoc access to selection state. Defaults to False.
make_lupton_rgb_opts (dict, optional) – Dictionary of options to pass to astropy’s make_lupton_rgb function for RGB image creation. Default is {“stretch”: 5, “Q”: 8}. Common parameters include stretch (brightness/contrast) and Q (softening parameter for asinh transformation).
kwargs – Keyword arguments are passed through as options for the plot object as
plot_pane.opts(**plot_options). It is not recommended to override the “tools” plot option, because that will break the integration between the plot selection operations and the table.
- Returns:
Holoviews, if return_verb = True (defaul) – A Collection of Haloviews Panes
tuple of (pane, Visualize), if return_verb = True – Returns a 2-tuple with the pane and the verb instance.
- visible_points(x_range: tuple | list, y_range: tuple | list)[source]#
Generate a hv.Points object with the points inside the bounding box passed.
This is the event handler for moving or scaling the latent space plot, and is called by Holoviews.
- Parameters:
x_range (tuple or list) – min and max x values
y_range (tuple or list) – min and max y values
- Returns:
Points lying inside the bounding box passed
- Return type:
hv.Points
- update_points(**kwargs) None[source]#
This is the main UI event handler for selection tools on the plot. If you are a dynamic map in the layout of the visualizer who updates based on plot selection you MUST call this function.
This function accepts the data values from all streams and uses the differences between the current call and prior calls to differentiate between different UI events.
The self.prev_kwargs dictionary is used to store previous calls to this function, and the
_called_*helpers perform the differencing for each case.Calling this function GUARANTEES that self.points, self.points_id, and self.points_idx are up-to-date with the user’s latest selection, regardless of the order that Holoviews evaluates the DynamicMaps in.
- poly_select_points(geometry) tuple[numpy.typing.ArrayLike, numpy.typing.ArrayLike, numpy.typing.ArrayLike][source]#
Select points inside a polygon.
- Parameters:
geometry (list) – List of x/y points describing the verticies of the polygon
- Returns:
First element is an ndarray of x/y points in latent space inside the polygon Second element is an ndarray of corresponding object ids
- Return type:
Tuple
- box_select_points(x_range: tuple | list, y_range: tuple | list) tuple[numpy.typing.ArrayLike, numpy.typing.ArrayLike, numpy.typing.ArrayLike][source]#
Return the points and IDs for a box in the latent space
- Parameters:
x_range (tuple or list) – min and max x values
y_range (tuple or list) – min and max y values
- Returns:
First element is an ndarray of x/y points in latent space inside the box Second element is an ndarray of corresponding object ids
- Return type:
Tuple
- box_select_indexes(x_range: tuple | list, y_range: tuple | list)[source]#
Return the indexes inside of a particular box in the latent space
- Parameters:
x_range (tuple or list) – min and max x values
y_range (tuple or list) – min and max y values
- Returns:
Array of data indexes where the latent space representation falls inside the given box.
- Return type:
np.ndarray
- selected_objects(**kwargs)[source]#
Generate the holoview table for a selected set of objects based on input from the Lasso, Tap, and SelectionXY streams.
- Returns:
Table with Object ID, x, y locations of the selected objects
- Return type:
hv.Table
- class VisualizeV2(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbVerb to create a hexbin visualization of a 2D latent space.
Overall initialization for all verbs that saves the config
- cli_name = 'visualize_v2'#
- add_parser_kwargs#
- REQUIRED_DATA_GROUPS = ('visualize',)#
- OPTIONAL_DATA_GROUPS = ()#
- run(**kwargs)[source]#
Generate an interactive hexbin visualization of a latent space projected to 2D.
Uses HoloViews HexTiles with datashader for adaptive hexbin aggregation, box/lasso selection, a metadata table, and tabbed detail plots.
- Parameters:
kwargs – Additional keyword arguments passed as HexTiles opts overrides.
- Returns:
This verb instance. Use it to call
restart_ui()orget_selected_df()after the UI has been displayed.- Return type:
- restart_ui(**kwargs)[source]#
Rebuild and re-display the Panel UI without reloading data.
Call this after a Jupyter websocket disconnect instead of re-running the cell. The expensive data-loading step is skipped — only the widgets are rebuilt.
- Parameters:
kwargs – Additional keyword arguments passed as HexTiles opts overrides.
- Returns:
This verb instance. Use it to call
restart_ui()orget_selected_df()after the UI has been displayed.- Return type:
- _load_data()[source]#
Load dataset and build the points DataFrame.
Guards with a
_data_loadedsentinel so the expensive steps only run once per verb instance. Safe to call multiple times.
- class Lookup(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbLook up an inference result using the ID of a data member
Overall initialization for all verbs that saves the config
- cli_name = 'lookup'#
- add_parser_kwargs#
- description = 'Look up an inference result using the ID of a data member.'#
- static setup_parser(parser: argparse.ArgumentParser)[source]#
Set up our arguments by configuring a subparser
- Parameters:
parser (ArgumentParser) – The sub-parser to configure
- run_cli(args: argparse.Namespace | None = None)[source]#
Entrypoint to Lookup from the CLI.
- Parameters:
args (Optional[Namespace], optional) – The parsed command line arguments
- run(id: str, results_dir: pathlib.Path | str | None = None) numpy.ndarray | None[source]#
Lookup the latent-space representation of a particular ID
Requires the relevant dataset to be configured, and for inference to have been run.
- Parameters:
id (str) – The ID of the input data to look up the inference result
results_dir (str, Optional) – The directory containing the inference results.
- Returns:
The output tensor of the model for the given input.
- Return type:
Optional[np.ndarray]
- class SaveToDatabase(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbVerb to insert inference results into a vector database index for fast similarity search.
Overall initialization for all verbs that saves the config
- cli_name = 'save_to_database'#
- add_parser_kwargs#
- description = 'Insert inference results into vector database.'#
- run(input_dir: pathlib.Path | str | None = None, output_dir: pathlib.Path | str | None = None)[source]#
Insert inference results into vector database.
- Parameters:
input_dir (str or Path, Optional) – The directory containing the inference results.
output_dir (str or Path, Optional) – The directory where the vector database is stored. If None, a new directory will be created. If specified, it can point to either an empty directory or a directory containing an existing vector database. If the latter, the database will be updated with the new vectors.
- class Model(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbResolves the model class that is defined in the config file. This will return a reference to the model class.
Overall initialization for all verbs that saves the config
- cli_name = 'model'#
- add_parser_kwargs#
- description = 'Return a reference to the model class (not a new instance).'#
- class ToOnnx(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbExport the model to ONNX format
Overall initialization for all verbs that saves the config
- cli_name = 'to_onnx'#
- add_parser_kwargs#
- description = 'Export model to ONNX format.'#
- class Engine(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbThis verb drives inference with an ONNX model in production.
Overall initialization for all verbs that saves the config
- cli_name = 'engine'#
- add_parser_kwargs#
- description = 'Run inference with an ONNX model.'#
- run(model_directory: str = None)[source]#
Run inference with an ONNX model.
This method performs the following steps: - Read in the user config - Prepare all the datasets requested - Implement a simple strategy for reading in batches of data samples - Process the samples with any custom collate functions as well as a default collate function - Pass the collated batch to the appropriate to_tensor function - Send that output to the ONNX-ified model - Persist the results of inference
- Parameters:
model_directory (str, optional) – Directory containing the ONNX model. If not provided, uses the config file or finds the most recent ONNX export directory.
- create_ort_inputs(prepared_batch)[source]#
Create the inputs array for the ONNX model using the expected inputs from the loaded ONNX model and the type and shape of the prepared batch.
- class Prepare(config)[source]#
Bases:
hyrax.verbs.verb_registry.VerbPrepare Verb, Prepares a dataset and returns it
Overall initialization for all verbs that saves the config
- cli_name = 'prepare'#
- add_parser_kwargs#
- class Verb(config)[source]#
Bases:
abc.ABCBase class for all hyrax verbs
Overall initialization for all verbs that saves the config
- add_parser_kwargs: dict[str, str]#
- REQUIRED_DATA_GROUPS: tuple[str, Ellipsis] = ()#
- OPTIONAL_DATA_GROUPS: tuple[str, Ellipsis] = ()#
- cli_name = 'VERB'#
- description = ''#
- config#
- classmethod information()[source]#
Returns a string describing this verb. Includes the following: - Name of the verb - Required Data Groups - Optional Data Groups - One line description of what this verb does
If a data group is empty then it will be printed as an empty tuple.
- Returns:
<name>: Data Groups: Req. (<req1>, <req2>, …), Opt. (<opt1>, <opt2>, …). <Description>
- Return type:
str
- validate_data_request() None[source]#
Validate the data_request configuration for this verb’s known groups.
Reads
data_requestfrom the verb’s config and checks:All groups listed in
REQUIRED_DATA_GROUPSare present.Cross-group split_fraction constraints (sum ≤ 1.0, consistency) hold for the active groups only — groups outside
REQUIRED_DATA_GROUPS + OPTIONAL_DATA_GROUPSare ignored so that unrelated groups in a shared config do not cause false failures.
Verbs that define neither
REQUIRED_DATA_GROUPSnorOPTIONAL_DATA_GROUPSskip validation entirely.- Raises:
RuntimeError – If a required group is absent, or if cross-group split_fraction constraints are violated for the active groups.
- all_class_verbs() list[str][source]#
Returns all verbs that are currently registered with a class-based implementation