hyrax.verbs.reduction_algorithms

hyrax.verbs.reduction_algorithms#

Submodules#

Classes#

`ReductionAlgorithm`	Abstract base class for all reduction algorithms.
`UMAP`	UMAP reduction implementation.
`PCA`	PCA reduction implementation.
`TSNE`	TSNE reduction implementation.

Package Contents#

class ReductionAlgorithm(config: dict, reduction_results: ResultDatasetWriter | None = None)[source]#

Abstract base class for all reduction algorithms.

_config#

_reduction_results = None#

reducer = None#

property config#: Return the configuration dictionary for this reduction algorithm.

property reduction_results#: Return the result dataset writer for this reduction algorithm.

classmethod __init_subclass__()[source]#

fit(data_sample: numpy.ndarray)[source]#

Fit the reduction algorithm to the data. Set the internal state of the reducer based on the provided data sample.

Parameters:: data_sample (numpy.ndarray) – The data sample used to fit the model.

abstractmethod transform(args: dict, num_batches: int)[source]#

Transform the data with a fitted reducer.

Parameters:

args (dict) – A dictionary containing the data to be transformed.
num_batches (int) – The total number of batches that the data is split into for transformation.

save_model(model_path: pathlib.Path | str | None = None)[source]#

Save the reducer model to a picklefile.

Parameters:: model_path (Path or str) – The path to save the model to.

load_model(expected_input_dim: int, model_path: pathlib.Path | str | None = None)[source]#

Load the reducer model from a file.

Parameters:

expected_input_dim (int) – The expected number of input features for the loaded model.
model_path (Path or str, optional) – The path to the file to load the model from.

Returns:

The reduction algorithm instance with the loaded model.

Return type:

ReductionAlgorithm

_load_pickle(model_path: pathlib.Path | str)[source]#

Helper function to wrap loading a pickle file from a given path for easier testing.

Parameters:: model_path (str or Path) – The file path to the pickle file.
Returns:: The object loaded from the pickle file.
Return type:: object

_transform_batch(batch_tuple: tuple)[source]#

Private helper to transform a single batch with fitted reducer.

Parameters:: batch_tuple (tuple()) – first element is the IDs of the batch as a numpy array second element is the inference results to transform as a numpy array with shape (batch_len, N) where N is the total number of dimensions in the inference result. Caller flattens all inference result axes for us.
Returns:: first element is the ids of the batch as a numpy array second element is the results of running the transform on the input as a numpy array.
Return type:: tuple

static _log_memory_usage(message: str = '')[source]#

Log the current resident set size (RSS) memory usage of the current process in gigabytes.

Parameters:: message (str, optional) – A descriptive message to include in the log output for context.

Notes

This method is intended for debugging and performance monitoring.

class UMAP(config: dict, reduction_results=None)[source]#

Bases: hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm

UMAP reduction implementation.

reducer#

save_model(results_dir: pathlib.Path)[source]#

Save the fitted UMAP model to a pickle file.

Parameters:: results_dir (Path) – The directory where the model should be saved. The model will be saved as ‘umap.pickle’ in this directory.

load_model(expected_input_dim: int, model_path: pathlib.Path | str | None = None)[source]#

Load a pre-existing UMAP model from disk.

Parameters:

expected_input_dim (int) – The expected number of input features for the loaded model.
model_path (Path or str, optional) – The path to the file to load the model from. If not specified, method will look in the config for a default model path.

_validate_umap_model(reducer, expected_input_dim: int) → None[source]#

Validate the loaded UMAP model. Checks that the loaded object is a UMAP instance and that its input and output dimensions match the expected values.

Parameters:

reducer (object) – The loaded model object to validate.
expected_input_dim (int) – The expected number of input features for the loaded model.

Raises:

ValueError – If the loaded model is not a UMAP instance or if its input/output dimensions are incompatible.

fit(data_sample: numpy.ndarray)[source]#

Fit the UMAP model to a sample of inference data. The fitted model is stored in the instance variable self.reducer and can be used for transforming data.

Parameters:: data_sample (numpy.ndarray) – The data sample used to fit the model.

transform(args: dict, num_batches: int)[source]#

Transform data with a fitted UMAP model. Use parallel processing if specified in the config.

Parameters:

args (dict) – A dictionary containing the data to be transformed.
num_batches (int) – The total number of batches that the data is split into for transformation.

class PCA(config: dict, reduction_results=None)[source]#

Bases: hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm

PCA reduction implementation.

reducer#

save_model(results_dir: pathlib.Path)[source]#

Save the fitted PCA model to a pickle file.

Parameters:: results_dir (Path) – The directory where the model should be saved. The model will be saved as ‘pca.pickle’ in this directory.

load_model(expected_input_dim: int, model_path: pathlib.Path | str | None = None)[source]#

Load a pre-existing PCA model from disk.

Parameters:

expected_input_dim (int) – The expected number of input features for the loaded model.
model_path (Path or str, optional) – The path to the file to load the model from. If not specified, method will look in the config for a default model path.

_validate_pca_model(reducer, expected_input_dim: int) → None[source]#

Validate the loaded PCA model. Checks that the loaded object is a PCA instance and that its input and output dimensions match the expected values.

Parameters:

reducer (object) – The loaded model object to validate.
expected_input_dim (int) – The expected number of input features for the loaded model.

Raises:

ValueError – If the loaded model is not a PCA instance or if its input/output dimensions are incompatible.

fit(data_sample: numpy.ndarray)[source]#

Fit the PCA model to a sample of inference data. The fitted model is stored in the instance variable self.reducer and can be used for transforming data.

Parameters:: data_sample (numpy.ndarray) – The data sample used to fit the model.

transform(args: dict, num_batches: int)[source]#

Transform the data with the fitted PCA model. Use parallel processing if specified in the config.

Parameters:

args (dict) – A dictionary containing the data to be transformed.
num_batches (int) – The total number of batches that the data is split into for transformation.

class TSNE(config: dict, reduction_results=None)[source]#

Bases: hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm

TSNE reduction implementation.

reducer#

save_model(_)[source]#: TSNE does not support saving the model. This method is a no-op.

load_model(_)[source]#: TSNE does not support loading a pre-existing model. This method is a no-op.

fit(_)[source]#: TSNE does not support a separate fitting stage. This method is a no-op.

transform(args: dict, num_batches: int)[source]#

Fit and transform data with TSNE model.

Parameters:

args (dict) – A dictionary containing the data to be transformed.
num_batches (int) – The total number of batches that the data is split into for transformation.

_fit_transform_batch(batch_tuple: tuple)[source]#

Private helper to fit_transform a single batch

Parameters:: batch_tuple (tuple()) – first element is the IDs of the batch as a numpy array second element is the inference results to transform as a numpy array with shape (batch_len, N) where N is the total number of dimensions in the inference result. Caller flattens all inference result axes for us.
Returns:: first element is the ids of the batch as a numpy array second element is the results of running the tsne transform on the input as a numpy array.
Return type:: tuple