hyrax.verbs.reduction_algorithms
================================

.. py:module:: hyrax.verbs.reduction_algorithms


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/hyrax/verbs/reduction_algorithms/algorithm_registry/index
   /autoapi/hyrax/verbs/reduction_algorithms/pca/index
   /autoapi/hyrax/verbs/reduction_algorithms/tsne/index
   /autoapi/hyrax/verbs/reduction_algorithms/umap/index


Classes
-------

.. autoapisummary::

   hyrax.verbs.reduction_algorithms.ReductionAlgorithm
   hyrax.verbs.reduction_algorithms.UMAP
   hyrax.verbs.reduction_algorithms.PCA
   hyrax.verbs.reduction_algorithms.TSNE


Package Contents
----------------

.. py:class:: ReductionAlgorithm(config: dict, reduction_results: ResultDatasetWriter | None = None)

   Abstract base class for all reduction algorithms.


   .. py:attribute:: _config


   .. py:attribute:: _reduction_results
      :value: None


   .. py:attribute:: reducer
      :value: None


   .. py:property:: config

      Return the configuration dictionary for this reduction algorithm.


   .. py:property:: reduction_results

      Return the result dataset writer for this reduction algorithm.


   .. py:method:: __init_subclass__()
      :classmethod:


   .. py:method:: fit(data_sample: numpy.ndarray)

      Fit the reduction algorithm to the data.
      Set the internal state of the reducer based on the provided data sample.

      :param data_sample: The data sample used to fit the model.
      :type data_sample: numpy.ndarray


   .. py:method:: transform(args: dict, num_batches: int)
      :abstractmethod:


      Transform the data with a fitted reducer.

      :param args: A dictionary containing the data to be transformed.
      :type args: dict
      :param num_batches: The total number of batches that the data is split into for transformation.
      :type num_batches: int


   .. py:method:: save_model(model_path: Union[pathlib.Path, str] | None = None)

      Save the reducer model to a picklefile.

      :param model_path: The path to save the model to.
      :type model_path: Path or str


   .. py:method:: load_model(expected_input_dim: int, model_path: Union[pathlib.Path, str] | None = None)

      Load the reducer model from a file.

      :param expected_input_dim: The expected number of input features for the loaded model.
      :type expected_input_dim: int
      :param model_path: The path to the file to load the model from.
      :type model_path: Path or str, optional

      :returns: The reduction algorithm instance with the loaded model.
      :rtype: ReductionAlgorithm


   .. py:method:: _load_pickle(model_path: Union[pathlib.Path, str])

      Helper function to wrap loading a pickle file from a given path for easier testing.

      :param model_path: The file path to the pickle file.
      :type model_path: str or Path

      :returns: The object loaded from the pickle file.
      :rtype: object


   .. py:method:: _transform_batch(batch_tuple: tuple)

      Private helper to transform a single batch with fitted reducer.

      :param batch_tuple: first element is the IDs of the batch as a numpy array
                          second element is the inference results to transform as a numpy array with shape (batch_len, N)
                          where N is the total number of dimensions in the inference result. Caller flattens all inference
                          result axes for us.
      :type batch_tuple: tuple()

      :returns: first element is the ids of the batch as a numpy array
                second element is the results of running the transform on the input as a numpy array.
      :rtype: tuple


   .. py:method:: _log_memory_usage(message: str = '')
      :staticmethod:


      Log the current resident set size (RSS) memory usage of the current process in gigabytes.

      :param message: A descriptive message to include in the log output for context.
      :type message: str, optional

      .. rubric:: Notes

      This method is intended for debugging and performance monitoring.


.. py:class:: UMAP(config: dict, reduction_results=None)

   Bases: :py:obj:`hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm`


   UMAP reduction implementation.


   .. py:attribute:: reducer


   .. py:method:: save_model(results_dir: pathlib.Path)

      Save the fitted UMAP model to a pickle file.

      :param results_dir: The directory where the model should be saved.
                          The model will be saved as 'umap.pickle' in this directory.
      :type results_dir: Path


   .. py:method:: load_model(expected_input_dim: int, model_path: Union[pathlib.Path, str] | None = None)

      Load a pre-existing UMAP model from disk.

      :param expected_input_dim: The expected number of input features for the loaded model.
      :type expected_input_dim: int
      :param model_path: The path to the file to load the model from.
                         If not specified, method will look in the config for a default model path.
      :type model_path: Path or str, optional


   .. py:method:: _validate_umap_model(reducer, expected_input_dim: int) -> None

      Validate the loaded UMAP model.
      Checks that the loaded object is a UMAP instance and that its
      input and output dimensions match the expected values.

      :param reducer: The loaded model object to validate.
      :type reducer: object
      :param expected_input_dim: The expected number of input features for the loaded model.
      :type expected_input_dim: int

      :raises ValueError: If the loaded model is not a UMAP instance or if its input/output dimensions are incompatible.


   .. py:method:: fit(data_sample: numpy.ndarray)

      Fit the UMAP model to a sample of inference data. The fitted model is stored in
      the instance variable `self.reducer` and can be used for transforming data.

      :param data_sample: The data sample used to fit the model.
      :type data_sample: numpy.ndarray


   .. py:method:: transform(args: dict, num_batches: int)

      Transform data with a fitted UMAP model. Use parallel processing if specified in the config.

      :param args: A dictionary containing the data to be transformed.
      :type args: dict
      :param num_batches: The total number of batches that the data is split into for transformation.
      :type num_batches: int


.. py:class:: PCA(config: dict, reduction_results=None)

   Bases: :py:obj:`hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm`


   PCA reduction implementation.


   .. py:attribute:: reducer


   .. py:method:: save_model(results_dir: pathlib.Path)

      Save the fitted PCA model to a pickle file.

      :param results_dir: The directory where the model should be saved.
                          The model will be saved as 'pca.pickle' in this directory.
      :type results_dir: Path


   .. py:method:: load_model(expected_input_dim: int, model_path: Union[pathlib.Path, str] | None = None)

      Load a pre-existing PCA model from disk.

      :param expected_input_dim: The expected number of input features for the loaded model.
      :type expected_input_dim: int
      :param model_path: The path to the file to load the model from.
                         If not specified, method will look in the config for a default model path.
      :type model_path: Path or str, optional


   .. py:method:: _validate_pca_model(reducer, expected_input_dim: int) -> None

      Validate the loaded PCA model.
      Checks that the loaded object is a PCA instance and that its
      input and output dimensions match the expected values.

      :param reducer: The loaded model object to validate.
      :type reducer: object
      :param expected_input_dim: The expected number of input features for the loaded model.
      :type expected_input_dim: int

      :raises ValueError: If the loaded model is not a PCA instance or if its input/output dimensions are incompatible.


   .. py:method:: fit(data_sample: numpy.ndarray)

      Fit the PCA model to a sample of inference data. The fitted model is stored in
      the instance variable `self.reducer` and can be used for transforming data.

      :param data_sample: The data sample used to fit the model.
      :type data_sample: numpy.ndarray


   .. py:method:: transform(args: dict, num_batches: int)

      Transform the data with the fitted PCA model. Use parallel processing if specified in the config.

      :param args: A dictionary containing the data to be transformed.
      :type args: dict
      :param num_batches: The total number of batches that the data is split into for transformation.
      :type num_batches: int


.. py:class:: TSNE(config: dict, reduction_results=None)

   Bases: :py:obj:`hyrax.verbs.reduction_algorithms.algorithm_registry.ReductionAlgorithm`


   TSNE reduction implementation.


   .. py:attribute:: reducer


   .. py:method:: save_model(_)

      TSNE does not support saving the model. This method is a no-op.


   .. py:method:: load_model(_)

      TSNE does not support loading a pre-existing model. This method is a no-op.


   .. py:method:: fit(_)

      TSNE does not support a separate fitting stage. This method is a no-op.


   .. py:method:: transform(args: dict, num_batches: int)

      Fit and transform data with TSNE model.

      :param args: A dictionary containing the data to be transformed.
      :type args: dict
      :param num_batches: The total number of batches that the data is split into for transformation.
      :type num_batches: int


   .. py:method:: _fit_transform_batch(batch_tuple: tuple)

      Private helper to fit_transform a single batch

      :param batch_tuple: first element is the IDs of the batch as a numpy array
                          second element is the inference results to transform as a numpy array with shape (batch_len, N)
                          where N is the total number of dimensions in the inference result. Caller flattens all inference
                          result axes for us.
      :type batch_tuple: tuple()

      :returns: first element is the ids of the batch as a numpy array
                second element is the results of running the tsne transform on the input as a numpy array.
      :rtype: tuple