Hyrax

What is Hyrax?

Hyrax is an extensible GPU-enabled framework that provides infrastructure for the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive 2D/3D latent-space exploration for unsupervised discovery.


Why Hyrax?#

With current and upcoming large astronomical surveys producing data at unprecedented scale, the limiting factor for ML-driven discovery is increasingly not the data itself, but the infrastructure required to work with it. Astronomers routinely spend a significant amount of their time on data wrangling, configuration management, and bespoke pipeline engineering — effort that comes directly at the expense of science; and is often not reusable by other research groups/teams resulting in duplicated effort.

Hyrax Design Philosophy

Hyrax lets users focus on writing their ML model code (center); while it provides astronomy-aware infrastructure to handle everything else shown on this diagram. Figure from Ghosh, Oldag & Tauraso et al.#


The Hyrax Workflow#

Hyrax is built around a small set of verbs that cover the main stages of an astronomy ML workflow, from data access and training to inference, similarity search, and interactive exploration.

Hyrax ML Workflow

A typical Hyrax workflow. Retrieved or user-provided data are organized into astronomy-aware datasets, then passed through training and inference. For unsupervised workflows, Hyrax also supports vector-database search and interactive latent-space visualization.#

from hyrax import Hyrax

# Load a runtime configuration that defines the dataset, model, outputs, etc.
h = Hyrax(config_file="path/to/runtime_config.toml")

h.download()              # Retrieve cutouts from LSST, HSC, or other surveys
h.train()                 # Train any PyTorch model with automatic logging & multi-GPU support
h.infer()                 # Run inference and store results
h.save_to_database()      # Index embeddings in a vector database
h.umap()                  # Reduce latent vectors to 2D/3D with UMAP
h.visualize()             # Interactively explore latent spaces in 2D or 3D
db = h.database_connection()
v = ...                   # numpy vector representing the object to search for
db.search_by_vector(v)    # Find similar objects via integrated vector databases

Each step can be used on its own, or combined into an end-to-end workflow.


Science with Hyrax#

Hyrax is science-agnostic and is designed to support a wide range of astronomy workflows, from ML-based classification/regression problems to discovery-oriented latent-space exploration. It can work on images, light curves, spectra, and combinations thereof.

Below is an incomplete list of Hyrax science efforts being led by different PIs:

Extragalactic Unsupervised Discovery

Rubin DP1 | HSC Unsupervised Galaxies

Multi-model representation learning project to surface mergers, low-surface-brightness galaxies, and scientifically interesting outliers without any labeled training data.

Cluster-Scale Lens Searches

Rubin DP1 | Euclid Human in the Loop Galaxies

A hybrid workflow combining latent-space clustering and visual inspection to identify lensed arcs in cluster environments.

Multimodal Transient Classification

ZTF + Spectra Supervised Time Domain

An AppleCiDEr-based workflow to classify transients using a combination of light curves, spectra, cutout images, and metadata.

Asteroid Search Filtering

DECam Supervised Solar System

A deep-learning based algorithm to filter out false-positives in moving-object searches performed by KBMOD.

Detailed writeups for each of these applications are in preparation; and will be out soon.


First Steps#

Getting Started

Install Hyrax and train your first model

Getting Started
Science Examples

End-to-end workflows on real data

Science Examples
Core Concepts

Deep dives to get the most out of Hyrax

Core concepts
Common Workflows

Reusable recipes for common Hyrax tasks

Common workflows

Citing Hyrax#

If you use Hyrax in your research, please cite the following paper:

Ghosh, Oldag & Tauraso et al. 2026, Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid

@article{Ghosh_Oldag_Tauraso_2026,
author = {Aritra Ghosh and Drew Oldag and Michael Tauraso and Andrew J. Connolly and Peter Ferguson and Derek Jones and Gourav Khullar and Argyro Sasli and Samarth Venkatesh and Gracia Wang and Maxine West and Dylan Berry and Neven Caplar and Colin Orion Chandler and Tanawan Chatchadanoraset and Michael W. Coughlin and Melissa DeLucchi and Alexandra Junell and Diego Miura and Felipe Fontinele Nunes and Wilson Beebe and Doug Branton and Sandro Campos and Liam Cunningham and Mi Dai and Jeremy Kubica and Konstantin Malanchev and Rachel Mandelbaum and Sean McGuire and Imad Pasha and Dan S. Taranu and Tianqing Zhang},
journal = {arXiv e-prints},
title = {Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid},
eprint = {2605.18959},
archivePrefix = {arXiv},
year = {2026},
}