What is Hyrax?
Hyrax is an extensible GPU-enabled framework that provides infrastructure for the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive 2D/3D latent-space exploration for unsupervised discovery.
Why Hyrax?#
With current and upcoming large astronomical surveys producing data at unprecedented scale, the limiting factor for ML-driven discovery is increasingly not the data itself, but the infrastructure required to work with it. Astronomers routinely spend a significant amount of their time on data wrangling, configuration management, and bespoke pipeline engineering — effort that comes directly at the expense of science; and is often not reusable by other research groups/teams resulting in duplicated effort.
Hyrax lets users focus on writing their ML model code (center); while it provides astronomy-aware infrastructure to handle everything else shown on this diagram. Figure from Ghosh, Oldag & Tauraso et al.#
The Hyrax Workflow#
Hyrax is built around a small set of verbs that cover the main stages of an astronomy ML workflow, from data access and training to inference, similarity search, and interactive exploration.
A typical Hyrax workflow. Retrieved or user-provided data are organized into astronomy-aware datasets, then passed through training and inference. For unsupervised workflows, Hyrax also supports vector-database search and interactive latent-space visualization.#
from hyrax import Hyrax
# Load a runtime configuration that defines the dataset, model, outputs, etc.
h = Hyrax(config_file="path/to/runtime_config.toml")
h.download() # Retrieve cutouts from LSST, HSC, or other surveys
h.train() # Train any PyTorch model with automatic logging & multi-GPU support
h.infer() # Run inference and store results
h.save_to_database() # Index embeddings in a vector database
h.umap() # Reduce latent vectors to 2D/3D with UMAP
h.visualize() # Interactively explore latent spaces in 2D or 3D
db = h.database_connection()
v = ... # numpy vector representing the object to search for
db.search_by_vector(v) # Find similar objects via integrated vector databases
Each step can be used on its own, or combined into an end-to-end workflow.
Science with Hyrax#
Hyrax is science-agnostic and is designed to support a wide range of astronomy workflows, from ML-based classification/regression problems to discovery-oriented latent-space exploration. It can work on images, light curves, spectra, and combinations thereof.
Below is an incomplete list of Hyrax science efforts being led by different PIs:
Rubin DP1 | HSC Unsupervised Galaxies
Multi-model representation learning project to surface mergers, low-surface-brightness galaxies, and scientifically interesting outliers without any labeled training data.
Rubin DP1 | Euclid Human in the Loop Galaxies
A hybrid workflow combining latent-space clustering and visual inspection to identify lensed arcs in cluster environments.
ZTF + Spectra Supervised Time Domain
An AppleCiDEr-based workflow to classify transients using a combination of light curves, spectra, cutout images, and metadata.
DECam Supervised Solar System
A deep-learning based algorithm to filter out false-positives in moving-object searches performed by KBMOD.
Detailed writeups for each of these applications are in preparation; and will be out soon.
First Steps#
Install Hyrax and train your first model
End-to-end workflows on real data
Deep dives to get the most out of Hyrax
Reusable recipes for common Hyrax tasks
Citing Hyrax#
If you use Hyrax in your research, please cite the following paper:
Ghosh, Oldag & Tauraso et al. 2026, Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid
@article{Ghosh_Oldag_Tauraso_2026,
author = {Aritra Ghosh and Drew Oldag and Michael Tauraso and Andrew J. Connolly and Peter Ferguson and Derek Jones and Gourav Khullar and Argyro Sasli and Samarth Venkatesh and Gracia Wang and Maxine West and Dylan Berry and Neven Caplar and Colin Orion Chandler and Tanawan Chatchadanoraset and Michael W. Coughlin and Melissa DeLucchi and Alexandra Junell and Diego Miura and Felipe Fontinele Nunes and Wilson Beebe and Doug Branton and Sandro Campos and Liam Cunningham and Mi Dai and Jeremy Kubica and Konstantin Malanchev and Rachel Mandelbaum and Sean McGuire and Imad Pasha and Dan S. Taranu and Tianqing Zhang},
journal = {arXiv e-prints},
title = {Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid},
eprint = {2605.18959},
archivePrefix = {arXiv},
year = {2026},
}