What is Hyrax?
Hyrax is an extensible GPU-enabled framework that provides infrastructure for the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive 2D/3D latent-space exploration for unsupervised discovery.
Why Hyrax?#
With current and upcoming large astronomical surveys producing data at unprecedented scale, the limiting factor for ML-driven discovery is increasingly not the data itself, but the infrastructure required to work with it. Astronomers routinely spend a significant amount of their time on data wrangling, configuration management, and bespoke pipeline engineering — effort that comes directly at the expense of science; and is often not reusable by other research groups/teams resulting in duplicated effort.
Hyrax lets users focus on writing their ML model code (center); while it provides astronomy-aware infrastructure to handle everything else shown on this diagram. Figure from Ghosh, Oldag & Tauraso et al.#
The Hyrax Workflow#
Hyrax is built around a small set of verbs that cover the main stages of an astronomy ML workflow, from data access and training to inference, similarity search, and interactive exploration.
A typical Hyrax workflow. Retrieved or user-provided data are organized into astronomy-aware datasets, then passed through training and inference. For unsupervised workflows, Hyrax also supports vector-database search and interactive latent-space visualization.#
from hyrax import Hyrax
# Load a runtime configuration that defines the dataset, model, outputs, etc.
h = Hyrax(config_file="path/to/runtime_config.toml")
h.download() # Retrieve cutouts from LSST, HSC, or other surveys
h.train() # Train any PyTorch model with automatic logging & multi-GPU support
h.infer() # Run inference and store results
h.save_to_database() # Index embeddings in a vector database
h.umap() # Reduce latent vectors to 2D/3D with UMAP
h.visualize() # Interactively explore latent spaces in 2D or 3D
db = h.database_connection()
v = ... # numpy vector representing the object to search for
db.search_by_vector(v) # Find similar objects via integrated vector databases
Each step can be used on its own, or combined into an end-to-end workflow.
Science with Hyrax#
Hyrax is science-agnostic and is designed to support a wide range of astronomy workflows, from ML-based classification/regression problems to discovery-oriented latent-space exploration. It can work on images, light curves, spectra, and combinations thereof.
Below is an incomplete list of Hyrax science efforts being led by different PIs:
Rubin DP1 | HSC Unsupervised Galaxies
Multi-model representation learning project to surface mergers, low-surface-brightness galaxies, and scientifically interesting outliers without any labeled training data.
Rubin DP1 | Euclid Human in the Loop Galaxies
A hybrid workflow combining latent-space clustering and visual inspection to identify lensed arcs in cluster environments.
ZTF + Spectra Supervised Time Domain
An AppleCiDEr-based workflow to classify transients using a combination of light curves, spectra, cutout images, and metadata.
DECam Supervised Solar System
A deep-learning based algorithm to filter out false-positives in moving-object searches performed by KBMOD.
Detailed writeups for each of these applications are in preparation; and will be out soon.
First Steps#
Install Hyrax and train your first model
End-to-end workflows on real data
Deep dives to get the most our of Hyrax
Reusable recipes for common Hyrax tasks