Hyrax

What is Hyrax?

Hyrax is an extensible GPU-enabled framework that provides infrastructure for the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive 2D/3D latent-space exploration for unsupervised discovery.


Why Hyrax?#

With current and upcoming large astronomical surveys producing data at unprecedented scale, the limiting factor for ML-driven discovery is increasingly not the data itself, but the infrastructure required to work with it. Astronomers routinely spend a significant amount of their time on data wrangling, configuration management, and bespoke pipeline engineering — effort that comes directly at the expense of science; and is often not reusable by other research groups/teams resulting in duplicated effort.

Hyrax Design Philosophy

Hyrax lets users focus on writing their ML model code (center); while it provides astronomy-aware infrastructure to handle everything else shown on this diagram. Figure from Ghosh, Oldag & Tauraso et al.#


The Hyrax Workflow#

Hyrax is built around a small set of verbs that cover the main stages of an astronomy ML workflow, from data access and training to inference, similarity search, and interactive exploration.

Hyrax ML Workflow

A typical Hyrax workflow. Retrieved or user-provided data are organized into astronomy-aware datasets, then passed through training and inference. For unsupervised workflows, Hyrax also supports vector-database search and interactive latent-space visualization.#

from hyrax import Hyrax

# Load a runtime configuration that defines the dataset, model, outputs, etc.
h = Hyrax(config_file="path/to/runtime_config.toml")

h.download()              # Retrieve cutouts from LSST, HSC, or other surveys
h.train()                 # Train any PyTorch model with automatic logging & multi-GPU support
h.infer()                 # Run inference and store results
h.save_to_database()      # Index embeddings in a vector database
h.umap()                  # Reduce latent vectors to 2D/3D with UMAP
h.visualize()             # Interactively explore latent spaces in 2D or 3D
db = h.database_connection()
v = ...                   # numpy vector representing the object to search for
db.search_by_vector(v)    # Find similar objects via integrated vector databases

Each step can be used on its own, or combined into an end-to-end workflow.


Science with Hyrax#

Hyrax is science-agnostic and is designed to support a wide range of astronomy workflows, from ML-based classification/regression problems to discovery-oriented latent-space exploration. It can work on images, light curves, spectra, and combinations thereof.

Below is an incomplete list of Hyrax science efforts being led by different PIs:

Extragalactic Unsupervised Discovery

Rubin DP1 | HSC Unsupervised Galaxies

Multi-model representation learning project to surface mergers, low-surface-brightness galaxies, and scientifically interesting outliers without any labeled training data.

Cluster-Scale Lens Searches

Rubin DP1 | Euclid Human in the Loop Galaxies

A hybrid workflow combining latent-space clustering and visual inspection to identify lensed arcs in cluster environments.

Multimodal Transient Classification

ZTF + Spectra Supervised Time Domain

An AppleCiDEr-based workflow to classify transients using a combination of light curves, spectra, cutout images, and metadata.

Asteroid Search Filtering

DECam Supervised Solar System

A deep-learning based algorithm to filter out false-positives in moving-object searches performed by KBMOD.

Detailed writeups for each of these applications are in preparation; and will be out soon.


First Steps#

Getting Started

Install Hyrax and train your first model

Getting Started
Science Examples

End-to-end workflows on real data

Science Examples
Core Concepts

Deep dives to get the most our of Hyrax

Core concepts
Common Workflows

Reusable recipes for common Hyrax tasks

Common workflows