Convert Hyrax Results to a Pandas DataFrame

Convert Hyrax Results to a Pandas DataFrame#

After running inference with Hyrax, results are saved in the Lance format — a columnar format optimized for ML workloads.

This notebook shows two ways to load those results into a familiar Pandas DataFrame:

  • LanceDB — provides a SQL-like query interface for easy filtering and selection.

  • PyLance — provides direct, low-level access to the data on disk.

Note: Neither approach requires Hyrax to be installed. Only lancedb or lance is needed.

The example below uses results produced by the Getting Started notebook.

Setup#

Set result_directory to the path of your Hyrax output directory. Here we use the location of the saved predictions from the Getting Started notebook.

[1]:
from pathlib import Path

result_directory = Path("./example_results/getting_started_results")

Option 1: LanceDB#

Connect to the results directory using lancedb, open the "results" table, and convert it to a Pandas DataFrame.

[2]:
import lancedb

lance_dir = result_directory / "lance_db"
db = lancedb.connect(str(lance_dir))

table = db.open_table("results")
df = table.to_pandas()
df.head()
[2]:
object_id data
0 00000 [0.096435286, -2.6353374, 1.7344711, 2.0339143...
1 00001 [4.793885, 6.458918, 0.20510733, -2.3948255, -...
2 00002 [2.7748845, 3.6781337, 0.251015, -0.95724285, ...
3 00003 [3.8944254, 1.8255252, 0.85703826, -1.0406122,...
4 00004 [-2.8371797, -2.5587287, 2.6390426, 2.211744, ...

Option 2: Lance#

The lance library gives you direct access to the dataset file on disk. Note that the path points to the specific .lance dataset file inside lance_db/, rather than the lance_db/ directory itself.

[3]:
import lance

lance_dir = result_directory / "lance_db" / "results.lance"
ds = lance.dataset(lance_dir)

table = ds.to_table()
df = table.to_pandas()
df.head()
[3]:
object_id data
0 00000 [0.096435286, -2.6353374, 1.7344711, 2.0339143...
1 00001 [4.793885, 6.458918, 0.20510733, -2.3948255, -...
2 00002 [2.7748845, 3.6781337, 0.251015, -0.95724285, ...
3 00003 [3.8944254, 1.8255252, 0.85703826, -1.0406122,...
4 00004 [-2.8371797, -2.5587287, 2.6390426, 2.211744, ...

Both approaches produce a standard Pandas DataFrame, making it easy to share results and explore them without any dependency on Hyrax — only lancedb or lance is required.