Convert Hyrax results to Parquet#
After running inference with Hyrax, results are saved in the Lance format — a columnar format optimized for ML workloads. This notebook shows how to convert those results to Parquet files on disk.
For more information about working with results in memory, see:
Note: This notebook uses pre-computed inference results from the Getting Started notebook. Update
result_directoryin the next cell to point to your own results.
[1]:
from pathlib import Path
import lancedb
import pyarrow.parquet as pq
result_directory = Path("./example_results/getting_started_results")
Simple conversion to Parquet#
If your dataset fits into memory easily, you can convert to parquet in a few lines of code.
[2]:
lance_dir = result_directory / "lance_db"
db = lancedb.connect(str(lance_dir))
table = db.open_table("results")
# Convert to Arrow and write as Parquet
arrow_table = table.to_arrow()
pq.write_table(arrow_table, result_directory / "output.parquet")
Larger dataset batching#
If your lance dataset is too large to fit in memory, you can write it to parquet in batches.
[3]:
batch_size = 10_000
writer = None
try:
for batch in table.to_lance().to_batches(batch_size=batch_size):
if writer is None:
writer = pq.ParquetWriter(result_directory / "batched_output.parquet", batch.schema)
writer.write_batch(batch)
finally:
if writer is not None:
writer.close()