Data Formats

Helio Additive exports data in various formats to support different workflows and use cases. This guide explains the different data formats you’ll encounter and how to work with them.

Parquet Files

What is Parquet?

Apache Parquet is a columnar storage file format designed for efficient data storage and retrieval. Unlike traditional row-based formats (like CSV), Parquet stores data by columns, which offers several advantages for analytical workloads.

Why We Use Parquet

Parquet is particularly well-suited for storing simulation and optimization data because:

Efficient Compression: Columnar storage allows for better compression ratios, reducing file sizes significantly
Fast Query Performance: Reading only the columns you need is much faster than reading entire rows
Type Safety: Parquet preserves data types (integers, floats, timestamps, etc.), unlike CSV which treats everything as text
Schema Evolution: The format supports adding new columns without breaking existing readers
Cross-Platform: Works seamlessly with Python, R, Java, C++, and most data analysis tools

When You’ll Encounter Parquet Files

Helio Additive exports the following data in Parquet format:

Thermal simulation data: Temperature profiles and thermal quality indices
Layer-by-layer analysis: Detailed metrics for each print layer
Mesh data: 3D geometry and quality information
Contact data: Inter-layer bonding information

Working with Parquet Files

Python

The most common way to work with Parquet files is using Python with libraries like pandas or polars:

Using Pandas:

import pandas as pd

# Read a Parquet file
df = pd.read_parquet('thermal-data.parquet')

# View the first few rows
print(df.head())

# Get column information
print(df.info())

# Filter and analyze
hot_zones = df[df['temperature'] > 250]
print(f"Found {len(hot_zones)} hot zones")

Using Polars (faster for large files):

import polars as pl

# Read a Parquet file
df = pl.read_parquet('thermal-data.parquet')

# Lazy evaluation for better performance
df_lazy = pl.scan_parquet('thermal-data.parquet')
result = df_lazy.filter(pl.col('temperature') > 250).collect()

R

library(arrow)

# Read a Parquet file
df <- read_parquet('thermal-data.parquet')

# View the data
head(df)

Command Line Tools

You can also use command-line tools to inspect Parquet files:

# Install parquet-tools
pip install parquet-tools

# View schema
parquet-tools schema thermal-data.parquet

# View first few rows
parquet-tools head thermal-data.parquet

# Convert to CSV
parquet-tools csv thermal-data.parquet > output.csv

Performance Comparison

Here’s how Parquet compares to CSV for typical simulation data:

Metric	CSV	Parquet
File Size	500 MB	50 MB (10x smaller)
Read Time	15 seconds	2 seconds (7x faster)
Column Read	Must read entire file	Read only needed columns
Type Safety	All strings	Native types preserved

CSV Files

For simpler use cases and better compatibility with spreadsheet software, we also provide CSV (Comma-Separated Values) exports:

Plot data: Visualization coordinates and quality metrics
Summary reports: High-level statistics and results
Export flexibility: Easy to open in Excel, Google Sheets, or any text editor

When to Use CSV vs Parquet

Use CSV when:

You need to quickly view data in Excel or Google Sheets
File sizes are small (< 10 MB)
You’re doing simple one-time analysis
You need human-readable data

Use Parquet when:

Working with large datasets (> 10 MB)
Performing repeated analysis or queries
Building automated data pipelines
Memory efficiency is important
You need to preserve exact data types

JSON Files

JSON (JavaScript Object Notation) is used for structured configuration and report data:

Simulation reports: Summary statistics and metadata
Configuration files: Settings and parameters
API responses: Structured data from our API

JSON is human-readable and widely supported across all programming languages.

import json

# Read a JSON report
with open('report.json', 'r') as f:
    report = json.load(f)

print(f"Simulation status: {report['status']}")
print(f"Quality score: {report['quality_score']}")

Choosing the Right Format

Use Case	Recommended Format
Large-scale data analysis	Parquet
Quick inspection in spreadsheet	CSV
Configuration and metadata	JSON
3D visualization	CSV (plot data)
Data warehousing	Parquet
Sharing with non-technical users	CSV

Getting Help

If you need assistance working with any of these data formats, please:

Check our Visualizing Data guide for code examples
Review the Schemas documentation for data structure details
Contact support at support@helioadditive.com