Data Formats
Helio Additive exports data in various formats to support different workflows and use cases. This guide explains the different data formats you’ll encounter and how to work with them.
Parquet Files
Section titled “Parquet Files”What is Parquet?
Section titled “What is Parquet?”Apache Parquet is a columnar storage file format designed for efficient data storage and retrieval. Unlike traditional row-based formats (like CSV), Parquet stores data by columns, which offers several advantages for analytical workloads.
Why We Use Parquet
Section titled “Why We Use Parquet”Parquet is particularly well-suited for storing simulation and optimization data because:
- Efficient Compression: Columnar storage allows for better compression ratios, reducing file sizes significantly
- Fast Query Performance: Reading only the columns you need is much faster than reading entire rows
- Type Safety: Parquet preserves data types (integers, floats, timestamps, etc.), unlike CSV which treats everything as text
- Schema Evolution: The format supports adding new columns without breaking existing readers
- Cross-Platform: Works seamlessly with Python, R, Java, C++, and most data analysis tools
When You’ll Encounter Parquet Files
Section titled “When You’ll Encounter Parquet Files”Helio Additive exports the following data in Parquet format:
- Thermal simulation data: Temperature profiles and thermal quality indices
- Layer-by-layer analysis: Detailed metrics for each print layer
- Mesh data: 3D geometry and quality information
- Contact data: Inter-layer bonding information
Working with Parquet Files
Section titled “Working with Parquet Files”Python
Section titled “Python”The most common way to work with Parquet files is using Python with libraries like pandas or polars:
Using Pandas:
import pandas as pd
# Read a Parquet filedf = pd.read_parquet('thermal-data.parquet')
# View the first few rowsprint(df.head())
# Get column informationprint(df.info())
# Filter and analyzehot_zones = df[df['temperature'] > 250]print(f"Found {len(hot_zones)} hot zones")Using Polars (faster for large files):
import polars as pl
# Read a Parquet filedf = pl.read_parquet('thermal-data.parquet')
# Lazy evaluation for better performancedf_lazy = pl.scan_parquet('thermal-data.parquet')result = df_lazy.filter(pl.col('temperature') > 250).collect()library(arrow)
# Read a Parquet filedf <- read_parquet('thermal-data.parquet')
# View the datahead(df)Command Line Tools
Section titled “Command Line Tools”You can also use command-line tools to inspect Parquet files:
# Install parquet-toolspip install parquet-tools
# View schemaparquet-tools schema thermal-data.parquet
# View first few rowsparquet-tools head thermal-data.parquet
# Convert to CSVparquet-tools csv thermal-data.parquet > output.csvPerformance Comparison
Section titled “Performance Comparison”Here’s how Parquet compares to CSV for typical simulation data:
| Metric | CSV | Parquet |
|---|---|---|
| File Size | 500 MB | 50 MB (10x smaller) |
| Read Time | 15 seconds | 2 seconds (7x faster) |
| Column Read | Must read entire file | Read only needed columns |
| Type Safety | All strings | Native types preserved |
CSV Files
Section titled “CSV Files”For simpler use cases and better compatibility with spreadsheet software, we also provide CSV (Comma-Separated Values) exports:
- Plot data: Visualization coordinates and quality metrics
- Summary reports: High-level statistics and results
- Export flexibility: Easy to open in Excel, Google Sheets, or any text editor
When to Use CSV vs Parquet
Section titled “When to Use CSV vs Parquet”Use CSV when:
- You need to quickly view data in Excel or Google Sheets
- File sizes are small (< 10 MB)
- You’re doing simple one-time analysis
- You need human-readable data
Use Parquet when:
- Working with large datasets (> 10 MB)
- Performing repeated analysis or queries
- Building automated data pipelines
- Memory efficiency is important
- You need to preserve exact data types
JSON Files
Section titled “JSON Files”JSON (JavaScript Object Notation) is used for structured configuration and report data:
- Simulation reports: Summary statistics and metadata
- Configuration files: Settings and parameters
- API responses: Structured data from our API
JSON is human-readable and widely supported across all programming languages.
import json
# Read a JSON reportwith open('report.json', 'r') as f: report = json.load(f)
print(f"Simulation status: {report['status']}")print(f"Quality score: {report['quality_score']}")Choosing the Right Format
Section titled “Choosing the Right Format”| Use Case | Recommended Format |
|---|---|
| Large-scale data analysis | Parquet |
| Quick inspection in spreadsheet | CSV |
| Configuration and metadata | JSON |
| 3D visualization | CSV (plot data) |
| Data warehousing | Parquet |
| Sharing with non-technical users | CSV |
Getting Help
Section titled “Getting Help”If you need assistance working with any of these data formats, please:
- Check our Visualizing Data guide for code examples
- Review the Schemas documentation for data structure details
- Contact support at
support@helioadditive.com