Elsheeto Docs
elsheeto is a Python library for parsing NGS sample sheets from Illumina and Element Biosciences Aviti platforms. It provides type-annotated, validated models using Pydantic and supports a three-stage parsing architecture for robust data processing.
Features
Multi-platform support: Parse both Illumina v1 and Aviti sample sheets
Type safety: Full type annotations with Pydantic validation
Three-stage parsing: Raw CSV → Structured data → Platform-specific models
Robust error handling: Comprehensive validation and error reporting
Flexible column consistency: Automatic padding or strict validation modes
Easy-to-use API: Simple facade functions for common use cases
Quick Start
Install elsheeto:
pip install elsheeto
Parse an Illumina v1 sample sheet:
from elsheeto import parse_illumina_v1
# Parse from file
sheet = parse_illumina_v1("path/to/samplesheet.csv")
# Access parsed data
print(f"Experiment: {sheet.header.experiment_name}")
print(f"Samples: {len(sheet.data)}")
Parse an Aviti sample sheet:
from elsheeto import parse_aviti
# Parse from file
sheet = parse_aviti("path/to/samplesheet.csv")
# Access parsed data
print(f"Samples: {len(sheet.samples)}")
if sheet.settings:
print(f"Settings: {sheet.settings.data}")
Architecture
elsheeto uses a three-stage parsing architecture:
Stage 1 (Raw CSV): Parse the raw CSV file into sectioned data
Stage 2 (Structured): Convert sectioned data into key-value and tabular structures
Stage 3 (Platform-specific): Transform into validated platform-specific models
This architecture ensures robust parsing and allows for easy extension to new platforms.
Column Consistency Modes
elsheeto provides flexible handling of CSV files with inconsistent column counts:
WARN_AND_PAD (default): Automatically pads missing cells with empty strings and issues warnings
PAD: Silently pads missing cells without warnings
STRICT_SECTIONED: Requires consistent columns within each section (raises exceptions)
STRICT_GLOBAL: Requires the same column count across all sections (raises exceptions)
LOOSE: No consistency requirements
The default behavior changed from strict validation to warning-based padding to improve usability with real-world sample sheets that may have formatting inconsistencies.
from elsheeto import parse_illumina_v1
from elsheeto.parser.common import ColumnConsistency, ParserConfiguration
# Default: warnings for inconsistencies, automatic padding
sheet = parse_illumina_v1("samplesheet.csv")
# Silent padding (no warnings)
config = ParserConfiguration(column_consistency=ColumnConsistency.PAD)
sheet = parse_illumina_v1("samplesheet.csv", config=config)
# Strict validation (old behavior)
config = ParserConfiguration(column_consistency=ColumnConsistency.STRICT_SECTIONED)
sheet = parse_illumina_v1("samplesheet.csv", config=config)
Table of Contents
Contents
API Docs