Elsheeto Docs

elsheeto is a Python library for parsing NGS sample sheets from Illumina and Element Biosciences Aviti platforms. It provides type-annotated, validated models using Pydantic and supports a three-stage parsing architecture for robust data processing.

Features

Multi-platform support: Parse both Illumina v1 and Aviti sample sheets
Type safety: Full type annotations with Pydantic validation
Three-stage parsing: Raw CSV → Structured data → Platform-specific models
Robust error handling: Comprehensive validation and error reporting
Flexible column consistency: Automatic padding or strict validation modes
Easy-to-use API: Simple facade functions for common use cases

Quick Start

Install elsheeto:

pip install elsheeto

Parse an Illumina v1 sample sheet:

from elsheeto import parse_illumina_v1

# Parse from file
sheet = parse_illumina_v1("path/to/samplesheet.csv")

# Access parsed data
print(f"Experiment: {sheet.header.experiment_name}")
print(f"Samples: {len(sheet.data)}")

Parse an Aviti sample sheet:

from elsheeto import parse_aviti

# Parse from file
sheet = parse_aviti("path/to/samplesheet.csv")

# Access parsed data
print(f"Samples: {len(sheet.samples)}")
if sheet.settings:
    print(f"Settings: {sheet.settings.data}")

Architecture

elsheeto uses a three-stage parsing architecture:

Stage 1 (Raw CSV): Parse the raw CSV file into sectioned data
Stage 2 (Structured): Convert sectioned data into key-value and tabular structures
Stage 3 (Platform-specific): Transform into validated platform-specific models

This architecture ensures robust parsing and allows for easy extension to new platforms.

Column Consistency Modes

elsheeto provides flexible handling of CSV files with inconsistent column counts:

WARN_AND_PAD (default): Automatically pads missing cells with empty strings and issues warnings
PAD: Silently pads missing cells without warnings
STRICT_SECTIONED: Requires consistent columns within each section (raises exceptions)
STRICT_GLOBAL: Requires the same column count across all sections (raises exceptions)
LOOSE: No consistency requirements

The default behavior changed from strict validation to warning-based padding to improve usability with real-world sample sheets that may have formatting inconsistencies.

from elsheeto import parse_illumina_v1
from elsheeto.parser.common import ColumnConsistency, ParserConfiguration

# Default: warnings for inconsistencies, automatic padding
sheet = parse_illumina_v1("samplesheet.csv")

# Silent padding (no warnings)
config = ParserConfiguration(column_consistency=ColumnConsistency.PAD)
sheet = parse_illumina_v1("samplesheet.csv", config=config)

# Strict validation (old behavior)
config = ParserConfiguration(column_consistency=ColumnConsistency.STRICT_SECTIONED)
sheet = parse_illumina_v1("samplesheet.csv", config=config)

Table of Contents

Contents

Elsheeto Docs

Examples

API Docs

API Reference