pipelines-docs

Single-Nucleotide Variant Calling

The SMaHT SNV pipeline detects somatic single-nucleotide variants (SNVs) across multiple sequencing technologies. The pipeline integrates four somatic SNV callers: three short-read-based callers (Strelka2, Mutect2, RUFUS), and one long-read-based caller (longcallD).

Raw calls generated by the individual tools are merged and then processed through hierarchical filtering and cross-evidence validation to produce high-confidence SNV calls. The pipeline is designed for per-tissue sample execution while leveraging donor-level information to validate and refine candidate variants.

Key Pipeline Steps

  1. Short-Read Variant Calling: Detection of candidate SNVs using Strelka2, Mutect2, and RUFUS from short-read sequencing data.
  2. Long-Read Variant Calling: Detection of candidate SNVs using longcallD from long-read sequencing data.
  3. Call Merging and Normalization: Consolidation and normalization of raw calls from all variant callers into a unified representation.
  4. Hierarchical Filtering: Removal of low-confidence calls using variant annotations, genomic context filters, and population allele frequency data.
  5. Cross-Technology Validation: Integration of evidence across sequencing technologies and tissues to validate SNV candidates.
  6. Donor-Level Refinement: Evaluation of candidate variants across samples from the same donor to distinguish somatic events from germline variation and sequencing artifacts.

Data

All sequencing libraries generated by multiple Genome Characterization Centers (GCCs) for each sample are merged prior to analysis and provided as high-depth input (~300X short-read coverage) to the variant callers.

PacBio HiFi data is used for long-read variant calling when available.


Home - Overview - Short-Read Calling - Long-Read Calling - Calls Merging - Filtering - Cross-Technology Validation - Donor-Level Refinement - Confidence Designation