After call merging and normalization, candidate variants undergo hierarchical filtering to remove sequencing artifacts, alignment errors, and common germline variants.
This stage reduces false positives by combining variant annotations, genomic context filters, and population allele frequency data. Only variants passing these filters proceed to downstream validation steps.
Filtering is applied sequentially to progressively remove low-confidence candidate variants.
Variants present in donor-level germline call sets are removed.
Germline variants are called using the Sentieon DNAscope Hybrid algorithm. Please refer to the relevant page for more information.
Variants located within ±50 bp of another candidate variant are excluded to reduce clustering artifacts.
Variants located in genomic regions known to produce unreliable alignments are removed from the candidate set. These include:
These regions are prone to mapping ambiguity and increased sequencing error rates.
Note: Genomic regions were generated from UCSC browser.
For additional details for Mills and 1000 Genomes Project reference see corresponding documentation in the “Variant Catalogs” section.
Variants observed in the Brain Somatic Mosaicism Network (BSMN) Panel of Normals are removed.
This filter reduces recurrent technical artifacts observed across unrelated samples.
Note: For additional details, see the Brain Somatic Mosaicism Network Panel of Normals documentation in the “Variant Catalogs” section.
Variants with population allele frequency above the following threshold are excluded:
gnomAD v4.1 grpmax_joint_AF > 0.001
to minimize inclusion of rare germline polymorphisms.
After filtering, variants are separated by variant class:
All the relevant code can be accessed in the GitHub repository: