pipelines-docs

Hierarchical Filtering

After call merging and normalization, candidate variants undergo hierarchical filtering to remove sequencing artifacts, alignment errors, and common germline variants.

This stage reduces false positives by combining variant annotations, genomic context filters, and population allele frequency data. Only variants passing these filters proceed to downstream validation steps.

Filtering Workflow

Filtering is applied sequentially to progressively remove low-confidence candidate variants.

Germline Variant Removal

Variants present in donor-level germline call sets are removed.

Germline variants are called using the Sentieon DNAscope Hybrid algorithm. Please refer to the relevant page for more information.

Variant Clustering Filter

Variants located within ±50 bp of another candidate variant are excluded to reduce clustering artifacts.

Problematic Genomic Regions

Variants located in genomic regions known to produce unreliable alignments are removed from the candidate set. These include:

These regions are prone to mapping ambiguity and increased sequencing error rates.

Note: Genomic regions were generated from UCSC browser.

For additional details for Mills and 1000 Genomes Project reference see corresponding documentation in the “Variant Catalogs” section.

Panel of Normals Filtering

Variants observed in the Brain Somatic Mosaicism Network (BSMN) Panel of Normals are removed.

This filter reduces recurrent technical artifacts observed across unrelated samples.

Note: For additional details, see the Brain Somatic Mosaicism Network Panel of Normals documentation in the “Variant Catalogs” section.

Population Allele Frequency Filtering

Variants with population allele frequency above the following threshold are excluded:

gnomAD v4.1 grpmax_joint_AF > 0.001

to minimize inclusion of rare germline polymorphisms.

Variant Class Separation

After filtering, variants are separated by variant class:

Source Code

All the relevant code can be accessed in the GitHub repository:


Home - Overview - Short-Read Calling - Long-Read Calling - Calls Merging - Filtering - Cross-Technology Validation - Donor-Level Refinement - Confidence Designation