pipelines-docs

Germline Variant Calling

After generating analysis-ready alignment files, the pipeline performs germline variant calling. This step identifies single nucleotide variants (SNVs), small insertions and deletions (indels), structural variants (SVs), and copy number variants (CNVs) present in the sample relative to the reference genome.

The pipeline uses the Sentieon DNAscope Hybrid algorithm to call variants from combined short-read and long-read data. This approach leverages the high base accuracy of short reads and the improved mapping and structural resolution provided by long reads.

Variant Calling with DNAscope Hybrid

DNAscope Hybrid performs germline variant calling using aligned short-read and long-read data. The algorithm integrates information from both sequencing technologies to improve variant detection accuracy across different genomic contexts.

Run DNAscope Hybrid variant calling
sentieon-cli dnascope-hybrid --rgsm sample_name \
                             -r reference.fasta \
                             --sr_aln short_reads.cram [short_reads.cram ...] \
                             --lr_aln long_reads.cram [long_reads.cram ...] \
                             -m hybrid_model \
                             -d known_sites_SNP.vcf \
                             output.vcf.gz

Arguments:

DNAscope Hybrid uses machine learning models trained on curated datasets to improve the accuracy of variant detection compared to traditional heuristic-based callers. For additional details and to download the model file (Illumina PacBio whole genome), please refer to the developer repository here.

Variant calls with MLrejected FILTER status failed the DNAModelApply step and are included only for high sensitivity purposes. These should be removed for general analyses!

Implementation with Sentieon

The pipeline implementation uses the Sentieon DNAscope Hybrid algorithm, which is part of the Sentieon DNASeq toolkit.

The pipeline uses Sentieon version 202503.01 and Sentieon-CLI version 1.5.0.

Source Code

All the relevant code can be accessed in the GitHub repository:


Home - DNAscope Hybrid - Phased Germline