pipelines-docs

Calls Merging and Normalization

The pipeline consolidates variant calls produced by multiple variant callers into a unified representation. Raw variant calls from each algorithm are first filtered and normalized independently, and then merged into a single candidate variant set.

This step ensures that variants detected by different algorithms are represented consistently and that equivalent variants reported by multiple callers are unified into a single record.

VCF Preprocessing

Each VCF file produced by the variant callers undergoes preprocessing before merging. This step standardizes variant representation and removes redundant records.

Run preprocessing
bcftools_PASS_norm_dedup.sh \
  -i input.vcf.gz \
  -f additional.vcf.gz \
  -r reference.fasta

Arguments:

The preprocessing stage performs the following operations:

These operations are implemented using Bcftools.

Variant Merging

After preprocessing, the normalized VCF files generated by each variant caller are merged into a unified variant representation.

Run variant merging
merge_callers.py \
  -i TNhaplotyper2:tnhaplotyper2.vcf.gz \
  -i Strelka2:strelka2.vcf.gz \
  -i RUFUS:rufus.vcf.gz \
  -i longcallD:longcalld.vcf.gz \
  -s sample \
  -o merged.vcf.gz

Arguments:

The merging step consolidates variants reported by multiple callers into a single record when they share the same genomic position and allele representation (CHROM, POS, REF, ALT).

Caller Tracking

For each merged variant, the pipeline records which algorithms detected the variant.

This information is stored in the INFO field:

CALLERS=Strelka2,TNhaplotyper2

This annotation allows downstream filtering steps to evaluate support for each variant across independent algorithms.

Header Standardization

During merging, the pipeline reconstructs the VCF header to ensure consistency across callers. The process includes:

Cross-Evidence Classification

After validation, variants are annotated based on the level of cross-technology support observed across sequencing platforms.

CrossCaller

Variants are labeled CrossCaller when they are independently detected by two or more somatic callers (listed in the CALLERS INFO field).

Source Code

All the relevant code can be accessed in the GitHub repository:


Home - Overview - Short-Read Calling - Long-Read Calling - Calls Merging - Filtering - Cross-Technology Validation - Donor-Level Refinement - Confidence Designation