pipelines-docs

polyG Artifacts Removal

Latest Illumina technologies using one/two-channel sequencing systems, such as NovaSeq, may introduce homopolymer runs of G bases (polyG) as artifacts. polyG artifacts appear when the dark base G is called after the synthesis has terminated, resulting in the erroneous calling of high-confidence G bases at the ends of affected reads. Eventually, a large number of these reads may align to reference regions with high G content (e.g., chr2:32916230-32916625), creating problems for downstream processing.

As part of FASTQ files preprocessing, raw reads generated by Illumina sequencing systems are filtered using fastp to remove read pairs containing polyG artifacts.

Removing Read Pairs Containing polyG Artifacts

Filter read pairs containing polyG artifacts
fastp
    --dont_eval_duplication
    --disable_adapter_trimming
    --disable_quality_filtering
    --trim_poly_g
    --length_required read_length
    -i reads.fastq -I mates.fastq
    -o reads.filtered.fastq -O mates.filtered.fastq

Arguments:

Implementation with fastp

The pipeline is using fastp version 0.23.2.


Home - Overview - polyG Artifacts Removal