pipelines-docs

Read Groups

As the second step, the pipeline assigns the reads to unique read groups, representing identifiers that group reads together. A read group (@RG) captures relevant information about the sample and the sequencing process and technology, utilized by various downstream bioinformatics tools.

The relevant fields in defining a read group include:

Assigning Read Groups

To assign read groups, an in-house Python script is used. It can automatically generate read groups based on Illumina read names and handle multiple read groups in the same file (e.g., reads from multiple lanes are merged into a single file).

The read groups are assigned as follows:

E.g., in BAM file:

@RG ID:SMAHT1.ST-E00127_336_HJ7YHCCXX.8  SM:SMAHT1  PL:ILLUMINA  PU:ST-E00127_336_HJ7YHCCXX.8  LB:SMAHT1.HISEQ-LIB1

Source Code

All the relevant code is accessible in the GitHub repository:


Home - Overview - Alignment - Read Groups - Duplicate Reads - Local Realignment - Base Quality Score Recalibration - Hi-C