pipelines-docs

Read Groups

A read group (@RG) is a unique identifier that group reads together, capturing relevant information about the sample and the sequencing process and technology, utilized by various downstream bioinformatics tools.

The relevant fields in defining a read group include:

Assigning Read Groups

The original read groups from the unaligned BAM files are linked and maintained in the corresponding alignment BAM files. In-house bash code that utilizes samtools replaces SM and LB information with the correct identifiers used by the portal, as follows:

E.g., in BAM file:

@RG	ID:bcdb4058-3545-4c45-aea9-4159f1c2ca7d_dna_r10.4.1_e8.2_400bps_sup@v4.2.0	DT:2024-02-21T12:56:53.022625-06:00	DS:runid=bcdb4058-3545-4c45-aea9-4159f1c2ca7d	basecall_model=dna_r10.4.1_e8.2_400bps_sup@v4.2.0	LB:SMACUWVOKOZU.SMALI56YAYM5	PL:ONT	PM:3A	PU:PAW14872	al:unclassified SM:SMACUWVOKOZU

Source Code

All the relevant code is accessible in the GitHub repository:


Home - Overview - Alignment - Read Groups - Methylation and Tags