A read group (@RG
) is a unique identifier that group reads together, capturing relevant information about the sample and the sequencing process and technology, utilized by various downstream bioinformatics tools.
The relevant fields in defining a read group include:
The original read groups from the unaligned BAM files are linked and maintained in the corresponding alignment BAM files. In-house bash code that utilizes samtools replaces SM
and LB
information with the correct identifiers used by the portal, as follows:
<sample name>
<sample name>.<library>
E.g., in BAM file:
@RG ID:bcdb4058-3545-4c45-aea9-4159f1c2ca7d_dna_r10.4.1_e8.2_400bps_sup@v4.2.0 DT:2024-02-21T12:56:53.022625-06:00 DS:runid=bcdb4058-3545-4c45-aea9-4159f1c2ca7d basecall_model=dna_r10.4.1_e8.2_400bps_sup@v4.2.0 LB:SMACUWVOKOZU.SMALI56YAYM5 PL:ONT PM:3A PU:PAW14872 al:unclassified SM:SMACUWVOKOZU
All the relevant code is accessible in the GitHub repository: