pipelines-docs

Read Annotation

In this step, the pipeline annotates individual Full Length Non Chimeric (FLNC) reads with the isoform-level classification generated in the previous step. This allows downstream analyses to trace high-confidence isoforms back to the specific supporting reads.

The annotation is performed using a custom in-house script that lifts isoform classification to the read level.

Annotation Tags

Reads are annotated in the BAM format using the following custom tags:

Tag Format Description
in:Z: string Isoform ID.
sc:Z: string Structural category. One of: full-splice_match, incomplete-splice_match, novel_in_catalog, novel_not_in_catalog, genic, antisense, fusion, intergenic, genic_intron.
gn:Z: string Associated reference gene name.
tn:Z: string Associated reference transcript name.
sb:Z: string Subcategory for additional splicing information. Values may include mono-exon, multi-exon, and intron_retention (separated by semicolons).
ct:i: int Total number of reads supporting the isoform.

Annotating FLNC Reads by Isoform Class

Annotate FLNC reads
FLNC_ImportTags.py \
  --input_flnc aligned_flnc.bam \
  --output_flnc annotated_flnc.bam \
  --read_stat read_stat.txt \
  --classification filtered_classification.txt \
  --index

Arguments:

Implementation

The annotation step is implemented using a custom Python script maintained in-house.

Source Code

All the relevant code can be accessed in the GitHub repository:


Home - Overview - Clustering - Alignment - Collapsing - Classification and Filtering - Annotation