In this step, the pipeline annotates individual Full Length Non Chimeric (FLNC) reads with the isoform-level classification generated in the previous step. This allows downstream analyses to trace high-confidence isoforms back to the specific supporting reads.
The annotation is performed using a custom in-house script that lifts isoform classification to the read level.
Reads are annotated in the BAM format using the following custom tags:
Tag | Format | Description |
---|---|---|
in:Z: |
string | Isoform ID. |
sc:Z: |
string | Structural category. One of: full-splice_match , incomplete-splice_match , novel_in_catalog , novel_not_in_catalog , genic , antisense , fusion , intergenic , genic_intron . |
gn:Z: |
string | Associated reference gene name. |
tn:Z: |
string | Associated reference transcript name. |
sb:Z: |
string | Subcategory for additional splicing information. Values may include mono-exon , multi-exon , and intron_retention (separated by semicolons). |
ct:i: |
int | Total number of reads supporting the isoform. |
FLNC_ImportTags.py \
--input_flnc aligned_flnc.bam \
--output_flnc annotated_flnc.bam \
--read_stat read_stat.txt \
--classification filtered_classification.txt \
--index
Arguments:
The annotation step is implemented using a custom Python script maintained in-house.
All the relevant code can be accessed in the GitHub repository: