pipelines-docs

Transcript Collapsing

In this step, the pipeline merges redundant consensus transcripts that align to the same genomic loci. Transcripts with identical exon–intron structures are collapsed into a single representative transcript model. The output includes unique isoforms in GFF format, a FASTA sequence file, and several supporting metric files.

Both the aligned consensus reads and the original Full Length Non Chimeric (FLNC) reads are used to determine transcript structure and quantify read support.

Collapsing Consensus Transcripts

Collapse redundant transcripts into unique isoforms

isoseq collapse --do-not-collapse-extra-5exons aligned_transcripts.bam flnc.bam collapsed_isoforms.gff

Arguments:

flnc.bam: original FLNC reads used to assess transcript support by counting the number of reads mapped to each isoform.
--do-not-collapse-extra-5exons: prevents collapsing of isoforms that differ only in extra 5’ exons. This preserves transcription start site (TSS) diversity and is recommended for bulk Iso-Seq applications.

Note: In addition to the GFF, the output includes a TXT file with read-to-isoform mappings, a TXT file listing transcript support statistics (FLNC counts), and a JSON file with detailed metrics. These files are required for downstream annotation and quality control.

Implementation with IsoSeq

The pipeline uses IsoSeq version 4.2.0.

Source Code

All the relevant code can be accessed in the GitHub repository:

isoseq_collapse.sh [collapse]

Home - Overview - Clustering - Alignment - Collapsing - Classification and Filtering - Annotation