The pipeline uses STAR in paired-end mode for alignment to both the reference genome and the transcriptome. It accepts and processes multiple sets of paired FASTQ files in a single run, producing a unified output. The pipeline generates alignments to the reference genome, with reads sorted by genomic coordinates, and to the transcriptome. An integrity check is performed on the resulting BAM files.
sentieon STAR \
--readFilesIn reads1.fastq,reads2.fastq mates1.fastq,mates2.fastq \
--genomeDir star/index/path \
--readFilesCommand zcat \
--twopassMode Basic \
--twopass1readsN -1 \
--outFilterMultimapNmax 20 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.1 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--outFilterType BySJout \
--outFilterScoreMinOverLread 0.33 \
--outFilterMatchNmin 0 \
--outFilterMatchNminOverLread 0.33 \
--limitSjdbInsertNsj 1200000 \
--outSAMstrandField intronMotif \
--outFilterIntronMotifs None \
--alignSoftClipAtReferenceEnds Yes \
--quantMode TranscriptomeSAM GeneCounts \
--outSAMtype BAM Unsorted \
--outStd BAM_Unsorted \
--outBAMcompression 0 \
--outSAMunmapped Within \
--genomeLoad NoSharedMemory \
--chimSegmentMin 15 \
--chimJunctionOverhangMin 15 \
--chimOutType Junctions WithinBAM SoftClip \
--chimMainSegmentMultNmax 1 \
--chimOutJunctionFormat 0 \
--outSAMattributes NH HI AS nM NM ch \
--outFileNamePrefix star_out/OUT. \
| samtools sort --no-PG -o sorted.bam -
Note: The STAR index was generated using a slightly modified version of the reference genome that excludes ALT, HLA and Decoy contigs. Please refer to the reference files section for further details.
To confirm the integrity of the alignment BAM file, in-house Python code checks for the presence of the 28-byte empty block representing the EOF marker in BAM format.
Sentieon implementation replicates the original STAR code. The pipeline is using Sentieon version 202308.01, corresponding to STAR version 2.7.10b.
All the relevant code can be accessed in the GitHub repository: