pipelines-docs

RSEM Reference

The RSEM Reference is generated from the standard Genome Reference Consortium Human Build 38 (GRCh38) released by the Broad Institute, as described in GTEx analysis pipeline.

The RSEM Reference uses GENCODE comprehensive gene annotations. For more detailed information please refer to the GENCODE documentation under “Genome Annotations” section.

Downloading and Preparing the Genome Reference

Download the reference genome
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta
ALT, HLA, and decoy contigs are excluded from the reference genome FASTA using the following Python code
with open('Homo_sapiens_assembly38.fasta', 'r') as fasta:
    contigs = fasta.read()
contigs = contigs.split('>')
contig_ids = [i.split(' ', 1)[0] for i in contigs]

# exclude ALT, HLA and decoy contigs
filtered_fasta = '>'.join([c for i,c in zip(contig_ids, contigs)
    if not (i[-4:]=='_alt' or i[:3]=='HLA' or i[-6:]=='_decoy')])

with open('Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta', 'w') as fasta:
    fasta.write(filtered_fasta)
Generate FASTA indexes
samtools faidx Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta

java -jar picard.jar \
    CreateSequenceDictionary \
    R=Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta \
    O=Homo_sapiens_assembly38_noALT_noHLA_noDecoy.dict

Generating RSEM Reference

Generate RSEM reference
rsem-prepare-reference \
    --gtf gencode.annotation.gtf \
    Homo_sapiens_assembly38_noALT_noHLA_noDecoy.fasta \
    rsem_reference

Implementation with RSEM

The current reference was generated using RSEM version v1.3.3.


Home - BWT Index - STAR Index - RSEM Reference