The GENCODE project1 provides comprehensive annotation of gene features for the human genome, including coding and non-coding genes, pseudogenes, and other significant genomic elements.
The specific version in use is GENCODE Release 47 (GRCh38.p14), which aligns with the Genome Reference Consortium Human Build 38 (GRCh38) and is accessible for download here.
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gtf.gz
python3 collapse_annotation.py \
--collapse_only gencode.v47.annotation.gtf \
gencode.v47.genes.gtf
Source code for the collapse_annotation.py
2 script is available here.
1: Frankish A, et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res., Volume 51, Issue D1, 6 January 2023, Pages D942–D949. doi: 10.1093/nar/gkac1071; 2: Original author: Francois Aguet