The Genome Reference Consortium Human Build 38 (GRCh38), or hg38, serves as a standardized representation of DNA sequences for various alignment and analysis pipelines.
The specific version in use is GCA_000001405.15 no_alt_analysis_set, accessible for download here in the following files:
This version excludes ALT contigs and Human decoy sequences from hs38d1 (GCA_000786075.2), and includes the following sequences:
Note: The two PAR regions on chrY have been hard-masked with Ns, and the chromosome Y sequence is not identical to the GenBank sequence but shares the same coordinates. Similarly, duplicate copies of centromeric arrays and WGS on chromosomes 5, 14, 19, 21 & 22 have been hard-masked with Ns.
Note: The EBV sequence is not part of the genome assembly but is included in the analysis set for aligning reads often present in sequencing samples.
gunzip GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
.fasta
mv GCA_000001405.15_GRCh38_no_alt_analysis_set.fna GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta
faidx
index (requires samtools)samtools faidx GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta
java -jar picard.jar CreateSequenceDictionary R=GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta