#### README #### IMPORTANT: Please note you can download subsets of data via the BioMart data mining tool. See https://www.ensembl.org/info/data/biomart/ for more information. ################## Fasta cds dumps ################# These files hold the coding sequences corresponding to Ensembl genes. CDS does not contain UTR or intronic sequence. ------------ FILE NAMES ------------ The files are consistently named following this pattern: ....fa.gz : The systematic name of the species. : The assembly build name. : cds for CDS sequences * 'cds.all' - all transcript coding sequences resulting from Ensembl genes. EXAMPLES for Human: Homo_sapiens.NCBI37.cds.all.fa.gz cds sequences for all protein-coding transcripts ------------------------------- FASTA Sequence Header Lines ------------------------------ The FASTA sequence header lines are designed to be consistent across all types of Ensembl FASTA sequences. Stable IDs for genes and transcripts are suffixed with a version if they have been generated by Ensembl (this is typical for vertebrate species, but not for non-vertebrates). All ab initio data is unversioned. General format: >TRANSCRIPT_ID SEQTYPE LOCATION GENE_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE Example of an Ensembl CDS header: >ENST00000525148.1 cds chromosome:GRCh37:11:66188562:66193526:1 gene:ENSG00000174576.1 gene_biotype:protein_coding transcript_biotype:nonsense_mediated_decay ^ ^ ^ ^ ^ ^ TRANSCRIPT_ID | LOCATION GENE_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE SEQTYPE