#### README #### IMPORTANT: Please note you can download subsets of data via the BioMart data mining tool. See https://www.ensembl.org/info/data/biomart/ for more information. #################### Fasta Peptide dumps #################### These files hold the protein translations of Ensembl genes. ----------- FILE NAMES ------------ The files are consistently named following this pattern: ....fa.gz : The systematic name of the species. : The assembly build name. : pep for peptide sequences * 'pep.all' - all translations resulting from Ensembl genes. * 'pep.abinitio' translations resulting from 'ab initio' gene prediction algorithms such as SNAP and GENSCAN. In general, all 'ab initio' predictions are based solely on the genomic sequence and not any other experimental evidence. Therefore, not all GENSCAN or SNAP predictions represent biologically real proteins. fa : All files in these directories represent FASTA database files gz : All files are compacted with GNU Zip for storage efficiency. EXAMPLES (Note: Not all species have 'pep.abinitio' data) for Human: Homo_sapiens.NCBI36.pep.all.fa.gz contains all annotated peptides Homo_sapiens.NCBI36.pep.abinitio.fa.gz contains all abinitio predicted peptide ------------------------------- FASTA Sequence Header Lines ------------------------------ The FASTA sequence header lines are designed to be consistent across all types of Ensembl FASTA sequences. Stable IDs for genes, transcripts, and translations are suffixed with a version if they have been generated by Ensembl (this is typical for vertebrate species, but not for non-vertebrates). All ab initio data is unversioned. General format: >TRANSLATION_ID SEQTYPE LOCATION GENE_ID TRANSCRIPT_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE Example of Ensembl Peptide header: >ENSP00000328693.1 pep chromosome:NCBI35:1:904515:910768:1 gene:ENSG00000158815.1 transcript:ENST00000328693.1 gene_biotype:protein_coding transcript_biotype:protein_coding ^ ^ ^ ^ ^ ^ ^ TRANSLATION_ID | LOCATION GENE_ID TRANSCRIPT_ID GENE_BIOTYPE TRANSCRIPT_BIOTYPE SEQTYPE