815 Protein features that represent the mapping between Ensembl proteins (ENSP) and AlphaFoldDB protein structures (including their corresponding chains). Imported via UniProt mappings AFDB-ENSP mappings 1 \N 21 Alignments of EST clusters (Unigenes, Gene Indices) from Arabidopsis EST Cluster (Arabidopsis) 1 {"type": "cdna"} 299 Gene annotation by ARAPORT through a process of automatic and manual curation. Genes 1 {"label_key": "[biotype]", "caption": "Genes", "colour_key": "[biotype]", "default": {"cytoview": "gene_label", "MultiBottom": "collapsed_label", "alignsliceviewbottom": "as_collapsed_label", "MultiTop": "gene_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label"}, "name": "Genes", "key": "ensembl"} 798 Conserved Domain Database models. CDD 1 {"type": "domain"} 542 Covariance models from Rfam (release 12.2), aligned to the genome with 'cmscan' from the Infernal suite of programs. Models are restricted to those observed in species that share a last common ancestor (LCA). Rfam Models (LCA) 1 {"type": "rna"} 84 Density of coding genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Coding genes (density) 1 \N 22 Alignmments of EST clusters (Unigenes, Gene Indices) from all dicot species EST Cluster (Dicot) 1 {"type": "cdna"} 737 Dust is a program that identifies low-complexity sequences (regions of the genome with a biased distribution of nucleotides, such as a repeat). The Dust module is widely used with BLAST to prevent 'sticky' regions from determining false hits. Low complexity (Dust) 1 \N 133 External links from plants collaborators External Plant Links 1 \N 16 Ab initio prediction of protein coding genes, based on the genomic sequence alone AA Salamov et al., Genome Res. 2000 4:516-22. FGENESH prediction 1 {"caption": "FGENESH prediction", "colour_key": "[biotype]", "default": {"MultiBottom": "collapsed_label", "MultiTop": "gene_label", "alignsliceviewbottom": "as_collapsed_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label", "cytoview": "gene_label"}, "key": "fgenesh", "label_key": "[biotype]", "multi_name": "FGENESH prediction", "name": "FGENESH prediction"} 329 Gene Annotation File (GAF) loader to import GO annotations. GAF loader 1 \N 811 CATH/Gene3D families. Gene3D 1 {"type": "domain"} 818 \N GOA annotation 0 \N 787 GO term derived transitively from a UniProt record UniProt-derived GO term 0 \N 298 The Gene Ontology XRef projection pipeline GO projected xrefs 0 \N 741 Gene encoding an enzyme annotated at the Plant Reactome. Plant Reactome 1 \N 794 HAMAP families. HAMAP 1 {"type": "domain"} 793 PANTHER families. PANTHER 1 {"type": "domain"} 812 InterPro2GO mapping, defined by InterPro. InterPro2GO mapping 0 \N 122 Density of long non-coding RNA genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Long non-coding genes (density) 1 \N 25 Alignments of EST clusters (Unigenes, Gene Indices) from Maize EST Cluster (Maize) 1 {"type": "cdna"} 544 MicroRNA from miRBase. miRBase miRNA 1 {"type": "rna"} 804 Intrinsically disordered regions predicted by MobiDB lite. MobiDB lite 1 {"type": "feature"} 23 Alignments of EST clusters (Unigenes, Gene Indices) from monocot species EST Cluster (Monocot) 1 {"type": "cdna"} 803 Coiled-coil regions predicted by Ncoils. Coiled-coils (Ncoils) 1 {"type": "feature"} 85 Density of non-coding RNA genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Non-coding genes (density) 1 \N 26 Genomic location of Overgo probe markers Overgo Probes 1 \N 74 Percentage of repetitive elements for top level sequences (such as chromosomes, scaffolds, etc.) Repeats (percent) 1 \N 78 Percentage of G/C bases in the sequence. GC content 1 \N 791 Protein domains and motifs from the Pfam database. Pfam 1 {"type": "domain"} 813 Protein domains and motifs from the PROSITE profiles database. PROSITE profiles 1 {"type": "domain"} 796 Protein domains and motifs from the PIR (Protein Information Resource) Superfamily database. PIRSF 1 {"type": "domain"} 795 Protein fingerprints (groups of conserved motifs) from the PRINTS database. Prints 1 {"type": "domain"} 86 Density of pseudogenes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Pseudogenes (density) 1 \N 740 Repeats detected using RepeatMasker to scan non-redundant elements from several plant libraries , including REdat, TREP and RepetDB among others. Repeats: nrplants 1 \N 547 Repeats detected using the MIPS Repeat Database (REdat) using RepeatMasker. Repeats: REdat 1 \N 27 Genomics location of RFLP (restriction fragment length polymorphism) markers Marker (RFLP) 1 \N 24 Alignments of EST clusters (Unigenes, Gene Indices) from rice (Oryza) species EST Cluster (Rice) 1 {"type": "cdna"} 814 Protein domains and motifs from the PROSITE patterns database. PROSITE patterns 1 {"type": "domain"} 808 Low complexity peptide sequences identified by Seg. Low complexity (Seg) 1 {"type": "feature"} 799 Structure-Function Linkage Database families. SFLD 1 {"type": "domain"} 121 Density of short non-coding RNA genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Short non-coding genes (density) 1 \N 805 Signal peptide cleavage sites predicted by SignalP. Cleavage site (Signalp) 1 {"type": "feature"} 801 Protein domains and motifs from the SMART database. SMART 1 {"type": "domain"} 82 Density of single nucleotide polymorphisms (SNPs), calculated by dividing the chromosome into 150 "bins" and counting the SNPs in each. (For very short chromosomes, e.g. MT, some SNPs contribute to multiple bins.) SNP Density 1 \N 800 Protein domains and motifs from the SUPERFAMILY database. Superfamily 1 {"type": "domain"} 41 Gene annotation by TAIR through a process of automatic and manual curation. TAIR 1 {"colour_key": "[biotype]", "default": {"MultiBottom": "collapsed_label", "MultiTop": "gene_label", "alignsliceviewbottom": "as_collapsed_label", "caption": "TAIR gene", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label", "cytoview": "gene_label", "multi_name": "TAIR Genes", "name": "TAIR Genes"}, "key": "ensembl", "label_key": "[biotype]"} 802 Protein domains and motifs from the TIGRFAM database. TIGRFAM 1 {"type": "domain"} 807 Transmembrane helices predicted by TMHMM. Transmembrane helices 1 {"type": "feature"} 738 Tandem Repeats Finder locates adjacent copies of a pattern of nucleotides. Tandem repeats (TRF) 1 \N 543 tRNA models predicted with tRNAscan-SE (release 1.3.1). tRNA Models 1 {"type": "rna"} 784 UniParc mapping based on sequence checksums UniParc cross-reference 0 \N 75 Sequences from various databases are matched to Ensembl transcripts using Exonerate. These are external references, or 'Xrefs'. DNA match 0 \N 76 match Protein 0 \N 785 UniProt cross-reference derived transitively from a UniParc identifier UniParc-derived cross-reference 0 \N 786 Cross-reference derived transitively from a UniProt record UniProt-derived cross-reference 0 \N 583 Cross references to RefSeq nucleotide sequences, determined by alignment against the transcriptome with blastx. RefSeq transcripts 0 \N 582 Cross references to RefSeq peptide sequences, determined by alignment against the proteome with blastp. RefSeq peptides 0 \N 788 Cross references to UniProt Swiss-Prot (reviewed) proteins, determined by alignment against the proteome with blastp. UniProt reviewed proteins 0 \N 789 Cross references to UniProt TrEMBL (unreviewed) proteins, determined by alignment against the proteome with blastp. UniProt unreviewed proteins 0 \N