6 AGI Repeat sequences repeats-AGI 1 \N 116 Alignments of EST clusters (Unigenes, Gene Indices) from Arabidopsis EST Cluster (Arabidopsis) 1 {"type": "cdna"} 890 Conserved Domain Database models. CDD 1 {"type": "domain"} 633 Covariance models from Rfam (release 12.2), aligned to the genome with 'cmscan' from the Infernal suite of programs. Models are restricted to those observed in species that share a last common ancestor (LCA). Rfam Models (LCA) 1 {"type": "rna"} 188 Density of coding genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Coding genes (density) 1 \N 117 Alignmments of EST clusters (Unigenes, Gene Indices) from all dicot species EST Cluster (Dicot) 1 {"type": "cdna"} 635 Dust is a program that identifies low-complexity sequences (regions of the genome with a biased distribution of nucleotides, such as a repeat). The Dust module is widely used with BLAST to prevent 'sticky' regions from determining false hits. Low complexity (Dust) 1 \N 133 Ab initio prediction of protein coding genes, based on the genomic sequence alone AA Salamov et al., Genome Res. 2000 4:516-22. FGENESH prediction 1 {"caption": "FGENESH prediction", "colour_key": "[biotype]", "default": {"MultiBottom": "collapsed_label", "MultiTop": "gene_label", "alignsliceviewbottom": "as_collapsed_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label", "cytoview": "gene_label"}, "key": "fgenesh", "label_key": "[biotype]", "multi_name": "FGENESH prediction", "name": "FGENESH prediction"} 914 CATH/Gene3D families. Gene3D 1 {"type": "domain"} 150 Gene annotation by BGI through a process of automatic and manual curation Genes 1 {"caption": "BGI gene", "colour_key": "[biotype]", "default": {"MultiBottom": "collapsed_label", "MultiTop": "gene_label", "alignsliceviewbottom": "as_collapsed_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label", "cytoview": "gene_label"}, "key": "ensembl", "label_key": "[biotype]", "multi_name": "BGI Genes", "name": "BGI Genes"} 917 \N GOA annotation 0 \N 669 GO term derived transitively from a UniProt record UniProt-derived GO term 0 \N 425 The Gene Ontology XRef projection pipeline GO projected xrefs 0 \N 721 Gene encoding an enzyme annotated at the Plant Reactome. Plant Reactome 1 \N 894 HAMAP families. HAMAP 1 {"type": "domain"} 892 PANTHER families. PANTHER 1 {"type": "domain"} 912 InterPro2GO mapping, defined by InterPro. InterPro2GO mapping 0 \N 225 Density of long non-coding RNA genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Long non-coding genes (density) 1 \N 120 Alignments of EST clusters (Unigenes, Gene Indices) from Maize EST Cluster (Maize) 1 {"type": "cdna"} 131 Genic nonCoding microsatellites Marker (microsat) 1 \N 641 RNA genes imported from miRBase . RNA genes 1 {"label_key": "[biotype]", "caption": "Genes", "colour_key": "[biotype]", "default": {"cytoview": "gene_label", "MultiBottom": "collapsed_label", "alignsliceviewbottom": "as_collapsed_label", "MultiTop": "gene_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label"}, "name": "Genes", "key": "ensembl"} 902 Intrinsically disordered regions predicted by MobiDB lite. MobiDB lite 1 {"type": "feature"} 118 Alignments of EST clusters (Unigenes, Gene Indices) from monocot species EST Cluster (Monocot) 1 {"type": "cdna"} 903 Coiled-coil regions predicted by Ncoils. Coiled-coils (Ncoils) 1 {"type": "feature"} 189 Density of non-coding RNA genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Non-coding genes (density) 1 \N 127 Genomic location of Overgo probe markers Overgo Probes 1 \N 169 Percentage of repetitive elements for top level sequences (such as chromosomes, scaffolds, etc.) Repeats (percent) 1 \N 168 Percentage of G/C bases in the sequence. GC content 1 \N 893 Protein domains and motifs from the Pfam database. Pfam 1 {"type": "domain"} 911 Protein domains and motifs from the PROSITE profiles database. PROSITE profiles 1 {"type": "domain"} 896 Protein domains and motifs from the PIR (Protein Information Resource) Superfamily database. PIRSF 1 {"type": "domain"} 897 Protein fingerprints (groups of conserved motifs) from the PRINTS database. Prints 1 {"type": "domain"} 190 Density of pseudogenes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Pseudogenes (density) 1 \N 637 Repeats detected using the MIPS Repeat Database (REdat) using RepeatMasker. Repeats: REdat 1 \N 638 Repeats identified by RepeatMasker, using the Repbase library of repeat profiles. Repeats: Repbase 1 \N 642 RNA genes produced by filtering alignments of Rfam (release 12.2) covariance models. RNA genes 1 {"label_key": "[biotype]", "caption": "Genes", "colour_key": "[biotype]", "default": {"cytoview": "gene_label", "MultiBottom": "collapsed_label", "alignsliceviewbottom": "as_collapsed_label", "MultiTop": "gene_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label"}, "name": "Genes", "key": "ensembl"} 123 Genomics location of RFLP (restriction fragment length polymorphism) markers Marker (RFLP) 1 \N 121 Genomic alignments of rice (Oryza) "Expressed Sequence Tags" (ESTs) from dbEST Rice ESTs 1 {"gene": {"do_not_display": "1"}, "type": "est"} 119 Alignments of EST clusters (Unigenes, Gene Indices) from rice (Oryza) species EST Cluster (Rice) 1 {"type": "cdna"} 913 Protein domains and motifs from the PROSITE patterns database. PROSITE patterns 1 {"type": "domain"} 906 Low complexity peptide sequences identified by Seg. Low complexity (Seg) 1 {"type": "feature"} 899 Structure-Function Linkage Database families. SFLD 1 {"type": "domain"} 224 Density of short non-coding RNA genes, calculated by dividing the chromosome into 150 "bins" and counting the genes in each. (For very short chromosomes, e.g. MT, some genes contribute to multiple bins.) Short non-coding genes (density) 1 \N 904 Signal peptide cleavage sites predicted by SignalP. Cleavage site (Signalp) 1 {"type": "feature"} 900 Protein domains and motifs from the SMART database. SMART 1 {"type": "domain"} 149 Density of single nucleotide polymorphisms (SNPs), calculated by dividing the chromosome into 150 "bins" and counting the SNPs in each. (For very short chromosomes, e.g. MT, some SNPs contribute to multiple bins.) SNP Density 1 \N 901 Protein domains and motifs from the SUPERFAMILY database. Superfamily 1 {"type": "domain"} 122 Locations of T-DNA and transposable element insertion sites identified via alignment to the genome of appropriate flanking sequence tags (FST) Insertion Sites 1 \N 907 Protein domains and motifs from the TIGRFAM database. TIGRFAM 1 {"type": "domain"} 905 Transmembrane helices predicted by TMHMM. Transmembrane helices 1 {"type": "feature"} 636 Tandem Repeats Finder locates adjacent copies of a pattern of nucleotides. Tandem repeats (TRF) 1 \N 634 tRNA models predicted with tRNAscan-SE (release 1.3.1). tRNA Models 1 {"type": "rna"} 643 RNA genes produced by filtering predictions from tRNAscan-SE v1.23. RNA genes 1 {"label_key": "[biotype]", "caption": "Genes", "colour_key": "[biotype]", "default": {"cytoview": "gene_label", "MultiBottom": "collapsed_label", "alignsliceviewbottom": "as_collapsed_label", "MultiTop": "gene_label", "contigviewbottom": "transcript_label", "contigviewtop": "gene_label"}, "name": "Genes", "key": "ensembl"} 667 UniParc mapping based on sequence checksums UniParc cross-reference 0 \N 164 Sequences from various databases are matched to Ensembl transcripts using Exonerate. These are external references, or 'Xrefs'. DNA match 0 \N 165 match Protein 0 \N 668 UniProt cross-reference derived transitively from a UniParc identifier UniParc-derived cross-reference 0 \N 670 Cross-reference derived transitively from a UniProt record UniProt-derived cross-reference 0 \N 673 Cross references to RefSeq nucleotide sequences, determined by alignment against the transcriptome with blastx. RefSeq transcripts 0 \N 672 Cross references to RefSeq peptide sequences, determined by alignment against the proteome with blastp. RefSeq peptides 0 \N 671 Cross references to UniProt Swiss-Prot (reviewed) proteins, determined by alignment against the proteome with blastp. UniProt reviewed proteins 0 \N 674 Cross references to UniProt TrEMBL (unreviewed) proteins, determined by alignment against the proteome with blastp. UniProt unreviewed proteins 0 \N