Some notes on Gramene's changes to Ensembl to use Oracle. All pathnames are relative to gramene_ensembl What we did was not so much change Ensembl to work with Oracle, but as much as possible, make it work without change, so that upgrading to newer version of Ensembl would be easier, at the cost of some efficiency. Right now we're still using the 1.0.0 we started with, but should be up-to-date later this spring. genbank-fetch/create.sql ======================================================================== Our db schema We prefixed column names which are oracle reserved words, like 'sequence', with "ORCL_". ------------------------------------------------------------------------ Some additions to the sequence database schema are made by scripts in load-scripts. This directory also contains the schemas for the comparative map and ontology databases. ======================================================================== The general part of the adaptation is Bio::EnsEMBL::DBSQL::DBAdaptor uses Bio::EnsEMBL::DBSQL::SQL.pm uses Bio::EnsEMBL::DBSQL::SQL::StatementHandle uses Bio::EnsEMBL::DBSQL::SQL::Orowhash Oracle returns all column names as upper case, and the queries usually have them lower case--a problem for fetchrow_hashref(). I added some NAME_lc's ( $sth->fetchrow_hashref('NAME_lc') ) before realizing this wasn't sufficient. (There are queries mentioning columns like "xrefId". So I gave up fixing the Ensembl code and did the Orowhashref thing.) So these NAME_lc's can be ignored. ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm ======================================================================== Oracle-style DBI connect and also save more connect parameters for the map database (I think this works, but we're using our own map db now). Adds some tracing -- so you can see better which queries don't work I'm not sure if the AUTOLOAD I added is really necessary. You'll want to change other Adaptors you use similarly. ======================================================================== ensembl/modules/Bio/EnsEMBL/DBSQL/SQL.pm ======================================================================== Tries to fix up queries prepare returns our own StatementHandle type so the results can be munged. Ofixsql handles reserved words, single vs. double quotes, IF (we're lucky they can all be made DECODE's), DATES, etc. The -- stuff is necessary because of queries build with "Where field1>field2-$x" and $x is negative ======================================================================== ensembl/modules/Bio/EnsEMBL/DBSQL/SQL/StatementHandle.pm ======================================================================== Just makes fetchrow_hasref return our funny kind of hash. ======================================================================== ensembl/modules/Bio/EnsEMBL/DBSQL/SQL/Orowhash.pm ======================================================================== For select_row_hashref. Makes it case insensitive, Also handles the column names we changed by adding "ORCL_" ======================================================================== conf/SiteDefs.pm ======================================================================== Relevant variables: $ENSEMBL_DRIVER = 'Oracle'; # ensembl database driver $ENSEMBL_HOST = 'cheese'; # ensembl database server $ENSEMBL_HOST_PORT = '1521'; # ensembl database port (Oracle) $ENSEMBL_DB = 'gramene'; # ensembl database name $ENSEMBL_HELP = 'gramene'; # help database name $ENSEMBL_HELP_HOST = 'cheese'; # help database name $ENSEMBL_HELP_PORT = '1521'; # help database port $ENSEMBL_HELP_DBUSER = 'riceseq'; # help database name $ENSEMBL_HELP_DBPASS = 'passwd'; # help database name $ENSEMBL_DBUSER = 'riceseq'; # database username $ENSEMBL_DBPASS = 'passwd'; # database password (etc. for other databases SNP, FAMILY, etc. if you have them) $ENSEMBL_ORACLE_HOME = '/oracle1/app/oracle/product/9.0.1.1'; $ENSEMBL_ORACLE_BASE = '/oracle1/app/oracle'; $ENSEMBL_TWO_TASK = 'T:localhost:gramene'; $ENSEMBL_TNS_ADMIN = '/oracle1/app/oracle/product/9.0.1.1/network/admin'; $ENSEMBL_ORACLE_SID = 'gramene'; ======================================================================== conf/httpd.conf ======================================================================== These variables are put into the environment of each mod_perl process: $ENSEMBL_ORACLE_HOME $ENSEMBL_ORACLE_BASE $ENSEMBL_TWO_TASK $ENSEMBL_TNS_ADMIN ======================================================================== perl/EnsWeb.pm ======================================================================== get_locator() mentions everything Oracle wants. ======================================================================== ensembl/modules/Bio/EnsEMBL/DBSQL/RawContig.pm ======================================================================== get_all_SimilarityFeatures_above_score: Oracle doesn't like "not in" with >1000 entries, so I build up a "not in" clause for the first 900 features already processed and remember the rest in a hash. You might have this problem elsewhere. ======================================================================== ensembl/modules/Bio/EnsEMBL/DBSQL/DBPrimarySeq.pm ======================================================================== Since varchar2 is too short for the sequence dna table, we use CLOB. So we can't do SUBSTR (etc.) in SQL (Why can't Oracle throw in a hack for this??) For convenience we add a column for the length. You could do better by using the DBMS_LOB PL/SQL package, but we haven't gotten around to it. ======================================================================== helpview: ======================================================================== The text is often too long to be a varchar2. So each record is split into several, distinguished by a "piece" number. kw and title are repeated, so it's not normalized, but makes for quick adaption of the code. helpview/mkhelpview.sql is the new schema helpview/prep_table.pl change the helpview table as given by EnsEMBL into a format that's easier to edit helpview/gramene_helpview.txt is our help in this format helpview/load_helpview.pl loads this file into the database, splitting each entry that needs to be. ( EnsemblLocator is defined in CSHL::Config as use constant EnsemblLocator => 'Bio::EnsEMBL::DBSQL::DBAdaptor/host=cheese.cshl.org;port=1521;dbname=gramene;user=riceseq;pass=passwd;perlonlyfeatures=0;mapdbname=;driver=Oracle;debug=0'; ) perl/HelpView.pm is modified to fetch all the pieces and put them together. ======================================================================== perl/unisearch ======================================================================== Database connection fixed-up ( added %dbh to avoid Apache::Registry "will not stay shared" errors(this has nothing to do with Oracle)) Fixed CONCAT(A,B,C) to CONCAT(CONCAT(A,B),C) in queries Got rid of the offset & limit stuff. ======================================================================== doc/EnsEMBL_Oracle.txt contains some old notes