Some notes on Gramene's changes to Ensembl to use Oracle.

All pathnames are relative to gramene_ensembl

What we did was not so much change Ensembl to work with Oracle, but
as much as possible, make it work without change, so that upgrading to
newer version of Ensembl would be easier, at the cost of some efficiency.


Right now we're still using the 1.0.0 we started with, but should be
up-to-date later this spring.

genbank-fetch/create.sql
========================================================================
Our db schema
We prefixed column names which are oracle reserved words, like 'sequence',
with "ORCL_".
------------------------------------------------------------------------
Some additions to the sequence database schema are made by scripts
in load-scripts.
This directory also contains the schemas for the comparative map and
ontology databases.
========================================================================


The general part of the adaptation is 
Bio::EnsEMBL::DBSQL::DBAdaptor uses 
Bio::EnsEMBL::DBSQL::SQL.pm uses
Bio::EnsEMBL::DBSQL::SQL::StatementHandle uses
Bio::EnsEMBL::DBSQL::SQL::Orowhash

Oracle returns all column names as upper case, and the queries
usually have them lower case--a problem for fetchrow_hashref().
I added  some NAME_lc's (  $sth->fetchrow_hashref('NAME_lc')  )
before realizing this wasn't sufficient.
(There are queries mentioning columns like "xrefId".
So I gave up fixing the Ensembl code and did the Orowhashref thing.)
So these NAME_lc's can be ignored.

ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm
========================================================================
Oracle-style DBI connect 
and also save more connect parameters for the map database
(I think this works, but we're using our own map db now).

Adds some tracing -- so you can see better which queries don't work

I'm not sure if the AUTOLOAD I added is really necessary.

You'll want to change other Adaptors you use similarly.


========================================================================

ensembl/modules/Bio/EnsEMBL/DBSQL/SQL.pm
========================================================================
Tries to fix up queries
prepare returns our own StatementHandle type so the results can be munged.

Ofixsql  handles reserved words, single vs. double quotes,  IF (we're
lucky they can all be made DECODE's), DATES, etc.
The   --  stuff is necessary because of queries build with
    "Where field1>field2-$x" and $x is negative
========================================================================

ensembl/modules/Bio/EnsEMBL/DBSQL/SQL/StatementHandle.pm
========================================================================
Just makes fetchrow_hasref return our funny kind of hash.
========================================================================

ensembl/modules/Bio/EnsEMBL/DBSQL/SQL/Orowhash.pm
========================================================================
For select_row_hashref.
Makes it case insensitive,
Also handles the column names we changed by adding "ORCL_"
========================================================================

conf/SiteDefs.pm
========================================================================
Relevant variables:
$ENSEMBL_DRIVER		= 'Oracle';	    # ensembl database driver
$ENSEMBL_HOST           = 'cheese';      # ensembl database server
$ENSEMBL_HOST_PORT	= '1521';	    # ensembl database port (Oracle)
$ENSEMBL_DB             = 'gramene';	    # ensembl database name
$ENSEMBL_HELP           = 'gramene';    # help database name
$ENSEMBL_HELP_HOST      = 'cheese';      # help database name
$ENSEMBL_HELP_PORT	= '1521';	    # help database port
$ENSEMBL_HELP_DBUSER    = 'riceseq';	    # help database name
$ENSEMBL_HELP_DBPASS    = 'passwd';	    # help database name
$ENSEMBL_DBUSER         = 'riceseq';         # database username
$ENSEMBL_DBPASS         = 'passwd';	    # database password
(etc. for other databases SNP, FAMILY, etc. if you have them)

$ENSEMBL_ORACLE_HOME    = '/oracle1/app/oracle/product/9.0.1.1';
$ENSEMBL_ORACLE_BASE    = '/oracle1/app/oracle';
$ENSEMBL_TWO_TASK       = 'T:localhost:gramene';
$ENSEMBL_TNS_ADMIN      = '/oracle1/app/oracle/product/9.0.1.1/network/admin';
$ENSEMBL_ORACLE_SID     = 'gramene';
========================================================================

conf/httpd.conf
========================================================================
These variables are put into the environment of each mod_perl process:
$ENSEMBL_ORACLE_HOME
$ENSEMBL_ORACLE_BASE
$ENSEMBL_TWO_TASK
$ENSEMBL_TNS_ADMIN
========================================================================

perl/EnsWeb.pm
========================================================================
get_locator() mentions everything Oracle wants.
========================================================================

ensembl/modules/Bio/EnsEMBL/DBSQL/RawContig.pm
========================================================================
get_all_SimilarityFeatures_above_score:
Oracle doesn't like "not in" with >1000 entries,
so I build up a "not  in" clause for the first 900 features already
processed and remember the rest in a hash.
You might have this problem elsewhere.
========================================================================


ensembl/modules/Bio/EnsEMBL/DBSQL/DBPrimarySeq.pm
========================================================================
Since varchar2 is too short for the sequence dna table, we use CLOB.
So we can't do  SUBSTR (etc.) in SQL  (Why can't Oracle throw in a hack
for this??)
For convenience we add a column for the length.
You could do better by using the DBMS_LOB PL/SQL package, but we haven't
gotten around to it.
========================================================================


helpview:
========================================================================
The text is often too long to be a varchar2. So each record is split into
several, distinguished by a "piece" number.  kw and title are repeated,
so it's not normalized, but makes for quick adaption of the code.

helpview/mkhelpview.sql is the new schema
helpview/prep_table.pl change the helpview table as given by EnsEMBL
				into a format that's easier to edit
helpview/gramene_helpview.txt is our help in  this format
helpview/load_helpview.pl loads this file into the database, splitting
					    each entry that needs to be.
( EnsemblLocator  is defined in CSHL::Config as 
    use constant EnsemblLocator =>    'Bio::EnsEMBL::DBSQL::DBAdaptor/host=cheese.cshl.org;port=1521;dbname=gramene;user=riceseq;pass=passwd;perlonlyfeatures=0;mapdbname=;driver=Oracle;debug=0';
)
perl/HelpView.pm is modified to fetch all the pieces and put them together.
========================================================================


perl/unisearch
========================================================================
Database connection fixed-up
( added %dbh to avoid Apache::Registry "will not stay shared" errors(this
					has nothing to do with Oracle))
Fixed CONCAT(A,B,C) to CONCAT(CONCAT(A,B),C) in queries
Got rid of the offset & limit stuff.
========================================================================

doc/EnsEMBL_Oracle.txt contains some old notes