              GDPDM - Perl tools by Ken Youens-Clark

This distribution represents a suite of tools for working with the
GDPDM schema.  It relies heavily on many CPAN modules, so the first
thing you should do is run "perl Build.PL" to see what is missing and
then use CPAN or CPANPLUS to satisfy all the dependencies.  I wouldn't
necessarily recommend doing "./Build install" to put everything into
standard Perl library paths (and scripts into "/usr/local/bin/" or
where ever).  In fact, you could avoid having to change a few
hard-coded paths if you just put everything into "/opt/GDPDM" for now.
I will fix this in the future.

Possible Import Flow:

What I've come up with so far for importing GDPDM data isn't optimal,
but it seems to work so far.  I'm giving the curators an Excel
spreadsheet where each worksheet is a table and the columns are named
the same as those in the table.  E.g., there is a worksheet called
"div_taxonomy" with columns the same as in the GDPDM "div_taxonomy"
table.  

The curator is expected to fill in the values of the spreadsheet as
they would appear in the database with one notable exception:  All of
the primary key values need only be internally consistent within the
single spreadsheet and will not be used on import (unless directed).

The idea is that a curator will work up many small spreadsheets.
E.g., to load taxonomy and passport data, the curator could create one
spreadsheet with a handful of species and use the "div_taxonomy" ID
for each in the "div_taxonomy_id" column of the "div_passport"
worksheet to indicate which species to link the passport to.  If, when
importing into the database, an identical species (taxonomy) is found,
it will not be recreated, and the existing or new "div_taxonomy_id"
will be used to link the new (or updated) passport record.  There
remains work to be done to specify the lookup to determine extant
records and when/what to update.

To go from the Excel format to something worth working with, I've
written the "excel2sexp" script (in "scripts").  Run "perldoc" on that
to find out all the details, but essentially this will convert an
Excel file to something that the "gdpdm-import.pl" script will be
happy to work with.  (While it creates an S-expression file, that is
not the only valid import format;  XML and space-indented files are
also acceptably, and I could see easily accepting YAML, as well.)

To import the data, use the "gdpdm-import.pl" script (in "scripts").
To get things working, you probably won't have to do much if you took
my earlier suggestion to untar everything into "/opt/GDPDM" as this is
the hard-coded path for the config file.  If you put it somewhere
else, you'll need to change the setting in the file
"lib/GDPDM/Config.pm".  Lastly, you'll need to set the
"conf/gdpdm.conf" values for your database.

Data Viewing Tools:

A separate project, "GDPDM-Catalyst" is a Catalyst-based web
viewing/editing tool.  For an intro on Catalyst, see here (something
of a Perl clone of Ruby on Rails):

    http://dev.catalyst.perl.org/

SQL::Translator:

FWIW, I've used my SQL::Translator module to help automate things like
the creation of the "lib/GDPDM/Config.pm" module or the Excel
spreadsheet.  These are made directly from the schema definition,
saving time and effort and ensuring correctness.

Ideas for the Future:

As an alternative to SPOPS, I'm very interested in Class::DBI, a very
mature database persistence module on CPAN.  There are a few things
that SPOPS gives me with respect to how the configuration can tell me
about how tables relate that I'm not sure I can get from CDBI, so I'm
still looking.  The upside to moving to CDBI is that I think it would
be easier to hook up to Catalyst.

To install:
 
  perl Build.PL   # optionally satisfy any missing dependencies
  ./Build
  ./Build test    # no tests right now
  ./Build install # probably as root or with sudo
