processing PhylOData (pPOD):
Core Database Technologies to Enable the Integration of AToL Information
A joint project of University of Pennsylvania, University of California, Davis, and Yale University.
pPOD community meeting, 11-12 September 2007 (at NESCent, Durham, North Carolina)
Complete information
The researchers working in NSF's AToL (Assembling the Tree of Life)
program http://atol.sdsc.edu aim to reconstruct the
evolutionary origins of all living things. A lot of data is being
generated and consumed within each of the program's 30+ projects.
pPOD is an NSF-funded collaborative project
(IIS 0629846 + IIS 0630033 + IIS 0629702) dedicated to the
development of tools for the integration of AToL data across projects
and for the interoperability of AToL data within analysis pipelines.
AToL Projects' Data
The AToL projects include studies of bacteria, microbial eukaryotes,
vertebrates, flowering plants and many more. The data being generated
by these projects include:
- Genotypic descriptions and their provenance;
- Phenotypic descriptions and their provenance;
- Specimens and their provenance including collection information, voucher deposition, etc.;
- Interpretation of the primary measurements including homology;
- Estimates of phylogenies, and information about the methods employed;
- Supertree construction, and information about the methods employed; and
- Post-tree analyses such as character evolution hypotheses.
While the data collection, storage, and dissemination within each AToL
project are well coordinated, there is a critical need to develop the
infrastructure to integrate all AToL data sources together with
other valuable resources such as publication archival databases,
morphological character databases, phylogenomics databases, etc. Such
integration will allow a project to share some of its data with the
community (export), as well as to benefit from retrieving useful
information from the rest of the community (import).
Core Technologies
We plan to develop and provide a reference implementation
for a core set of technologies that will enable interoperability,
i.e., both data and tool integration, following a three-pronged approach:
- Develop an extensible core data model for phylogenetic data.
The model will include a query language as well as extensible data structures and will benefit from research on efficiently querying phylogenetic data.
- Develop schema mappings for peer-to-peer data integration and exchange, where a project can join existing integration groups by providing mappings between the schema of their data and the core data model or one of its extensions.
- Develop a scientific workflow system (lab notebook) that will allow research groups to put together the data integration components with the local database access components and with the analysis tools.
This system will provide strong support for systematics-oriented provenance management in anticipation of the increase in utility of provenance in future tools.
Personnel
Penn:
Susan Davidson, Zack Ives, Val Tannen (coord.PI),
Sam Donnelly http://db.cis.upenn.edu
Junhyong Kim http://www.bio.upenn.edu/faculty/kim
UC Davis:
Shawn Bowers, Bertram Ludaescher (PI), and Tim McPhillips http://daks.ucdavis.edu/~ludaesch
Yale:
Reed Beaman (PI), Bill Piel http://www.yale.edu/peabody/databases/inform , http://treebase.peabody.yale.edu/treebase
Consultants:
Peter Buneman (U. Edinburgh) http://www.dcc.ac.uk/about/directory
Michael Donoghue (Yale) http://www.phylodiversity.net/donoghue
Jim Leebens-Mack (U Georgia) http://www.plantbio.uga.edu/~jleebensmack/JLMmain.html
Francois Lutzoni (Duke) http://www.lutzonilab.net
David Maddison (U Arizona) http://david.bembidion.org
Wayne Maddison (U British Columbia) http://salticidae.org/wpm
Brent Mishler (UC Berkeley) http://ucjeps.berkeley.edu/bryolab
Bernard Moret (EPF Lausanne) http://lcbb.epfl.ch
Rod Page (U Glasgow) http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Mike Sanderson (U Arizona) http://ginger.ucdavis.edu
Todd Vision (U North Carolina and NESCENT) http://visionlab.bio.unc.edu http://www.nescent.org/about/leadership.php
pPOD community meeting, 11-12 September 2007 (at NESCent, Durham, North Carolina)
Complete information
Share your experience
The ultimate justification of the project is to produce easy-to-use
tools. We plan to leverage combined experience in distributed database
integration, workflow systems, as well as the practical experience of
the AToL informatics and related communities. The project is
collecting suggestions, experience and, eventually, usecases from the
community. If you are moved to help, please post on the wiki at:
Contribute