Recent Changes - Search:

Public

Contribute

Restricted area

Admin

pPOD on Google Code

edit SideBar
Logout

pPOD

processing PhylOData (pPOD):

Core Database Technologies to Enable the Integration of AToL Information

A joint project of University of Pennsylvania, University of California, Davis, and Yale University.


pPOD community meeting, 11-12 September 2007 (at NESCent, Durham, North Carolina)

Complete information


The researchers working in NSF's AToL (Assembling the Tree of Life) program http://atol.sdsc.edu aim to reconstruct the evolutionary origins of all living things. A lot of data is being generated and consumed within each of the program's 30+ projects.

pPOD is an NSF-funded collaborative project (IIS 0629846 + IIS 0630033 + IIS 0629702) dedicated to the development of tools for the integration of AToL data across projects and for the interoperability of AToL data within analysis pipelines.

AToL Projects' Data

The AToL projects include studies of bacteria, microbial eukaryotes, vertebrates, flowering plants and many more. The data being generated by these projects include:

  1. Genotypic descriptions and their provenance;
  2. Phenotypic descriptions and their provenance;
  3. Specimens and their provenance including collection information, voucher deposition, etc.;
  4. Interpretation of the primary measurements including homology;
  5. Estimates of phylogenies, and information about the methods employed;
  6. Supertree construction, and information about the methods employed; and
  7. Post-tree analyses such as character evolution hypotheses.

While the data collection, storage, and dissemination within each AToL project are well coordinated, there is a critical need to develop the infrastructure to integrate all AToL data sources together with other valuable resources such as publication archival databases, morphological character databases, phylogenomics databases, etc. Such integration will allow a project to share some of its data with the community (export), as well as to benefit from retrieving useful information from the rest of the community (import).

Core Technologies

We plan to develop and provide a reference implementation for a core set of technologies that will enable interoperability, i.e., both data and tool integration, following a three-pronged approach:

  1. Develop an extensible core data model for phylogenetic data.
    The model will include a query language as well as extensible data structures and will benefit from research on efficiently querying phylogenetic data.
  2. Develop schema mappings for peer-to-peer data integration and exchange, where a project can join existing integration groups by providing mappings between the schema of their data and the core data model or one of its extensions.
  3. Develop a scientific workflow system (lab notebook) that will allow research groups to put together the data integration components with the local database access components and with the analysis tools.
    This system will provide strong support for systematics-oriented provenance management in anticipation of the increase in utility of provenance in future tools.

Personnel

Penn:
Susan Davidson, Zack Ives, Val Tannen (coord.PI), Sam Donnelly http://db.cis.upenn.edu
Junhyong Kim http://www.bio.upenn.edu/faculty/kim
UC Davis:
Shawn Bowers, Bertram Ludaescher (PI), and Tim McPhillips http://daks.ucdavis.edu/~ludaesch
Yale:
Reed Beaman (PI), Bill Piel http://www.yale.edu/peabody/databases/inform , http://treebase.peabody.yale.edu/treebase
Consultants:
Peter Buneman (U. Edinburgh) http://www.dcc.ac.uk/about/directory
Sarah Cohen-Boulakia (Université Paris-Sud, at Penn at the inception of the project) http://www.lri.fr/~cohen
Michael Donoghue (Yale) http://www.phylodiversity.net/donoghue
Jim Leebens-Mack (U Georgia) http://www.plantbio.uga.edu/~jleebensmack/JLMmain.html
Francois Lutzoni (Duke) http://www.lutzonilab.net
David Maddison (U Arizona) http://david.bembidion.org
Wayne Maddison (U British Columbia) http://salticidae.org/wpm
Brent Mishler (UC Berkeley) http://ucjeps.berkeley.edu/bryolab
Bernard Moret (EPF Lausanne) http://lcbb.epfl.ch
Rod Page (U Glasgow) http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Mike Sanderson (U Arizona) http://ginger.ucdavis.edu
Todd Vision (U North Carolina and NESCENT) http://visionlab.bio.unc.edu http://www.nescent.org/about/leadership.php

pPOD community meeting, 11-12 September 2007 (at NESCent, Durham, North Carolina)

Complete information

pPOD on Google Code

This is where the core data model part of the pPOD project is being hosted. You'll find source code, javadocs, test coverage reports, bug tracking, and all new publicly available documentation.

Share your experience

The ultimate justification of the project is to produce easy-to-use tools. We plan to leverage combined experience in distributed database integration, workflow systems, as well as the practical experience of the AToL informatics and related communities. The project is collecting suggestions, experience and, eventually, usecases from the community. If you are moved to help, please post on the wiki at: Contribute

Edit - History - Print - Recent Changes - Search
Page last modified on June 08, 2010, at 01:59 PM