Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Collection and/or integration of multi-disciplinary datasets into existing open-source community databases

Overarching Objectives

The key top level objectives of TP3 are:

‘Increase accesibility to existing open-source datasets’: Provide both a centralized location by which datasets relevant to a wide range of researchers in earthquake resilience can be accessed (this does not imply that datasets will all be hosted by QuakeCoRE, often there may be links to third-party data), and also facilitate the access to datasets through the use of interfaces/APIs to download subsets of data from complex multi-faceted datasets in different formats.

‘Focus on transformative datasets’: Data is critical for all aspects of QuakeCoRE research, and is collected and archived by groups of researchers already.  Open-source ‘community’ datasets should be datasets that require centralized hosting (and possibly curation) because they are mission-critical to achieving strategic QuakeCoRE goals. 

Principles

The underlying principles for TP3 to attain the overarching objectives are:

Centralized vs decentralized: It is not efficicent for QuakeCoRE to host/curate all datasets that exist.  There is a natural progression that all datasets start as the focus of a subset of researchers which have a specific focus on the data and utilize it in a decentralized manner (relative to QuakeCoRE as a community of researchers).  As the widespread value of some specific datasets becomes apparent (because of their size, comprehensiveness, or multidisciplinary nature) then it makes sense for a more coordinated approach to enable their access in a centralized manner.  Furthermore,  to handle datasets that are both dynamic (i.e. continually growing at a rapid pace) and multi-faceted it maybe necessary to have interfaces/APIs that enable the users to interact with the dataset before identifying a specific subset of the data to download and manipulate remotely.  Thus datasets can be seen to sit in one of three categories: (1) decentralized, QuakeCoRE does nothing; (2) centralized, QuakeCoRE either hosts the data in a ‘raw’ format, or provides a link to a third-party location; (3) centralized data with appropiate interfaces/APIs to enable interogration of the dataset prior to download to a remote location.

Open Source: The data must be open source to maintain flexibility, enhance collaboration, and not be excessively dependent on individual organisations.  This also recognises that NZ researchers represent a small portion of the global human resource in this field and the use of OS datasets enables a greater leverage of international initiatives.

Flexible: Any curation and hosting that QuakeCoRE provides to external datasets must be flexible to enable the datasets to be used in a multitude of ways by different researchers from different disciplines

Recognise QuakeCoRE’s position in the data ecosystem: Data curation and hosting is extremely expensive, and several research entities that we have interacted with (PEER/SCEC) has said that they have avoided this topic because it is a black hole for resources.  We must be extremely careful not to fall into this trap.  Staying true to the centralized vs decentralized mantra is one important concept, as well as the idea that in order for QuakeCoRE to continue to innovate we must continually shift resources onto new datasets.  In most cases there should be an emphasis on transferring QuakeCoRE community datasets to external third parties that can provide a long-term ‘home’ for them (particularly if curator of the dataset costs more than simply server space).

Personnel

The QuakeCoRE (and aligned) staff for Technology Platform 3 (some of who are also involved in TP4) are:

Sung Bae, IT Architect

Sharmila Savarimuthu, Software Engineer 

Key performance indicators:

...