Data Intake and Discovery

The primary objective of the Data Intake and Discovery (DID) workflow is to provide a flexible environment for hurricane field data management and discovery by individual users that can readily integrate with other major repositories, while also curating the data generated by the RRA. Unfortunately, the diverse data and event types associated with hurricanes pose a particular challenge to maintaining the ability to robustly query the stored data. This was compounded by the fact that the developers chose not to impose, a priori, a rigid standard, i.e., supporting a limited number of data fields. Instead, to create an incentive for information sharing within the community, a large database of attributes from field reconnaissance instruments developed by the North Carolina Sea Grant, US Army Corp of Engineers, Louisiana State University, Texas Tech University, and the University of Notre Dame were assembled. This would allow users to be able to select subsets of these attributes to map to the fields in their personal database when they supply their data to the Data Warehouse, instead of forcing them to completely reformat their databases to match some rigid standard.

Available data fields are organized into eleven major categories, e.g., Demographics, Basic Structural Information, Structural Details (I-III), Site Inventory, Hazard Characterization, and Damage (I-IV). The database was designed to allow users can attach models, data (field observations and measurements) or URLs to their reports, and backside file naming conventions allow users to attach notes and images to groups of entries in any of the categories in the database.  This flexible approach to reconnaissance data curation also enables users to create and save their own customized damage reporting forms with fields selected from the database of attributes or use pre-defined forms generated by other members of the community.

fig7

With no centralized repository, investigators supervise their own curation, at best publishing their databases on personal websites with varying standards and completeness and limited ability to query data, though most reconnaissance databases remain offline. Thus DID provides a rare open repository for hurricane reconnaissance, with wide variety of supported fields and dynamic visualization and querying capabilities making this valuable data discoverable.

-- Tracy Kijewski-Correa
CyberEye Director

The DID workflow is primarily enabled by the CyberEye Data Warehouse, a PostGIS spatial database that houses both these user-supplied entries as well as outputs of other modules, e.g., RRA runs. By creating one centralized data repository, users have the ability to execute robust searches over the entirety of Warehouse, which stores reports (field data or simulation results that include metadata and descriptions of the event/scenario) and data items (individual measurements or observations) – offering a far more manageable approach to data discovery for large stores of heterogeneous data. This enables searches to be executed using a joint combination of Standard Filters (logical operators) or Spatial Filters (using bounding boxes). More importantly, the use of a Restful architecture for the overall platform provides the flexibility to build customizable applications on mobile platforms to support field reconnaissance teams in directly submitting post-disaster data either in real time, when connectivity in the field is available, or in a batch mode upon returning to a base of operations.

Enter DID Workflow