We seek to develop innovative software solutions to big data problems. Our problem scope includes challenges associated with overwhelming data volumes in streaming applications, massive data archives, and in-situ operations with limited communication bandwidth. The unifying theme is the use of machine learning to elicit, model, and incorporate investigator preferences into fast, automated analysis. Innovative aspects of our solutions include
These methods can also assist data analysis and real-time decisions in mission operations.
Several data triage systems exist for detecting rare events of high scientific interest amidst large (but not yet overwhelming) data volumes. The V-FASTR system at the VLBA is at the forefront of commensal (piggyback) real-time detection for radio astronomy [Wayth et al., 2011]. It autonomously detects transient radio events such as pulses from pulsars and other astrophysical phenomena. The Virtual Observatory performs a similar function across a network of optical telescopes, and the OGLE survey autonomously detects rare exoplanet microlensing events.
However, existing data triage systems are model-driven "one-size-fits-none" solutions. They generally obtain detection, excision, and prioritization rules exclusively from prior physical models or laboratory measurements. These seldom extrapolate to new science goals and users, let alone the wide range of observing conditions experienced by a petascale instrument. For example, observing conditions can change from one observation to the next, as is the case in radio transient detection, in which the noise environment fluctuates strongly due to local interference. Further, most instruments must support multiple scientific goals and users, and this is currently achieved through manual discussion and prioritization which is too slow to keep up with anticipated data streams.
We are working with scientists in a variety of disciplines that face big data challenges. Our collaborations include:
Similar data prioritization challenges exist for Earth orbital multi-angle data (from MISR), hyperspectral data (from AVIRIS), and exoplanet atmospheric spectra (from Spitzer).