Machine Learning and Instrument Autonomy Group

Principal Investigator

Jack Lightholder
Phone: (626) 710-3246

Download CODEX on GitHub

Modern science datasets from missions like OCO-2 and telemetry records in operational environments may have 500+ simultaneous measurements at each of millions of time samples. Scientists would often like to look through the record and discover not only expected trends but ones they did not initially guess, while Ops personnel perform the same task under serious time pressure should an anomaly occur. In both cases, the optimal environment for this rapid exploration large data would be one where visualizations were clear, interactive, and responsive, permitting the investigator to “play” with the data and gain rapid insight, falsify hypothesis, and make discoveries. Machine Learning (ML) has proven invaluable in providing some of these key data insights, but to do so in a statistically robust and reliable manner requires a data science professional and a lot of custom Python code, losing any sense of interaction and play. CODEX will address these concerns by providing a desktop-like environment with standard scientific graph types that are robust to rapid, powerful exploration. For example, all graphs are linked such that selecting data on one graph reveals those same points in all others regardless of type including histograms and heat maps. Machine Learning comes into play as well, allowing users to request high-level directions like “Show me more like this region,” “How many groups best describe all of these observations,” and “Predict this value based on these 500 other values, and show me the small set of variables that were responsible for 90% of the prediction.” All of this power normally requires considerable tuning of complex parameters in the ML algorithms; CODEX achieves this tuning visually, by showing demonstrations of what would happen for various hyper parameters and permitting the user to select visually the desired outcome.