Machine Learning Systems
Selected Current Projects
HiiHAT Automated Hyperspectral Image Summary

Hyperspectral imagery has provided dramatic new insight into the geology and atmosphere of other planets. However, understanding these images can be quite challenging since scientists can only visualize a small number of bands. Many discoveries come months or years after data's initial release, and many key mineralogical signatures may still lie undiscovered in the ever-growing archive of hyperspectral planetary data.

We are developing algorithms for efficient automated summary and analysis of hyperspectral scenes. These are aimed primarily at the challenges of planetary science datasets, such as high noise levels, uncertain and subtle mineralogical constituents, and the need for fast processing turnaround during tactical mission planning. We have incorporated the techniques into the Hyperspectral Image Interpretation and Holistic Analysis Tools (Hii-Hat), an intelligent assistant to help mission operators efficiently browse, summarize, and search hyperspectral scenes.

Dynamic Landmarking Detecting Transient Surface Features with Dynamic Landmarking

We have developed methods to dynamically and autonomously detect transient surface features, such as dust devil tracks or dark slope streaks on Mars, from images. Most prior work on this subject has relied on manual examination of image pairs. Exciting discoveries of new surface features such as gullies and impact craters have been made, usually serendipitously. How many more such features remain undiscovered in the massive volume of images being collected and returned?

We have developed an automated approach to this problem that computes the "salience" of each pixel in an image, with respect to its neighbors, and uses this information to outline regions of high salience (landmarks). By matching landmarks between images of the same region taken at different times, we can automatically detect and catalog salient changes.

IMBUE: Interactive Machine Learning for Big Data Understanding and Explanation

We seek to develop innovative software solutions to big data problems. Our problem scope includes challenges associated with overwhelming data volumes in streaming applications, massive data archives, and in-situ operations with limited communication bandwidth. The unifying theme is the use of machine learning to elicit, model, and incorporate investigator preferences into fast, automated analysis.

We have applied these methods to data sets from Earth science (AVIRIS, MISR), optical astronomy (PTF), radio astronomy (VLBA), exoplanet studies (Kepler), and more.

Onboard Autonomous Science Investigation System

Rover traverse distances are increasing at a faster rate than downlink capacity is increasing. As this trend continues, the quantity of data that can be returned to Earth per meter traversed is reduced. The capacity of the rover to collect data, however, remains high. This circumstance leads to an opportunity to increase mission science return by carefully selecting the data with the highest science interest for downlink. We have developed an onboard science analysis technology for increasing science return from missions. Our technology evaluates the geologic data gather by the rover. This analysis is used to prioritize the data for transmission, so that the data with the highest science value is transmitted to Earth. In addition, the onboard analysis results are used to identify science opportunities. A planning and scheduling component of the system enables the rover to take advantage of the identified science opportunity.

Past Projects
Adaptive Data Processing Adaptive Data Processing for Next-Generation Radio Arrays

We are using machine learning methods to enable large-scale radio astronomy data analyses in real-time. Specifically, we focus on the detection and characterization of time-varying sources, one of the major scientific goals of current and planned radio arrays (such as the Square Kilometer Array). Detection and characterization of fast transients is expected to open up a previously unexplored area of astronomy, but it presents a daunting challenge in terms of recording and processing technologies. We are desiging on-line, adaptive, cost-sensitive algorithms that allocate computational and storage resources "on the fly" to detect the most promising data using a multi-tier approach to trigger on fast transients.

Cellerator Cellerator

Cellerator is a Mathematica® package designed to facilitate biological modeling via automated equation generation. Cellerator was designed with the intent of simulating at least the following essential biological processes:

  1. signal transduction networks (STNs);
  2. cells that are represented by interacting signal transduction networks; and
  3. multi-cellular tissues that are represented by interacting networks of cells that may themselves contain internal STNs.

These processes combine to form an obvious hierarchy that can be further subdivided for notational simplicity (e.g., STNs as elements of STNs, and so forth). In the past it has been necessary to manually translate chemical networks from cartoon-diagrams to chemical equations and thence to ordinary differential equations. This process is tedious and highly error prone, and impractical for all but the simplest of systems because of the combinatoric increase in the number of equations with the number of chemical species. Cellerator provides a framework for generating, translating, and numerically solving a potentially unlimited number of biochemical interactions.

Collaborative Learning Collaborative Learning for Sensor Networks

Imagine a machine learning agent deployed at each station in a sensor network, so that it can analyze incoming data and determine when something interesting happens. Traditionally, this analysis would be done independently at each station. But what if each agent could talk to its neighbors and find out what they're seeing? We've developed a learning system that enables collaboration so that the agents can autonomously (without human input) improve their performance. Each agent can ask its neighbors for their opinions, then use them to refine its own results. We've evaluated this approach to learning for both classification and clustering. Our target application domain involves the analysis of seismic and infrasonic data collected by the Mount Erebus Volcano Observatory to better understand different types of volcanic activity.

Heterogeneous Agricultural Research Via Interactive, Scalable Technology

We developed and demonstrated a machine learning analysis toolkit that uses Support Vector Machines, clustering, and regression models to identify the connections between weather and agriculture (e.g., crop yield). We integrated data from orbiting satellites and weather stations on the ground with historical crop yield archives.

Bioinformatics Support for the Functional Annotation of Human chromosome 19

The MLS Group developed a variety of bioinformatics tools to support Caltech and LLNL biologists' efforts to produce a complete functional annotation genes and regulatory sequences in the unusually gene-dense human chromosome HSA19. The tools include automated image analysis software that enables the high-throughput interpretation of tissue arrays, and systems for integrated analysis of diverse sequence annotation and transcript expression data.

MISR MISR Automated Cloud Classification

The Multi-angle Imaging SpectroRadiometer (MISR) instrument captures images of the earth at moderate resolution (275 m or 1.1 km) from nine different angles, ranging from straight down to 70 degrees in either direction. By comparing images of the same area of the earth from different angles, scientists are able to identify thin clouds and determine approximate cloud heights with unprecedented accuracy, leading to greater understanding of the planet's global distribution of clouds, and how that affects the global climate. Automating the process of detecting clouds and distinguishing between different types of clouds and aerosols remains a challenge, and we applied machine learning technology to this problem to complement the physics-based algorithms currently being used by scientists.