Distributed Optimization of Machine Learning Incorporating Nested Evaluation (DOMINE)
Principal Investigator
Lukas MandrakeEmail: lukas.mandrake@jpl.nasa.gov
DOMINE is a software tool that enables large-scale exploration and tuning of competing machine learning technologies in a fully robust, cross-validated, and distributed sense including feature selection. DOMINE asks users to specific their desired datasets, labels, and classifiers to enable a full and efficient diagnosis which technologies, hyperparameter settings, and feature settings perform best. Our system, by design, incorporates nested cross validation to ensure publication-quality results with statistically robust, defensible decisions for model optimality. All of this is accomplished on top of a client-server interface (see Figure 1) that allows for large-scale, distributed computation for quicker results.
Figure 1. The architecture of DOMINE system. Instances of the same module are color-coded with the same color.
Distributed Computing
Evaluation of machine learning algorithms (including deep learning algorithms) with hyperparameter optimization in a fully cross-validated manner requires significant computational resources. This is a challenge that all automated machine learning systems have to handle. We attempt to solve this computational problem with a classic client-server based distribution system. Tasks in DOMINE can be processed and evaluated independently. This nature enables us to take advantages of distributed computing. DOMINE server and client instances have a one-to-many relationship. The communications between server and client instances are enabled by HTTP protocol.
Machine Learning Algorithms
• Support both traditional and deep learning classification algorithms.
• Random and grid search for parameter selections.
• Support various cross validation methods
• Built-in evaluation methods to compute metrics including accuracy score, precision/recall score, and ROC curves.