Synthetic data files for multiple-instance regression studies
Kiri Wagstaff, February 2008
kiri.wagstaff@jpl.nasa.gov
----------------------------------------------------------------------
We have generated several synthetic data sets containing 1D data.
Each one contains 20 bags with k=2 to 10 components. Each component
is represented by 10 data items.
The data files are:
1. synth_nBags=20_k=$k_bagData.dat, where $k ranges from 2 to 10.
Format (per item/line of the file):
bagID bag_label x_value
The bag_label is redundant in that it appears for every item in the
file but is the same for all items from the same bag. The bag label
is only generated from component 1. The data set for k components
contains all of the data generated for k-1, plus a new component (not
related to the bag label).
2. synth_nBags=20_k=$k_groundTruth.dat, where $k$ ranges from 2 to 10.
Format:
slope y_intercept
This file stores the true regression model that was used to generate
labels for the bags, from only the true component (component 1).
----------------------------------------------------------------------