Synthetic data files for multiple-instance regression studies Kiri Wagstaff, February 2008 kiri.wagstaff@jpl.nasa.gov ---------------------------------------------------------------------- We have generated several synthetic data sets containing 1D data. Each one contains 20 bags with k=2 to 10 components. Each component is represented by 10 data items. The data files are: 1. synth_nBags=20_k=$k_bagData.dat, where $k ranges from 2 to 10. Format (per item/line of the file): bagID bag_label x_value The bag_label is redundant in that it appears for every item in the file but is the same for all items from the same bag. The bag label is only generated from component 1. The data set for k components contains all of the data generated for k-1, plus a new component (not related to the bag label). 2. synth_nBags=20_k=$k_groundTruth.dat, where $k$ ranges from 2 to 10. Format: slope y_intercept This file stores the true regression model that was used to generate labels for the bags, from only the true component (component 1). ----------------------------------------------------------------------