Reads a full dataset; limited to those that fit in memory. More...
#include <Dataset.hpp>
Public Member Functions | |
Dataset () | |
count of rows in each class 1,2... ninc[0] unused | |
Dataset (string fname, double enc0, double enc1) | |
Read from a file. | |
const vector< double > & | row (size_t i) const |
Look at an indexed row of data. | |
size_t | getNClasses () const |
Return number of classes. | |
size_t | getNRows () const |
Return number of rows / instances. | |
size_t | getTargetClass (size_t ind) const |
Return target class 1..N. | |
size_t | getNRowsInClass (size_t c) const |
Return the number of rows that belong to a given class. | |
const vector< double > & | prototype (size_t row) |
Return a binary vector encoding the target class; the real values for the encoding must be given as a constructor parameter; default is -1 for non-class elements and +1 for class elements. | |
void | toStream (ostream &ost) const |
Dump data to a stream. | |
Protected Attributes | |
vector< vector< double > > | invec |
vector< size_t > | targetc |
input vectors | |
vector< vector< double > > | prototypes |
target classes 1,2,... | |
size_t | nclasses |
target prototypes | |
vector< size_t > | ninc |
number of classes; discovered upon load |
Reads a full dataset; limited to those that fit in memory.
A pointer to this must be shared among all users. As of yet, works only for classification datasets, where the last value on each row is the integral class label and all other values are real-valued features.
Dataset::Dataset | ( | string | fname, | |
double | enc0, | |||
double | enc1 | |||
) |
Read from a file.
TODO: differentiate classification / approx?
size_t Dataset::getNClasses | ( | ) | const |
Return number of classes.
TODO: classification vs. approx?
size_t Dataset::getTargetClass | ( | size_t | ind | ) | const |
Return target class 1..N.
NOTE: class indices are base 1 while data indices are base 0!
const vector< double > & Dataset::row | ( | size_t | i | ) | const |
Look at an indexed row of data.
TODO: Likely to replace this with iterators that can be generated for splits. Or, rather, objects of type DataSplit.
void Dataset::toStream | ( | ostream & | ost | ) | const |
Dump data to a stream.