This section shows the standard classification data sets avalaible in the repository. Every one defines a supervised classification problem, where each of its examples is composed by some nominal or numerical attributes and a nominal output attribute (its class).
Each data file has the following structure:
- @relation: Name of the data set
- @attribute: Description of an attribute (one for each attribute)
- @inputs: List with the names of the input attributes
- @output: Name of the output attribute
- @data: Starting tag of the data
The rest of the file contains all the examples belonging to the data set, expressed in comma sepparated values format.
Below you can find all the Standard Classification data sets available. For each data set, it is shown its name and its number of instances, attributes (the table details the number of Real/Integer/Nominal attributes in the data) and classes (number of possible values of the output variable). In addition, the table shows if the corresponding data set has missing values or not (for data sets with missing values the table shows the number of instances without missing values, and the total number of instances between brackets).
The table allows to download each data set in KEEL format (inside a ZIP file). Additionally, it is possible to obtain the data set already partitioned, by means of a 10-folds / 5-folds stratified cross validation (SCV) procedure. The partitions using a 10-folds / 5-folds distribution optimally balanced stratified cross-validation (DOB-SCV) are also available (except for those datasets with a very high number of examples, since this partitioning scheme requires a considerable computation). The latter validation procedure was proposed in:
J.G. Moreno-Torres, J.A. Sáez, F. Herrera, Study on the impact of partition-induced dataset shift on k-fold cross-validation, IEEE Transactions on Neural Networks and Learning Systems 23 (8) (2012) 1304-1313. |
|
For data sets with missing values, only the
cleaned version (where instances with missing values are not included) is provided. A
complete version including instances with missing values can be found in the description page of each data set or in the
missing values section of KEEL-dataset. Finally, we provide a header file to give additional information about each data set and its attributes.
By clicking in the column headers, you can order the table by names (alphabetically), by the number of examples, attributes or classes, or by the presence of missing values. Clicking again will sort the rows in reverse order.
Collecting Data Sets
If you have some example data sets and you would like to share
them with the rest of the research community by means of this page, please be so
kind as to send your data to the Webmaster Team with the following information:
- People answerable for the data (full name, affiliation, e-mail, web page,
...).
- training and test data sets considered, preferably in ASCII format.
- A brief description of the application.
- References where it is used.
- Results obtained by the methods proposed by the authors or used for comparison.
- Type of experiment developed.
- Any additional useful information.
Collecting Results
If you have applied your methods to some of the problems
presented here we will be glad of showing your results in this page. Please be so kind as to send the following information to Webmaster Team:
- Name of the application considered and type of experiment developed.
- Results obtained by the methods proposed by the authors or used for comparison.
- References where the results are shown.
- Any additional useful information.
Contact Us
If you are interested on being informed of each update made in
this page or you would like to comment on it, please contact with the Webmaster Team.