DEVELOPMENT... OpenML
Data
gas-drift

gas-drift

active ARFF Publicly available Visibility: public Uploaded 22-05-2015 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • OpenML100 study_123 study_135 study_14 time_series study_225
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Alexander Vergara Source: [UCI](https://archive.ics.uci.edu/ml/datasets/gas+sensor+array+drift+dataset) - 2012 Please cite: Alexander Vergara, Shankar Vembu, Tuba Ayhan, Margaret A. Ryan, Margie L. Homer, Ramón Huerta. Chemical gas sensor drift compensation using classifier ensembles, Sensors and Actuators B: Chemical (2012) doi: 10.1016/j.snb.2012.01.074. ### Description Gas Sensor Array Drift Dataset Data Set ### Sources ``` (a) Creators: Alexander Vergara (vergara '@' ucsd.edu) BioCircutis Institute University of California San Diego San Diego, California, USA (b) Donors: Alexander Vergara (vergara '@' ucsd.edu) Ramon Huerta (rhuerta '@' ucsd.edu) ``` ### Dataset Information This archive contains 13910 measurements from 16 chemical sensors utilized in simulations for drift compensation in a discrimination task of 6 gases at various levels of concentrations. The goal is to achieve good performance (or as low degradation as possible) over time, as reported in the paper mentioned below in Section 2: Data collection. The primary purpose of providing this dataset is to make it freely accessible online to the chemo-sensor research community and artificial intelligence to develop strategies to cope with sensor/concept drift. The dataset can be used exclusively for research purposes. Commercial purposes are fully excluded. The dataset was gathered within January 2007 to February 2011 (36 months) in a gas delivery platform facility situated at the ChemoSignals Laboratory in the BioCircuits Institute, University of California San Diego. Being completely operated by a fully computerized environment controlled by a LabVIEW's National Instruments software on a PC fitted with the appropriate serial data acquisition boards. The measurement system platform provides versatility for obtaining the desired concentrations of the chemical substances of interest with high accuracy and in a highly reproducible manner, minimizing thereby the common mistakes caused by human intervention and making it possible to exclusively concentrate on the chemical sensors for compensating real drift. The resulting dataset comprises recordings from six distinct pure gaseous substances, namely Ammonia, Acetaldehyde, Acetone, Ethylene, Ethanol, and Toluene, each dosed at a wide variety of concentration values ranging from 5 to 1000 ppmv. An extension of this dataset with the concentration values is available at [Gas Sensor Array Drift Dataset at Different Concentrations Data Set](http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations). ### Attribute Information The response of the said sensors is read-out in the form of the resistance across the active layer of each sensor. Hence each measurement produced a 16-channel time series, each of which represented by an aggregate of features reflecting all the dynamic processes occurring at the sensor surface in reaction to the chemical substance being evaluated. In particular, two distinct types of features were considered in the creation of this dataset: (i) The so-called steady-state feature (ΔR), defined as the difference of the maximal resistance change and the baseline and its normalized version expressed by the ratio of the maximal resistance and the baseline values when the chemical vapor is present in the test chamber; and (ii) an aggregate of features reflecting the sensor dynamics of the increasing/decaying transient portion of the sensor response during the entire measurement procedure under controlled conditions, namely the exponential moving average (emaα). These aggregate of features is a transform, borrowed from the field of econometrics originally introduced to the chemo-sensing community by Muezzinoglu et al. (2009), that converts the said transient portion into a real scalar, by estimating the maximum value —minimum for the decaying portion of the sensor response— of its exponential moving average (emaα), with an initial condition set to zero and a scalar smoothing parameter of the operator, α, that defines both the quality of the feature and the time of its occurrence along the time series the scalar, set to range between 0 and 1. In particular, three different values for α were set to obtain three different feature values from the pre-recorded rising portion of the sensor response and three additional features with the same α values but for the decaying portion of the sensor response, covering thus the entire sensor response dynamics. For a more detailed analysis and discussion on these features as well as a graphical illustration of them please refer to Section 2.3 and Figure 2, respectively of the annotated manuscript. Once the abovementioned features are calculated, one is to form a feature vector containing the 8 features extracted from each particular sensor multiplied by the 16 sensors considered here. In the end, there is a resulting 128-dimensional feature vector containing all the features indicated above. There are six possible classes: ``` 1: Ethanol 2: Ethylene 3: Ammonia 4: Acetaldehyde 5: Acetone 6: Toluene ``` ### Relevant Papers Alexander Vergara, Shankar Vembu, Tuba Ayhan, Margaret A. Ryan, Margie L. Homer and Ramón Huerta, Chemical gas sensor drift compensation using classifier ensembles, Sensors and Actuators B: Chemical (2012) doi: 10.1016/j.snb.2012.01.074.

129 features

Class (target)nominal6 unique values
0 missing
V1numeric13904 unique values
0 missing
V2numeric13890 unique values
0 missing
V3numeric13904 unique values
0 missing
V4numeric13905 unique values
0 missing
V5numeric13904 unique values
0 missing
V6numeric13897 unique values
0 missing
V7numeric13895 unique values
0 missing
V8numeric13907 unique values
0 missing
V9numeric13897 unique values
0 missing
V10numeric13888 unique values
0 missing
V11numeric13905 unique values
0 missing
V12numeric13909 unique values
0 missing
V13numeric13906 unique values
0 missing
V14numeric13906 unique values
0 missing
V15numeric13902 unique values
0 missing
V16numeric13908 unique values
0 missing
V17numeric13910 unique values
0 missing
V18numeric13892 unique values
0 missing
V19numeric13896 unique values
0 missing
V20numeric13903 unique values
0 missing
V21numeric13909 unique values
0 missing
V22numeric13883 unique values
0 missing
V23numeric13903 unique values
0 missing
V24numeric13899 unique values
0 missing
V25numeric13896 unique values
0 missing
V26numeric13885 unique values
0 missing
V27numeric13891 unique values
0 missing
V28numeric13892 unique values
0 missing
V29numeric13893 unique values
0 missing
V30numeric13872 unique values
0 missing
V31numeric13886 unique values
0 missing
V32numeric13891 unique values
0 missing
V33numeric13904 unique values
0 missing
V34numeric13874 unique values
0 missing
V35numeric13855 unique values
0 missing
V36numeric13894 unique values
0 missing
V37numeric13886 unique values
0 missing
V38numeric13835 unique values
0 missing
V39numeric13869 unique values
0 missing
V40numeric13891 unique values
0 missing
V41numeric13908 unique values
0 missing
V42numeric13877 unique values
0 missing
V43numeric13864 unique values
0 missing
V44numeric13891 unique values
0 missing
V45numeric13894 unique values
0 missing
V46numeric13820 unique values
0 missing
V47numeric13859 unique values
0 missing
V48numeric13882 unique values
0 missing
V49numeric13908 unique values
0 missing
V50numeric13898 unique values
0 missing
V51numeric13906 unique values
0 missing
V52numeric13908 unique values
0 missing
V53numeric13907 unique values
0 missing
V54numeric13893 unique values
0 missing
V55numeric13903 unique values
0 missing
V56numeric13903 unique values
0 missing
V57numeric13909 unique values
0 missing
V58numeric13897 unique values
0 missing
V59numeric13900 unique values
0 missing
V60numeric13905 unique values
0 missing
V61numeric13906 unique values
0 missing
V62numeric13902 unique values
0 missing
V63numeric13901 unique values
0 missing
V64numeric13904 unique values
0 missing
V65numeric13899 unique values
0 missing
V66numeric13889 unique values
0 missing
V67numeric13902 unique values
0 missing
V68numeric13906 unique values
0 missing
V69numeric13907 unique values
0 missing
V70numeric13891 unique values
0 missing
V71numeric13907 unique values
0 missing
V72numeric13906 unique values
0 missing
V73numeric13904 unique values
0 missing
V74numeric13887 unique values
0 missing
V75numeric13904 unique values
0 missing
V76numeric13903 unique values
0 missing
V77numeric13905 unique values
0 missing
V78numeric13897 unique values
0 missing
V79numeric13898 unique values
0 missing
V80numeric13900 unique values
0 missing
V81numeric13908 unique values
0 missing
V82numeric13888 unique values
0 missing
V83numeric13906 unique values
0 missing
V84numeric13906 unique values
0 missing
V85numeric13905 unique values
0 missing
V86numeric13892 unique values
0 missing
V87numeric13899 unique values
0 missing
V88numeric13903 unique values
0 missing
V89numeric13908 unique values
0 missing
V90numeric13900 unique values
0 missing
V91numeric13903 unique values
0 missing
V92numeric13905 unique values
0 missing
V93numeric13903 unique values
0 missing
V94numeric13886 unique values
0 missing
V95numeric13896 unique values
0 missing
V96numeric13902 unique values
0 missing
V97numeric13902 unique values
0 missing
V98numeric13882 unique values
0 missing
V99numeric13872 unique values
0 missing
V100numeric13905 unique values
0 missing
V101numeric13902 unique values
0 missing
V102numeric13854 unique values
0 missing
V103numeric13882 unique values
0 missing
V104numeric13895 unique values
0 missing
V105numeric13910 unique values
0 missing
V106numeric13885 unique values
0 missing
V107numeric13876 unique values
0 missing
V108numeric13894 unique values
0 missing
V109numeric13895 unique values
0 missing
V110numeric13850 unique values
0 missing
V111numeric13875 unique values
0 missing
V112numeric13875 unique values
0 missing
V113numeric13905 unique values
0 missing
V114numeric13898 unique values
0 missing
V115numeric13903 unique values
0 missing
V116numeric13908 unique values
0 missing
V117numeric13906 unique values
0 missing
V118numeric13898 unique values
0 missing
V119numeric13903 unique values
0 missing
V120numeric13907 unique values
0 missing
V121numeric13909 unique values
0 missing
V122numeric13898 unique values
0 missing
V123numeric13903 unique values
0 missing
V124numeric13907 unique values
0 missing
V125numeric13903 unique values
0 missing
V126numeric13898 unique values
0 missing
V127numeric13905 unique values
0 missing
V128numeric13907 unique values
0 missing

107 properties

13910
Number of instances (rows) of the dataset.
129
Number of attributes (columns) of the dataset.
6
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
128
Number of numeric attributes.
1
Number of nominal attributes.
0.59
Average class difference between consecutive instances.
0.97
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.05
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.94
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.97
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.05
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.94
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.97
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.05
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.94
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
2.55
Entropy of the target attribute values.
0.71
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
0.61
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
0.22
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.01
Number of attributes divided by the number of instances.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.03
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.96
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.03
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.96
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.03
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.96
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
21.63
Percentage of instances belonging to the most frequent class.
3009
Number of instances belonging to the most frequent class.
Maximum entropy among attributes.
13909.09
Maximum kurtosis among attributes of the numeric type.
57340.1
Maximum of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
6
The maximum number of distinct values among attributes of the nominal type.
117.93
Maximum skewness among attributes of the numeric type.
69844.79
Maximum standard deviation of attributes of the numeric type.
Average entropy of the attributes.
1037.15
Mean kurtosis among attributes of the numeric type.
2791.46
Mean of means among attributes of the numeric type.
Average mutual information between the nominal attributes and the target attribute.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
6
Average number of distinct values among the attributes of the nominal type.
4.62
Mean skewness among attributes of the numeric type.
2729.31
Mean standard deviation of attributes of the numeric type.
Minimal entropy among attributes.
-0.07
Minimum kurtosis among attributes of the numeric type.
-72.75
Minimum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
6
The minimal number of distinct values among attributes of the nominal type.
-87.65
Minimum skewness among attributes of the numeric type.
0.53
Minimum standard deviation of attributes of the numeric type.
11.8
Percentage of instances belonging to the least frequent class.
1641
Number of instances belonging to the least frequent class.
0.84
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.43
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.49
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
99.22
Percentage of numeric attributes.
0.78
Percentage of nominal attributes.
First quartile of entropy among attributes.
4.23
First quartile of kurtosis among attributes of the numeric type.
-4.77
First quartile of means among attributes of the numeric type.
First quartile of mutual information between the nominal attributes and the target attribute.
-2.29
First quartile of skewness among attributes of the numeric type.
4.36
First quartile of standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
10.32
Second quartile (Median) of kurtosis among attributes of the numeric type.
5.37
Second quartile (Median) of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
1.3
Second quartile (Median) of skewness among attributes of the numeric type.
9.63
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
80.89
Third quartile of kurtosis among attributes of the numeric type.
15.19
Third quartile of means among attributes of the numeric type.
Third quartile of mutual information between the nominal attributes and the target attribute.
2.54
Third quartile of skewness among attributes of the numeric type.
24.79
Third quartile of standard deviation of attributes of the numeric type.
0.99
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.04
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.95
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.99
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.04
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.95
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.99
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.04
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.95
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.04
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.95
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.04
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.95
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.04
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.95
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0
Standard deviation of the number of distinct values among attributes of the nominal type.
1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.01
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
0.99
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

44 tasks

10683 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
31 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Class
1 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: Class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - target_feature: Class
0 runs - estimation_procedure: Test on Training Data - target_feature: Class
0 runs - estimation_procedure: 33% Holdout set - target_feature: Class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature:
0 runs - target_feature: Class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
1305 runs - target_feature: Class
1303 runs - target_feature: Class
1302 runs - target_feature: Class
1301 runs - target_feature: Class
1300 runs - target_feature: Class
1296 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
0 runs - target_feature: Class
Define a new task