DEVELOPMENT... OpenML
Data
QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL233

QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL233

deactivated ARFF Publicly available Visibility: public Uploaded 14-07-2016 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL233 (TID: 129), and it has 4089 rows and 213 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent Molecular Descriptors which were generated from SMILES strings. Missing value imputation was applied to this dataset (By choosing the Median). Feature selection was also applied.

215 features

pXC50 (target)numeric1631 unique values
0 missing
molecule_id (row identifier)nominal4089 unique values
0 missing
D.Dtr12numeric563 unique values
0 missing
nR12numeric4 unique values
0 missing
Eig01_EA.ed.numeric1297 unique values
0 missing
SM10_AEA.dm.numeric1297 unique values
0 missing
SpMax_EA.ed.numeric1297 unique values
0 missing
SpDiam_EA.ed.numeric1567 unique values
0 missing
Eig01_AEA.ed.numeric880 unique values
0 missing
SpMax_AEA.ed.numeric880 unique values
0 missing
Eig01_AEA.dm.numeric886 unique values
0 missing
SpMax_AEA.dm.numeric886 unique values
0 missing
SpDiam_AEA.dm.numeric885 unique values
0 missing
SpDiam_EA.ri.numeric864 unique values
0 missing
Eig01_EA.ri.numeric857 unique values
0 missing
SpMax_EA.ri.numeric857 unique values
0 missing
Eig01_AEA.ri.numeric831 unique values
0 missing
SpMax_AEA.ri.numeric831 unique values
0 missing
Eig01_EAnumeric779 unique values
0 missing
SM09_AEA.bo.numeric779 unique values
0 missing
SpDiam_EAnumeric779 unique values
0 missing
SpMax_EAnumeric779 unique values
0 missing
D.Dtr03numeric310 unique values
0 missing
nR03numeric3 unique values
0 missing
SRW03numeric3 unique values
0 missing
SM11_EA.ed.numeric1912 unique values
0 missing
Eig01_EA.bo.numeric802 unique values
0 missing
SM11_AEA.ri.numeric802 unique values
0 missing
SpMax_EA.bo.numeric802 unique values
0 missing
SM12_EA.ed.numeric1920 unique values
0 missing
SM15_EA.ed.numeric1856 unique values
0 missing
SM14_EA.ed.numeric1865 unique values
0 missing
SpMin1_Bh.s.numeric532 unique values
0 missing
SM10_EA.ed.numeric1961 unique values
0 missing
SM09_EA.ed.numeric1950 unique values
0 missing
SpDiam_EA.bo.numeric822 unique values
0 missing
SM13_EA.ed.numeric1897 unique values
0 missing
SpMin1_Bh.m.numeric263 unique values
0 missing
Eig01_AEA.bo.numeric782 unique values
0 missing
SpMax_AEA.bo.numeric782 unique values
0 missing
SM09_EA.dm.numeric533 unique values
0 missing
SpMin1_Bh.i.numeric259 unique values
0 missing
SpMin1_Bh.e.numeric248 unique values
0 missing
SpDiam_AEA.ed.numeric1282 unique values
0 missing
SM08_EA.ed.numeric1946 unique values
0 missing
SpDiam_AEA.bo.numeric850 unique values
0 missing
SM10_EA.ri.numeric2042 unique values
0 missing
SM15_EA.ri.numeric2256 unique values
0 missing
SM08_EA.ri.numeric1882 unique values
0 missing
SM14_EA.ri.numeric2262 unique values
0 missing
ATSC5vnumeric3504 unique values
0 missing
SM09_EA.ri.numeric1981 unique values
0 missing
SpMin1_Bh.v.numeric234 unique values
0 missing
Eig02_AEA.ed.numeric1014 unique values
0 missing
SM13_EA.ri.numeric2202 unique values
0 missing
SM15_EAnumeric1948 unique values
0 missing
SM12_EA.ri.numeric2137 unique values
0 missing
ATSC4vnumeric3422 unique values
0 missing
SM03_EA.ri.numeric1155 unique values
0 missing
SM11_EA.ri.numeric2083 unique values
0 missing
nR08numeric6 unique values
0 missing
SM14_EAnumeric1945 unique values
0 missing
nCsnumeric36 unique values
0 missing
ATSC4mnumeric3563 unique values
0 missing
SM10_EA.dm.numeric896 unique values
0 missing
SpDiam_AEA.ri.numeric975 unique values
0 missing
ATSC4pnumeric3346 unique values
0 missing
SM15_AEA.ed.numeric1890 unique values
0 missing
SpMin5_Bh.s.numeric764 unique values
0 missing
ATS5inumeric1380 unique values
0 missing
ATSC3vnumeric3292 unique values
0 missing
ATSC6pnumeric3400 unique values
0 missing
SM15_EA.dm.numeric478 unique values
0 missing
ATSC1vnumeric2539 unique values
0 missing
ATS4inumeric1288 unique values
0 missing
SM14_EA.dm.numeric806 unique values
0 missing
ATSC5pnumeric3429 unique values
0 missing
SM06_EA.ri.numeric1619 unique values
0 missing
SM07_EA.ri.numeric1817 unique values
0 missing
Sinumeric2478 unique values
0 missing
CIC0numeric1281 unique values
0 missing
ATS4pnumeric1302 unique values
0 missing
ATS4enumeric1297 unique values
0 missing
SM09_AEA.ed.numeric1714 unique values
0 missing
ATS5enumeric1384 unique values
0 missing
SpMin6_Bh.s.numeric805 unique values
0 missing
SM07_EA.ed.numeric1934 unique values
0 missing
SM13_EAnumeric1904 unique values
0 missing
nCrsnumeric22 unique values
0 missing
SM11_EAnumeric1836 unique values
0 missing
SM13_AEA.ed.numeric1873 unique values
0 missing
X5vnumeric2717 unique values
0 missing
ATSC3mnumeric3409 unique values
0 missing
ATS2pnumeric1070 unique values
0 missing
SM07_EA.dm.numeric555 unique values
0 missing
C.002numeric28 unique values
0 missing
X1Anumeric93 unique values
0 missing
SM14_AEA.ed.numeric1875 unique values
0 missing
ATSC3pnumeric3185 unique values
0 missing
SM12_EAnumeric1890 unique values
0 missing
ATSC4inumeric1897 unique values
0 missing
ATS5vnumeric1386 unique values
0 missing
ATSC1pnumeric2408 unique values
0 missing
SM08_AEA.ed.numeric1681 unique values
0 missing
SM06_AEA.ed.numeric1447 unique values
0 missing
SM09_EAnumeric1715 unique values
0 missing
Eta_C_Anumeric901 unique values
0 missing
SM07_AEA.ed.numeric1554 unique values
0 missing
SM03_EA.ed.numeric946 unique values
0 missing
MWC10numeric1479 unique values
0 missing
Eig02_EA.ed.numeric1585 unique values
0 missing
SM11_AEA.dm.numeric1585 unique values
0 missing
SM04_EA.ed.numeric1626 unique values
0 missing
SM12_AEA.ed.numeric1865 unique values
0 missing
SM11_AEA.ed.numeric1848 unique values
0 missing
SM08_EAnumeric1624 unique values
0 missing
ATSC2vnumeric2988 unique values
0 missing
SM10_EAnumeric1765 unique values
0 missing
SM05_EA.ri.numeric1558 unique values
0 missing
SM07_EAnumeric1182 unique values
0 missing
SM10_AEA.ed.numeric1803 unique values
0 missing
TWCnumeric1432 unique values
0 missing
ATSC6inumeric2032 unique values
0 missing
ATSC5inumeric2029 unique values
0 missing
SM12_EA.dm.numeric858 unique values
0 missing
SpDiam_EA.dm.numeric327 unique values
0 missing
SM05_EA.ed.numeric1798 unique values
0 missing
ATS3pnumeric1182 unique values
0 missing
SM05_EA.dm.numeric498 unique values
0 missing
Senumeric2380 unique values
0 missing
SM06_EA.ed.numeric1877 unique values
0 missing
MWC09numeric1422 unique values
0 missing
ATS4vnumeric1303 unique values
0 missing
SpMin1_Bh.p.numeric229 unique values
0 missing
ATS3vnumeric1179 unique values
0 missing
ATS2vnumeric1077 unique values
0 missing
ATS3inumeric1198 unique values
0 missing
SM05_AEA.ed.numeric1406 unique values
0 missing
SM11_EA.dm.numeric510 unique values
0 missing
SRW10numeric1297 unique values
0 missing
SM06_EAnumeric1375 unique values
0 missing
Eig10_EA.ri.numeric1314 unique values
0 missing
TRSnumeric58 unique values
0 missing
ATS7inumeric1541 unique values
0 missing
ATS3enumeric1196 unique values
0 missing
SM06_EA.dm.numeric1073 unique values
0 missing
MPC07numeric490 unique values
0 missing
SpMin4_Bh.s.numeric649 unique values
0 missing
ATSC6vnumeric3531 unique values
0 missing
P_VSA_MR_2numeric635 unique values
0 missing
SpMax1_Bh.v.numeric333 unique values
0 missing
ATS6inumeric1457 unique values
0 missing
nCrqnumeric5 unique values
0 missing
SM04_EA.ri.numeric1327 unique values
0 missing
ATS5pnumeric1374 unique values
0 missing
MPC08numeric584 unique values
0 missing
Qindexnumeric66 unique values
0 missing
SpMin3_Bh.s.numeric560 unique values
0 missing
ATSC2pnumeric2828 unique values
0 missing
ATS6pnumeric1443 unique values
0 missing
MPC06numeric383 unique values
0 missing
SpMAD_EA.ri.numeric275 unique values
0 missing
MPC09numeric665 unique values
0 missing
MWC08numeric1369 unique values
0 missing
SM05_EAnumeric248 unique values
0 missing
Hynumeric921 unique values
0 missing
ATS2enumeric1100 unique values
0 missing
MPC05numeric301 unique values
0 missing
SM08_EA.dm.numeric1026 unique values
0 missing
GGI2numeric129 unique values
0 missing
ATSC3inumeric1667 unique values
0 missing
SM13_EA.dm.numeric489 unique values
0 missing
SRW09numeric178 unique values
0 missing
ATSC5mnumeric3601 unique values
0 missing
ATS7enumeric1557 unique values
0 missing
MWC07numeric1322 unique values
0 missing
X1MulPernumeric2861 unique values
0 missing
SRW06numeric817 unique values
0 missing
ATS6enumeric1453 unique values
0 missing
SpMax3_Bh.i.numeric483 unique values
0 missing
SRW05numeric12 unique values
0 missing
C.003numeric8 unique values
0 missing
nCICnumeric14 unique values
0 missing
nCsp3numeric48 unique values
0 missing
SpMin2_Bh.s.numeric520 unique values
0 missing
Eig04_AEA.ed.numeric1183 unique values
0 missing
CATS2D_03_LLnumeric50 unique values
0 missing
SpMin8_Bh.s.numeric869 unique values
0 missing
GGI4numeric1436 unique values
0 missing
P_VSA_e_1numeric92 unique values
0 missing
P_VSA_m_1numeric91 unique values
0 missing
P_VSA_v_1numeric91 unique values
0 missing
nR10numeric7 unique values
0 missing
X1Kupnumeric2895 unique values
0 missing
SM02_EA.ed.numeric789 unique values
0 missing
nCtnumeric9 unique values
0 missing
SM02_EA.ri.numeric1121 unique values
0 missing
SM02_AEA.ed.numeric382 unique values
0 missing
SpMax1_Bh.p.numeric336 unique values
0 missing
H.052numeric34 unique values
0 missing
P_VSA_p_1numeric170 unique values
0 missing
SM04_AEA.ed.numeric1282 unique values
0 missing
X1Pernumeric2806 unique values
0 missing
X4vnumeric2816 unique values
0 missing
Eig02_EA.ri.numeric838 unique values
0 missing
X3vnumeric2922 unique values
0 missing
BIC0numeric199 unique values
0 missing
TPCnumeric1729 unique values
0 missing
Eig02_EAnumeric816 unique values
0 missing
SM10_AEA.bo.numeric816 unique values
0 missing
RFDnumeric124 unique values
0 missing
nBTnumeric165 unique values
0 missing
MPC10numeric725 unique values
0 missing
SpMax1_Bh.e.numeric311 unique values
0 missing
P_VSA_LogP_7numeric213 unique values
0 missing

62 properties

4089
Number of instances (rows) of the dataset.
215
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
214
Number of numeric attributes.
1
Number of nominal attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
99.53
Percentage of numeric attributes.
0.47
Percentage of nominal attributes.
First quartile of entropy among attributes.
-0.31
First quartile of kurtosis among attributes of the numeric type.
4.1
First quartile of means among attributes of the numeric type.
Standard deviation of the number of distinct values among attributes of the nominal type.
0.08
First quartile of skewness among attributes of the numeric type.
0.35
First quartile of standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
1.11
Second quartile (Median) of kurtosis among attributes of the numeric type.
5.68
Second quartile (Median) of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.24
Second quartile (Median) of skewness among attributes of the numeric type.
0.77
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
3.93
Third quartile of kurtosis among attributes of the numeric type.
13.72
Third quartile of means among attributes of the numeric type.
Third quartile of mutual information between the nominal attributes and the target attribute.
1.25
Third quartile of skewness among attributes of the numeric type.
2.61
Third quartile of standard deviation of attributes of the numeric type.
-0.11
Average class difference between consecutive instances.
18.51
Mean of means among attributes of the numeric type.
Entropy of the target attribute values.
0.05
Number of attributes divided by the number of instances.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Maximum entropy among attributes.
213.33
Maximum kurtosis among attributes of the numeric type.
345.67
Maximum of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
The maximum number of distinct values among attributes of the nominal type.
12.18
Maximum skewness among attributes of the numeric type.
121.64
Maximum standard deviation of attributes of the numeric type.
Average entropy of the attributes.
4.42
Mean kurtosis among attributes of the numeric type.
0
Number of binary attributes.
Average mutual information between the nominal attributes and the target attribute.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
Average number of distinct values among the attributes of the nominal type.
0.75
Mean skewness among attributes of the numeric type.
5.77
Mean standard deviation of attributes of the numeric type.
Minimal entropy among attributes.
-1.55
Minimum kurtosis among attributes of the numeric type.
0.12
Minimum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
-2.17
Minimum skewness among attributes of the numeric type.
0.01
Minimum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.

12 tasks

2 runs - estimation_procedure: Custom 10-fold Crossvalidation - target_feature: pXC50
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task