DEVELOPMENT... OpenML
Data
QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL248

QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL248

deactivated ARFF Publicly available Visibility: public Uploaded 15-07-2016 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL248 (TID: 235), and it has 1560 rows and 212 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent Molecular Descriptors which were generated from SMILES strings. Missing value imputation was applied to this dataset (By choosing the Median). Feature selection was also applied.

214 features

pXC50 (target)numeric706 unique values
0 missing
molecule_id (row identifier)nominal1560 unique values
0 missing
SpMax7_Bh.m.numeric769 unique values
0 missing
ZM2Pernumeric1435 unique values
0 missing
Eig01_EA.dm.numeric189 unique values
0 missing
SpMax_EA.dm.numeric189 unique values
0 missing
Eig07_AEA.bo.numeric838 unique values
0 missing
SM14_EA.dm.numeric397 unique values
0 missing
SM12_EA.ed.numeric937 unique values
0 missing
Eig09_EA.bo.numeric855 unique values
0 missing
ZM2Kupnumeric1401 unique values
0 missing
ZM2MulPernumeric1433 unique values
0 missing
Eig01_AEA.ed.numeric401 unique values
0 missing
SpMax_AEA.ed.numeric401 unique values
0 missing
SpMax8_Bh.v.numeric747 unique values
0 missing
SpMax4_Bh.m.numeric675 unique values
0 missing
Eig01_EA.ed.numeric555 unique values
0 missing
SM10_AEA.dm.numeric555 unique values
0 missing
SpMax_EA.ed.numeric555 unique values
0 missing
nSO2Nnumeric3 unique values
0 missing
C.003numeric7 unique values
0 missing
MATS3mnumeric454 unique values
0 missing
C.007numeric3 unique values
0 missing
SpMax7_Bh.v.numeric725 unique values
0 missing
Eig14_EA.bo.numeric845 unique values
0 missing
Eig01_AEA.ri.numeric456 unique values
0 missing
SpMax_AEA.ri.numeric456 unique values
0 missing
Eig08_EA.bo.numeric840 unique values
0 missing
SM12_EA.dm.numeric428 unique values
0 missing
SpMax3_Bh.m.numeric571 unique values
0 missing
GATS3mnumeric517 unique values
0 missing
Eig09_AEA.bo.numeric827 unique values
0 missing
SM06_AEA.bo.numeric842 unique values
0 missing
SM13_EA.ed.numeric924 unique values
0 missing
SpMax4_Bh.v.numeric665 unique values
0 missing
SM10_EA.dm.numeric465 unique values
0 missing
C.009numeric3 unique values
0 missing
Eig01_EAnumeric395 unique values
0 missing
SM09_AEA.bo.numeric395 unique values
0 missing
SpMax_EAnumeric395 unique values
0 missing
Eta_betanumeric229 unique values
0 missing
SM15_EA.ed.numeric910 unique values
0 missing
SpMax4_Bh.p.numeric645 unique values
0 missing
IC2numeric933 unique values
0 missing
Eig06_EA.bo.numeric818 unique values
0 missing
Eig06_EA.ri.numeric931 unique values
0 missing
Eig02_AEA.bo.numeric535 unique values
0 missing
NsssNnumeric5 unique values
0 missing
SM05_EA.dm.numeric351 unique values
0 missing
SpMaxA_EA.ri.numeric188 unique values
0 missing
GGI2numeric90 unique values
0 missing
C.013numeric4 unique values
0 missing
F.083numeric4 unique values
0 missing
nCRX3numeric4 unique values
0 missing
SM07_AEA.bo.numeric871 unique values
0 missing
nR04numeric2 unique values
0 missing
SpMax4_Bh.e.numeric635 unique values
0 missing
P_VSA_e_6numeric11 unique values
0 missing
SpDiam_EAnumeric398 unique values
0 missing
SpDiam_EA.dm.numeric241 unique values
0 missing
SpMax4_Bh.i.numeric625 unique values
0 missing
SM03_EA.dm.numeric225 unique values
0 missing
Eta_FLnumeric1250 unique values
0 missing
SpMax6_Bh.v.numeric690 unique values
0 missing
nBeta.Lactamsnumeric2 unique values
0 missing
O.058numeric15 unique values
0 missing
ATS4snumeric1021 unique values
0 missing
CATS2D_04_ALnumeric37 unique values
0 missing
MAXDNnumeric987 unique values
0 missing
Eig06_AEA.ed.numeric853 unique values
0 missing
SpMin5_Bh.m.numeric614 unique values
0 missing
Eig04_EA.bo.numeric742 unique values
0 missing
SM14_AEA.ri.numeric742 unique values
0 missing
SpMin1_Bh.s.numeric334 unique values
0 missing
NdOnumeric15 unique values
0 missing
SM11_EA.ed.numeric946 unique values
0 missing
Eig01_EA.ri.numeric438 unique values
0 missing
SpMax_EA.ri.numeric438 unique values
0 missing
P_VSA_p_1numeric211 unique values
0 missing
Eig01_AEA.dm.numeric415 unique values
0 missing
SpMax_AEA.dm.numeric415 unique values
0 missing
SpMax5_Bh.m.numeric720 unique values
0 missing
SpDiam_AEA.dm.numeric484 unique values
0 missing
Polnumeric127 unique values
0 missing
Eig05_AEA.ri.numeric846 unique values
0 missing
nXnumeric10 unique values
0 missing
C.038numeric3 unique values
0 missing
ATS8mnumeric1153 unique values
0 missing
nRCOnumeric3 unique values
0 missing
ECCnumeric644 unique values
0 missing
SM08_EA.dm.numeric524 unique values
0 missing
CATS2D_06_ALnumeric39 unique values
0 missing
CSInumeric852 unique values
0 missing
GGI8numeric640 unique values
0 missing
SM06_AEA.ed.numeric857 unique values
0 missing
GATS2mnumeric465 unique values
0 missing
SM06_EA.bo.numeric900 unique values
0 missing
ATSC4enumeric1071 unique values
0 missing
X3numeric1163 unique values
0 missing
ATSC6snumeric1483 unique values
0 missing
GATS2snumeric298 unique values
0 missing
Eta_F_Anumeric662 unique values
0 missing
SpMax6_Bh.p.numeric705 unique values
0 missing
SM03_EA.bo.numeric176 unique values
0 missing
ATS6snumeric1083 unique values
0 missing
SpDiam_EA.ed.numeric674 unique values
0 missing
Eig02_EA.ri.numeric593 unique values
0 missing
Eig07_EAnumeric825 unique values
0 missing
SM15_AEA.bo.numeric825 unique values
0 missing
ATSC6pnumeric1440 unique values
0 missing
ZM1Pernumeric1427 unique values
0 missing
Wapnumeric1154 unique values
0 missing
SpMax5_Bh.v.numeric709 unique values
0 missing
SRW09numeric100 unique values
0 missing
SM08_AEA.bo.numeric883 unique values
0 missing
Eig01_EA.bo.numeric496 unique values
0 missing
SM11_AEA.ri.numeric496 unique values
0 missing
SpMax_EA.bo.numeric496 unique values
0 missing
SpMin4_Bh.m.numeric600 unique values
0 missing
NssNHnumeric14 unique values
0 missing
ATS3snumeric1047 unique values
0 missing
ATSC1inumeric831 unique values
0 missing
ATSC4snumeric1484 unique values
0 missing
SM04_EA.bo.numeric805 unique values
0 missing
X3solnumeric1183 unique values
0 missing
CATS2D_04_AAnumeric12 unique values
0 missing
MATS1vnumeric173 unique values
0 missing
ZM1MulPernumeric1436 unique values
0 missing
GATS8mnumeric713 unique values
0 missing
Eig02_AEA.ri.numeric619 unique values
0 missing
Eig10_AEA.ed.numeric828 unique values
0 missing
nPyrrolidinesnumeric4 unique values
0 missing
ATSC7snumeric1482 unique values
0 missing
ZM1Vnumeric391 unique values
0 missing
IACnumeric1156 unique values
0 missing
TIC0numeric1156 unique values
0 missing
Minumeric90 unique values
0 missing
SpMax8_Bh.m.numeric763 unique values
0 missing
SM08_AEA.ed.numeric920 unique values
0 missing
MPC03numeric129 unique values
0 missing
SM05_AEA.ed.numeric869 unique values
0 missing
Eig05_EA.ri.numeric867 unique values
0 missing
Eig14_AEA.ed.numeric794 unique values
0 missing
H.046numeric45 unique values
0 missing
Eig07_EA.bo.numeric886 unique values
0 missing
MSDnumeric1144 unique values
0 missing
RDSQnumeric1232 unique values
0 missing
SM06_EAnumeric854 unique values
0 missing
ATSC8snumeric1472 unique values
0 missing
MATS1inumeric410 unique values
0 missing
Eig08_EA.ri.numeric932 unique values
0 missing
SM05_EAnumeric207 unique values
0 missing
ON0numeric326 unique values
0 missing
SpMax1_Bh.s.numeric94 unique values
0 missing
SM03_EA.ed.numeric616 unique values
0 missing
MWC05numeric755 unique values
0 missing
SM07_EAnumeric790 unique values
0 missing
piPC03numeric597 unique values
0 missing
X4numeric1162 unique values
0 missing
Eig01_AEA.bo.numeric437 unique values
0 missing
SpMax_AEA.bo.numeric437 unique values
0 missing
Eig05_EA.ed.numeric1004 unique values
0 missing
SM14_AEA.dm.numeric1004 unique values
0 missing
SpDiam_EA.ri.numeric458 unique values
0 missing
Eig08_AEA.bo.numeric843 unique values
0 missing
SM10_EA.ed.numeric963 unique values
0 missing
TPCnumeric848 unique values
0 missing
MWC04numeric549 unique values
0 missing
Eig02_EA.ed.numeric777 unique values
0 missing
SM11_AEA.dm.numeric777 unique values
0 missing
Eta_betaPnumeric69 unique values
0 missing
nRCOORnumeric6 unique values
0 missing
Eig08_EAnumeric826 unique values
0 missing
SM02_AEA.dm.numeric826 unique values
0 missing
Eig10_AEA.bo.numeric821 unique values
0 missing
MWC03numeric306 unique values
0 missing
ZM2numeric306 unique values
0 missing
GATS2pnumeric523 unique values
0 missing
Eig15_AEA.ed.numeric775 unique values
0 missing
SpDiam_AEA.bo.numeric533 unique values
0 missing
SpMax5_Bh.i.numeric663 unique values
0 missing
ATSC5pnumeric1432 unique values
0 missing
ATSC6mnumeric1472 unique values
0 missing
Psi_e_0numeric1366 unique values
0 missing
TIC2numeric1334 unique values
0 missing
Eig13_EA.ri.numeric934 unique values
0 missing
Eig14_EA.ri.numeric919 unique values
0 missing
Psi_i_snumeric954 unique values
0 missing
SpMax2_Bh.v.numeric450 unique values
0 missing
Eig13_AEA.ri.numeric937 unique values
0 missing
ZM2Madnumeric1416 unique values
0 missing
AECCnumeric940 unique values
0 missing
Eig14_AEA.ri.numeric922 unique values
0 missing
IDEnumeric903 unique values
0 missing
MATS8mnumeric619 unique values
0 missing
Eig13_EAnumeric783 unique values
0 missing
SM07_AEA.dm.numeric783 unique values
0 missing
SdOnumeric1426 unique values
0 missing
SpAD_AEA.bo.numeric1276 unique values
0 missing
SM09_AEA.ed.numeric927 unique values
0 missing
SpAD_AEA.ri.numeric1441 unique values
0 missing
SPInumeric1215 unique values
0 missing
nR05numeric5 unique values
0 missing
ATS4pnumeric1023 unique values
0 missing
SpAD_EA.ri.numeric1440 unique values
0 missing
Eig14_EAnumeric798 unique values
0 missing
SM08_AEA.dm.numeric798 unique values
0 missing
P_VSA_i_4numeric237 unique values
0 missing
Eta_Fnumeric1463 unique values
0 missing
SpAD_AEA.ed.numeric1223 unique values
0 missing
Chi0_EA.ri.numeric1375 unique values
0 missing
ATS1snumeric920 unique values
0 missing
HVcpxnumeric860 unique values
0 missing
P_VSA_m_2numeric1277 unique values
0 missing

62 properties

1560
Number of instances (rows) of the dataset.
214
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
213
Number of numeric attributes.
1
Number of nominal attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
99.53
Percentage of numeric attributes.
0.47
Percentage of nominal attributes.
First quartile of entropy among attributes.
-0.22
First quartile of kurtosis among attributes of the numeric type.
1.99
First quartile of means among attributes of the numeric type.
Standard deviation of the number of distinct values among attributes of the nominal type.
-0.34
First quartile of skewness among attributes of the numeric type.
0.4
First quartile of standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
0.81
Second quartile (Median) of kurtosis among attributes of the numeric type.
4.35
Second quartile (Median) of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.34
Second quartile (Median) of skewness among attributes of the numeric type.
0.74
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
15.52
Third quartile of kurtosis among attributes of the numeric type.
12.35
Third quartile of means among attributes of the numeric type.
Third quartile of mutual information between the nominal attributes and the target attribute.
2.63
Third quartile of skewness among attributes of the numeric type.
2.89
Third quartile of standard deviation of attributes of the numeric type.
-0.5
Average class difference between consecutive instances.
1238.66
Mean of means among attributes of the numeric type.
Entropy of the target attribute values.
0.14
Number of attributes divided by the number of instances.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Maximum entropy among attributes.
347.59
Maximum kurtosis among attributes of the numeric type.
254696.33
Maximum of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
The maximum number of distinct values among attributes of the nominal type.
17.94
Maximum skewness among attributes of the numeric type.
3624912.39
Maximum standard deviation of attributes of the numeric type.
Average entropy of the attributes.
16.27
Mean kurtosis among attributes of the numeric type.
0
Number of binary attributes.
Average mutual information between the nominal attributes and the target attribute.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
Average number of distinct values among the attributes of the nominal type.
1.49
Mean skewness among attributes of the numeric type.
17046.71
Mean standard deviation of attributes of the numeric type.
Minimal entropy among attributes.
-1.92
Minimum kurtosis among attributes of the numeric type.
-0.13
Minimum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
-1.36
Minimum skewness among attributes of the numeric type.
0.02
Minimum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.

12 tasks

2 runs - estimation_procedure: Custom 10-fold Crossvalidation - target_feature: pXC50
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task