DEVELOPMENT... OpenML
Data
rainfall_bangladesh

rainfall_bangladesh

in_preparation ARFF Publicly available Visibility: public Uploaded 09-11-2018 by
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Mankind have been attempting to predict the weather from prehistory. For good reason for knowing when to plant crops, when to build and when to prepare for drought and flood. In a nation such as Bangladesh being able to predict the weather, especially rainfall has never been so vitally important. The proposed research work pursues to produce prediction model on rainfall using the machine learning algorithms. The base data for this work has been collected from Bangladesh Meteorological Department. It is mainly focused on the development of models for long term rainfall prediction of Bangladesh divisions and districts (Weather Stations). Rainfall prediction is very important for the Bangladesh economy and day to day life. Scarcity or heavy - both rainfall effects rural and urban life to a great extent with the changing pattern of the climate. Unusual rainfall and long lasting rainy season is a great factor to take account into. We want to see whether too much unusual behavior is taking place another pattern resulting new clamatorial description. As agriculture is dependent on rain and heavy rainfall caused flood frequently leading to great loss to crops, rainfall is a very complex phenomenon which is dependent on various atmospheric, oceanic and geographical parameters. The relationship between these parameters and rainfall is unstable. Beside this changing behavior of clamatorial facts making the existing meteorological forecasting less usable to the users. Initially linear regression models were developed for monthly rainfall prediction of station and national level as per day month year. Here humidity, temperatures & wind parameters are used as predictors. The study is further extended by developing another popular regression analysis algorithm named Random Forest Regression. After then, few other classification algorithms have been used for model building, training and prediction. Those are Naive Bayes Classification, Decision Tree Classification (Entropy and Gini) and Random Forest Classification. In all model building and training predictor parameters were Station, Year, Month and Day. As the effect of rainfall affecting parameters is embedded in rainfall, rainfall was the label or dependent variable in these models. The developed and trained model is capable of predicting rainfall in advance for a month of a given year for a given area (for area we used here are the stations (weather parameters values are measured by Bangladesh Meteorological Department). The accuracy of rainfall estimation is above 65%. Accuracy percentage varies from algorithm to algorithm. Two regression analysis and three classification analysis models has been developed for rainfall prediction of 33 Bangladeshi weather station. Apache Spark library has been used for machine library in Scala programming language. The main idea behind the use of classification and regression analysis is to see the comparative difference between types of algorithms prediction output and the predictability along with usability. This thesis is a contribution to the effort of rainfall prediction within Bangladesh. It takes the strategy of applying machine learning models to historical weather data gathered in Bangladesh. As part of this work, a web-based software application was written using Apache Spark, Scala and HighCharts to demonstrate rainfall prediction using multiple machine learning models. Models are successively improved with the rainfall prediction accuracy.

4 features

Rainfall (target)numeric1128 unique values
0 missing
Yearnumeric47 unique values
0 missing
Stationnominal33 unique values
0 missing
Monthstring12 unique values
0 missing

62 properties

16755
Number of instances (rows) of the dataset.
4
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
2
Number of numeric attributes.
1
Number of nominal attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
50
Percentage of numeric attributes.
25
Percentage of nominal attributes.
First quartile of entropy among attributes.
-1.13
First quartile of kurtosis among attributes of the numeric type.
202.1
First quartile of means among attributes of the numeric type.
0
Standard deviation of the number of distinct values among attributes of the nominal type.
-0.11
First quartile of skewness among attributes of the numeric type.
13.16
First quartile of standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
1.97
Second quartile (Median) of kurtosis among attributes of the numeric type.
1098.29
Second quartile (Median) of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.88
Second quartile (Median) of skewness among attributes of the numeric type.
131.53
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
5.06
Third quartile of kurtosis among attributes of the numeric type.
1994.48
Third quartile of means among attributes of the numeric type.
Third quartile of mutual information between the nominal attributes and the target attribute.
1.87
Third quartile of skewness among attributes of the numeric type.
249.9
Third quartile of standard deviation of attributes of the numeric type.
-136.51
Average class difference between consecutive instances.
1098.29
Mean of means among attributes of the numeric type.
Entropy of the target attribute values.
0
Number of attributes divided by the number of instances.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Maximum entropy among attributes.
5.06
Maximum kurtosis among attributes of the numeric type.
1994.48
Maximum of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
33
The maximum number of distinct values among attributes of the nominal type.
1.87
Maximum skewness among attributes of the numeric type.
249.9
Maximum standard deviation of attributes of the numeric type.
Average entropy of the attributes.
1.97
Mean kurtosis among attributes of the numeric type.
0
Number of binary attributes.
Average mutual information between the nominal attributes and the target attribute.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
33
Average number of distinct values among the attributes of the nominal type.
0.88
Mean skewness among attributes of the numeric type.
131.53
Mean standard deviation of attributes of the numeric type.
Minimal entropy among attributes.
-1.13
Minimum kurtosis among attributes of the numeric type.
202.1
Minimum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
33
The minimal number of distinct values among attributes of the nominal type.
-0.11
Minimum skewness among attributes of the numeric type.
13.16
Minimum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.

10 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: kendallTau, r-squared - target_feature: Rainfall
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task