Syllabus : Types of data mining problems. The process of data mining. Statistical evaluation of big data: statistical prediction, performance measures, pitfalls in data-mining evaluation. Data preparation: data models, data transformations, handling of missing data, time-dependent data, textual data. Data reduction: feature selection, principal components, smoothing data, case subsampling. Predictive modeling: mathematical models, linear models, neural nets, advanced statistical models, distance solutions, logic solutions, decision trees, decision rules, model combination. Solution analyses: graphical trend analyses, comparison of methods. Case studies. Future trends: text mining, visualization, distributed data. Practical sessions using open-source software. |