Pre-requisites : CS204

Syllabus :
Types of data mining problems. The process of data mining. Statistical evaluation of big data: statistical prediction, performance measures, pitfalls in data-mining evaluation. Data preparation: data models, data transformations, handling of missing data, time-dependent data, textual data. Data reduction: feature selection, principal components, smoothing data, case subsampling. Predictive modeling: mathematical models, linear models, neural nets, advanced statistical models, distance solutions, logic solutions, decision trees, decision rules, model combination. Solution analyses: graphical trend analyses, comparison of methods. Case studies. Future trends: text mining, visualization, distributed data. Practical sessions using open-source software.

Texts :
1. S. Weiss and N. Indurkhya, Predictive Data-Mining: A Practical Guide, Morgan Kaufmann, 1998.

References :
1. S. Weiss, N. Indurkhya, T. Zhang and F. Damerau, Text Mining: Predictive Methods for Analyzing Unstructured Information, Springer, 2004.