1.What’s an attribute? What’s a data instance?
2.What’s noise? How can noise be reduced in a dataset?
3.Define outlier. Describe 2 different approaches to detect outliers in a dataset.
4.Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.
5.Given a sample dataset with missing values, apply an appropriate technique to deal with them.
6.Give 2 examples in which aggregation is useful.
7.Given a sample dataset, apply aggregation of data values.
8.What’s sampling?
9.What’s simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).
10.What’s stratified sampling?