Data instance

1.What’s an attribute? What’s a data instance?

2.What’s noise? How can noise be reduced in a dataset?

3.Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4.Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5.Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6.Give 2 examples in which aggregation is useful.

7.Given a sample dataset, apply aggregation of data values.

8.What’s sampling?

9.What’s simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10.What’s stratified sampling?

Data instance

This question has been answered.