1.(5 points) Conduct the principal component analysis on the predictors. Draw the proportion of variance explained.
2.(5 points) Implement the following prediction methods.
•OLS or Logistic Regression
•K-nearest neighbors
•LASSO (You need to discuss how you choose the tuning parameter.)
•Regression (or Classification) Tree
•Random Forests.
3.(5 points) Compare the above prediction methods in terms of (estimated) out-of-sample performances. Explain what you have done as if the grader does not know R (or any other programming language).
4.(5 points) Which prediction method works the best? Provide some conjecture on why.
5.(5 points) Which prediction method works the worst? Provide some conjecture on why.
6.(5 points) If you can use only one variable as a predictor, which variable do you want to use? Choose it in terms of (estimated) out-of-sample performances.
7.(5 points) Using your preferred method, you predict the outcome given a value of the predictors. (You can choose any value.) You also discuss whether the predicted value seems reasonable or not.
8.(5 points) Your final paper uses at least one scatter plot of raw data. The plot should be self-contained so that the grader does not need to read the main text to understand the information in the figure.
9.(5 points) Your final paper uses at least one box (or violin) plot. The plot should be self-contained.
10.(5 points) Your final paper has at least one figure, in which you discuss whether the result of LASSO is sensitive with respect to the tuning parameter or not. It should be self-contained.
11.(5 points) Your final paper has at least one figure, in which you discuss some results of your Regression (or Classification) Tree. It should be self-contained.
12.(5 points) Your final paper uses at least one table summarizing the OLS or Logistic Regression. It should be self-contained.
Sample Solution