Data Analytics: Identify an important business problem, find one or more relevant
datasets, generate insightful visualizations of the data, fit a range of models to the data to
produce your best predictions/forecasts, and make and justify recommendations to a
decision maker related to this problem.(Tableau & R required)
Section 1: The Problem (10%)
• Discuss the problem you are addressing.
• What are the questions and business/management decisions your analysis is trying to address?
• Describe your problem’s decision maker and what is important for them to know from your data analysis?
• Discuss the source of your data. Questions to consider include:
– Where did you find this data?
– How reliable or uncertain is this data?
– How old is the data?
– Is the data recorded at given dates or times?
• Identify and justify your choice of target attribute(s) and explain how this/these should be derived, if not
already available.
Section 2: Understand the Data (30%)
• Discuss the nature and size of the dataset(s) you are using.
• Discuss the data attributes that are relevant to your problem. Exactly what does the data represent and, if
relevant, how was it derived? How is it distributed? What type of data is it?
• Explore and discuss whether any of the data attributes you have focused on are closely correlated with other
attributes – either positively or negatively.
• Include at least 3 different types of Tableau visualisations (e.g. map, scatter plot, bar chart, pie chart, boxand-whisker plot) to support your discussions.
• Include at least 3 R-generated plots or aggregation tables to support your discussions.
• Include the R-code you used in the code appendix.
Section 3: Prepare the Data (10%)
• If required, explain how you have derived your chosen target attribute(s) in Tableau and in R.
• Discuss and justify what other steps you may have taken to prepare your data, including, where relevant:
removing attributes from consideration, adding further “derived” attributes (eg Dates), imputing “reasonable”
values for missing data,
and standardizing data values.
• Prepare suitable separate “Training”, “Validation” (if required) and “Testing” subsets of the dataset.
• Include any R-code you used to prepare your data in the code appendix.
Section 4: Generate and Test Prediction Models (40%)
• Select and justify at least 3 different prediction model types that are likely to best help with your stated
problem objectives.
• Configure your models (e.g. select attributes and/or other model parameters) that you expect will best deliver
relevant insights and/or provide the lowest error rates, justifying your decisions.
• Run these models, discussing the model outputs and drawing, where possible, insights related to your
problem.
• Prepare and discuss at least 1 ensemble model, combining two or more of your prediction models.
• Select a proper scoring rule to measure the accuracy of your models. Determine and comment on the best
generalised error rate across your 4 prediction models and of your ensemble models.
• Discuss what steps you may have taken to improve your individual models.
• Combining the results from your various analysis steps, draw conclusions about the particular problem and
questions stated at the beginning.
• What recommendations would you now make to your problem’s decision maker and why?
• Which are the most important variables for the decision maker to look at?