Apply learned machine learning, deep learning, and explainable AI skills to predict and explain outcomes in
a binary classification problem in which the data is heavily weighted toward one class.
Learning Objectives
• To consolidate the problem-solving skills in designing and executing a deep learning pipeline
• To develop skills in handling unbalanced data
• Consolidate skills in programming in TensorFlow/Keras
• Perform tuning of deep learning models
• Compare regularisation strategies commonly used in deep learning
• Compare the outcomes of different classification algorithms
• Implement an explainable AI system to compare performances of different models
• Reflect on machine learning model performance
Indicative Timetable
The following table shows a breakdown of the activities for this assignment to assist you in prioritising your
time. Note that these times are representative of the average time needed to complete each task. The exact
time to complete each task is dependent on your implementation/analysis speed and personal circumstances.
Activity
Task 1 Task 2 Task 3 Task 4 Write-Up
Approximate Duration
2-3 days 5-6 Days 3-4 days 3-4 days
3 days
Note that this assignment is time-consuming and requires thoughtful planning to complete all required tasks.
There will be discussion threads available through Keats to answer your questions and help you with technical
difficulties.
A juypter notebook containing snippets of useful code has been supplied to help you get started
Project Overview:
Task: Classifying the presence of cold and flu in speech samples
Now that you have been equipped with the skills to use different machine learning and deep learning
algorithms, you will have the opportunity to practice and apply it to a practical problem.
The task you will conduct in this assignment is classifying the presence or absence of cold and flu in speech
samples. For this task, you will be using a publicly available dataset from the INTERSPEECH Computational
Paralinguistics Challenge Series. For details on this data, please see the following paper
• https://www.isca-speech.org/archive/Interspeech_2017/pdfs/0043.PDF
The data has already been partitioned into three separate groupings. There are two partitions (train and
development) for initial testing of your models and a third (test) for the blind testing their generalisability. The
distribution of speech samples in these partitions are:
Train Development
Test Σ
895 2,876 8,656 25,776 9,551 28,652
Cold 970 Not Cold 8,535 Σ 9,505
1,011 8,585 9,595
To protect the privacy of the speakers in these files, you will not be given the speech files. Instead, you will be
supplied with features already extracted from each speech sample. The feature representation is an 88-
dimensional representation of each file, known as the extended Geneva Minimalistic Acoustic Parameter Set
(eGeMAPS). Note, each speech file has been converted into one 88-dimensional vector, so while speech is a
time-series, the conversion into an eGeMAPS feature vector removes time dependencies. Therefore, there is
no benefit in using Recurrent Neural Networks in the assignment.
Your task for the assignment will be to create generalisable deep learning models using the supplied data and
TensorFlow/Keras. If you do not have, or need help installing this software on your computer, please contact
the module organiser. You will be supplied with the “Cold” and “Not-Cold” labels for the training and
development partitions. Using these, you can develop different systems for performing the 2-class classification
task.
Then, from your most suitable models, you will then generate predictions for the test set. The accuracy of the
test set predictions will be independently verified by your lecturer. You are required to verify the results of 5
different models.
Additionally, you will implement the Local Interpretable Model-agnostic Explanations (LIME) framework to help
compare the performance of your developed models on exemplar data instances.
Instructions:
Task 1: Baseline System and Data Balancing (15% of marks)
As deep neural networks contain a considerably higher number of parameters and a wider choice of
hyperparameters than other classification approaches, it is often necessary to first implement a simpler
classifier to understand the difficulty of the associated task. Use the supplied data training and development
data partitions for this task.
1. Generate a suitable baseline classification model. For this, use either a Support Vector Machine or Random
Forest Classifier. Both are implementable through SciKit-Learn.
2. Identify the most suitable metric for assessing system performance
3. Perform a suitable grid-search to find a reasonable set-up of your chosen classifier
4. Using your chosen baseline system, explore the benefits of random downsampling of the
majority class; random upsampling of the minority class and cost-sensitive training (i.e., the inclusion of class
weights)
• Hint #1: both sampling methods are implementable using the imbalance-learn toolkit
• Hint #2: you may need to find new hyperparameter values for your models to release the benefits of these
methods
5. Document your findings and observations
Task 2: Neural Network Classification (40% of marks)
The aim of this task is to identify suitable deep learning models using the supplied data training and
development data partitions and associated labels
1. Explore different approaches for implanting both Feedforward (Dense) Neural Networks and Convolutional
Neural Networks as realised through TensorFlow/Keras. This work should include observing the effect of
changing
• The number and width of hidden layers used
• Using of different activation functions
• Different optimisation techniques
• Different regularisation strategies
• Combinations of the above
2. Once you have identified reasonable network architectures, observe the effect of different data balancing
approaches with different networks.
3. Document your findings and observations
Task 3: Generating Explanations (25% of marks)
The aim of this task is to compare the predictions made by your baseline and strongest deep learning based
system using the LIME toolkit.
1. Identify your most robust baseline and deep learning models
2. Document your reasons for choosing these models.
3. From the development data, identify a data instance that returned true-positive results for
both classifiers, compare the resulting explanations
4. Do the same for a false-positive, true-negative, and false-negative instance.
5. Finally, compare for a data instance that returned true-positive on the baseline system, but
not on the deep learning; and a data instance that returned a true-positive for your deep
learning system but not your baseline system
6. Document your findings for steps 3, 4 and 5. What similarities did you seen in each pair of
explanations? What were the main differences?
Task 4: Generalisability Testing (20% of marks)
The aim of this task is to test the generalisability of your developed models on completely held-out test set
data.
1. Identify your 5 most suitable models for further evaluation
• Choose two “baseline” approaches and three deep learning approaches, including at
least one feedforward and one convolutional network
2. Document your reasons for choosing these models.
3. Combine the training and development features and labels to create a ‘new’, larger training
dataset.