Programming Homework Help

Programming Homework Help. Machine learning

As a simplified example of character recognition, we will compare several supervised

learning classifiers with validation on a larger version of the MNIST digit recognition

dataset. In this assignment we will use a much larger dataset than that used for

assignment 1; this should represent a better distribution of the natural variability in hand

written 8s and 9s.

Download (from moodle), NumberRecognitionBigger.mat. Not the dataset includes data

samples for all handwritten digits 0 to 9, but we will be using only 8 and 9 for this

assignment. You can implement your assignment in either Matlab or python, with details

to follow:

Coding

Example Matlab and Python functions that can be relied upon are already outlined in

Assignment 1. Assignment 2 may also benefit from the following commands. You are

expected to read documentation on the commands available and try to get them

working, prior to asking for assistance. Please address questions to the course

Python

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA

from sklearn.naive_bayes import GaussianNB as NB

Also strongly consider using:

from sklearn.model_selection import cross_validate

from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit

and using the random_state argument for either StratifiedShuffleSplit or

StratifiedKFold.

Question 1: Implement K-Fold cross validation (K=5). Within the validation, you

will train and compare a Linear Discriminant Analysis Classifier, a Quadratic

Discriminant Analysis Classifier, a Bayesian Classifier (Naïve Bayes) and a K-NN

(K=1, K=5 and K=10) classifier. The validation loop will train these models for

predicting 8s and 9s. NOTE: for a fair comparison, K-Fold randomization should

only be performed once, with any selected samples for training applied to the

creation of all classifier types (LDA, QDA, Bayes, KNN) in an identical manner (i.e.

the exact same set of training data will be used to construct each model being

compared to ensure a fair comparison).

Provide a K Fold validated error rate for each of the classifiers. Provide a printout of your

code (Matlab or python). Answer the following questions:

a) Which classifier performs the best in this task?

b) Why do you think this classifier outperforms the others?

c) How does KNN compare to the results obtained in assignment 1? Why do you

observe this comparative pattern?

It was previously announced on multiple occasions that each student is required to

assemble their own dataset compatible with supervised learning based classification

(i.e. a collection of measurements across many samples/instances/subjects that include

a group of interest distinct from the rest of the samples). If you are happy with your

choice from assignment 1, then re-provide your answer to Assignment 1 Question 2

below. If you want to change your dataset for this assignment, for a future assignment or

for your graduate project, you are free to do so, but you have to update your answer to

Question 2 based on your new dataset choice.

Question 2: (Repeat) Describe the dataset you have collected: total number of

samples, total number of measurements, brief description of the measurements

included, nature of the group of interest and what differentiates it from the other

samples, sample counts for your group of interest and sample count for the group not of

interest. Write a program that analyzes each measurement/feature individually. For each

measurement, compute Cohen’s d statistic (the difference between the average value of

the group of interest and the average value of the group not of interest, divided by the

standard deviation of the joint distribution that includes both groups). Provide a printout

of the 10 leading measurements (d statistic furthest from zero), with their corresponding

d statistic, making it clear what those measurements represent in your dataset (these

are the measurements with the most obvious potential to inform prediction in any given

machine learning algorithm). Provide a printout of this code.

Question 3: Adapt your code from Question 1 to be applied to the dataset that you’ve

organized for yourself. Provide a printout of the error rates for the different classifiers

and your code. Answer the following question: is the best performing classifier from

Question 1 the same in Question 3? Elaborate on those similarities/differences – what

about your dataset may have contributed to the differences/similarities observed?

Deadline: October 24th, 2019.

Programming Homework Help

 
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"