Programming Homework Help

Programming Homework Help. Modify a python program about classification

Q1 Classification

Write a function classify to conduct a classification experiements as follows:

Take the training and testing file name strings as inputs, e.g. classify(training_file, testing_file). I
Classify text samples in training file using linear support vector machine as follows:

a. First apply grid search with 6-fold cross validation to find the best values for parameters min_df, stop_words, and C (penality parameter of SVM) that are used the modeling pipeline. Use f1-macro as the scoring metric to select the best parameter values. Potential values for these parameters are:

min_df’ : [1,2,5]

stop_words’ : [None,”english”]

C: [0.5,1,5]

b. Using the best parameter values, train a linear support vector machine classifier with all samples in news_train.csv

Test the linear support vector classifier created in Step 2.b using the testing file. Compare f1-macro score you obtain from the test dataset with the f1-macro of the best model from grid search, and comment if the model is overfitted or not. Save your comment into a pdf file
Your function “classify” t has no return. However, when this function is called, the best parameter values from grid search is printed and the testing precision, recall, and f1 score from Step 3 is printed.

Q2. How many samples are enough? Show the impact of sample size on classifier performance

Write a function “impact_of_sample_size” as follows:

Take the full file name path strings for training and test datasets as inputs, e.g.

impact_of_sample_size(train_file, test_file).

Starting with 300 samples from the training file, in each round you build a classifier with 300

more samples. i.e. in round 1, you use samples from 0:300, and in round 2, you use samples from

0:600, …, until you use all samples.

In each round, do the following:

create tf-idf matrix using TfidfVectorizer with stop words removed
train a classifier using multinomial Naive Bayes model
train a classifier using linear support vector machine model
for each classifier, test its performance using the testing file and collect the following metrics: macro precision, macro recall. Note, make sure you use the same model parameters for all iterations.

Draw a line chart (two lines, one for each classifier) show the relationship between sample size and precision. Similarly, plot another line chart to show the relationship between sample size and recall

Write your analysis on the following:

How sample size aﬀects each classifier’s performance?

How many samples do you think would be needed for each model for good performance?

How is performance of SVM classifier compared with Naïve Bayes classifier, as the sample size increases?

There is no return for this function, but the charts should be plotted.

Programming Homework Help

"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"

About Us

Quick Links

We Accept

Contact Us

"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"

You might also like

About Us

Quick Links

We Accept

Contact Us