Programming Homework Help

Programming Homework Help. Modify a python program about classification

Q1 Classification

Write a function classify to conduct a classification experiements as follows:

  • Take the training and testing file name strings as inputs, e.g. classify(training_file, testing_file). I
  • Classify text samples in training file using linear support vector machine as follows:

a. First apply grid search with 6-fold cross validation to find the best values for parameters min_df, stop_words, and C (penality parameter of SVM) that are used the modeling pipeline. Use f1-macro as the scoring metric to select the best parameter values. Potential values for these parameters are:

min_df’ : [1,2,5]

stop_words’ : [None,”english”]

C: [0.5,1,5]

b. Using the best parameter values, train a linear support vector machine classifier with all samples in news_train.csv

  • Test the linear support vector classifier created in Step 2.b using the testing file. Compare f1-macro score you obtain from the test dataset with the f1-macro of the best model from grid search, and comment if the model is overfitted or not. Save your comment into a pdf file
  • Your function “classify” t has no return. However, when this function is called, the best parameter values from grid search is printed and the testing precision, recall, and f1 score from Step 3 is printed.

Q2. How many samples are enough? Show the impact of sample size on classifier performance

Write a function “impact_of_sample_size” as follows:

Take the full file name path strings for training and test datasets as inputs, e.g.

impact_of_sample_size(train_file, test_file).

Starting with 300 samples from the training file, in each round you build a classifier with 300

more samples. i.e. in round 1, you use samples from 0:300, and in round 2, you use samples from

0:600, …, until you use all samples.

In each round, do the following:

  • create tf-idf matrix using TfidfVectorizer with stop words removed
  • train a classifier using multinomial Naive Bayes model
  • train a classifier using linear support vector machine model
  • for each classifier, test its performance using the testing file and collect the following metrics: macro precision, macro recall. Note, make sure you use the same model parameters for all iterations.

Draw a line chart (two lines, one for each classifier) show the relationship between sample size and precision. Similarly, plot another line chart to show the relationship between sample size and recall

Write your analysis on the following:

How sample size affects each classifier’s performance?

How many samples do you think would be needed for each model for good performance?

How is performance of SVM classifier compared with Naïve Bayes classifier, as the sample size increases?

There is no return for this function, but the charts should be plotted.

Programming Homework Help

 
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"