Computer Science Homework Help

Computer Science Homework Help. k-Nearest Neighbor Classification

The purpose of this assignment is to perform k-Nearest Neighbor
classification, interpret the results, and analyze whether or not the
information generated can be used to address a specific business
problem.

For this assignment, you will use the “Adult Incomes” data set from the Topic Materials.

ABC Survey Company collects data via surveys that it then sells to
marketing departments. Marketing departments typically do not like
missing data. Since survey takers typically do not like to answer
questions regarding their salary, the one question usually missing from
the survey results is, “Is your annual salary $50,000 or more?”

You are the analyst who has been tasked with finding a way to impute
(i.e., fill-in) the answer to the question, “Is your annual salary
$50,000 or more?” This information can best be imputed based upon how
individuals answer other survey questions related to their marital
status, educational level, occupation, and familial relationship status.
If this important question can be accurately imputed, then the worth of
the survey data provided by ABC Survey Company increases dramatically.

Question 1: Using only “Marital_Status,”
“Education,” “Occupation,” and “Relationship” variables, find the number
of neighbors (k) that minimizes the error rate. Use a range of k
between 3 and 10. Include the “k Selection Error Log” output when
submitting the answer.

Question 2: Using the same variables and
the k selected in Question 1, rerun the nearest neighbor model using the
feature selection option in the IBM SPSS Modeler. What is the set of
variables that minimize the error rate? Include the “Predictor Selection
Error Log” output when submitting the answer.

Question 3: Using the value of k and the
set of variables that minimizes the error rate, rerun the k-Nearest
Neighbor model. What is the classification table? Include the pivot
table output when submitting the answer.

Question 4: Consider the following
individual: Marital_Status=Never-married, Education=Masters,
Occupation=Sales, and Relationship=Not-in-family. Based on the k-Nearest
Neighbor model from Question 3, how would this individual be
classified? Provide the predicted income level (“>50K” or “<=50K”)
and explain the process that you used to determine the income level.
Include the table illustrating the data when submitting the answer.

Question 5: Describe the model building
process you used to determine whether or not a particular survey taker
earned an annual salary of $50,000 or more. Include discussion of the
accuracy of the k-Nearest Neighbor model and how it can be used in
practice to impute the answer to the question, “Is your annual salary
$50,000 or more?”

General Requirements:

Submit the answers to Questions 1-5 including the specified screenshots and software outputs, in a Word document.

APA format is not required, but solid academic writing is expected.

This assignment uses a grading rubric. Please review the rubric prior
to beginning the assignment to become familiar with the expectations
for successful completion.

Computer Science Homework Help

 
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"