Programming Homework Help

Programming Homework Help. Python with Natural Language Processing

Retrieve relevant answers to questions by similarity

Each row in “qa.txt” defines a question and its corresponding answer. Now assume we do not know answers to these questions. Let’s design an algorithm to retrieve the most relevant answer to each question.

1. Define another function match_question_answer as follows:

takes two inputs: a list of questions as strings (i.e. questions), and a list of answers as strings (i.e. answers).

uses the “tokenize” function defined in Q1 to tokenize each document

generates tf_idf matrix from the tokens (hint: reference to the tf_idf function defined in Section 7.5 in lecture notes)

calculates the cosine distance between every question and every answer using the tf_idf matrix (hint, you can use scipy.spatial.distance.cdist function)

for each question q, identifies the answer which is the most similar to q as the most relevant answer (denoted as a )

returns a list of tuples each with 3 elements, (index of q, index of a , similarity score) for every question q in the dataset.

2. Define a function evaluate to evaluate the performance of retrieval as follows:

takes the returned list from match_question_answer function as an input

sets a minimum similarity threshold (denoted as min_sim), and selects entries from the list with similarity >= the threshold (denoted as matching_pairs). calculates two metrics for selected matching_pairs:

recall: the percentage of questions with matching answers, i.e.

len(matching_pairs)/len(qu estions)

precision: the precentage of questions in matching_pairs indeed matched with the corresponding answers as indicated in the dataset.

Varies the similarity threshold from 0 to 0.6 with 0.05 increase in each round, calculate the recall and precision in each round, and plot a chart with two lines where the recall and precision as Y axis and the threshold as X axis.

3.As the threshold increases, how precision and recal change? What can be a good similarity threshold for retrieving most relevant answers to these questions? Write down your analysis in a document with your code.

Programming Homework Help

 
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"