Business Finance Homework Help

Business Finance Homework Help. midterm exam in business intelligence and analytics, Multiple choices

please answer these questions during three hours before 11. see the questions attached in the word file

In using a classification (aka decision) tree…

	each data instance will correspond to one and only one path in the tree.
	we are relying on deduction rather than induction

	the segmentation is unsupervised.
	parent nodes may share descendants.

Question 2

If the True Positive Rate for a classification model is .75, which of the following must be true?

	The True Negative Rate is .25
	The True Negative Rate is .75

	The True Positive Count is greater than the False Negative Count
	The True Positive Count is greater than the False Positive Count

Flag this Question

Question 35 pts

Find the BEST matching of the items below.

Entropy :

Choose

a- Numeric target

b- Maximum margin

c- log odds

d- difference between parents and children

c- How mixed up classes are

Logistic regression :

Choose

a- Numeric target

b- Maximum margin

c- log odds

d- difference between parents and children

c- How mixed up classes are

Information Gain :

Choose

a- Numeric target

b- Maximum margin

c- log odds

d- difference between parents and children

c- How mixed up classes are

Regression :

Choose

a- Numeric target

b- Maximum margin

c- log odds

d- difference between parents and children

c- How mixed up classes are

SVMs :

Choose

a- Numeric target

b- Maximum margin

c- log odds

d- difference between parents and children

c- How mixed up classes are

Question 42 pts

The decision boundaries associated with a classification tree will always be perpendicular to an axis of the instance space.

	True
	False

Question 52 pts

The Laplace correction, a formula for smoothing the frequency-based estimate, can help with probability estimation for leaves with a large number of instances.

	True
	False

Question 62 pts

When ranking attributes for use at a node in a classification tree…

	each attribute is always evaluated on the entire set of instances.
	the attributes are always ranked the same, no matter where in the tree they are being considered.

	the rankings are based on how many instances end up in each “child” node.
	the ranking depends on the splits above the node in the tree.

Question 72 pts

You would like to build a model for predicting defaults on student loans. You are given a large number of categorical attributes of each loan, such as the type of school that a student will attend, the state where it is located, etc., as well as numerical attributes such as outstanding loan amount, student’s age, loan interest rate, and so on. Your client asks that your model must provide a clear explanation of the reason for its predictions, since the final judgment on whether to give a loan or not will be made by a human agent. What data mining technique would you suggest using?

	Logistic regression
	Decision tree

	Support vector machine
	Artificial neural network

Question 82 pts

All of the following are true about linear discriminant functions EXCEPT

	The function is a weighted sum of the values of the attributes.
	The function will always divide the data into mutually exclusive groups associated with the target variable.

	The data mining will determine the weights of the function for the best fit to the data.
	Linear discriminant functions are a form of supervised segmentation.

Question 94 pts

Match the definitions of these two-class classification problem confusion matrix entries.

True positive count

Choose

a- Actually negative; classified as positive

b- classified as negative; actually positive

c- Actually negative; classified as negative

d- difference between parents and children

c- classified as positive; actually positive

True negative count

Choose

a- Actually negative; classified as positive

b- classified as negative; actually positive

c- Actually negative; classified as negative

d- difference between parents and children

c- classified as positive; actually positive

False positive count

Choose

a- Actually negative; classified as positive

b- classified as negative; actually positive

c- Actually negative; classified as negative

d- difference between parents and children

c- classified as positive; actually positive

False negative count

Choose

a- Actually negative; classified as positive

b- classified as negative; actually positive

c- Actually negative; classified as negative

d- difference between parents and children

c- classified as positive; actually positive

Flag this Question

Question 102 pts

A confusion matrix is used to evaluate a diagnostic model for a binary disease classifier. Out of 165 patients, the model predicted “yes” 50 times and “no” 115 times. In reality, 55 patients have the disease and 110 do not. There are 40 true positives. What is the Accuracy of this model?

	80%
	85%

	73%
	91%

Flag this Question

Question 112 pts

In order to train a clustering model, you do not need labelled data.

	True
	False

Flag this Question

Question 122 pts

Which of the following statements is true regarding the use of parametric modeling in data mining?

	The data is used to specify both the form of the model and the values of the parameters.
	The decision boundaries resulting from parametric models are axis-parallel.

	Parametric models should be evaluated both on predictive performance and understandability.
	All parametric models are in the form of a linear function.

Flag this Question

Question 132 pts

Which of the following statements best explains the reason for using holdout data?

	Holdout data provides an assessment of how well a model generalizes to unseen cases.
	Holdout data improves the base rate performance of a model.

	Accuracy on the holdout data correlates with accuracy on the training data.
	Holdout data is considered “unlabeled,” since the target variable values are unknown, and the model can be used to predict these target values.

Flag this Question

Question 142 pts

Techniques of model regularization include all of the following EXCEPT

	Tree pruning
	Cross-validation

	Employing complexity penalties
	Feature selection

Flag this Question

Question 152 pts

Problems associated with increasing model complexity include…

	more overfitting
	inability to use hinge loss

	the need for more test data
	All of the above

Flag this Question

Question 164 pts

Match these overfitting tools and concepts with their primary characteristics.

Cross-validation

Choose

a- controlling complexity

b- k- folds

c- changes in model complexity

d- changes in amount of training data

Learning curve

Choose

a- controlling complexity

b- k- folds

c- changes in model complexity

d- changes in amount of training data

Fitting graph

Choose

a- controlling complexity

b- k- folds

c- changes in model complexity

d- changes in amount of training data

Regularization

Choose

a- controlling complexity

b- k- folds

c- changes in model complexity

d- changes in amount of training data

Flag this Question

Question 172 pts

Consider this picture from the DS text.

Would splitting on head shape result in the same information gain as splitting on body color?

	Yes
	No

Flag this Question

Question 182 pts

Except for the root node, features in a classification tree are not evaluated on the entire set of instances.

	True
	False

Flag this Question

Question 192 pts

It is usually easy to determine in advance whether the linear decision boundaries of a tree induction model or a linear classifier will be a better fit for a particular data set.

	True
	False

Flag this Question

Question 202 pts

All model types can be overfit, but induction trees are the least-susceptible to overfitting.

	True
	False

Flag this Question

Question 212 pts

The convention in representing data for data mining has the ________________ in rows and the ________________ in columns.

	features; observations
	instances; attributes

	predictors; targets
	variables; examples

Flag this Question

Question 222 pts

The logistic regression model is more prone to overfitting to accommodate outliers than the support vector machine model.

	True
	False

Flag this Question

Question 232 pts

Which of the following is true about the hinge loss function, used by support vector machines?

	It provides the same penalty values as a zero-one loss function.
	An example that is on the wrong side of the margin incurs no penalty.

	It only becomes positive when an example is on the wrong side of the boundary and beyond the margin.
	Penalties increase exponentially with the example’s distance from the margin.

Flag this Question

Question 242 pts

Usually, model generalizability ____________ with the amount of training data.

	increases
	decreases

	stays the same
	The amount of training data does not affect the generalizability of a model.

Flag this Question

Question 252 pts

All of the following statements about training data and test data are true EXCEPT

	As model complexity increases, model accuracy increases on training data, but increases, then decreases, on test data.
	Model generalizability can be compromised if the training data and test data do not match the field data to which the model will be applied.

	Cross-validation exchanges test and training data in a systematic procedure designed to guard against overfitting.
	Test data should generally have more attributes than training data.

Flag this Question

Question 262 pts

In data mining, prediction is always associated with forecasting a future event.

	True
	False

Flag this Question

Question 272 pts

Which TWO of the following statements about classification models are true?

	A classification tree is a logical classification model.
	A linear discriminant function is a logical classification model.

	A classification tree is a numeric classification model.
	A linear discriminant function is a numeric classification model.

Flag this Question

Question 282 pts

All of the following are true about the use of data mining results EXCEPT

	Training data have all class values specified
	The deployed model is built using new data instances

	A deployed model can predict both the class value and the probability of belonging to that class
	There is a difference between mining data to find patterns and build models, and using the results of data mining

Flag this Question

Question 292 pts

For a cross-validation procedure that splits the data into k folds…

	k different results can be used to compute the average accuracy and its variance.
	an equal amount of testing and training data is used in each iteration.

	training and testing are iterated k-1 times.
	All of the above.

Flag this Question

Question 302 pts

A classification/decision tree is equivalent to a rule set for scoring new instances of unseen data.

	True
	False

Flag this Question

Question 312 pts

The flexibility of tree induction to represent nonlinear relationships between the features and the target can be an advantage with larger training sets.

	True
	False

Flag this Question

Question 322 pts

Which of these can be used to justify investing (or not) in additional training data?

	Regularization
	Fitting graph

	Cross-validation
	Learning curve

Flag this Question

Question 332 pts

For any probability, the log-odds function will produce a value between zero and positive infinity.

	True
	False

Flag this Question

Question 342 pts

The _____________ in a fitting graph represents the desired balance of complexity and generalizability for a particular set of data.

	accuracy
	sweet spot

	training data performance
	holdout data performance

Flag this Question

Question 352 pts

All of the following are true about overfitting and generalization EXCEPT

	Evaluation on training data provides no assessment of how well a model will generalize to unseen cases.
	We can estimate generalization performance by comparing predicted values on holdout data to their known true values.

	Generally speaking, there will be greater generalization the more complex a model is.
	Cross-validation can provide some simple statistics on the generalization performance of a model.

Flag this Question

Question 362 pts

The entropy of the parent data set in the tennis example is approximately

	.89
	.94

	.97
	1

Flag this Question

Question 372 pts

In the tennis classification example, all of the following are true EXCEPT

	Outlook is the attribute that provides the most information about the target value.
	An overcast outlook results in a pure leaf node.

	Humidity is the second split because it has the second highest information gain at the top split.
	All the leaf nodes at the bottom of the tree have zero entropy.

Flag this Question

Course Feedback

These questions are intended to help me gather some data and give me some feedback on your experience with the course so far. You will be given one point for each question answered in this part. THERE ARE NO WRONG ANSWERS.

Flag this Question

Question 381 pts

Which of these course features has contributed MOST to your learning so far?

(You may check more than one.)

	Textbook reading
	Weekly narrated lecture

	Additional activities
	Discussion forums

	Live sessions
	Homework

Flag this Question

Question 391 pts

Which of these course features has contributed LEAST to your learning so far?

(You may check more than one.)

	Textbook reading
	Weekly narrated lecture

	Additional activities
	Discussion forums

	Live sessions
	Homework

Flag this Question

Question 401 pts

Compared to your expectations regarding the technical difficulty of this class, do you find the material…

	too light on technical details
	about right on technical details

too technically complex

Flag this Question

Question 411 pts

Compared with the time estimates included in each Module, the amount of time you spend on this course is…

	more than the time estimates, on average.
	less than the time estimates, on average.

about the same as the time estimates, on average.

Flag this Question

Question 421 pts

Please suggest ONE way in which the course could be improved in the second half of the semester.

HTML Editor Keyboard Shortcuts

12pt

Paragraph

0 words

Business Finance Homework Help

"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"

About Us

Quick Links

We Accept

Contact Us

"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"

You might also like

About Us

Quick Links

We Accept

Contact Us