Olete.in
Articles
Mock Tests
🧪 Machine Learning (ML) MCQ Quiz Hub
Machine Learning (ML) MCQ Set 02
Choose a topic to test your knowledge and improve your Machine Learning (ML) skills
1. What are support vectors?
all the examples that have a non-zero weight ??k in a svm
the only examples necessary to compute f(x) in an svm.
all of the above
none of the above
2. A perceptron adds up all the weighted inputs it receives, and if it exceeds a certain value, it outputs a 1, otherwise it just outputs a 0.
true
false
sometimes – it can also output intermediate values as well
can’t say
3. What is the purpose of the Kernel Trick?
to transform the data from nonlinearly separable to linearly separable
to transform the problem from regression to classification
to transform the problem from supervised to unsupervised learning
all of the above
4. Which of the following can only be used when training data are linearlyseparable?
linear hard-margin svm
linear logistic regression
linear soft margin svm
parzen windows
5. The firing rate of a neuron
determines how strongly the dendrites of theneuron stimulate axons of neighboring neurons
is more analogous to the output of a unit in aneural net than the output voltage of the neuron
only changes very slowly, taking a period ofseveral seconds to make large adjustments
can sometimes exceed 30,000 action potentialsper second
6. Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target?
auc-roc
accuracy
mean-squared-error
mean-squared-error
7. The cost parameter in the SVM means:
the number of cross-validations to be made
the kernel to be used
the tradeoff between misclassification and simplicity of the model
none of the above
8. The kernel trick
can be applied to every classification algorithm
is commonly used for dimensionality reduction
changes ridge regression so we solve a d ?? dlinear system instead of an n ?? n system, given nsample points with d features
exploits the fact that in many learning algorithms, the weights can be written as a linearcombination of input points
9. How does the bias-variance decomposition of a ridge regression estimator compare with that of ordinaryleast squares regression?
ridge has larger bias, larger variance
ridge has smaller bias, larger variance
ridge has larger bias, smaller variance
ridge has smaller bias, smaller variance
10. Which of the following are real world applications of the SVM?
text and hypertext categorization
image classification
clustering of news articles
all of the above
11. How can SVM be classified?
it is a model trained using unsupervised learning. it can be used for classification and regression.
it is a model trained using unsupervised learning. it can be used for classification but not for regression.
it is a model trained using supervised learning. it can be used for classification and regression.
t is a model trained using unsupervised learning. it can be used for classification but not for regression.
12. Which of the following can help to reduce overfitting in an SVM classifier?
use of slack variables
high-degree polynomial features
normalizing the data
setting a very low learning rate
13. Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting. Which of the following is best option would you more likely to consider iterating SVM next time?
you want to increase your data points
you want to decrease your data points
you will try to calculate more variables
you will try to reduce the features
14. What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space2. It’s a similarity function
1
2
1 and 2
none of these
15. You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which is false.
this is an instance of overfitting
this is an instance of underfitting
the training was not well regularized
the training and testing examples are sampled from different distributions
16. Suppose your model is demonstrating high variance across the different training sets. Which of the following is NOT valid way to try and reduce the variance?
increase the amount of traning data in each traning set
improve the optimization algorithm being used for error minimization.
decrease the model complexity
reduce the noise in the training data
17. Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
the model would consider even far away points from hyperplane for modeling
the model would consider only the points close to the hyperplane for modeling
the model would not be affected by distance of points from hyperplane for modeling
none of the above
18. We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization? 1. We do feature normalization so that new feature will dominate other 2. Some times, feature normalization is not feasible in case of categorical variables 3. Feature normalization always helps when we use Gaussian kernel in SVM
1
1 and 2
1 and 3
2 and 3
19. Wrapper methods are hyper-parameter selection methods that
should be used whenever possible because they are computationally efficient
should be avoided unless there are no other options because they are always prone to overfitting.
are useful mainly when the learning machines are “black boxes”
should be avoided altogether.
20. Which of the following methods can not achieve zero training error on any linearly separable dataset?
decision tree
15-nearest neighbors
hard-margin svm
perceptron
21. Suppose we train a hard-margin linear SVM on n > 100 data points in R2, yielding a hyperplane with exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)?
1
2
n
n+1
22. Let S1 and S2 be the set of support vectors and w1 and w2 be the learnt weight vectors for a linearly separable problem using hard and soft margin linear SVMs respectively. Which of the following are correct?
s1 ⚂ s2
s1 may not be a subset of s2
w1 = w2
all of the above
23. Which statement about outliers is true?
outliers should be part of the training dataset but should not be present in the test data
outliers should be identified and removed from a dataset
the nature of the problem determines how outliers are used
outliers should be part of the test dataset but should not be present in the training data
24. If TP=9 FP=6 FN=26 TN=70 then Error rate will be
45 percentage
99 percentage
28 percentage
20 perentage
25. Imagine, you are solving a classification problems with highly imbalanced class. The majority class is observed 99% of times in the training data. Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in such a case? 1. Accuracy metric is not a good idea for imbalanced class problems. 2.Accuracy metric is a good idea for imbalanced class problems. 3.Precision and recall metrics are good for imbalanced class problems. 4.Precision and recall metrics aren’t good for imbalanced class problems.
1 and 3
1 and 4
2 and 3
2 and 4
26. he minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM’s?
large datasets
small datasets
medium sized datasets
size does not matter
27. Perceptron Classifier is
unsupervised learning algorithm
semi-supervised learning algorithm
supervised learning algorithm
soft margin classifier
28. Type of dataset available in Supervised Learning is
unlabeled dataset
labeled dataset
csv file
excel file
29. which among the following is the most appropriate kernel that can be used with SVM to separate the classes
linear kernel
gaussian rbf kernel
polynomial kernel
option 1 and option 3
30. The SVMs are less effective when
the data is linearly separable
the data is clean and ready to use
the data is noisy and contains overlapping points
option 1 and option 2
31. What is the precision value for following confusion matrix of binary classification?
0.91
0.09
0.9
0.95
32. Which of the following are components of generalization Error?
bias
vaiance
both of them
none of them
33. Which of the following is not a kernel method in SVM?
linear kernel
polynomial kernel
rbf kernel
nonlinear kernel
34. During the treatement of cancer patients , the doctor needs to be very careful about which patients need to be given chemotherapy.Which metric should we use in order to decide the patients who should given chemotherapy?
precision
recall
call
score
35. Which one of the following is suitable? 1. When the hypothsis space is richer, overfitting is more likely. 2. when the feature space is larger , overfitting is more likely.
true, false
false, true
true,true
false,false
36. Which of the following is a categorical data?
branch of bank
expenditure in rupees
prize of house
weight of a person
37. The soft margin SVM is more preferred than the hard-margin SVM when-
the data is linearly seperable
the data is noisy and contains overlapping points
the data is not noisy and linearly seperable
weight of a person
38. In SVM which has quadratic kernel function of polynomial degree 2 that has slack variable C as one hyper paramenter. What would happen if we use very large value for C
we can still classify the data correctly for given setting of hyper parameter c
we can not classify the data correctly for given setting of hyper parameter c
we can not classify the data at all
data can be classified correctly without any impact of c
39. In SVM, RBF kernel with appropriate parameters to perform binary classification where the data is non-linearly seperable. In this scenario
the decision boundry in the transformed feature space in non-linear
the decision boundry in the transformed feature space in linear
. the decision boundry in the original feature space in not considered
the decision boundry in the original feature space in linear
40. Which of the following is true about SVM? 1. Kernel function map low dimensional data to high dimensional space. 2. It is a similarity Function
1 is true, 2 is false
1 is false, 2 is true
1 is true, 2 is true
1 is false, 2 is false
41. What is the Accuracy in percentage based on following confusion matrix of three class classification. Confusion Matrix C=[14 0 0] [ 1 15 0] [ 0 0 6]
0.75
0.97
0.95
0.85
42. Which of the following method is used for multiclass classification?
one vs rest
loocv
all vs one
one vs another
43. Based on survey , it was found that the probability that person like to watch serials is 0.25 and the probability that person like to watch netflix series is 0.43. Also the probability that person like to watch serials and netflix sereis is 0.12. what is the probability that a person doesn't like to watch either?
0.32
0.2
0.44
0.56
44. A machine learning problem involves four attributes plus a class. The attributes have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many maximum possible different examples are there?
12
24
48
72
45. MLE estimates are often undesirable because
they are biased
they have high variance
they are not consistent estimators
none of the above
46. The difference between the actual Y value and the predicted Y value found using a regression equation is called the
slope
residual
outlier
scatter plot
47. Neural networks
optimize a convex cost function
always output values between 0 and 1
can be used for regression as well as classification
all of the above
48. Linear Regression is a _______ machine learning algorithm.
supervised
unsupervised
semi-supervised
can say
49. Which of the following methods/methods do we use to find the best fit line for data in Linear Regression?
least square error
maximum likelihood
logarithmic loss
both a and b
50. Which of the following methods do we use to best fit the data in Logistic Regression?
least square error
maximum likelihood
jaccard distance
both a and b
Submit