Wednesday, August 13, 2025

20 Interview Questions - From the 'Hundred-Page Machine Learning Book' - Ch.1

To See All Interview Preparation Articles: Index For Interviews Preparation

Prompt: Extract 10 simple and 10 complex questions from this pdf with detailed answers

Got it — I’ll pull 10 simple and 10 complex questions from the provided section (pages 7–13) and give detailed answers for each.


10 Simple Questions

1. What is Machine Learning according to the book?

Answer:
Machine learning is a subfield of computer science concerned with building algorithms that rely on examples of some phenomenon. It can also be defined as solving a practical problem by:

  1. Gathering a dataset, and

  2. Algorithmically building a statistical model based on that dataset.
    The model is then used to solve the practical problem.


2. What are the main types of learning in machine learning?

Answer:

  1. Supervised learning – learns from labeled data.

  2. Unsupervised learning – learns from unlabeled data.

  3. Semi-supervised learning – uses both labeled and unlabeled data.

  4. Reinforcement learning – learns through interactions with an environment, aiming to maximize rewards.


3. What is a feature vector?

Answer:
A feature vector is an ordered list of values (features) describing an example in the dataset. Each position corresponds to a specific attribute. Example: for a person, x(1) could be height, x(2) weight, and x(3) gender.


4. In supervised learning, what is the goal of the model?

Answer:
The goal is to take a feature vector xx as input and output information (class label, probability, etc.) that allows deducing the correct label yy.


5. What is clustering?

Answer:
Clustering is an unsupervised learning task where the model assigns each data point to a group (cluster) so that points in the same cluster are similar.


6. What is a decision boundary?

Answer:
It’s the surface (line, plane, or hyperplane) in the feature space that separates different classes.


7. What is a policy in reinforcement learning?

Answer:
A policy is a function that maps the current state (feature vector) of the environment to the optimal action to take.


8. What is a margin in SVM?

Answer:
In SVM, the margin is the distance between the closest positive and negative examples, measured across the decision boundary. Maximizing the margin helps generalization.


9. What is the bag-of-words model in NLP?

Answer:
It’s a way to convert text into a feature vector by checking for the presence/absence (or frequency) of each word from a dictionary in the text.


10. What does the sign function do in SVM?

Answer:
It outputs +1 if its input is positive and –1 if the input is negative, allowing binary classification.


10 Complex Questions

1. How does semi-supervised learning leverage unlabeled data to improve performance?

Answer:
Even though unlabeled data lacks target labels, it provides additional information about the underlying data distribution. A larger dataset (labeled + unlabeled) better approximates the probability distribution of the phenomenon, allowing the model to make more informed decisions.


2. How does SVM find the optimal decision boundary?

Answer:
SVM formulates an optimization problem:

minws.t.yi(wxib)1\min ||w|| \quad \text{s.t.} \quad y_i(w x_i - b) \ge 1

Minimizing w||w|| maximizes the margin between the two classes, ensuring better generalization. The optimal ww^* and bb^* define the decision boundary.


3. Why does maximizing the margin in SVM improve generalization?

Answer:
A larger margin places the decision boundary farther from the nearest training points, reducing the risk of misclassification for new, unseen examples. This is supported by the principle that simpler boundaries generalize better.


4. How can SVM handle non-linear boundaries?

Answer:
SVM can use kernels to implicitly map the data into a higher-dimensional space where a linear separation becomes possible, even if the data is not linearly separable in the original space.


5. Why might perfect separation be impossible in some datasets?

Answer:
Due to noise, labeling errors, or outliers, the classes may overlap in feature space, making a perfect decision boundary impossible without misclassifying some points.


6. How is a real-world object transformed into a machine-readable feature vector in supervised learning?

Answer:
A data analyst chooses a representation (feature engineering). Example: In spam detection, an email can be transformed into a binary vector using the bag-of-words approach, where each position represents the presence or absence of a specific word.


7. What is the role of optimization in training machine learning models?

Answer:
Optimization algorithms search for parameters (e.g., ww, bb) that minimize a loss function subject to constraints. In SVM, the objective is to maximize the margin while classifying all training examples correctly.


8. What assumptions are made about training data to ensure a model generalizes well?

Answer:

  • Training examples are randomly sampled.

  • They are independent of each other.

  • They follow the same distribution as the test data.


9. What is PAC learning and why is it important?

Answer:
PAC (Probably Approximately Correct) learning is a framework that studies under what conditions an algorithm will produce a model that is probably close to the correct classifier, given finite data. It relates model complexity, dataset size, and expected error.


10. How does feature dimensionality affect SVM performance?

Answer:
High-dimensional spaces make separation easier (due to the curse of dimensionality), but also risk overfitting if the number of features greatly exceeds the number of examples. Regularization and feature selection help control this.


If you want, I can now put these 20 Q&As into a neatly formatted interview-style PDF so you can use it directly for candidate assessment. Would you like me to prepare that?

Tags: Interview Preparation,Machine Learning,Technology,

No comments:

Post a Comment