Pages
- Index of Lessons in Technology
- Index of Book Summaries
- Index of Book Lists And Downloads
- Index For Job Interviews Preparation
- Index of "Algorithms: Design and Analysis"
- Python Course (Index)
- Data Analytics Course (Index)
- Index of Machine Learning
- Postings Index
- Index of BITS WILP Exam Papers and Content
- Lessons in Investing
- Index of Math Lessons
- Index of Management Lessons
- Book Requests
- Index of English Lessons
- Index of Medicines
- Index of Quizzes (Educational)
Friday, March 27, 2026
Supervised Machine Learning - Regression and Classification (at DeepLearning.ai)
Friday, November 7, 2025
YouTube Academy For Machine Learning
Toggle All Sections
What is Machine Learning?
-
Google for Developers
-
Google Open Online Education
-
TensorFlow
-
Google for Developers
What is On-device Machine Learning?
-
Google for Developers
Supervised Machine Learning
-
Google Open Online Education
Types of Machine Learning
-
Python Simplified
-
IBM Technology
Generalization
-
Google for Developers
Linear Regression
-
Khan Academy
Supervised Learning
-
Google for Developers
Logistic Regression
-
Google for Developers
-
Gate Smashers
-
5 Minutes Engineering
-
Krish Naik
Decision Tree
-
Intuitive Machine Learning
Support Vector Machines
-
Intuitive Machine Learning
-
TileStats
-
Visually Explained
Gradient Descent
-
Google for Developers
Neural Networks
-
IBM Technology
-
TileStats
-
3Blue1Brown
-
IBM Technology
-
IBM Technology
-
IBM Technology
Machine Learning Courses
-
Google for Developers
-
Google for Developers
-
Google for Developers
-
IBM Technology
-
Stanford Online
Tuesday, August 19, 2025
20 Interview Questions - From the 'Hundred-Page Machine Learning Book' - Ch.2
Prompt: Extract 10 simple and 10 complex questions from this pdf with detailed answers
Here’s a set of 10 simple and 10 complex questions with detailed answers from the chapter you provided (pp. 29–42 of The Hundred-Page Machine Learning Book).
I’ve made sure the simple ones check basic understanding, while the complex ones require reasoning, synthesis, or application.
Simple Questions (with Answers)
1. What is the main difference between Linear Regression and SVM in terms of their objective?
Answer:
In Linear Regression, the model’s hyperplane is chosen to be as close as possible to all training examples, minimizing the average squared error. In SVM, the hyperplane serves as a decision boundary that separates classes and is placed to maximize the margin between them.
2. What is the loss function used in Linear Regression?
Answer:
Linear Regression uses squared error loss, which is (f(xi)−yi)2 for each example, averaged across the dataset.
3. Why is squared loss preferred over absolute loss in Linear Regression?
Answer:
Squared loss has a continuous derivative, making it smoother and easier to optimize using linear algebra for closed-form solutions, unlike absolute loss, which is not smooth.
4. Is Logistic Regression used for regression tasks?
Answer:
No. Despite the name, Logistic Regression is a classification algorithm, typically for binary classification, not regression.
5. What function is used in Logistic Regression to map outputs between 0 and 1?
Answer:
The sigmoid (logistic) function:
6. What is the main optimization criterion for Logistic Regression?
Answer:
It uses Maximum Likelihood Estimation (MLE), maximizing the likelihood of the observed data under the model.
7. What does the ID3 algorithm use to determine the best split?
Answer:
ID3 uses entropy to measure uncertainty and selects the feature/threshold that minimizes the weighted average entropy after the split.
8. What is the difference between hard-margin and soft-margin SVM?
Answer:
Hard-margin SVM requires perfect separation of data without errors. Soft-margin SVM allows some misclassifications using hinge loss and a regularization parameter C to balance margin size and classification errors.
9. What is the kernel trick in SVM?
Answer:
It’s a method to compute dot products in a higher-dimensional feature space without explicitly transforming the data, using a kernel function.
10. What does the parameter k represent in k-Nearest Neighbors (kNN)?
Answer:
It represents the number of nearest neighbors considered when predicting the label for a new example.
Complex Questions (with Answers)
1. Explain why overfitting can occur with high-degree polynomial regression, using the concepts from the text.
Answer:
High-degree polynomial regression can create a curve that fits the training data almost perfectly, capturing noise and outliers rather than the underlying pattern. This leads to poor generalization on unseen data, as shown in Fig. 2 of the text. The curve follows training points too closely, increasing variance and overfitting.
2. Why does Logistic Regression use log-likelihood instead of raw likelihood for optimization?
Answer:
Log-likelihood simplifies the product of probabilities into a sum (via logarithms), making it easier to compute and differentiate. Since the log function is monotonically increasing, maximizing log-likelihood yields the same result as maximizing likelihood but is more numerically stable and computationally convenient.
3. How does the choice of the hyperparameter C in SVM affect bias and variance?
Answer:
-
High C: Focuses on minimizing classification errors, leading to low bias but high variance (risk of overfitting).
-
Low C: Allows more misclassifications for a larger margin, increasing bias but reducing variance (better generalization).
4. Describe the process of finding the best split in the ID3 decision tree algorithm.
Answer:
For each feature and possible threshold, ID3:
-
Splits the dataset into two subsets.
-
Computes the weighted average entropy of the subsets.
-
Selects the split (feature + threshold) with the lowest weighted entropy.
This process is repeated recursively until stopping criteria (e.g., pure nodes, max depth) are met.
5. Why is the kernel trick computationally advantageous compared to explicit transformation?
Answer:
Explicit transformation to high-dimensional space is computationally expensive and memory-intensive. The kernel trick computes the dot product in that space directly from the original feature vectors, avoiding the explicit mapping and thus saving time and resources.
6. Explain the role of hinge loss in soft-margin SVM.
Answer:
Hinge loss, defined as max(0,1−yi(wxi−b)), penalizes points on the wrong side of the margin or decision boundary. It allows the SVM to tolerate some violations (misclassifications) while still aiming to maximize the margin, balancing between training accuracy and generalization.
7. In what situations would the RBF kernel be preferred over a polynomial kernel?
Answer:
RBF kernel is preferred when:
-
The relationship between features is highly nonlinear.
-
You expect the decision boundary to be smooth rather than oscillatory.
-
The data is not well-represented by polynomial combinations of features.
RBF maps to an infinite-dimensional space, allowing flexible, smooth boundaries.
8. How does cosine similarity differ from Euclidean distance in kNN, and when might it be preferred?
Answer:
-
Cosine similarity measures the angle between vectors, focusing on direction rather than magnitude.
-
Euclidean distance measures absolute geometric distance.
Cosine similarity is preferred when the magnitude of features is irrelevant (e.g., text data with TF-IDF vectors) and direction (pattern) matters more.
9. Why does the ID3 algorithm not guarantee an optimal decision tree?
Answer:
ID3 makes greedy, local decisions at each split without considering future splits. It doesn’t backtrack to evaluate whether earlier splits could lead to better overall trees, so it may get stuck in local optima.
10. Explain how kNN can be interpreted as a locally linear classifier under certain assumptions.
Answer:
With normalized feature vectors and cosine similarity, kNN predicts based on a weighted sum of feature vectors from neighbors with label 1. This sum acts like a local linear model, where the classification is based on the dot product (cosine similarity) between the summed vector and the input vector.
If you want, I can also make a printable 2-column table with these questions and answers so you can directly use it as a quick study sheet. That would make it easy to revise before exams. Would you like me to prepare that?
Wednesday, August 13, 2025
Q8: What is a margin in SVM?
Other Ques From: Ch.1 from the 'Hundred-Page Machine Learning Book'
In Support Vector Machines (SVM), the margin is the distance between the decision boundary (the separating hyperplane) and the closest data points from each class.
📌 Key Points:
-
Support Vectors
-
The data points that lie closest to the decision boundary.
-
They "support" or define the position and orientation of the boundary.
-
-
Margin Definition
-
The margin is measured as the perpendicular distance from the decision boundary to the nearest support vector from either class.
-
In a maximum-margin classifier, SVM chooses the boundary that maximizes this distance.
-
-
Why Maximize the Margin?
-
A larger margin means the decision boundary is farther from any training data point, which usually improves generalization on unseen data.
-
This aligns with the structural risk minimization principle: simpler decision boundaries generalize better.
-
📐 Formula for Margin
For a linear SVM, the margin M is:
M=∥w∥2Where:
-
w = weight vector defining the hyperplane w⋅x+b=0
🖼 Intuition Example:
Imagine two parallel lines enclosing the training points of both classes:
-
The middle line between them is the decision boundary.
-
The distance from this middle line to either parallel line is half the margin.
-
SVM’s goal is to push these lines as far apart as possible while still separating the classes.
If you want, I can create a visual diagram showing an SVM decision boundary with the margin and support vectors labeled. That usually makes this concept much clearer.
Do you want me to make it?
Q10: How does feature dimensionality affect SVM performance?
Other Ques From: Ch.1 from the 'Hundred-Page Machine Learning Book'
Feature dimensionality affects SVM performance in both positive and negative ways, depending on the balance between number of features and number of training samples.
🔹 1. Positive Effect — High Dimensions Can Help
-
In higher-dimensional spaces, data points are more likely to become linearly separable.
-
This is why SVMs often perform better with more features, especially when the original space is not separable.
-
The kernel trick can implicitly create such high-dimensional feature spaces.
Example:
In 2D, two intertwined spirals may be inseparable, but in 3D after a transformation, a plane can separate them.
🔹 2. Negative Effect — Curse of Dimensionality
When the number of features greatly exceeds the number of training samples:
-
SVM might overfit — the model finds boundaries that perfectly classify the training set but fail on new data.
-
The margin can become unstable because distances in high dimensions tend to lose meaning (all points look equally far apart).
-
More dimensions = higher computation cost.
🔹 3. Impact on Margin
-
SVM tries to maximize the margin between classes.
-
In very high dimensions, the margin can shrink or become noisy if irrelevant features are present.
-
This makes feature selection and regularization (via
C,gamma) crucial.
🔹 4. Practical Tips
-
If features are high-dimensional: Use regularization (
Cin SVM) to avoid overfitting. -
If data is sparse: Linear SVM (with a linear kernel) often works better.
-
If many irrelevant features exist: Apply feature selection (e.g., PCA, mutual information).
✅ Summary Table:
| Dimensionality Change | Effect on SVM | Risk |
|---|---|---|
| Moderate increase | Easier separation | Higher computation |
| Large increase (> samples) | Overfitting risk | Poor generalization |
| Many irrelevant features | Noisy margin | Reduced accuracy |
If you want, I can make you a diagram showing how increasing feature dimensions changes the SVM decision boundary and margin.
Do you want me to create that?






























