Showing posts with label Machine Learning. Show all posts
Showing posts with label Machine Learning. Show all posts

Friday, March 27, 2026

Supervised Machine Learning - Regression and Classification (at DeepLearning.ai)

View Course on DeepLearning.AI    View Other Courses Audited By Us


Download Lecture Slides




Quizzes

Week 1

1.1: Supervised vs unsupervised learning

1.2: Regression

1.3: Train the model with gradient descent


Week 2

2.1: Multiple linear regression

2.2: Gradient descent in practice


Week 3

3.1: Classification with logistic regression

3.2: Cost function for logistic regression

3.3: Gradient descent for logistic regression

3.4: The problem of overfitting

Tags: Machine Learning,Classification,Regression

Friday, November 7, 2025

YouTube Academy For Machine Learning



Toggle All Sections

What is Machine Learning?

What is On-device Machine Learning?

Supervised Machine Learning

  1. Google Open Online Education

Types of Machine Learning

Generalization

Linear Regression

Supervised Learning

Logistic Regression

Decision Tree

  1. Intuitive Machine Learning

Support Vector Machines

Gradient Descent

Neural Networks

Machine Learning Courses

Tags: Machine Learning,Technology,Video,YouTube Academy,

Tuesday, August 19, 2025

20 Interview Questions - From the 'Hundred-Page Machine Learning Book' - Ch.2

To See All Interview Preparation Articles: Index For Interviews Preparation

Prompt: Extract 10 simple and 10 complex questions from this pdf with detailed answers

Here’s a set of 10 simple and 10 complex questions with detailed answers from the chapter you provided (pp. 29–42 of The Hundred-Page Machine Learning Book).
I’ve made sure the simple ones check basic understanding, while the complex ones require reasoning, synthesis, or application.


Simple Questions (with Answers)

1. What is the main difference between Linear Regression and SVM in terms of their objective?
Answer:
In Linear Regression, the model’s hyperplane is chosen to be as close as possible to all training examples, minimizing the average squared error. In SVM, the hyperplane serves as a decision boundary that separates classes and is placed to maximize the margin between them.


2. What is the loss function used in Linear Regression?
Answer:
Linear Regression uses squared error loss, which is (f(xi)yi)2(f(x_i) - y_i)^2 for each example, averaged across the dataset.


3. Why is squared loss preferred over absolute loss in Linear Regression?
Answer:
Squared loss has a continuous derivative, making it smoother and easier to optimize using linear algebra for closed-form solutions, unlike absolute loss, which is not smooth.


4. Is Logistic Regression used for regression tasks?
Answer:
No. Despite the name, Logistic Regression is a classification algorithm, typically for binary classification, not regression.


5. What function is used in Logistic Regression to map outputs between 0 and 1?
Answer:
The sigmoid (logistic) function:

f(x)=11+exf(x) = \frac{1}{1 + e^{-x}}

6. What is the main optimization criterion for Logistic Regression?
Answer:
It uses Maximum Likelihood Estimation (MLE), maximizing the likelihood of the observed data under the model.


7. What does the ID3 algorithm use to determine the best split?
Answer:
ID3 uses entropy to measure uncertainty and selects the feature/threshold that minimizes the weighted average entropy after the split.


8. What is the difference between hard-margin and soft-margin SVM?
Answer:
Hard-margin SVM requires perfect separation of data without errors. Soft-margin SVM allows some misclassifications using hinge loss and a regularization parameter CC to balance margin size and classification errors.


9. What is the kernel trick in SVM?
Answer:
It’s a method to compute dot products in a higher-dimensional feature space without explicitly transforming the data, using a kernel function.


10. What does the parameter kk represent in k-Nearest Neighbors (kNN)?
Answer:
It represents the number of nearest neighbors considered when predicting the label for a new example.


Complex Questions (with Answers)

1. Explain why overfitting can occur with high-degree polynomial regression, using the concepts from the text.
Answer:
High-degree polynomial regression can create a curve that fits the training data almost perfectly, capturing noise and outliers rather than the underlying pattern. This leads to poor generalization on unseen data, as shown in Fig. 2 of the text. The curve follows training points too closely, increasing variance and overfitting.


2. Why does Logistic Regression use log-likelihood instead of raw likelihood for optimization?
Answer:
Log-likelihood simplifies the product of probabilities into a sum (via logarithms), making it easier to compute and differentiate. Since the log function is monotonically increasing, maximizing log-likelihood yields the same result as maximizing likelihood but is more numerically stable and computationally convenient.


3. How does the choice of the hyperparameter CC in SVM affect bias and variance?
Answer:

  • High CC: Focuses on minimizing classification errors, leading to low bias but high variance (risk of overfitting).

  • Low CC: Allows more misclassifications for a larger margin, increasing bias but reducing variance (better generalization).


4. Describe the process of finding the best split in the ID3 decision tree algorithm.
Answer:
For each feature and possible threshold, ID3:

  1. Splits the dataset into two subsets.

  2. Computes the weighted average entropy of the subsets.

  3. Selects the split (feature + threshold) with the lowest weighted entropy.
    This process is repeated recursively until stopping criteria (e.g., pure nodes, max depth) are met.


5. Why is the kernel trick computationally advantageous compared to explicit transformation?
Answer:
Explicit transformation to high-dimensional space is computationally expensive and memory-intensive. The kernel trick computes the dot product in that space directly from the original feature vectors, avoiding the explicit mapping and thus saving time and resources.


6. Explain the role of hinge loss in soft-margin SVM.
Answer:
Hinge loss, defined as max(0,1yi(wxib))\max(0, 1 - y_i(w x_i - b)), penalizes points on the wrong side of the margin or decision boundary. It allows the SVM to tolerate some violations (misclassifications) while still aiming to maximize the margin, balancing between training accuracy and generalization.


7. In what situations would the RBF kernel be preferred over a polynomial kernel?
Answer:
RBF kernel is preferred when:

  • The relationship between features is highly nonlinear.

  • You expect the decision boundary to be smooth rather than oscillatory.

  • The data is not well-represented by polynomial combinations of features.
    RBF maps to an infinite-dimensional space, allowing flexible, smooth boundaries.


8. How does cosine similarity differ from Euclidean distance in kNN, and when might it be preferred?
Answer:

  • Cosine similarity measures the angle between vectors, focusing on direction rather than magnitude.

  • Euclidean distance measures absolute geometric distance.
    Cosine similarity is preferred when the magnitude of features is irrelevant (e.g., text data with TF-IDF vectors) and direction (pattern) matters more.


9. Why does the ID3 algorithm not guarantee an optimal decision tree?
Answer:
ID3 makes greedy, local decisions at each split without considering future splits. It doesn’t backtrack to evaluate whether earlier splits could lead to better overall trees, so it may get stuck in local optima.


10. Explain how kNN can be interpreted as a locally linear classifier under certain assumptions.
Answer:
With normalized feature vectors and cosine similarity, kNN predicts based on a weighted sum of feature vectors from neighbors with label 1. This sum acts like a local linear model, where the classification is based on the dot product (cosine similarity) between the summed vector and the input vector.


If you want, I can also make a printable 2-column table with these questions and answers so you can directly use it as a quick study sheet. That would make it easy to revise before exams. Would you like me to prepare that?

Tags: Technology,Machine Learning,Interview Preparation,

Wednesday, August 13, 2025

Q8: What is a margin in SVM?

To See All Interview Preparation Articles: Index For Interviews Preparation
Other Ques From: Ch.1 from the 'Hundred-Page Machine Learning Book'

In Support Vector Machines (SVM), the margin is the distance between the decision boundary (the separating hyperplane) and the closest data points from each class.


📌 Key Points:

  1. Support Vectors

    • The data points that lie closest to the decision boundary.

    • They "support" or define the position and orientation of the boundary.

  2. Margin Definition

    • The margin is measured as the perpendicular distance from the decision boundary to the nearest support vector from either class.

    • In a maximum-margin classifier, SVM chooses the boundary that maximizes this distance.

  3. Why Maximize the Margin?

    • A larger margin means the decision boundary is farther from any training data point, which usually improves generalization on unseen data.

    • This aligns with the structural risk minimization principle: simpler decision boundaries generalize better.


📐 Formula for Margin

For a linear SVM, the margin MM is:

M=2wM = \frac{2}{\|w\|}

Where:

  • ww = weight vector defining the hyperplane wx+b=0w \cdot x + b = 0


🖼 Intuition Example:

Imagine two parallel lines enclosing the training points of both classes:

  • The middle line between them is the decision boundary.

  • The distance from this middle line to either parallel line is half the margin.

  • SVM’s goal is to push these lines as far apart as possible while still separating the classes.


If you want, I can create a visual diagram showing an SVM decision boundary with the margin and support vectors labeled. That usually makes this concept much clearer.
Do you want me to make it?

Tags: Technology,Machine Learning,Interview Preparation,

Q10: How does feature dimensionality affect SVM performance?

To See All Interview Preparation Articles: Index For Interviews Preparation
Other Ques From: Ch.1 from the 'Hundred-Page Machine Learning Book'

Feature dimensionality affects SVM performance in both positive and negative ways, depending on the balance between number of features and number of training samples.


🔹 1. Positive Effect — High Dimensions Can Help

  • In higher-dimensional spaces, data points are more likely to become linearly separable.

  • This is why SVMs often perform better with more features, especially when the original space is not separable.

  • The kernel trick can implicitly create such high-dimensional feature spaces.

Example:
In 2D, two intertwined spirals may be inseparable, but in 3D after a transformation, a plane can separate them.


🔹 2. Negative Effect — Curse of Dimensionality

When the number of features greatly exceeds the number of training samples:

  • SVM might overfit — the model finds boundaries that perfectly classify the training set but fail on new data.

  • The margin can become unstable because distances in high dimensions tend to lose meaning (all points look equally far apart).

  • More dimensions = higher computation cost.


🔹 3. Impact on Margin

  • SVM tries to maximize the margin between classes.

  • In very high dimensions, the margin can shrink or become noisy if irrelevant features are present.

  • This makes feature selection and regularization (via C, gamma) crucial.


🔹 4. Practical Tips

  • If features are high-dimensional: Use regularization (C in SVM) to avoid overfitting.

  • If data is sparse: Linear SVM (with a linear kernel) often works better.

  • If many irrelevant features exist: Apply feature selection (e.g., PCA, mutual information).


Summary Table:

Dimensionality ChangeEffect on SVMRisk
Moderate increaseEasier separationHigher computation
Large increase (> samples)Overfitting riskPoor generalization
Many irrelevant featuresNoisy marginReduced accuracy

If you want, I can make you a diagram showing how increasing feature dimensions changes the SVM decision boundary and margin.
Do you want me to create that?

Tags: Technology,Machine Learning,Interview Preparation,