Saturday, August 10, 2024

Getting the Geometric Intuition Behind Logistic Regression

To See All ML Articles: Index of Machine Learning
One of the first things to know about Logistic Regression is that:

• It is a Linear Model.

That means output of this model depends on the linear combination of it's features. Having said that: As a first step, let's create a linear combination of the features of the dataset with features : β0+β1x1+β2x2++βnxn\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = 0

where β0\beta_0 is the intercept, and β1,β2,,βn\beta_1, \beta_2, \ldots, \beta_n are the coefficients of the features x1,x2,,xnx_1, x_2, \ldots, x_n.

Geometric Intuition

This equation of linear combination of features resembles the equation of a plane.

The equation of a plane in three-dimensional space is a linear equation that represents all the points (x,y,z)(x, y, z) that lie on the plane. The general form of the equation of a plane is:

ax+by+cz=dax + by + cz = d Formula for Distance from a Point to a Plane

For a point P(x0,y0,z0)P(x_0, y_0, z_0), the distance DD from the point to the plane is given by:

D=ax0+by0+cz0da2+b2+c2D = \frac{|ax_0 + by_0 + cz_0 - d|}{\sqrt{a^2 + b^2 + c^2}}

Second thing to remember about Logistic Regression is that: • It is a Binary classification model.

But how does that matter?

Being a linear model: we can say that decision boundary for Logistic Regression would be line in 2D, plane in 3D and hyperplane in nD. Being a binary classification model: we can say that points will lie on either side of the decision boundary. This means that distance of a point on the decision boundary will have the following expression set to 0: D=ax0+by0+cz0da2+b2+c2D = \frac{|ax_0 + by_0 + cz_0 - d|}{\sqrt{a^2 + b^2 + c^2}} So: D = 0 for points on the decision boundary. Equivalently, we can say: ax0+by0+cz0d = 0 ***

Or For Our Logistic Regression Model:

β0+β1x1+β2x2++βnxn=0\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = 0 So the way to decide the class of a point is: β0+β1x1+β2x2++βnxn\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = 0 If this Beta expression > 0: point lies above the plane (on the one side of the plane) And if this expression < 0: point lies below the plane (on the other side of the plane)

Logistic (or Sigmoid) Comes Into Picture

Now, statisticians knew that the range of β0+β1x1+β2x2++βnxn\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = 0 (,)y \in (-\infty, \infty) To convert the Beta expression into a range of Probability and to also follow the properties of a Probability, we can pass it through a Logistic (or Sigmoid) expression: Logistic function is: σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}} For Logistic Regression, we write: σ(β, x)=11+e(β0+β1x1+β2x2++βnxn)\sigma(x) = \frac{1}{1 + e^{-\left(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n\right)}} Very Important Point :: This expression: σ(β, x) is the probability that data point in consideration lies in class Y=1.

Bonus Video:

Logistic Regression Indepth Intuition - Part 1 Logistic Regression Indepth Intuition - Part 2
Tags: Technology,Machine Learning,Mathematical Foundations for Data Science,

No comments:

Post a Comment