Wednesday, August 14, 2024

Likelihood of a Single Observation For Logistic Regression Model

To See All ML Articles: Index of Machine Learning

How are you arriving the expression for Likelihood Function for a Single Observation?

The expression for the likelihood function of a single observation is derived based on the following reasoning:

1. Understanding the Logistic Function:

  • In logistic regression, we model the probability that a binary outcome y(i)y^{(i)} equals 1 (i.e., the event happens) using the logistic function: ϕ(z)=11+ez\phi(z) = \frac{1}{1 + e^{-z}}
  • Where z=wTx(i)z = \mathbf{w}^T \mathbf{x}^{(i)}, with w\mathbf{w} being the weight vector and x(i)\mathbf{x}^{(i)} the feature vector for the ii-th observation.

2. Probability of Class 1 and Class 0:

  • The probability that y(i)=1y^{(i)} = 1 (the event happens) given the input features x(i)\mathbf{x}^{(i)} is: P(y(i)=1x(i);w)=ϕ(wTx(i))P(y^{(i)} = 1 \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})
  • The probability that y(i)=0y^{(i)} = 0 (the event does not happen) is: P(y(i)=0x(i);w)=1ϕ(wTx(i))P(y^{(i)} = 0 \mid \mathbf{x}^{(i)}; \mathbf{w}) = 1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})

3. Likelihood Function for a Single Observation:

  • The likelihood for a single observation ii is the probability of observing y(i)y^{(i)} given the input x(i)\mathbf{x}^{(i)} and model parameters w\mathbf{w}. Since y(i)y^{(i)} can be either 0 or 1, we can combine the two cases into a single expression using the following formulation: P(y(i)x(i);w)=ϕ(wTx(i))y(i)(1ϕ(wTx(i)))1y(i)P(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} \cdot \left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}}

4. Explanation of the Formula:

  • When y(i)=1y^{(i)} = 1:

    • The expression simplifies to: P(y(i)=1x(i);w)=ϕ(wTx(i))P(y^{(i)} = 1 \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})
    • This is because y(i)=1y^{(i)} = 1 makes the first term ϕ(wTx(i))y(i)\phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} become ϕ(wTx(i))\phi(\mathbf{w}^T \mathbf{x}^{(i)}) and the second term (1ϕ(wTx(i)))1y(i)\left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}} becomes 1.
  • When y(i)=0y^{(i)} = 0:

    • The expression simplifies to: P(y(i)=0x(i);w)=1ϕ(wTx(i))P(y^{(i)} = 0 \mid \mathbf{x}^{(i)}; \mathbf{w}) = 1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})
    • This is because y(i)=0y^{(i)} = 0 makes the first term ϕ(wTx(i))y(i)\phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} become 1 (since anything raised to the power of 0 is 1) and the second term (1ϕ(wTx(i)))1y(i)\left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}} becomes 1ϕ(wTx(i))1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)}).

5. General Case:

  • By combining both cases (i.e., whether y(i)y^{(i)} is 0 or 1) into a single expression, we arrive at the following likelihood function for a single observation: P(y(i)x(i);w)=ϕ(wTx(i))y(i)(1ϕ(wTx(i)))1y(i)P(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} \cdot \left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}}
  • This expression allows us to compute the probability of observing the outcome y(i)y^{(i)} given the input features and the model's parameters, regardless of whether y(i)y^{(i)} is 0 or 1.

Summary:

The key idea is that this formulation captures both possible outcomes for y(i)y^{(i)} in a single expression, by raising the probability to the power of y(i)y^{(i)}. This way, when y(i)y^{(i)} is 1, the first term dominates, and when y(i)y^{(i)} is 0, the second term dominates, effectively covering both scenarios.

Reference

Python Machine Learning Sebastian Raschka Packt Publishing (2015)
Tags: Machine Learning,Mathematical Foundations for Data Science,

No comments:

Post a Comment