Wednesday, August 14, 2024

Cost Function For Logistic Regression Model

To See All ML Articles: Index of Machine Learning

Can you show the derivation for this Cost function?

Here's a step-by-step derivation of the cost function:

1. Understanding Logistic Regression:

  • In logistic regression, the probability that a given sample x(i)\mathbf{x}^{(i)} belongs to class y(i)y^{(i)} (which can be either 0 or 1) is modeled using the logistic function: ϕ(z)=11+ezwherez=wTx\phi(z) = \frac{1}{1 + e^{-z}} \quad \text{where} \quad z = \mathbf{w}^T \mathbf{x}
  • For binary classification, y(i)y^{(i)} can either be 0 or 1, so: P(y(i)=1x(i);w)=ϕ(wTx(i))P(y^{(i)} = 1 \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)}) P(y(i)=0x(i);w)=1ϕ(wTx(i))P(y^{(i)} = 0 \mid \mathbf{x}^{(i)}; \mathbf{w}) = 1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})

2. Likelihood Function for a Single Observation:

  • The likelihood for a single observation ii can be written as: P(y(i)x(i);w)=ϕ(wTx(i))y(i)(1ϕ(wTx(i)))1y(i)P(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} \cdot \left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}}
  • This formula works because:
    • If y(i)=1y^{(i)} = 1, the likelihood is ϕ(wTx(i))\phi(\mathbf{w}^T \mathbf{x}^{(i)}).
    • If y(i)=0y^{(i)} = 0, the likelihood is 1ϕ(wTx(i))1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)}).

3. Likelihood Function for the Entire Dataset:

  • Assuming that the samples are independent, the likelihood of the entire dataset (with nn samples) is the product of the likelihoods of individual observations: L(w)=i=1nP(y(i)x(i);w)=i=1n[ϕ(wTx(i))y(i)(1ϕ(wTx(i)))1y(i)]L(\mathbf{w}) = \prod_{i=1}^{n} P(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}) = \prod_{i=1}^{n} \left[\phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} \cdot \left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}}\right]

4. Log-Likelihood:

  • It is often easier to work with the log-likelihood (because the logarithm turns the product into a sum), which is given by: logL(w)=i=1n[y(i)logϕ(wTx(i))+(1y(i))log(1ϕ(wTx(i)))]\log L(\mathbf{w}) = \sum_{i=1}^{n} \left[y^{(i)} \log \phi(\mathbf{w}^T \mathbf{x}^{(i)}) + (1 - y^{(i)}) \log (1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)}))\right]

5. Cost Function:

  • In practice, we minimize the negative log-likelihood, which gives us the cost function for logistic regression: J(w)=logL(w)=i=1n[y(i)logϕ(wTx(i))+(1y(i))log(1ϕ(wTx(i)))]J(\mathbf{w}) = -\log L(\mathbf{w}) = -\sum_{i=1}^{n} \left[y^{(i)} \log \phi(\mathbf{w}^T \mathbf{x}^{(i)}) + (1 - y^{(i)}) \log (1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)}))\right]
  • This cost function is convex, which makes it suitable for optimization algorithms like gradient descent.

Conclusion:

The derivation of the likelihood function for logistic regression involves expressing the probability of each individual sample given the model's parameters and then combining these probabilities across all samples in the dataset, assuming independence. The final likelihood function provides a way to measure how well the model's parameters fit the data, and minimizing the negative log-likelihood (or equivalently, maximizing the likelihood) leads to the optimal model parameters.

Reference

Python Machine Learning Sebastian Raschka Packt Publishing (2015)
Tags: Machine Learning,Mathematical Foundations for Data Science,

No comments:

Post a Comment