survival8: Cost Function For Logistic Regression Model

Wednesday, August 14, 2024

Cost Function For Logistic Regression Model

To See All ML Articles: Index of Machine Learning



Can you show the derivation for this Cost function?

Here's a step-by-step derivation of the cost function:
1. Understanding Logistic Regression:
In logistic regression, the probability that a given sample  $\mathbf{x}^{(i)}$  belongs to class  $y^{(i)}$  (which can be either 0 or 1) is modeled using the logistic function:
 $\phi(z) = \frac{1}{1 + e^{-z}} \quad \text{where} \quad z = \mathbf{w}^T \mathbf{x}$ 
For binary classification,  $y^{(i)}$  can either be 0 or 1, so:
 $P(y^{(i)} = 1 \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})$ 
 $P(y^{(i)} = 0 \mid \mathbf{x}^{(i)}; \mathbf{w}) = 1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})$ 
2. Likelihood Function for a Single Observation:
The likelihood for a single observation  $i$  can be written as:
 $P(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}) = \phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} \cdot \left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}}$ 
This formula works because:If  $y^{(i)} = 1$ , the likelihood is  $\phi(\mathbf{w}^T \mathbf{x}^{(i)})$ .
If  $y^{(i)} = 0$ , the likelihood is  $1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})$ .
3. Likelihood Function for the Entire Dataset:
Assuming that the samples are independent, the likelihood of the entire dataset (with  $n$  samples) is the product of the likelihoods of individual observations:
 $L(\mathbf{w}) = \prod_{i=1}^{n} P(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}) = \prod_{i=1}^{n} \left[\phi(\mathbf{w}^T \mathbf{x}^{(i)})^{y^{(i)}} \cdot \left(1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)})\right)^{1 - y^{(i)}}\right]$ 
4. Log-Likelihood:
It is often easier to work with the log-likelihood (because the logarithm turns the product into a sum), which is given by:
 $\log L(\mathbf{w}) = \sum_{i=1}^{n} \left[y^{(i)} \log \phi(\mathbf{w}^T \mathbf{x}^{(i)}) + (1 - y^{(i)}) \log (1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)}))\right]$ 
5. Cost Function:
In practice, we minimize the negative log-likelihood, which gives us the cost function for logistic regression:
 $J(\mathbf{w}) = -\log L(\mathbf{w}) = -\sum_{i=1}^{n} \left[y^{(i)} \log \phi(\mathbf{w}^T \mathbf{x}^{(i)}) + (1 - y^{(i)}) \log (1 - \phi(\mathbf{w}^T \mathbf{x}^{(i)}))\right]$ 
This cost function is convex, which makes it suitable for optimization algorithms like gradient descent.
Conclusion:
The derivation of the likelihood function for logistic regression involves expressing the probability of each individual sample given the model's parameters and then combining these probabilities across all samples in the dataset, assuming independence. The final likelihood function provides a way to measure how well the model's parameters fit the data, and minimizing the negative log-likelihood (or equivalently, maximizing the likelihood) leads to the optimal model parameters.

Reference
Python Machine Learning
Sebastian Raschka 
Packt Publishing (2015)

survival8

Pages

Wednesday, August 14, 2024

Cost Function For Logistic Regression Model

Can you show the derivation for this Cost function?

1. Understanding Logistic Regression:

2. Likelihood Function for a Single Observation:

3. Likelihood Function for the Entire Dataset:

4. Log-Likelihood:

5. Cost Function:

Conclusion:

Reference

No comments:

Post a Comment