Saturday, January 13, 2024

Ex 1.9 - Show that the mode of the multivariate Gaussian is given by µ. (Pattern Recognition and ML by Christopher Bishop)

Pre-read (1):
Pre-read (2):
Pre-read (3):
Ex 1.9.A: Show that the mode (i.e. the maximum) of the Gaussian distribution (1.46) is given by µ. Ex 1.9.B: Similarly, show that the mode of the multivariate Gaussian (1.52) is given by µ. Solution hint from the book:

Solution by ChatGPT

The mode of a multivariate Gaussian distribution is the point at which the probability density function (PDF) is maximized. For a multivariate Gaussian distribution with mean vector μ and covariance matrix Σ, the PDF is given by:

f ( x ; μ , Σ ) = 1 ( 2 π ) n / 2 det ( Σ ) 1 / 2 exp ( 1 2 ( x μ ) T Σ 1 ( x μ ) ) f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{n/2} \det(\boldsymbol{\Sigma})^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)

where:

  • x \mathbf{x} is the vector of random variables.
  • μ \boldsymbol{\mu} is the mean vector.
  • Σ \boldsymbol{\Sigma} is the covariance matrix.
  • n n is the dimensionality of the multivariate distribution.
  • det ( Σ ) \det(\boldsymbol{\Sigma}) is the determinant of the covariance matrix.

To find the mode, we need to maximize this PDF with respect to x \mathbf{x} . Taking the natural logarithm (log) of the PDF simplifies the maximization problem, as the logarithm is a monotonic function:

ln f ( x ; μ , Σ ) = n 2 ln ( 2 π ) 1 2 ln ( det ( Σ ) ) 1 2 ( x μ ) T Σ 1 ( x μ ) \ln f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = -\frac{n}{2} \ln(2\pi) - \frac{1}{2} \ln(\det(\boldsymbol{\Sigma})) - \frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})

To find the mode, we need to find the point x \mathbf{x} at which the derivative of the log PDF with respect to x \mathbf{x} is zero. This can be expressed as:

x ln f ( x ; μ , Σ ) = 0 \frac{\partial}{\partial \mathbf{x}} \ln f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = 0

Solving this equation will give us the mode x \mathbf{x} . Taking the derivative, we get:

Σ 1 ( x μ ) = 0 -\boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) = 0

Solving for x \mathbf{x} , we find:

x = μ \mathbf{x} = \boldsymbol{\mu}

Therefore, the mode of the multivariate Gaussian distribution is given by the mean vector μ \boldsymbol{\mu} .

Solution by Bard

Here's a detailed proof demonstrating that the mode of a multivariate Gaussian distribution is indeed its mean:

1. Probability Density Function (PDF) of Multivariate Gaussian:

  • The PDF of a multivariate Gaussian distribution with mean vector μ and covariance matrix Σ is given by:

    f(x) = (1 / ((2π)^n/2 * |Σ|^1/2)) * exp(-1/2 * (x - μ)^T * Σ^(-1) * (x - μ))
        

2. Finding the Mode:

  • The mode is the point where the PDF has its maximum value.
  • To find the mode, we need to find the maximum of f(x) with respect to x.

3. Taking the Gradient:

  • We differentiate f(x) with respect to x:

    ∇f(x) = -Σ^(-1) * (x - μ) * f(x)
        

4. Setting the Gradient to Zero:

  • At the maximum, the gradient will be zero:

    ∇f(x) = 0  ==>  -Σ^(-1) * (x - μ) = 0
        

5. Solving for x:

  • Since Σ^(-1) is invertible, we can multiply both sides by its inverse:

    x - μ = 0
        

6. Mode x = μ:

  • This leads to the solution x = μ.

7. Second Derivative Test (Hessian):

  • To confirm that this is a maximum, we can examine the Hessian matrix, which is the matrix of second partial derivatives.
  • The Hessian of f(x) is -Σ^(-1), which is negative definite (all eigenvalues are negative).
  • This guarantees that f(x) has a maximum at x = μ.

Therefore, we have proven that the mode of the multivariate Gaussian distribution is equal to its mean vector μ.

No comments:

Post a Comment