Saturday, January 13, 2024

Ex 1.9 - Show that the mode of the multivariate Gaussian is given by µ. (Pattern Recognition and ML by Christopher Bishop)

Pre-read (1):
Pre-read (2):
Pre-read (3):
Ex 1.9.A: Show that the mode (i.e. the maximum) of the Gaussian distribution (1.46) is given by µ. Ex 1.9.B: Similarly, show that the mode of the multivariate Gaussian (1.52) is given by µ. Solution hint from the book:

Solution by ChatGPT

The mode of a multivariate Gaussian distribution is the point at which the probability density function (PDF) is maximized. For a multivariate Gaussian distribution with mean vector μ and covariance matrix Σ, the PDF is given by:

f ( x ; μ , Σ ) = 1 ( 2 π ) n / 2 det ( Σ ) 1 / 2 exp ( 1 2 ( x μ ) T Σ 1 ( x μ ) ) f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{n/2} \det(\boldsymbol{\Sigma})^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)

where:

  • x \mathbf{x} is the vector of random variables.
  • μ \boldsymbol{\mu} is the mean vector.
  • Σ \boldsymbol{\Sigma} is the covariance matrix.
  • n n is the dimensionality of the multivariate distribution.
  • det ( Σ ) \det(\boldsymbol{\Sigma}) is the determinant of the covariance matrix.

To find the mode, we need to maximize this PDF with respect to x \mathbf{x} . Taking the natural logarithm (log) of the PDF simplifies the maximization problem, as the logarithm is a monotonic function:

ln f ( x ; μ , Σ ) = n 2 ln ( 2 π ) 1 2 ln ( det ( Σ ) ) 1 2 ( x μ ) T Σ 1 ( x μ ) \ln f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = -\frac{n}{2} \ln(2\pi) - \frac{1}{2} \ln(\det(\boldsymbol{\Sigma})) - \frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})

To find the mode, we need to find the point x \mathbf{x} at which the derivative of the log PDF with respect to x \mathbf{x} is zero. This can be expressed as:

x ln f ( x ; μ , Σ ) = 0 \frac{\partial}{\partial \mathbf{x}} \ln f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = 0

Solving this equation will give us the mode x \mathbf{x} . Taking the derivative, we get:

Σ 1 ( x μ ) = 0 -\boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) = 0

Solving for x \mathbf{x} , we find:

x = μ \mathbf{x} = \boldsymbol{\mu}

Therefore, the mode of the multivariate Gaussian distribution is given by the mean vector μ \boldsymbol{\mu} .

Solution by Bard

Here's a detailed proof demonstrating that the mode of a multivariate Gaussian distribution is indeed its mean:

1. Probability Density Function (PDF) of Multivariate Gaussian:

  • The PDF of a multivariate Gaussian distribution with mean vector μ and covariance matrix Σ is given by:

    f(x) = (1 / ((2π)^n/2 * |Σ|^1/2)) * exp(-1/2 * (x - μ)^T * Σ^(-1) * (x - μ))
        

2. Finding the Mode:

  • The mode is the point where the PDF has its maximum value.
  • To find the mode, we need to find the maximum of f(x) with respect to x.

3. Taking the Gradient:

  • We differentiate f(x) with respect to x:

    ∇f(x) = -Σ^(-1) * (x - μ) * f(x)
        

4. Setting the Gradient to Zero:

  • At the maximum, the gradient will be zero:

    ∇f(x) = 0  ==>  -Σ^(-1) * (x - μ) = 0
        

5. Solving for x:

  • Since Σ^(-1) is invertible, we can multiply both sides by its inverse:

    x - μ = 0
        

6. Mode x = μ:

  • This leads to the solution x = μ.

7. Second Derivative Test (Hessian):

  • To confirm that this is a maximum, we can examine the Hessian matrix, which is the matrix of second partial derivatives.
  • The Hessian of f(x) is -Σ^(-1), which is negative definite (all eigenvalues are negative).
  • This guarantees that f(x) has a maximum at x = μ.

Therefore, we have proven that the mode of the multivariate Gaussian distribution is equal to its mean vector μ.

What happens when we reverse the limits of a definite integral?

When you reverse the limits of a definite integral, the sign of the integral changes. Mathematically, if you have a definite integral of a function f ( x ) f(x) from a a to b b :

a b f ( x ) d x \int_{a}^{b} f(x) \,dx

and you reverse the limits to b b and a a :

b a f ( x ) d x \int_{b}^{a} f(x) \,dx

the new integral is equal to the negative of the original integral:

b a f ( x ) d x = a b f ( x ) d x \int_{b}^{a} f(x) \,dx = -\int_{a}^{b} f(x) \,dx

This is a consequence of the fact that the definite integral represents the signed area under the curve of the function between the given limits. Reversing the limits effectively changes the orientation of the interval, resulting in a change of sign for the area.

In summary, reversing the limits of a definite integral changes the sign of the integral.

Can you give the example of an odd integrand and prove that it is odd?

Example of an odd integrand and it's proof.

Certainly! Let's consider the function f ( x ) = x 3 f(x) = x^3 . We'll integrate this function over a symmetric interval, say [ a , a ] [-a, a] , and then show that the result is zero, proving that f ( x ) f(x) is an odd function.

The integral is given by:

a a x 3 d x \int_{-a}^{a} x^3 \,dx

To evaluate this integral, we'll use the properties of odd functions. An odd function f ( x ) f(x) satisfies f ( x ) = f ( x ) f(-x) = -f(x) . Let's evaluate f ( x ) f(-x) for f ( x ) = x 3 f(x) = x^3 :

f ( x ) = ( x ) 3 = x 3 f(-x) = (-x)^3 = -x^3

Now, compare f ( x ) f(-x) with f ( x ) -f(x) :

f ( x ) = x 3 -f(x) = -x^3

As we can see, f ( x ) = f ( x ) f(-x) = -f(x) . This property holds for all values of x x .

Now, let's calculate the integral:

a a x 3 d x = 1 4 x 4 a a \int_{-a}^{a} x^3 \,dx = \frac{1}{4}x^4 \Big|_{-a}^{a}

= 1 4 ( a 4 ( a ) 4 ) = \frac{1}{4}(a^4 - (-a)^4)

= 1 4 ( a 4 a 4 ) = \frac{1}{4}(a^4 - a^4)

= 1 4 ( 0 ) = \frac{1}{4}(0)

= 0 = 0

The result is zero, which means that the integral of the odd function f ( x ) = x 3 f(x) = x^3 over the symmetric interval [ a , a ] [-a, a] is zero. This demonstrates that f ( x ) = x 3 f(x) = x^3 is an odd function.