Monday, February 12, 2024

Ex 1.12 - Pattern Recognition and ML by Christopher Bishop

Pre-read

Question

Solution from Book

Solution Using ChatGPT

Are you aware of the result E[xn, xm] = mu^2 + I . sigma ^ 2

I assume you meant E[XnXm]=μ2+Iσ2E[X_n \cdot X_m] = \mu^2 + I \cdot \sigma^2, where II is the Kronecker delta, which is 1 when n=mn = m and 0 otherwise.

In that case, E[XnXm]E[X_n \cdot X_m] simplifies to:

E[XnXm]={μ2+σ2if n=mμ2if nmE[X_n \cdot X_m] = \begin{cases} \mu^2 + \sigma^2 & \text{if } n = m \\ \mu^2 & \text{if } n \neq m \end{cases}

This is because when n=mn = m, you have the product of a variable with itself, and the additional σ2\sigma^2 term comes from the covariance term. When nmn \neq m, the covariance term is zero, and you are left with the square of the mean.

If you have further questions or if there's anything specific you'd like to clarify, feel free to ask!

Can you derive this expression?

Certainly! Let's derive the expression for E[XnXm]E[X_n \cdot X_m] for two continuous random variables XnX_n and XmX_m from the same Gaussian distribution N(μ,σ2)N(\mu, \sigma^2) with mean μ\mu and variance σ2\sigma^2.

The expectation of the product is given by:

E[XnXm]=xnxmf(xn)f(xm)dxndxmE[X_n \cdot X_m] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x_n \cdot x_m \cdot f(x_n) \cdot f(x_m) \,dx_n \,dx_m

Since XnX_n and XmX_m are from the same distribution, let's denote f(x)f(x) as the probability density function (PDF) of XX. The expression becomes:

E[XnXm]=xnxmf(x)f(x)dxndxmE[X_n \cdot X_m] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x_n \cdot x_m \cdot f(x) \cdot f(x) \,dx_n \,dx_m

Now, split the integral into two cases: n=mn = m and nmn \neq m.

Case 1: n=mn = m

E[XnXm]=xn2f(x)f(x)dxndxmE[X_n \cdot X_m] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x_n^2 \cdot f(x) \cdot f(x) \,dx_n \,dx_m

=(xn2f(x)dxn)(f(x)dxm)= \left(\int_{-\infty}^{\infty} x_n^2 \cdot f(x) \,dx_n\right) \cdot \left(\int_{-\infty}^{\infty} f(x) \,dx_m\right)

=μ2+σ2= \mu^2 + \sigma^2

This result comes from the fact that the integral of x2x^2 over the entire range for a Gaussian distribution with mean μ\mu is μ2+σ2\mu^2 + \sigma^2.

Case 2: nmn \neq m

E[XnXm]=xnxmf(x)f(x)dxndxmE[X_n \cdot X_m] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x_n \cdot x_m \cdot f(x) \cdot f(x) \,dx_n \,dx_m

=(xnf(x)dxn)(xmf(x)dxm)= \left(\int_{-\infty}^{\infty} x_n \cdot f(x) \,dx_n\right) \cdot \left(\int_{-\infty}^{\infty} x_m \cdot f(x) \,dx_m\right)

=μμ= \mu \cdot \mu

This result comes from the fact that the integral of xx over the entire range for a Gaussian distribution with mean μ\mu is μ\mu.

Therefore, combining both cases:

E[XnXm]=μ2+Iσ2E[X_n \cdot X_m] = \mu^2 + I \cdot \sigma^2

where II is the Kronecker delta, equal to 1 when n=mn = m and 0 otherwise.

References

Tags: Mathematical Foundations for Data Science,Machine Learning,

Thursday, February 8, 2024

Solution 1.11 - On Gaussian Distribution and it's Log Likelihood (Pattern Recognition and ML by Christopher Bishop)

Can you give me the equation for Gaussian distribution?

Can you give me it's log likelihood function?

Note: This is already given as part of the problem statement.

Can you show the detailed steps how you arrived at it?

Some paperwork

In the next slide we will solve for the below expression:

Some Logarithm Properties we would use:

Can you maximize the log likelihood expression w.r.t. mu and show the detailed steps?

Some paperwork for derivative of:

We will prove it next:

Some ‘Differentiation’ identities we would see next

And now maximize log likelihood w.r.t. sigma-squared and show the detailed steps

Identity:

References

https://chat.openai.com/share/4278db33-9b76-4d8b-9586-4e1e78844433

Tags: Mathematical Foundations for Data Science,Machine Learning,

Question 1.11 - On Gaussian Distribution and it's Log Likelihood (Pattern Recognition and ML by Christopher Bishop)

Pre-read:

(1)
(2)
(3) From (1.46) and (1.53), the log likelihood function can be written in the form:
Question: By setting the derivatives of the log likelihood function (1.54) with respect to µ and σ^2 equal to zero, verify the results (1.55) and (1.56). Link to Solution