Data Interview Question

Maximum Likelihood Estimation

bugfree Icon

Hello, I am bugfree Assistant. Feel free to ask me for any question related to this problem

Solution & Explanation

Maximum Likelihood Estimation (MLE):

Maximum Likelihood Estimation is a statistical method used to estimate the parameters of a probability distribution by maximizing the likelihood function. The likelihood function measures how likely it is to observe the given data under different parameter values.

Key Concepts:

  1. Likelihood Function:

    • Represents the probability of observing the data given a set of parameters.
    • For a given data set {y1,y2,...,yn}\{y_1, y_2, ..., y_n\}, and model parameters θ\theta, the likelihood function is given by: L(θy1,y2,...,yn)=P(y1θ)×P(y2θ)×...×P(ynθ)L(\theta | y_1, y_2, ..., y_n) = P(y_1 | \theta) \times P(y_2 | \theta) \times ... \times P(y_n | \theta)
    • In practice, we often use the log-likelihood function, which turns the product into a sum: logL(θy1,y2,...,yn)=i=1nlogP(yiθ)\log L(\theta | y_1, y_2, ..., y_n) = \sum_{i=1}^{n} \log P(y_i | \theta)
  2. Objective:

    • The goal of MLE is to find the parameter θ\theta that maximizes the likelihood function: θ^=argmaxθlogL(θy1,y2,...,yn)\hat{\theta} = \arg\max_{\theta} \log L(\theta | y_1, y_2, ..., y_n)
  3. Steps in MLE:

    • Define the Probability Model:
      • Assume the data follows a specific probability distribution (e.g., normal, binomial, etc.).
      • Identify the parameters θ\theta of this distribution.
    • Construct the Likelihood Function:
      • Based on the assumed distribution, write down the likelihood function.
    • Take the Log-Likelihood:
      • Convert the product of probabilities into a sum by taking the logarithm.
    • Differentiate and Solve:
      • Compute the derivative of the log-likelihood with respect to θ\theta.
      • Set the derivative to zero and solve for θ\theta to find the maximum likelihood estimator.
  4. Connection to Bayesian and Frequentist Perspectives:

    • Frequentist Perspective:
      • MLE does not incorporate prior distributions or evidence, similar to the frequentist approach.
    • Bayesian Perspective:
      • MLE uses the concept of likelihood, which is central in Bayesian statistics.

Example:

Suppose we have a sample {x1,x2,...,xn}\{x_1, x_2, ..., x_n\} from a normal distribution with unknown mean μ\mu and known variance σ2\sigma^2. The likelihood function is: L(μx1,x2,...,xn)=i=1n12πσ2e(xiμ)22σ2L(\mu | x_1, x_2, ..., x_n) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}

Taking the log-likelihood: logL(μx1,x2,...,xn)=n2log(2πσ2)12σ2i=1n(xiμ)2\log L(\mu | x_1, x_2, ..., x_n) = -\frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2

Differentiating with respect to μ\mu and solving gives: μ^=1ni=1nxi\hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i

This μ^\hat{\mu} is the maximum likelihood estimate of the mean, which is the sample mean.