bugfree Icon
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course

Data Interview Question

Ad Raters: Part 2

bugfree Icon

Hello, I am bugfree Assistant. Feel free to ask me for any question related to this problem

Solution & Explanation

Question 1: Probability that a rater is lazy given all ads are rated as good

To find the probability that a rater is lazy given that they rated three ads as good, we use Bayes' Theorem:

P(LG3)=P(G3L)P(L)P(G3)P(L \mid G_3) = \frac{P(G_3 \mid L) \cdot P(L)}{P(G_3)}

Where:

  • P(G3L)=13=1P(G_3 \mid L) = 1^3 = 1 (Lazy raters always rate ads as good)
  • P(L)=0.2P(L) = 0.2
  • P(G3Lc)=0.63=0.216P(G_3 \mid L^c) = 0.6^3 = 0.216 (Careful raters rate good with 60% probability)
  • P(Lc)=0.8P(L^c) = 0.8

The total probability of all ads being rated as good, P(G3)P(G_3), is:

P(G3)=P(L)P(G3L)+P(Lc)P(G3Lc)=0.21+0.80.216=0.3728P(G_3) = P(L)P(G_3 \mid L) + P(L^c)P(G_3 \mid L^c) = 0.2 \cdot 1 + 0.8 \cdot 0.216 = 0.3728

Thus, the probability that a rater is lazy given they rated all three ads as good is:

P(LG3)=0.210.3728=0.5365P(L \mid G_3) = \frac{0.2 \cdot 1}{0.3728} = 0.5365

Explanation: This means there's a 53.65% chance the rater is lazy if they rate all three ads as good.


Question 2: Probability of being lazy as N approaches infinity

We need to determine how the probability of a rater being lazy changes as they rate all NN ads as good, where NN approaches infinity:

P(LGN)=P(L)P(GNL)P(GN)P(L \mid G_N) = \frac{P(L)P(G_N \mid L)}{P(G_N)}

Where:

  • P(GNL)=1N=1P(G_N \mid L) = 1^N = 1
  • P(GNLc)=0.6NP(G_N \mid L^c) = 0.6^N

The total probability P(GN)P(G_N) is:

P(GN)=P(L)P(GNL)+P(Lc)P(GNLc)=0.2+0.80.6NP(G_N) = P(L)P(G_N \mid L) + P(L^c)P(G_N \mid L^c) = 0.2 + 0.8 \cdot 0.6^N

As NN \to \infty, 0.6N00.6^N \to 0, so:

P(GN)0.2P(G_N) \to 0.2

Thus:

P(LGN)=0.20.2=1P(L \mid G_N) = \frac{0.2}{0.2} = 1

Explanation: As NN increases, the probability that a rater is lazy approaches 100% if they rate all ads as good.


Question 3: Method to distinguish between careful and lazy raters

To filter out lazy raters, we can set a threshold α\alpha for the probability P(LGN)P(L \mid G_N). We flag a rater as lazy if:

P(LGN)>αP(L \mid G_N) > \alpha

Rewriting:

0.20.2+0.80.6N>α\frac{0.2}{0.2 + 0.8 \cdot 0.6^N} > \alpha

Solving for NN, we get:

0.2>α(0.2+0.80.6N)0.2 > \alpha(0.2 + 0.8 \cdot 0.6^N)

0.2(1α)>0.80.6Nα0.2(1-\alpha) > 0.8 \cdot 0.6^N \cdot \alpha

Taking the natural logarithm:

ln(0.2)+ln(1α)<ln(0.8)+Nln(0.6)+ln(α)\ln(0.2) + \ln(1-\alpha) < \ln(0.8) + N \cdot \ln(0.6) + \ln(\alpha)

Solving for NN:

N=ln(0.2)+ln(1α)ln(0.8)ln(α)ln(0.6)N = \left\lceil \frac{\ln(0.2) + \ln(1-\alpha) - \ln(0.8) - \ln(\alpha)}{\ln(0.6)} \right\rceil

Explanation: This formula provides the minimum number of consecutive good ratings required to flag a rater as lazy with confidence level 1α1 - \alpha. For example, with α=0.05\alpha = 0.05, we find that a rater needs to rate at least 9 ads as good to be flagged as lazy with 95% confidence.