My Notes for Online Experiment -- A/B Testings

Practical Applications and Concepts of AB testings.

Posted by Jiashu Miao on August 24, 2022

AB Testing

Workflow

  1. know business goal.google  
  2. define goal metrics
  3. Unit of diversion (randomization unit) - views, users, or cookie
  4. Population - to run experiment only on the population that will be affected (instead of all users, use users who initiate the checkout process for a checkout feature)
  5. Sample size - based on the baseline, practical significant level, significance level, power
  6. Duration - considering usage pattern, business cycle, and novelty effect, also if the experiment is risky or not reversible, should start with very small size and duration should be longer
  7. Assignment - how to split control and treatment (random? Network effect?)
  8. sanity check & check if the result is significant to reject the null hypothesis and accept there’s a difference in control and experiment
  9. launch or not – trade off

Misinterpretation of the statistical power

Lack of statistical power

  • Null Hypothesis Significance Testing we assume no difference in metric value between the control and treatment groups
  • if an experiment only imapcts a small subset of the population – even a large effect on a small set of users could be diluted and not be detectable overall. 作用在一小部分user上 overall 是很难detect的

Misinterpretation of P-values

  • p-value DOES NOT represent the prob that the average metric value in control is different from the everage metric value in Treatment
  • p value 不代表 俩组之间有区别

Peeking at Pvalues

  1. sequential tests with always valid p values.
  2. predetermined experiments duration, such as week, for detecting minimal statistical significance.

Multiple Hypothesis Tests