Benefits of A/B testing

Benefits of A/B Testing

How To Use A/B Testing

by Claire Lee

What is A/B Testing

What is A/B Testing● 마케팅과 서비스 향상을 위한 A 또는 B 샘플을 기반으로 가설을 이용한 통계수치 결과를 얻기 위한 테스트

● 웹 기반 서비스에서 많이 사용 되었음.● 예를 들어, 고객들에게 프로모션 코드와 함께 두 가지 이메일을 보낼 경우 어떤 유저의 그룹이 실제 웹페이지에 접속을 하고 구매율이 높은지 통계치 결과를 얻을 수 있음.

- 1000 명의 고객에게 구매 유도를 위해 프로모션 코드가 “이번 토요일”에 끝난다고 보냅니다, "Offer

ends this Saturday! Use code A1", ( 결과는, 5% 의 응답률 )

- 또 다른 1000 명의 고객에게는 프로모션 기간이 “곧" 끝난다고 보냅니다, "Offer ends soon! Use

code B1". ( 결과는, 3% 의 응답률 )

Segmentation & targetingOverall Men Women

Total sends 2,000 1,000 1,000

Total responses 80 35 45

Treatment A 50 / 1,000 (5%) 10 / 500 (2%) 40 / 500 (8%)

Treatment B 30 / 1,000 (3%) 25 / 500 (5%) 5 / 500 (1%)

https://en.wikipedia.org/wiki/A/B_testing#Segmentation_and_targeting



basic Terms of A/B Testing

basic Terms of A/B Testing● Variations (A Variation ( Control ) or Others ( Treatments )): A,B,C...N● Null hypothesis (귀무가설, 기호 H0) - 귀무가설 또는 영가설(零假說)은 통계학에서 처음부터 버릴 것을 예상

하는 가설이다. 차이가 없거나 의미있는 차이가 없는 경우의 가설이며 이것이 맞거나 맞지 않다는 통계학적 증거를 통해 증명하려는 가설이다. 예를 들어 범죄 사건에서 용의자가 있을 때 형사는 이 용의자가 범죄를 저질렀다는 추정인 대립가설을 세우게 된다. 이때 귀무가설은 용의자는 무죄라는 가설이다. 통계적인 방법으로 가설검정을 시도할 때 쓰인다. 로널드 피셔가 1966년에 정의하였다. 설정한 가설이 진실일 확률이 극히 적어 처음부터 버릴것이 예상되는 가설을 뜻함. 최소의 쓰레기 값을 말한다. 즉 귀무가설이 성립되지 않는 p-value <= 0.05 일때, 실험 결과를 신뢰 할 수 있다. https://ko.wikipedia.org/wiki/%EA%B7%80%EB%AC%B4%EA%B0%80%EC%84%A4

● p-value: 유의확률 ( 무작위로 측정된 통계에서 오차가 얼마나 있는지에 따라 값이 달라진다. p-value <= 0.05 일때 귀무가설 성립이 되지 않는다고 판단. p-value > 0.05 이면 귀무가설 성립.http://i.investopedia.com/u53524/normal-distribution-ht1.png

● Effect Size (=Sample Size)● False Positive and False Negative - False positive 긍정적인 데이타 결과지만 실은 오류이다.

False negative 는 조건의 효과가 전혀 나타나지 않음.● Segmentation and Targeting - 테스트를 위해 유저들의 특성별로 조건을 주면 테스트의 결과에 신뢰도

가 높아 질 수 있다.

https://ko.wikipedia.org/wiki/%ED%86%B5%EA%B3%84%ED%95%99

https://ko.wikipedia.org/wiki/%EB%8C%80%EB%A6%BD%EA%B0%80%EC%84%A4

https://ko.wikipedia.org/wiki/%EB%8C%80%EB%A6%BD%EA%B0%80%EC%84%A4

https://ko.wikipedia.org/wiki/%EB%A1%9C%EB%84%90%EB%93%9C_%ED%94%BC%EC%85%94

https://ko.wikipedia.org/wiki/%EA%B7%80%EB%AC%B4%EA%B0%80%EC%84%A4

https://ko.wikipedia.org/wiki/%EA%B7%80%EB%AC%B4%EA%B0%80%EC%84%A4

http://i.investopedia.com/u53524/normal-distribution-ht1.png



Figure 0: Null Hypothesis Graph (귀무가설) http://m9.i.pbase.com/o9/10/152510/1/152440759.QXJVvzP1.NIPCC_Null_Hypothesis.PNG

http://m9.i.pbase.com/o9/10/152510/1/152440759.QXJVvzP1.NIPCC_Null_Hypothesis.PNG



A/B Testing case study in Airbnb

http://nerds.airbnb.com/wp-content/uploads/2014/05/img2_price.png

Figure 2 - Example of a new feature that we tested and rejected.

Why Experiments?



http://nerds.airbnb.com/wp-content/uploads/2014/05/img1_launch.png

Figure 1 - It’s hard to tell the effect of this product launch.

Why Experiments?



the case of Airbnbhttp://nerds.airbnb.com/wp-content/uploads/2014/05/img3_flow.png

Figure 3 - Example of an experiment result broken down by booking flow steps.

http://nerds.airbnb.com/wp-content/uploads/2014/05/img3_flow.png

http://nerds.airbnb.com/wp-content/uploads/2014/05/img3_flow.png

http://nerds.airbnb.com/wp-content/uploads/2014/05/img4_max_price.png

Figure 4 - Example experiment testing the value of the price filter

How long do you need to run an experiment?



How long do you need to run an experiment?http://nerds.airbnb.com/wp-content/uploads/2014/05/img5_max_price_results.png

Figure 5 - Result of the price filter experiment over time.

http://nerds.airbnb.com/wp-content/uploads/2014/05/img5_max_price_results.png

http://nerds.airbnb.com/wp-content/uploads/2014/05/img5_max_price_results.png

How long do you need to run an experiment?http://www.evanmiller.org/ab-testing/sample-size.html

http://www.evanmiller.org/ab-testing/sample-size.html




How long do you need to run an experiment?http://nerds.airbnb.com/wp-content/uploads/2014/05/img6_dynamic_p.png

Figure 6 - An example of a dynamic p-value curve.

http://nerds.airbnb.com/wp-content/uploads/2014/05/img6_dynamic_p.png



Understanding results in contexthttp://nerds.airbnb.com/wp-content/uploads/2014/05/img7_magellan.png

Figure 7 – Before and after a full redesign of the search page.

http://nerds.airbnb.com/wp-content/uploads/2014/05/img7_magellan.png

http://nerds.airbnb.com/wp-content/uploads/2014/05/img7_magellan.png

Understanding results in contexthttp://nerds.airbnb.com/wp-content/uploads/2014/05/img8_magellan_results.png

Figure 8 - Results of the new search design

http://nerds.airbnb.com/wp-content/uploads/2014/05/img8_magellan_results.png




Assuming the system works (A/A or Dummy Testing)http://nerds.airbnb.com/wp-content/uploads/2014/05/img9_dummy.png

Figure 9 - Results of an example dummy experiment.

http://nerds.airbnb.com/wp-content/uploads/2014/05/img9_dummy.png



Assuming the system works (A/A or Dummy Testing)http://nerds.airbnb.com/wp-content/uploads/2014/05/img9a_dummy_results.png

Figure 10 - Results of a number of dummy experiments.

http://nerds.airbnb.com/wp-content/uploads/2014/05/img9a_dummy_results.png




conclusion1. 가치있는 결과를 얻으려면 얼마동안 테스트를 실행할지 결정하기 위해

먼저 샘플 사이즈를 계산해라.2. 만약 테스트 시스템에서 결과를 예상보다 빨리 나온다면, 방향성을 한

곳으로 모을지 아닐지 스스로 평가해봐라. 이런 시나리오에서는 보통 보수적으로 되는 것이 좋다.

3. 만약 점차적인 런칭이 필요하거나 성급한 결정을 막기 위해서는 다양한 p-value 임계점을 이용해서 결과에 조심스럽게 접근하는 것이 좋다.

4. 항상 과학적으로 접근하고 무언가 이상하다 싶을 때는 왜 그런지 조사해라. 이를 위해 가장 간단한 방법은 더미 테스트를 만드는 것이다.

Third party Tools for A/B Testing

Third party Tools for A/B Testing● Amazon A/B Testing: https://developer.amazon.

com/public/apis/manage/ab-testing ● Optimizely: https://www.optimizely.com ● Taplytics: https://www.taplytics.com ● Apptimize: https://www.apptimize.com

https://developer.amazon.com/public/apis/manage/ab-testing



https://www.optimizely.com

https://taplytics.com

https://www.apptimize.com

Terms of A/B testing in Amazon Insights● https://images-na.ssl-images-amazon.

com/images/G/01/mobile-apps/devportal2/content/sdk/images/abtesting/abt-report-1._V364877828_.png

● Experiments: 실험의 종류 또는 상태 정의 (e.g. Draft Experiment, Active Experiment and Archive Experiment )

● # of Views - View Event 가 기록된 총 유저 샘플 수치

● # of Conversions - 각 Goal Event를 유저가 수행했을 때 기록된 수치

● Conversions Rate - 서버에 기록된 Conversion Events 를 Variations 의 수로 나눈 Rate결과를 보여준다. +/-로 계산오차를 표시.

● Change - Variation A (Control) 에 따른 변화 수치 (increase or decrease)

● Confidence - 각 Variation 과 A Variation (Control) 간의 수집된 통계수치의 신뢰율, 95% 이상일때 신뢰할 수 있는 결과라고 봄. (p-value)

https://images-na.ssl-images-amazon.com/images/G/01/mobile-apps/devportal2/content/sdk/images/abtesting/abt-report-1._V364877828_.png





taplytics

ApptimizeLive Demo

http://apptimize.com/wp-content/uploads/2014/11/results_line_chart_nov14.png

http://apptimize.com/wp-content/uploads/2014/10/Screen-Shot-2014-10-14-at-4.26.03-PM.png

https://apptimize.com/demo?hm

https://apptimize.com/demo?hm













OptimizelyDemo

https://help.optimizely.com/hc/en-us/article_attachments/202666277/buttons_selected.png

https://app.optimizely.com/projects/3732872310/experiments

https://app.optimizely.com/projects/3732872310/experiments



comparison of Third party tool featuresAmazon A/B Testing Taplytics Apptimize Optimizely

Support APIs and SDKs iOS, Amazon Apps and Android SDK

iOS, Android and Javascript SDK

iOS, Android SDK REST API, JS API, iOS SDK, Android SDK

Multiple Variations 5 Unlimited Unlimited Unlimited

Dynamic Variables O O O O

Visual Editor X O O O

Segmentation O O O O

Code Block X O O O

Visual Graph Report X O O O

Importing 3rd party APIs Amazon Insights SDK( App Testing Service, Promotions, Security Profiles, Login with

Amazon, Analytics, etc. )

Google Analytics, Flurry, Mixpanel, Intercom, Adobe, Localytics,

Parse, Apsalar

Google Analytics, Mixpanel, Omniture and

Flurry

Adobe Analytics, Google Analytics, Mixpanel, etc.

PRICINGMAU (across all apps) +MAU Promotion Price

Amazon A/B Testing All X X FREE

Taplytics 80만 +20만 ($350) X $1,633 ( per month )

Apptimize 300만 +10만 ($1250) +2 months free (Q4) $45,000 ( per year )

30만 +10만 ($2500) X $15,000 ( per year )

Optimizely 80만 ? $2,000 DC (per month) ?

References & further reading* Optimizely Blog Including Case Study-https://blog.optimizely.com/ * How to work Amazon A/B Testing-https://developer.amazon.com/public/apis/manage/ab-testing/doc/how-ab-testing-works * The math behind Amazon A/B Testing-https://developer.amazon.com/public/apis/manage/ab-testing/doc/math-behind-ab-testing * Amazon A/B Testing Case Study and Blog Posts-https://youtu.be/D_vK9n5QuPg -https://developer.amazon.com/public/community/post/Tx3HA7JW52PWD1T/-span-class-matches-Amazon-span-8217-span-class-matches-s-span-span-class-matche -https://developer.amazon.com/public/community/post/Tx3BKTQFUERH5GP/-span-class-matches-Webinar-span-span-class-matches-Replay-span-span-class-match -https://developer.amazon.com/public/search?query=A%2FB+Testing&image.x=17&image.y=19* p-value-http://m.blog.naver.com/diegur/90078422801 *null hypothesis-https://en.wikipedia.org/wiki/Null_hypothesis

* What is A/B Testing -https://en.wikipedia.org/wiki/A/B_testing -https://en.wikipedia.org/wiki/False_positives_and_false_negatives * Airbnb Case Study in A/B Testing-http://nerds.airbnb.com/experiments-at-airbnb -http://nerds.airbnb.com/redesigning-search/ -http://www.evanmiller.org/how-not-to-run-an-ab-test.html -https://www.youtube.com/watch?v=lVTIcf6IhY4 & http://nerds.airbnb.com/experiment-reporting-framework/ * How to design, plan, implement, and analyze online experiments.-http://eytan.github.io/icwsm14_tutorial/ * Some surprising A/B test results people have seen-http://www.wordstream.com/blog/ws/2012/09/25/a-b-testing#Levy * Rules of thumb for running A/B tests-http://www.exp-platform.com/Documents/2014%20experimentersRulesOfThumb.pdf * a glossary of terms for apptimize -http://apptimize.com/docs/reference/glossary/

https://blog.optimizely.com/

https://developer.amazon.com/public/apis/manage/ab-testing/doc/how-ab-testing-works



https://developer.amazon.com/public/apis/manage/ab-testing/doc/math-behind-ab-testing



https://youtu.be/D_vK9n5QuPg

https://developer.amazon.com/public/community/post/Tx3HA7JW52PWD1T/-span-class-matches-Amazon-span-8217-span-class-matches-s-span-span-class-matche





https://developer.amazon.com/public/community/post/Tx3BKTQFUERH5GP/-span-class-matches-Webinar-span-span-class-matches-Replay-span-span-class-match





https://developer.amazon.com/public/search?query=A%2FB+Testing&image.x=17&image.y=19



http://m.blog.naver.com/diegur/90078422801

https://en.wikipedia.org/wiki/Null_hypothesis

https://en.wikipedia.org/wiki/A/B_testing

https://en.wikipedia.org/wiki/False_positives_and_false_negatives

http://nerds.airbnb.com/experiments-at-airbnb

http://nerds.airbnb.com/redesigning-search/

http://www.evanmiller.org/how-not-to-run-an-ab-test.html



https://www.youtube.com/watch?v=lVTIcf6IhY4

http://nerds.airbnb.com/experiment-reporting-framework/



http://eytan.github.io/icwsm14_tutorial/

http://www.wordstream.com/blog/ws/2012/09/25/a-b-testing#Levy



http://www.exp-platform.com/Documents/2014%20experimentersRulesOfThumb.pdf



http://apptimize.com/docs/reference/glossary/

Data & Analytics

Benefits of A/B testing