-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
1777 lines (1760 loc) · 92.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<!-- BEGIN Info -->
<meta name="description"
content="Waypost - An open-source feature flag management system that specializes in A/B Testing" />
<meta name="title" property="og:title" content="Waypost" />
<meta property="og:type" content="Website" />
<meta name="image" property="og:image" content="images/thumb.jpg" />
<meta name="description" property="og:description"
content="Waypost - An open-source feature flag management system that specializes in A/B Testing" />
<meta name="author" content="Waypost" />
<!-- END Info -->
<!-- BEGIN favicon -->
<link rel="apple-touch-icon" sizes="180x180" href="images/favicon/apple-touch-icon.png" />
<link rel="icon" type="image/png" sizes="32x32" href="images/favicon/favicon-32x32.png" />
<link rel="icon" type="image/png" sizes="16x16" href="images/favicon/favicon-16x16.png" />
<link rel="manifest" href="images/favicon/site.webmanifest" />
<link rel="shortcut icon" href="images/favicon/favicon.ico" />
<meta name="msapplication-TileColor" content="#ffffff" />
<meta name="msapplication-config" content="images/favicon/browserconfig.xml" />
<meta name="theme-color" content="#ffffff" />
<!-- END favicon -->
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Waypost</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css" />
<link rel="stylesheet" href="stylesheets/reset.css" />
<link rel="stylesheet" href="stylesheets/style.css" />
<link rel="stylesheet" href="stylesheets/responsive.css" />
</head>
<body>
<header class="mobile-menu-closed">
<div id="header">
<a href="/">
<img src="images/logo/Waypost_graphic_color.svg" />
<h1>Waypost</h1>
</a>
<nav>
<a href="#start-here" class="selected">Start Here</a>
<a href="#case-study">Case Study</a>
<a href="#presentation">Presentation</a>
<a href="#our-team">Our Team</a>
<a href="/documentation">Docs</a>
<a href="https://github.com/waypost-io" target="_blank" class="icon"><i class="fab fa-github"></i></a>
</nav>
<div id="menu">
<button type="button">
<svg id="mobile-open" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke="currentColor"
aria-hidden="true">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M4 6h16M4 12h16M4 18h16" />
</svg>
<svg id="mobile-close" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24"
stroke="currentColor" aria-hidden="true">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
</div>
<div id="header-buffer"></div>
<div id="mobile-menu">
<a href="#start-here" class="selected">Start Here</a>
<a href="#case-study">Case Study</a>
<a href="#presentation">Presentation</a>
<a href="#our-team">Our Team</a>
<a href="/documentation">Docs</a>
<a href="https://github.com/waypost-io" target="_blank"><i class="fab fa-github"></i> GitHub</a>
</div>
</header>
<div id="start-here" class="main-section">
<div class="h-full">
<div class="bg-offwhite static-logo-color"></div>
<div class="bg-black">
<h4 id="hero-text">
<span class="text-skyblue">Waypost</span> is an open-source <span class="text-skyblue">feature flagging</span>
platform specializing in <span class="text-skyblue">A/B testing</span>
</h4>
</div>
</div>
<div class="h-full">
<div class="bg-violet static-logo-dark">
<h2>Easily Toggle and Test New Features</h2>
</div>
<div class="bg-violet">
<h2 class="sm-header">Easily Toggle and Test New Features</h2>
<p>
Manage feature flags and experiments with a couple clicks.
</p>
<p>Waypost analyzes the data and provides experiment results.</p>
<img src="./images/diagrams/5.2_flags_dashboard.gif" alt="Flags Dashboard" />
</div>
</div>
<div class="h-full">
<div class="bg-turquoise static-logo-dark">
<h2>Customizable and Flexible</h2>
</div>
<div class="bg-turquoise">
<h2 class="sm-header">Modular and Flexible</h2>
<p>Waypost comes Dockerized for simple deployment.</p>
<p>Completely open-source and self-hosted, so it can be customized according to your needs.</p>
<p>Provides real-time updates and is easily scalable.</p>
<img src="./images/diagrams/architecture_with_bg.png" alt="Architecture" />
</div>
</div>
</div>
<aside id="toc">
<ul>
<!-- Section 1 -->
<li data-section="section-1" class="selected">
<a href="#section-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>Introduction</p>
</div>
</a>
</li>
<li data-section="section-1" class="subitem">
<a href="#section-1-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>Hypothetical Waypost User</p>
</div>
</a>
</li>
<li data-section="section-1" class="subitem">
<a href="#section-1-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Comparing Two Versions</p>
</div>
</a>
</li>
<!-- Section 2 -->
<li data-section="section-2">
<a href="#section-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>A/B Testing</p>
</div>
</a>
</li>
<li data-section="section-2" class="subitem">
<a href="#section-2-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>What is A/B Testing</p>
</div>
</a>
</li>
<li data-section="section-2" class="subitem">
<a href="#section-2-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Interpreting Results</p>
</div>
</a>
</li>
<li data-section="section-2" class="subitem">
<a href="#section-2-3">
<div>
<div class="bullet">
<div></div>
</div>
<p>Challenges of A/B Testing</p>
</div>
</a>
</li>
<!-- Section 3 -->
<li data-section="section-3">
<a href="#section-3">
<div>
<div class="bullet">
<div></div>
</div>
<p>Implementing Two Versions of a Site</p>
</div>
</a>
</li>
<li data-section="section-3" class="subitem">
<a href="#section-3-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>Multiple Deployments</p>
</div>
</a>
</li>
<li data-section="section-3" class="subitem">
<a href="#section-3-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Feature Flags</p>
</div>
</a>
</li>
<!-- Section 4 -->
<li data-section="section-4">
<a href="#section-4">
<div>
<div class="bullet">
<div></div>
</div>
<p>Existing Solutions</p>
</div>
</a>
</li>
<li data-section="section-4" class="subitem">
<a href="#section-4-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>DIY</p>
</div>
</a>
</li>
<li data-section="section-4" class="subitem">
<a href="#section-4-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Paid Solutions</p>
</div>
</a>
</li>
<!-- Section 5 -->
<li data-section="section-5">
<a href="#section-5">
<div>
<div class="bullet">
<div></div>
</div>
<p>Waypost</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>Architecture Overview</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Manager App</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-3">
<div>
<div class="bullet">
<div></div>
</div>
<p>Flag Provider Service</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-4">
<div>
<div class="bullet">
<div></div>
</div>
<p>SDK</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-5">
<div>
<div class="bullet">
<div></div>
</div>
<p>How Flag Data is Sent From the Manager to SDKs</p>
</div>
</a>
</li>
<li data-section="section-5" class="subitem">
<a href="#section-5-6">
<div>
<div class="bullet">
<div></div>
</div>
<p>A/B Testing with Waypost</p>
</div>
</a>
</li>
<!-- Section 6 -->
<li data-section="section-6">
<a href="#section-6">
<div>
<div class="bullet">
<div></div>
</div>
<p>Engineering Decisions</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>Hosted vs. Self-Hosted</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Collecting User Event Data</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Communication Between Manager App and Flag Provider</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-3">
<div>
<div class="bullet">
<div></div>
</div>
<p>Providing Feature Flag Data to Client</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-4">
<div>
<div class="bullet">
<div></div>
</div>
<p>Providing Feature Flag Data to the SDK</p>
</div>
</a>
</li>
<li data-section="section-6" class="subitem">
<a href="#section-6-5">
<div>
<div class="bullet">
<div></div>
</div>
<p>Statistics Pipeline</p>
</div>
</a>
</li>
<!-- Section 7 -->
<li data-section="section-7">
<a href="#section-7">
<div>
<div class="bullet">
<div></div>
</div>
<p>Future Work</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-1">
<div>
<div class="bullet">
<div></div>
</div>
<p>Extended Database Integration</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-2">
<div>
<div class="bullet">
<div></div>
</div>
<p>Separate Feature Flags by Application</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-3">
<div>
<div class="bullet">
<div></div>
</div>
<p>Login Capability</p>
</div>
</a>
</li>
<li data-section="section-7" class="subitem">
<a href="#section-7-4">
<div>
<div class="bullet">
<div></div>
</div>
<p>Additional Language Support for SDKs</p>
</div>
</a>
</li>
<!-- Section 8 -->
<li data-section="section-8">
<a href="#section-8">
<div>
<div class="bullet">
<div></div>
</div>
<p>Glossary</p>
</div>
</a>
</li>
<!-- Section 9 -->
<li data-section="section-9">
<a href="#section-9">
<div>
<div class="bullet">
<div></div>
</div>
<p>References</p>
</div>
</a>
</li>
</ul>
</aside>
<div id="case-study" class="main-section">
<div id="case-study-content">
<div class="prose prose-xl">
<h1>Case Study</h1>
<!-- Case study goes here -->
<!-- Section 1 -->
<h2 id="section-1">1. Introduction</h2>
<p>
<strong>Waypost is an open-source, lightweight, self-hosted feature flag management platform that specializes in A/B Testing.</strong>
</p>
<p>
Waypost provides feature flag management infrastructure that integrates with an organization’s application and existing user event logging system. Waypost allows developers to control functionality remotely without deploying code and to run A/B tests to determine how a new feature impacts user behavior and business goals.
</p>
<p>
This case study explores the engineering problems Waypost addresses, how it works, and some of the key engineering decisions we made while building it. To better understand how an organization would benefit from integrating with Waypost, we introduce a hypothetical company, Solar Flair.
</p>
<h3 id="section-1-1">1.1 Hypothetical Waypost User</h3>
<p>Solar Flair is a solar panel installation company with a handful of software engineers on their staff. Their
main way of gaining business is via their website’s landing page. As is standard in the solar panel
installation business, their primary sales funnel starts with clicking a button called “Get a Quote Today”
which opens up a form.</p>
<p>In Solar Flair’s first iteration of their website, the form requires the customer to input the architectural
data and their address in order to calculate how much sunlight their location typically gets. The developer
team at Solar Flair sets up event tracking to try and understand how people interact with their website and
they discover that 80% of people who click the “Get a Quote Today” button never finish the form and their
quote never gets calculated. They get lost in step one of the funnel.</p>
<p>The developers get together and discuss why this is happening and theorize that entering all the
architectural data is tedious and cumbersome. They believe they can increase the percentage of people signing
up for an appointment by using a 3rd-party software which only requires the address of the home as input and
then calculates the quote using satellite imagery and climate statistics. However, this software costs a
subscription fee, so the team wants to be sure that this fee is worth the cost and gets them enough new
customers to gain a profit. In addition, since it’s a 3rd-party software/API that may get updated, they don’t
want their entire site to go down if it gets a major update and the quote calculation doesn’t work. </p>
<img src="./images/diagrams/1.1_feature_comparison.png" alt="feature comparison" width="500px">
<p>Solar Flair needs a way to <strong>measure and analyze user behavior</strong> when they use the new quote feature vs. the
old one. In other words:</p>
<ol>
<li>
They need to serve users either the new quote feature or the old one.
</li>
<li>
They also need to determine whether users preferred the new quote, the old quote, or the change had no difference in behavior.
</li>
</ol>
<p>If something were to go wrong with the new quote feature, Solar Flair would like all users to be served the
old one as soon as possible to minimize any business losses from the bug while it gets fixed.</p>
<h3 id="section-1-2">1.2 Comparing Two Versions</h3>
<p>One option is to roll out the new feature temporarily and measure how the metrics change. But there are
several problems with this:</p>
<ol>
<li>
The new feature hasn’t been tested in production and could have bugs. Therefore, Solar Flair wouldn’t want it rolled out to everyone at once.
</li>
<li>
This strategy doesn’t control for external factors, like time. For example, users may be more likely to sign up at different times of the year. Without controlling for external factors, the change in metrics might be attributed to the new feature when it was actually caused by something else.
</li>
</ol>
<p>
Therefore, Solar Flair needs to be sure the only difference between the two user experiences is the different
feature being served to them. To do this, they’ll have to <strong>show both features during the same time period and
randomize who gets which feature</strong>.
</p>
<p>
The solution Solar Flair is looking for is experimentation, also known as <strong>A/B testing</strong>.
</p>
<h2 id="section-2">2. A/B Testing</h2>
<h3 id="section-2-1">2.1 What is A/B Testing?</h3>
<p>
A/B testing is the practice of testing two or more versions of a feature at once, in which a random group of
users is assigned to receive one version, and another random group of users is assigned the other version, for
a certain period of time. The A/B test can be considered the most basic kind of <strong>randomized controlled
experiment</strong>. In its simplest form, there are two treatments and one acts as the control for the other [1].
</p>
<figure>
<img src="./images/diagrams/2.1_A_and_b_features.png" alt="A and B features" width="400px">
<figcaption>In an experiment, users are served either version A or version B of an application.</figcaption>
</figure>
<h3 id="section-2-2">2.2 Interpreting Results</h3>
<p>
After an A/B test has finished running, one can analyze the results to make a data-driven decision. Every
experiment should have a primary <strong>metric</strong>, and possibly secondary metrics, by which it will be evaluated. The
aggregate values for these metrics get collected for the test group and the control group for comparison. One
can use statistics to determine if there was a statistically significant change in the metrics or not.
<strong>Statistical significance</strong> is determined by a statistic called a “p-value”. With this information, the team can
conclude whether the new feature being tested was “successful” or not, and make a decision whether to <strong>rollout
the feature to their entire userbase</strong>, to <strong>remove it</strong>, or to <strong>continue to run more experiments</strong>. For example, if
Solar Flair obtained a statistically significant increase in form completion rate with their new quote
feature, then they could confidently roll out the new feature.
</p>
<figure>
<img src="./images/diagrams/2.2_AB_decision.png" alt="A/B decision" width="1000px">
<figcaption>Interpreting results of an A/B test to decide whether to roll out a new feature.</figcaption>
</figure>
<h3 id="section-2-3">2.3 Challenges of A/B Testing</h3>
<p>
As A/B testing requires a large amount of data to work, implementing A/B testing comes with several
challenges. There are two primary categories that the challenges of A/B testing fall under: engineering
challenges and statistics challenges.
</p>
<h4>2.3.1 Engineering Challenges</h4>
<p>
From an engineering standpoint, developers must contend with the challenges of:
</p>
<ul>
<li>
<strong>How to assign users randomly into groups?</strong> Users must be randomly assigned so that the two groups can be
fairly compared, and so the two groups can be roughly the same size.
</li>
<li>
<strong>How to keep track of what group they were in?</strong> Each user must be given the same treatment on each visit.
</li>
<li>
<strong>How to log user events?</strong> Each time an event that we are interested in occurs, it must be logged to the
database along with relevant information regarding who, what, and where. In addition, one must log when a user
is exposed to an experiment. This will result in a huge volume of writes to the database.
</li>
<li>
<strong>How to pull the metrics to analyze afterward?</strong> Each metric is calculated differently and some may rely on the
same events, requiring the event data to be queryable and accessible. The metric calculation should also be
automated so that one does not have to manually query the metrics each time.
</li>
</ul>
<h4>2.3.2 Statistical Challenges</h4>
<p>From a statistics standpoint, the main challenges are:</p>
<ul>
<li>
<strong>Designing the experiment correctly such that we can gain meaningful results from it.</strong> It may be surprising how
easy it is to design an experiment incorrectly. For example, when running multiple experiments at the same
time, one must ensure that different experiments do not influence each other’s results.
</li>
<li>
<strong>Determining which statistical tests to run on the data.</strong> There are many different types of statistical tests
to choose from, each with different purposes. Therefore, picking which statistical tests to run must be done
thoughtfully and with care.
</li>
</ul>
<h2 id="section-3">3. Implementing Two Versions of a Site</h2>
<p>
Before Solar Flair can place their users into “test” and “control” groups, they must be able to serve the
groups different versions of their website at the same time. There are two common ways to implement this:
</p>
<ol>
<li>Have multiple deployments of the website running and route users to one or the other.</li>
<li>Use feature flags to render their site one way or the other at run time.</li>
</ol>
<h3 id="section-3-1">3.1 Possible Solution: Multiple Deployments</h3>
<p>
This strategy is implemented by having <strong>two versions of an application running in production at once</strong>. The new
version of the application is deployed to a set percentage of the organization’s servers. A <strong>router</strong> (usually a
load balancer) will route users to the two sets of servers. The “<strong>rollout percentage</strong>,” the percentage of the
users get routed to one group vs. the other, can be set to any percentage the user wants, whether it be 50/50,
10/90, etc.
</p>
<figure>
<img src="./images/diagrams/3.1_multiple_deployments.png" alt="multiple deployments">
<figcaption>Deploy two versions of an application and route users to one version or another.</figcaption>
</figure>
<p>
From here, if the new version proves to be an improvement, then the new version of the application is pushed
to all the servers. If instead it has a negative effect on business metrics, then all servers with the new
version are rolled back.
</p>
<p>
This approach allows an organization to run two versions of their website, collect user data on both
versions, and compare the results. However, there are some trade-offs to this strategy.
</p>
<h4>3.1.1 Trade-off #1: Full redeploy to fix bugs</h4>
<p>
Using multiple deployments for serving two versions of a website requires a <strong>full take-down and redeploy to
fix a bug</strong>. If there’s a critical bug in the new version of the application, the following steps must be
completed:
</p>
<ol>
<li>
The rollout percentage must first be set to 0% so all requests will be handled by the group of machines
with the old version of the application.
</li>
<li>
The machines serving the new version must be rolled-back to handle traffic without crashing.
</li>
<li>
The load balancer must be updated to allow rolled-back servers to receive requests again.
</li>
<li>
Once the bug is fixed, the new version must be deployed, the rollout percentage must be set and the router
must be configured accordingly.
</li>
</ol>
<h4>3.1.2 Trade-off #2: Complexity when testing multiple features</h4>
<p>
As discussed earlier, if two versions of a feature are being served with multiple deployments means there are
two versions of the entire application running in production. If another feature is also being tested at that
time, that means another version of the application is needed to test it assuming that the two features being
tested do not overlap.
</p>
<p>
For example, if Solar Flair also wants to test out a new contact form that is hosted on a separate page from
their quote form, then those two new features would not overlap. If instead they wanted to test changing the
form’s submit button and wanted to see how each combination of the form and button affected user behavior,
they’d need four total deployments.
</p>
<figure>
<img src="./images/diagrams/3.1.2_multivariate_testing.png" alt="multivariate testing" width="500px">
<figcaption>To test two overlapping features using the “Multiple Deployment” strategy, there must be four total deployments!</figcaption>
</figure>
<p>
Regardless of whether the additional new features being tested are overlapping or not, more tests results in:
</p>
<ul>
<li>
More deployed copies of an application.
</li>
<li>
More versions of an application to keep track of, which can lead to human errors, such as implementing a
change on the wrong version of the application.
</li>
</ul>
<p>
A large company with a large infrastructure and some DevOps engineers may not be be bothered by additional
deployments of code and keeping track of the different versions of the application. But it would be more
challenging for smaller organizations to implement this solution.
</p>
<br>
<!-- Do a line break here if we can -->
<p>
Solar Flair only has a handful of developers and servers. Therefore it would not be wise to implement this
strategy given the trade-offs. Fortunately for them, there’s a way to serve users two different versions of a
site with just one instance of an application, through using <strong>feature flags</strong>.
</p>
<h3 id="section-3-2">3.2 Possible Solution: Feature Flags</h3>
<p>
“A feature flag is a software development process used to enable or disable functionality remotely without
deploying code” [2].
</p>
<p>
<strong>A feature flag is conditional logic connected to a remote service that can alter its flow without a
redeployment.</strong> One can think of a feature flag as a toggle that determines whether a feature is turned “on” or
“off.” If the flag is turned “on”, the conditional returns <code>true</code> and the user receives one version of the
application. If the flag is turned “off”, the conditional returns <code>false</code> and they receive another.
</p>
<figure>
<img src="./images/diagrams/3.2_flag_evaluation.png" alt="Flag evaluation">
<figcaption>An application using feature flags to serve multiple versions of a website.</figcaption>
</figure>
<pre class="code-block"><code> if (evaluateFlag('Quote Form')) {
renderNewQuoteForm()
} else {
renderOldQuoteForm()
}</code></pre>
<figcaption>Example of a feature flag in application code</figcaption>
<p>
One can wrap as much or as little code in the feature flag as they like. It could be something big like a
completely different UI, or something small, like different colors on a “Checkout” button.
</p>
<p>
Feature flags can be <strong>static</strong>, meaning they’re either “on” or “off” for everyone; or <strong>dynamic</strong> meaning that
they’re “on” or “off” for some users based on a criteria, like whether they’re in a “control group” or a “test
group” of an A/B test. In order to make a flag dynamic, a “Toggle Router” is needed [3].
</p>
<h4>3.2.1 Toggle Router</h4>
<p>
A toggle router is simply something that “can be used to dynamically control which codepath is live” [3]. The
<code>evaluateFlag</code> function from the example above is a toggle router. One can implement toggle routers in many
ways, like a simple in-memory store or a standalone app with a user interface (UI).
</p>
<p>
A toggle router can be customized to evaluate a flag given any desired criteria. Using a toggle router, Solar
Flair could set a rollout percentage so that a set percentage of users receive the new quote form while the
rest receive the old one.
</p>
<p>
For example, Jack and Jill are two potential customers, both looking for solar panel installations for their
respective homes. They each visit <span id="fake-website">solarflair.net</span> in their browsers. A toggle router inputs their unique
identifiers (like their IP address) into a hashing algorithm and determines that Jack will receive the old
version of the quote form and Jill will receive the new version. All this happens using just one version of
Solar Flair’s website.
</p>
<figure>
<img src="./images/diagrams/3.2.1_jack_and_jill.png" alt="jack and jill comparison">
<figcaption>
Using an ID and hashing algorithm to evaluate flags allows Solar Flair to consistently serve a user the same version of their application on subsequent visits.
</figcaption>
</figure>
<h4>3.2.2 - Using Feature Flags for A/B Testing</h4>
<p>
As mentioned earlier, two key challenges of implementing AB Testing are:
</p>
<p>1. Assigning users randomly into groups.</p>
<p>2. Serving a user the same treatment on each visit.</p>
<p>
<strong>Solar Flair can solve these two challenges by using feature flags with a toggle router.</strong>
</p>
<h2 id="section-4">4. Existing Solutions</h2>
<p>
Solar Flair now knows they want to integrate feature flags into their web app and use them to perform A/B
tests. What they don’t know is exactly how to implement this new system. Do they want to build it from scratch
or integrate with an existing solution? If they want to build it, how much time and effort is needed to
accomplish that?
</p>
<h3 id="section-4-1">4.1 DIY</h3>
<p>If they decide to build the system, they’ll need to accomplish three things:</p>
<ol>
<li>Persistently store their feature flag data.</li>
<li>Connect the flag data to their application to dynamically serve the two versions of their site.</li>
<li>Collect user event data and analyze it to gain insights to drive their decision making.</li>
</ol>
<h4>4.1.1 Config File</h4>
<p>
The easiest and simplest way to store all the flag data in one place is by using a configuration file. This
file would contain the flag data in a data structure and could look something like the code below:
</p>
<!-- Code snippet -->
<pre class="code-block"><code> export default const flags = {
"new quote": false,
"other feature": true
}</code></pre>
<figcaption>Example of a config file to store feature flags.</figcaption>
<p>The trade-offs with the config file are:</p>
<ul>
<li>
<strong>Flags can’t be set dynamically</strong>, they’re “on” or “off” for all users.
</li>
<li>
Every time the config file is updated the application must be <strong>redeployed for the changes to take effect</strong>.
</li>
</ul>
<p>The config file’s shortcomings make this option unappealing.</p>
<h4>4.1.2 Database</h4>
<figure>
<img src="./images/diagrams/4.1.2_application_to_database.png" alt="applicaiton to database" width="300px">
<figcaption>Using a database to store flag data for an application.</figcaption>
<img src="./images/diagrams/4.1.2_flag_in_database_table.png" alt="application to database table">
<figcaption>Flags table in the database.</figcaption>
</figure>
<p>
A slightly better option for storing flag data is by using a database. With this configuration, Solar Flair’s
app would query the database each time a flag’s status is needed and updates to the flags would be made using
SQL commands, which would be cumbersome for non-technical employees to manage feature flags. The main benefit
of this architecture is it would allow flag data to be <strong>updated without redeployment</strong>. [4]
</p>
<p>
However, there are some downsides to this architecture that makes it unappealing for Solar Flair’s use case:
</p>
<ul>
<li>
It doesn’t solve the core problem of <strong>dynamically evaluating the flags</strong>.
</li>
<li>
It <strong>restricts</strong> management of the flags to technical members of the team who know SQL.
</li>
<li>
It doesn’t address <strong>analyzing the user event data</strong>.
</li>
</ul>
<p>
What Solar Flair needs is something bigger and more robust: a place to easily manage their flags and their experiments. What they need is a feature flag management service, specifically one that can handle A/B testing.
</p>
<h4>4.1.3 Building a Feature Flag Management Service</h4>
<p>
Feature flag management services are platforms that allow developers to view all their feature flags in one
place, usually through a user interface. They make the process of performing CRUD operations on the flags
easier to implement and keep track of. In addition, the services often contain tools, such as custom flag
evaluations, allowing developers to get more out of their feature flags.
</p>
<p>
To build this system oneself, one will need to build both the feature flagging and A/B testing services. One
must first design and build the architecture to manage and evaluate feature flags. This will require several
components such as a frontend, backend, database, and SDK at a minimum.
</p>
<ol>
<li>
A <strong>frontend interface</strong> will enable both developers and non-technical members of the organization to
easily manage feature flags and experiments.
</li>
<li>
A <strong>backend</strong> server and database will enable persistent storage of data.
</li>
<li>
An <strong>SDK</strong> that lives in the organization’s app should return the “on” or “off” status for a given user
based on factors like rollout percentage. It is typically necessary to have SDKs on both the client and
server, since there are features to be managed on both.
</li>
<li>
The SDKs also need to receive <strong>updated feature flag data</strong> when a change is made from the frontend,
ideally in real-time so that features can be shut off immediately if needed.
</li>
</ol>
<figure>
<img src="./images/diagrams/4.1.3_diy_architecture.png" alt="diy architecture">
<figcaption>Minimum Architecture of Feature Flag and A/B Testing Platform</figcaption>
</figure>
<h4>4.1.4 Building the A/B Testing Functionality</h4>
<p>
In addition to the infrastructure for feature flag management, one must also build the A/B testing
functionality to gain insights from changes in user event data. There are 4 steps to this:
</p>
<ol>
<li>Collect the user event data.</li>
<li>Process the data.</li>
<li>Run statistical analysis.</li>
<li>Display the results.</li>
</ol>
<p>
For the first step, a system must be in place to <strong>log user events</strong> and save them to a <strong>database</strong> using ETL
pipelines. This infrastructure can be built in-house, or a 3rd party service like Mixpanel can be used.
However, using a 3rd party service will offer less flexibility.
</p>
<p>
Then, the data must be processed by <strong>querying the event data</strong> and <strong>calculating metrics</strong> for each experiment and
treatment.
</p>
<p>
Next, the metrics are used as inputs in <strong>statistical tests</strong> to determine whether the change in them is likely
due to the change in feature or to random chance.
</p>
<p>
Lastly, the results are displayed, ideally with some <strong>visualization</strong>.
</p>
<p>
Having a data scientist on a team to give guidance would make the last three steps go smoothly. For example,
a data scientist would know which statistical tests to run depending on the data types, how the metrics need
to be measured and aggregated, and which kinds of visuals are most helpful. Without someone with that
knowledge, a team may have trouble building an A/B testing solution.
</p>
<h4>4.1.5 Trade-offs</h4>
<p>Building your own platform for feature flagging and experimentation comes with several benefits:</p>
<ul>
<li>
<strong>Flexibility over how it is deployed</strong>, and ability to customize what features to include and how they are
implemented. For example, the team could choose whether to use classical statistics or Bayesian statistics for
the experiment analysis.
</li>
<li>
<strong>Keep data in-house</strong>, reducing concerns around privacy and security
</li>
</ul>
<p>However, this approach also comes with a set of trade-offs:</p>
<ul>
<li>
<strong>Time and resources</strong> to build the feature flag management and A/B testing platform
</li>
<li>
Responsibility to <strong>maintain</strong> the system into the future
</li>
</ul>
<p>
As we saw earlier, building an A/B testing platform completely from scratch is extremely hard and time
consuming (expect at least 2,000 hours of work to get something decent) [5]. In other words, building the
platform yourself is not free.
</p>
<p>
One must also consider the <strong>accuracy and reliability</strong> of the system. Maintaining reliable analytics pipelines
is more difficult than one may think [6]. Therefore, as the ones who created the system, the team will also be
responsible for maintaining it into the future.
</p>
<h3 id="section-4-2">4.2 Paid Solutions</h3>
<p>
The other option is to use a 3rd party feature flagging and A/B testing platform that handles the engineering
aspects for you so that your team does not have to invest time into building and maintaining this system. Some
well-known players are LaunchDarkly and Optimizely. Both are feature-rich and reliable. LaunchDarkly is a
popular feature flag management platform that offers a wide range of features, such as A/B testing, targeting
users by attribute, integrations with productivity tools, workflow automation, and a support team. Optimizely
is a similar platform which focuses primarily on A/B testing. However, there are several trade-offs that come
with this approach:
</p>
<ul>
<li>
Since they are built by an external team, there is not as much flexibility for <strong>customization</strong> as an in-house
platform. They would not be able to change what features to include and their implementation, or the types of
statistical tests to use.
</li>
<li>
Using one of these services entails letting a third party <strong>store</strong> user data, which some business may not be
comfortable with or legally allowed to.
</li>
<li>
They typically <strong>cost</strong> a monthly fee and can be out of budget for smaller companies.
</li>
</ul>
<h2 id="section-5">5. Waypost</h2>
<p>
If Solar Flair was a bigger company with more engineers, then building their own system could be a good
option. If they wanted a large set of features quickly and had a larger budget, then a paid solution like
LaunchDarkly would be a good choice. However, the DIY option costs too much time and resources, and the paid
option is too expensive for them and doesn’t allow them to customize the functionality. They also don’t need
many extra features, as they only need to run simple experiments.
</p>
<p>
Therefore, there was an opportunity to create a feature flagging platform that fits the needs of small
businesses like Solar Flair. We created Waypost, a feature flag management platform that specializes in A/B
Testing. <strong>Waypost is self-hosted and open-source, so it is completely customizable while still containing the
core features our target user needs.</strong>
</p>
<img src="./images/diagrams/5_competitor_comparison.png" alt="competitor comparison">
<h3 id="section-5-1">5.1 Architecture Overview</h3>
<p>
Waypost provides a feature flag and A/B testing solution through the integration of its own components
(colored blue) and some existing infrastructure (colored red). The existing infrastructure is the application
that is using feature flags and a PostgreSQL database for storing user event data, which will be referred to
as the “Events DB”. For example, Solar Flair’s “Application” would be their web application that renders their
website and their “Events DB” would be their existing user event database.
</p>
<figure>
<img src="./images/diagrams/5.1_waypost_architecture.png" alt="Waypost architecture" width="800px">
<figcaption>Waypost's Architecture</figcaption>
</figure>
<p>
<span class="underline">General Overview of the Components and their responsibilities:</span>
</p>
<ul>
<li>The <strong>Manager Application</strong> is responsible for managing feature flags and experiments.</li>
<li>The <strong>Manager App</strong> sends copies of the flag data to the <strong>Flag Provider</strong>, which saves the copy and forwards it to
each <strong>application</strong> running the <strong>Waypost SDK</strong>.</li>
<li>The <strong>SDK</strong> is embedded into an <strong>application</strong> and is responsible for evaluating flags at run-time, and thus
allowing one to serve multiple versions of their website to their users. It also performs the assignment of
users into treatments.</li>
<li>The developer using Waypost is responsible for supplying their existing user event logging solution, in which
their <strong>application</strong> sends user event data to their <strong>Events DB</strong>.</li>
<li>The <strong>Manager App</strong> queries the <strong>Events DB</strong> when it runs statistical analysis for the experiments and displays the
results in the <strong>Manager’s UI</strong>.</li>
</ul>
<h3 id="section-5-2">5.2 Manager Application</h3>
<img src="./images/diagrams/5.2_Manager_app.png" alt="Manager app">
<p>
The <strong>Manager App</strong> is what developers use to manage their feature flags and view experiment results. It also
contains a <strong>statistics pipeline</strong> that performs the querying of data and statistical analysis for experiments.
The Manager App has three components, a React.js application that serves as the User Interface (UI), a backend
server built with Express.js, and a PostgreSQL database used for data persistence.
</p>
<p>
The Manager App’s <strong>UI</strong> provides features for developers and non-technical members of an organization to manage
their flags and experiments:
</p>