Bandit Algorithms for Website Optimization
John Myles White
Format: PDF / Kindle (mobi) / ePub
When looking for ways to improve your website, how do you decide which changes to make? And which changes to keep? This concise book shows you how to use Multiarmed Bandit algorithms to measure the real-world value of any modifications you make to your site. Author John Myles White shows you how this powerful class of algorithms can help you boost website traffic, convert visitors to customers, and increase many other measures of success.
This is the first developer-focused book on bandit algorithms, which were previously described only in research papers. You’ll quickly learn the benefits of several simple algorithms—including the epsilon-Greedy, Softmax, and Upper Confidence Bound (UCB) algorithms—by working through code examples written in Python, which you can easily adapt for deployment on your own website.
- Learn the basics of A/B testing—and recognize when it’s better to use bandit algorithms
- Develop a unit testing framework for debugging bandit algorithms
return def select_arm(self): t = sum(self.counts) + 1 temperature = 1 / math.log(t + 0.0000001) z = sum([math.exp(v / temperature) for v in self.values]) probs = [math.exp(v / temperature) / z for v in self.values] return categorical_draw(probs) def update(self, chosen_arm, reward): self.counts[chosen_arm] = self.counts[chosen_arm] + 1 n = self.counts[chosen_arm] value = self.values[chosen_arm] new_value = ((n - 1) / float(n)) * value + (1 / float(n)) * reward self.values[chosen_arm] = new_value
before it starts to apply its confidence-based decision rule. It’s important to keep this initialization step in mind when you consider deploying UCB1: if you will only let the algorithm run for a small number of plays (say M) and you have many arms to explore (say N), it’s possible that UCB1 will just try every single arm in succession and not even make it to the end. If M < N, this is definitely going to occur. If M is close to N, you’ll still spend a lot of time just doing this initial
will work in practice is important because there is no universal bandit algorithm that will always do the best job of optimizing a website: domain expertise and good judgment will always be necessary. To help you develop the intuition and judgment you’ll need, we’ve advocated a Monte Carlo simulation framework that lets you see how these algorithms and others will behave in hypothetical worlds. By testing an algorithm in many different hypothetical worlds, you can build an appreciation for the
happens 100 * X percent of the time. So saying that a coin comes up tails with probability 0.01 means that it comes up tails 1% of the time.) • If the coin comes up heads, the algorithm is going to exploit. To exploit, the algo‐ rithm looks up the historical conversion rates for both the green and red logos in whatever data source it uses to keep track of things. After determining which color had the highest success rate in the past, the algorithm decides to show the new visitor the color that’s
different payout schedule. For example, some of the slot machines in this hypothetical casino might pay out $5 on 1 out of 100 pulls, while other machines would pay out $25 on 1 out of 1,000 pulls. For whatever reason, the original mathematicians decided to treat the different slot ma‐ chines in their thought experiment as if they were one giant slot machine that had many arms. This led them to refer to the options in their problem as arms. It also led them to call this thought experiment the