Fast-forward a small decade and I now consider myself a (secular) bayesian. What is bayesian statistics *really* about ?

In the beginning of every probability book, you will see Bayes’ thoerem for events proudly stated as a direct consequence of the definition of conditional probability and the product (chain) rule. I argue here that it is not really helpful to think of bayes this way. If you wonder what Bayes’ theorem is, you can look it up online.

When I was in high school, I remember that we saw in a book that if you toss a coin 3 times and it leads 3 heads, the probability that the next toss will lead head too is … 0.5 (50%). I was a bit surprised by the result, because I overlooked the crucial fact that we knew the coin was unbiased, i.e. that it always leads to heads with a probability of 0.5.

However, if you don’t know if the coin is biased or not, then your belief that the next coin toss is going to lead heads should surely be a little higher than tails. A relevant question you can ask yourself is: at what odds would you accept to bet on tails ? Assuming you want your expected gain to be positive of course.

Born c. 1701, Thomas Bayes was a pioneer of statistics, and a minister of the Church. Coincidentally, these days some people are so deep into the Bayesian doctrine that we can consider them as being part of the *Bayesian church*.

In 1763 was published *An Essay towards solving a Problem in the Doctrine of Chances*, 2 years after Bayes’ death. In it, he states in the beginning the Bayes’ theorem as we know it. On wikipedia we can read:

it does not appear that Bayes emphasized or focused on this finding. Rather, he focused on the finding the solution to a much broader inferential problem:

“Given the number of times in which an unknown event has happened and failed [… Find] the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.”

Sounds like he was wondering about the coin toss problem !

During the 18 & 19th century, mathematicians such as Laplace and De Morgan refer to the *inverse probability* for the probability distribution of an unobserved variable. The term *Bayesian* was coined by Ronald Fisher in 1950, with the development of frequentism, to refer to the same thing.

Intuitively, statistics is about inverse problems : if you have some unobserved variables and possibly observed variables (often in the form of events), how can you assess the distribution of the unobserved variable ?

In Bayesian inference, parameters are random and data is not.

Here is the formula that we care about in inferential bayesian statistics:

The sign means proportional to, as such means for some **constant** .

In more precise terms, let be the unobserved random variable that we care about (weights & biases in machine learning for instance).

Let denote the distribution of . More rigorously, in this post loosely denotes either the probability of an event or the probability density (or probability mass function) of a random variable. When I write *distribution* it often means density of the random variable.

Let (for *Data*) denote our observed data, for instance for a regression or classification problem. Then the above formula can be written more precisely:

The **prior distribution** is the probability distribution of the unobserved random variable (r.v.) of interest *a priori*, i.e. without having seen any data / observed related r.v. / events. It may reflect your knowledge about the world.

The **likelihood** is the probability of the data conditioned on this unobserved random variable. This may sounds paradoxical: how can we condition on something we don’t observe ? Well, remember that this is a function of the unobserved random variable. As such, the likelihood is NOT a density, it doesn’t integrate to 1. We can put it another way: If we knew the unobserved variable, how likely would the data (events) we have observed be ?

The **posterior distribution** is what we ultimately care about. It describes the distribution of the *unobserved* r.v. condtioning on the data that we’ve collected, i.e. events that happened.

There are mainly 3 types of inference methods that we can leverage based on the above formula:

**Maximum likelihood estimation (MLE)**: we seek one point estimate of based on observations and the likelihood**Maximum a posteriori (MAP)**: we seek one point estimate of based on observations and prior knowledge - likelihood and prior**Full posterior inference**: we seek representative samples from the posterior distribution of based on the inference formula.

In this setting, we just optimize the likelihood to get the argmax. In this case, we ignore the prior and the posterior ; hence, we don’t really use the inference formula. Kevin Murphy refers to MLE and MAP as **poor man’s Bayes**.

This method is convenient because we can just optimize the function, with e.g. the gradient, and find the value of that maximizes the likelihood function. We can either compute the gradient in closed form and see where it vanishes, or run an iterative algorithm such as (stochastic) gradient descent.

In practice, we dont maximize the likelihood function but rather we minimize the negative log-likelihood function (NLL) with the logarithm function:

Note that in the case of a gaussian likelihood, for a regression, the negative log-likelihood is then equal (up to a constant) to the mean-squared error loss function, widely used in machine learning. In the linear case, this is the *Ordinary Least Squares (OLS)* method.

A more precise inference technique is to find the argmax of the posterior, that is the point that maximizes the posterior distribution. We call this value the **mode** of the distribution.

As with the likelihood, we use the negative logarithm. Since , we want to minimize the sum of the NLL and the negative log-prior.

Note that the MAP with a uniform prior is the same as MLE since in this case the log-prior is a constant with respect to .

Intuitively, the negative log-prior is a regularization term when we assume a centered random variable.

In case of a centered Gaussian prior, we find exactly the regularization framework. If the likelihoods is also Gaussian as above, we find the Ridge regression. The regularization parameter of the Ridge is then equal to the variance of the prior.

If you chose instead a Laplace prior, you will find the regularization, i.e. the Lasso regression.

In our case, we are now not restricted to Lasso or Ridge, we can choose any prior for the MAP as long as we can take its gradient (assuming we are using an algorithmic differentiation framework such as JAX). We can also freely pick the likelihood we want. How nice ! We are now only constrained by imagination (and tractable probability distributions).

The Expectation-maximization (EM) algorithm can also be used to find the MLE or MAP, if for some reason you can’t have the gradient.

If we are true bayesians, we need to sample from the posterior to find at least the posterior mean instead of the mode.

Last but not least, posterior inference is about efficiently sampling the posterior distribution. We want to obtain a set of samples on the **typical set** of the posterior distribution. This step is usually computationally intensive, as we need to produce enough samples to represent the full probability distribution of the posterior, as opposed to just finding the mode with e.g. MAP.

In the best situation, we have the (unnormalized) log-posterior function. That is, we have a computer function that computes the log-density of the posterior distribution for any .

So we won, right ? We have the function !

Well, not yet. We still have to sample from this distribution inside the typical set.

The typical set is the region where we have *both* high density and high volume. In high dimensions, it is crucial to think about volume, in that regions with high density don’t necessarily have high volume. In practice, we find the typical set with the sampling process that explores the parameter space.

A popular algorithm is Markov Chain Monte Carlo (MCMC), where we sample from the posterior by exploring the space in an iterative fashion. Each sample define a state in a markov chain, i.e. a sequential discrete process that has no memory, and we explore the space according to some *step* function.

We can also use Variational Inference, where we approximate the posterior density with a known parameterized family of distributions such as Gaussians that we can optimize with e.g. stochastic gradient descent; in this case, it becomes stochastic variational inference (SVI). Note that the similarity between posterior and the variational distribution is often evaluated with the Kullback-Leibler divergence.

If you need something even more scalable, paying the price of quality, you can use the Laplace approximation ^{1}.

The detailed machinery of inference algorithms are out of scope for this blog post, but the workhorse of modern bayesian inference. Indeed, for many problems the inference using e.g. MCMC is very computationally intensive, so scalable techniques are highly desired.

We’re going to solve the coin toss problem with Python and Numpyro, a probabilistic programming library based on JAX. We are going to explore full posterior inference with a MCMC algorithm (in our case the No U-Turn Sampler or NUTS).

Let’s assume a uniform prior for the bias of the coin, that is *any* bias is equally likely. Note that you can make one side of a coin heavier, so that it is biased, but I doubt that we could have a coin that always leads heads.

Anyways, as a first step let’s try this model.

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpyro as ny
import numpyro.distributions as dist
from numpyro.infer import MCMC, NUTS, Predictive
from jax import random
plt.style.use('ggplot')
%config InlineBackend.figure_format = 'svg'
```

Let’s describe the data. Note that here 1 means heads, so that the observations `[1, 1, 1]`

map to the realizations of our experiments.

```
tosses = np.asarray([1, 1, 1])
nb_heads = (tosses == 1).sum()
total_tosses = tosses.shape[0]
```

Then, let us define the model. As discussed above, we use a Uniform prior for the bias of the coin. Then, we use a binomial likelihood. The `nb_heads`

parameter use as `obs=nb_heads`

in the likelihood allows us to tell the model to condition on this observation for the inference. This parameter is optional because for prediction we don’t observe it.

```
def model(N, nb_heads=None):
p = ny.sample("p_heads", dist.Uniform())
ny.sample("nb_heads", dist.Binomial(N, probs=p), obs=nb_heads)
```

Let’s check a prior predictive simulation, i.e. what we would predict without having seen any data. We simulate 1 toss with 10_000 samples from the prior distribution:

```
prior_predictive = Predictive(model, num_samples=10_000)
all_prior_samples = prior_predictive(random.PRNGKey(5), N=1)
plt.hist(all_prior_samples["p_heads"], bins=30)
plt.title("(Beta) Prior probability distribution of the bias of the coin p(heads)")
```

```
prior_preds = all_prior_samples["nb_heads"]
prior_proba_next_heads = (prior_preds == 1).mean()
plt.hist(prior_preds)
plt.title(
f"Prior predictive probability of next toss to be Heads is {prior_proba_next_heads:.3f}"
)
```

Nothing surprising, the prior probability of the next toss being heads is approximately 0.5, since we have a uniform prior i.e. all possible biases are equally likely.

Now let’s run the inference using a Markov Chain Monte Carlo (MCMC) algorithm to infer the posterior distribution of the bias of the coin. We also plot this distribution and its mean.

```
mcmc = MCMC(NUTS(model), num_warmup=1000, num_samples=10000)
mcmc.run(random.PRNGKey(0), N=total_tosses, nb_heads=nb_heads)
posterior_p_heads_samples = mcmc.get_samples()["p_heads"]
plt.hist(posterior_p_heads_samples, bins=30)
plt.axvline(posterior_p_heads_samples.mean(), linestyle="--", color="red", label="mean")
plt.legend()
plt.title("Posterior distribution of the probability of Heads (bias of the coin)")
mcmc.print_summary()
```

```
sample: 100%|██████████| 11000/11000 [00:02<00:00, 5274.72it/s, 3 steps of size 1.05e+00. acc. prob=0.92]
mean std median 5.0% 95.0% n_eff r_hat
p_heads 0.67 0.15 0.68 0.43 0.91 3534.42 1.00
Number of divergences: 0
```

We obtain predictions directly with the `Predictive`

method. Note that since we simulate one toss, this is a Bernoulli experiment thus the predictive probability is exactly equal to the mean.

```
predictive = Predictive(model, mcmc.get_samples())
preds = predictive(random.PRNGKey(10), N=1)["nb_heads"]
proba_next_heads = (preds == 1).mean()
plt.hist(preds)
plt.title(
f"Posterior probability of next toss to lead to heads is approx. {proba_next_heads:.3f}"
)
```

Let’s modify the model with a prior:

```
def model(N, nb_heads=None):
p = ny.sample("p_heads", dist.Beta(3, 3))
ny.sample("nb_heads", dist.Binomial(N, probs=p), obs=nb_heads)
```

The prior predictive is almost the same since the prior is centered in 1. However, the posterior distribution is very different:

Same as before, since we only simulate one toss, the probability of the next toss being Heads is the mean of the posterior distribution:

As discovered by Bruno De Finetti, our belief about the world should translate into betting odds. In other words, what are the **fair** odds that you should accept for a given outcome ? The fair decimal odds for a given event of probability can be obtained by taking .

In the case of a uniform prior, the probability of the next coin toss leading to Heads is , hence odds of .

However, with a prior, we have a probability of , hence odds of . Way higher, for the same set of observations !

So, which is the right odds ? **Neither**. Because there are many scenarios, that is many possible biases, that could lead to the sequence *Heads - Heads - Heads* as the first three tosses. As Michael Betancourt would say, *embracing uncertainty with probability* is what makes us bayesian. Hence, not focusing on the mean of the posterior distribution of `p_heads`

but rather checking the whole extent of the density. This gives us plenty of information, from which we can draw credible intervals if we want.

As Georges Box famously said:

All models are wrong, but some are useful.

We could even argue that the *true* probability of an event doesn’t exist, it is merely a way to measure our incomplete information about the world in a specific context. In the real world, there is no probability, only events that happen or don’t happen. But what about quantum physics ? Well, we don’t deal with such a small scale here so I argue it’s irrelevant. We got a bit sidetracked, and to be honest this stuff is over my head.

Note: I wrote a bit more about how to translate probabilities to bets if you are interested in this stuff.

If by now you are not at least a little bit Bayesian, you must be deep into the frequentist cult! Joke aside, I hope that the mysteries of bayesian inference are now a bit less obscure, and that you will produce lots of great models with probabilistic programming.

*Many thanks to Théo Stoskopf, Théo Dumont, Haixuan Xavier Tao, Ai-Jun Tai for their precious feedback on early drafts of this post*

See e.g. https://github.com/aleximmer/Laplace↩︎

BibTeX citation:

```
@online{guy2024,
author = {Guy, Horace},
title = {An Introduction to Bayesian Inference},
date = {2024-04-24},
url = {https://blog.horaceg.xyz/posts/bayes},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2024. “An Introduction to Bayesian Inference.”
April 24, 2024. https://blog.horaceg.xyz/posts/bayes.

Due to the warm reviews, I tried first to read *Crafting Interpreters* and went though the first chapters, but I was quickly underwhelmed. Indeed I found the following:

- The language of the first part is Java, which I am not proficient in. I still managed to roughly translate the scanning part to python, but I am not familiar with Java so it adds a barrier for me.
- The imperative, stateful style is hard to wrap one’s head around.
- In order to have a working interpreter you have to go through the entire first part.
- I want to learn assembly, and this book doesn’t have any.

All in all, after the Scanning part I was unsatisfied and looked for something else.

So I pivoted and instead I went with University of California San Diego’s *Compiler Construction* course (codename CSE 131) of fall 2019, taught by Joe Gibbs Politz. Incidentally, he is also part of the Pyret language crew.

It checks almost all of my (digital) boxes:

- The code is in OCaml, which I know a bit
- It goes all the way to Assembly
- References the classic
*Modern compiler implementation in ML*by Andrew W. Appel (also known as the Tiger book), which is on my to-do list after this course - The approach is incremental: at the end of each assignment, I have a fully functional compiler !

The only minor drawback is that I am more interested in ARM than x86_64, because from what I’ve read x86 is messier than ARM.

I find it rewarding since now, after the first assignment, I have an end-to-end compiler for a calculator with variable. Of course, this is not a full-fledged language but I’m going to incrementally add features.

I just wish it were a book because I prefer this format to video lectures.

The course leverages basic features of the `OCaml`

programming language and relies on makefiles for the build system. I tried to follow the style but unfortunately I didn’t get satisfactory intellisense in vscode without `dune`

, the newish `OCaml`

build system. So I converted everything to use `dune`

.

Also, for unit test the old `OUnit`

is used, so I changed to leverage the more user-friendly and colorful `Alcotest`

.

Look at this colorful and explicit output ! |

The structure is the following:

```
~/dev/cse131/pa1 main*
❯ tree -C
.
├── bin
│ ├── dune
│ ├── main.c
│ └── main.ml
├── dune-project
├── input
│ ├── 42.ana
│ ├── addnums.ana
│ ├── letdup.ana
│ ├── letdup_valid.ana
│ ├── letlet.ana
│ └── sub1.ana
├── lib
│ ├── asm.ml
│ ├── compile.ml
│ ├── compile_simple.ml
│ ├── dune
│ ├── expr.ml
│ ├── parser.ml
│ └── runner.ml
├── Makefile
├── output
├── pa1.opam
├── README.md
└── test
├── dune
├── myTests.ml
└── test_pa1.ml
5 directories, 23 files
```

My own code is on github.

[edit: January 14 2024]

I stopped after the second assignment because the lectures started to get hairy and I got the feeling that everything wasn’t recorded. And also because of other constraints (life). Still, the mini-compiler should be functional.

All-in-all, I enjoyed working on this project.

BibTeX citation:

```
@online{guy2022,
author = {Guy, Horace},
title = {Crafting Compilers},
date = {2022-10-23},
url = {https://blog.horaceg.xyz/posts/compiler-1},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2022. “Crafting Compilers.” October 23, 2022.
https://blog.horaceg.xyz/posts/compiler-1.

TL;DR: resume.horaceg.xyz

I saw a blog post by Christophe-Marie Duquesne that seemed to exactly fit the bill. It is even inspired by *moderncv* !

So I ~~copied~~ took insipiration from it and wrote my own experience. However, it didn’t display well on mobile since the left section overflows to the center one.

Since we are in 2022, mobile first and all, I modified it to look good on mobile as well as *desktop* and *pdf*.

The first thing I had to modify was chaning the body `width`

property to `max-width`

:

```
body {
...
max-width: 900px
}
```

Then, I had to manage the overflow of the left section on mobile:

```
dt {
...
overflow-wrap: break-word;
}
```

This fixed the first issue: now one could somewhat read it on mobile !

Still, it was hardly readable since there’s a lot of text that spans many paragraphs for each item. So here I had the idea that I need to make the individual items collapsible.

Fortunately, there is exactly an HTML element to do this: `<details>`

& `<summary>`

! There is no specific markdown syntax to deal with these, so I had to put it as-is in the `.md`

file (having tried and failed to write a custom pandoc filter).

```
2019 - now
: <details open><summary>*Data scientist* in R&D at **Deepki** (Paris, France)</summary>
Data science improving energy efficiency of buildings.
...
</details>
```

This looks good, but there’s still an improvement possible. I want to show only the summary on mobile, and show the full, extended version on desktop.

This is where this bit of `javascript`

helped:

```
window.addEventListener("load", function () {
var elements = document.getElementsByTagName("details");
for (let e of elements) {
if (window.innerWidth < 500) {
e.open = false;
} else {
e.open = true;
}
}
})
```

On the `load`

event, this script toggles the `open`

attribute of all the `details`

elements based on the screen width.

There’s one thing left to be desired: since it does not look like a button, users won’t necessrily understand that collapsed sections are clickable. One could make it more obvious by styling the summary like a button, but I liked the clean simple design.

So I added this snippet of css:

```
summary::after {
content: "\a" attr(preview);
white-space: pre;
opacity: 0.5;
}
details[open] > summary::after {
content: none;
}
```

On collapsed, it shows the `preview`

attribute of the `summary`

element with half opacity, preceded by a newline `"\a"`

.

Now it works but I still have to manually fill the preview attributes, which is boring and error-prone in case I change items.

`js, hl_lines=5-13 window.addEventListener("load", function () { var elements = document.getElementsByTagName("details"); for (let e of elements) { ... e.children[0].setAttribute( "preview", e.outerHTML .split("<p>")[1] .split(" ") .slice(0, 4) .join(" ") .replace("amp;", "") + "..." ); } });`

Now, the preview attribute of all details elements contains the first 4 words of the following section, plus “…” appended at the end.

To build in html, I need to include the javascript in the header of the html file. Since pandoc uses `wkhtmltopdf`

, and the latter puts large margins everywhere by default, I need to specify the margins at zero for them to be handled by the css.

In `build.bash`

:

```
#! /bin/bash
set -e
date
pandoc -s --from markdown --to html \
-H <(echo "<script>" $(cat script.js) "</script>") \
-c style.css -o resume.html resume.md
pandoc -s --from markdown --to html \
-V margin-top=0 -V margin-left=0 -V margin-right=0 -V margin-bottom=0 \
-V papersize=letter \
-c style.css -o horace_guy.pdf resume.md
```

When I edit the document, I use `entr`

to watch for files in the folder and build on change. In `watch.bash`

:

```
#! /bin/bash
set -e
ls *.{md,css,js} | entr ./build.bash
```

The dev workflow is now: `./watch.bash`

.

Well, we need to put online an html file, a css file and a pdf file. I use sitejs with `site push`

to put it online on a $5/month Linode box.

All in all, I am happy with this new resume. I need to focus more on the content now that I have the template. All input files are in the github repo

BibTeX citation:

```
@online{guy2022,
author = {Guy, Horace},
title = {The Responsive Markdown Resume},
date = {2022-02-24},
url = {https://blog.horaceg.xyz/posts/resume},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2022. “The Responsive Markdown Resume.”
February 24, 2022. https://blog.horaceg.xyz/posts/resume.

I’ve struggled for a long time with concurrency and parallelism. Let’s dive in with the hot-cool-new ASGI framework, FastAPI. It is a concurrent framework, which means `asyncio`

-friendly. Tiangolo, the author, claims that the performance is on par with Go and Node webservers. We’re going to see a glimpse of the reason (spoilers: concurrency).

First things first, let’s install FastAPI by following the guide.

We are going to simulate a pure IO operation, such as an waiting for a database to finish its operation. Let’s create the following `server.py`

file:

```
## server.py
import time
from fastapi import FastAPI
app = FastAPI()
@app.get("/wait")
def wait():
duration = 1.
time.sleep(duration)
return {"duration": duration}
```

Run it with

`uvicorn server:app --reload`

You should see at `http://127.0.0.1:8000/wait`

something like:

`{ "duration": 1 }`

Ok, it works. Now, let’s dive into the performance comparison. We could use ApacheBench, but here we are going to implement everything in python for the sake of clarity.

Let’s create a `client.py`

file:

```
## client.py
import functools
import time
import requests
def timed(N, url, fn):
@functools.wraps(fn)
def wrapper(*args, **kwargs):
start = time.time()
res = fn(*args, **kwargs)
stop = time.time()
duration = stop - start
print(f"{N / duration:.2f} reqs / sec | {N} reqs | {url} | {fn.__name__}")
return res
return wrapper
def get(url):
resp = requests.get(url)
assert resp.status_code == 200
return resp.json()
def sync_get_all(url, n):
l = [get(url) for _ in range(n)]
return l
def run_bench(n, funcs, urls):
for url in urls:
for func in funcs:
timed(n, url, func)(url, n)
if __name__ == "__main__":
urls = ["http://127.0.0.1:8000/wait"]
funcs = [sync_get_all]
run_bench(10, funcs, urls)
```

Let’s run this:

`python client.py`

`0.99 reqs / sec | 10 reqs | http://127.0.0.1:8000/wait | sync_get_all`

So far, we know that the overhead is sub-10 ms for ten requests, so less than 1ms per request. Cool!

Now, we are going to simulate multiple simultaneous connections. This is usually a problem we want to have: the more users of our web API or app, the more simultaneous requests. The previous test wasn’t very realistic: users rarely browse sequentially, but rather appear simultaneously, forming bursts of activity.

We are going to implement concurrent requests using a **threadpool**:

```
## client.py
...
from concurrent.futures import ThreadPoolExecutor as Pool
...
def thread_pool(url, n, limit=None):
limit_ = limit or n
with Pool(max_workers=limit_) as pool:
result = pool.map(get, [url] * n)
return result
if __name__ == "__main__":
urls = ["http://127.0.0.1:8000/wait"]
run_bench(10, [sync_get_all, thread_pool], urls)
```

We get:

```
0.99 reqs / sec | 10 reqs | http://127.0.0.1:8000/wait | sync_get_all
9.56 reqs / sec | 10 reqs | http://127.0.0.1:8000/wait | thread_pool
```

This looks 10x better! The overhead is of 44 ms for 10 requests, where does that come from?

Also, how come the server was able to answer asynchronously, since we only wrote synchronous (regular) Python code? There are no `async`

nor `await`

…

Well, this is how FastAPI works behind the scenes: it runs every synchronous request in a threadpool. So, we have threadpools both client-side and **server-side**!

Let’s lower the duration:

```
## server.py
...
@app.get("/wait")
def wait():
duration = 0.05
time.sleep(duration)
return {"duration": duration}
```

Let’s also run the benchmark 100 times:

```
## client.py
...
if __name__ == "__main__":
urls = ["http://127.0.0.1:8000/wait"]
run_bench(100, [sync_get_all, thread_pool], urls)
```

```
15.91 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | sync_get_all
196.06 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | thread_pool
```

We can see there is some overhead on the server-side. Indeed, we should have requests per second if everything worked without any friction.

`async`

routesThere is another way to declare a route with FastAPI, using the `asyncio`

keywords.

```
## server.py
import asyncio
...
@app.get("/asyncwait")
async def asyncwait():
duration = 0.05
await asyncio.sleep(duration)
return {"duration": duration}
```

Now just add this route to the client:

```
## client.py
if __name__ == "__main__":
urls = ["http://127.0.0.1:8000/wait", "http://127.0.0.1:8000/asyncwait"]
run_bench(10, [sync_get_all, thread_pool], urls)
```

And run the benchmark:

```
15.66 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | sync_get_all
195.41 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | thread_pool
15.52 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | sync_get_all
208.06 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
```

We see a small improvement. But isn’t asyncio supposed to be very performant? And Uvicorn is based on uvloop, described as:

Ultra fast asyncio event loop.

Maybe the overhead comes from the client? Threadpools maybe?

`asyncio`

kool-aidTo check this, we’re going to implement a fully-asynchronous client. This is a bit more *involved*. Yes, this means `async`

s and `await`

s. I know you secretly enjoy these.

Just do `pip install aiohttp`

, then:

```
## client.py
import asyncio
...
import aiohttp
...
async def aget(session, url):
async with session.get(url) as response:
assert response.status == 200
json = await response.json()
return json
async def gather_limit(n_workers, *tasks):
semaphore = asyncio.Semaphore(n_workers)
async def sem_task(task):
async with semaphore:
return await task
return await asyncio.gather(*(sem_task(task) for task in tasks))
async def aget_all(url, n, n_workers=None):
limit = n_workers or n
async with aiohttp.ClientSession() as session:
result = await gather_limit(limit, *[aget(session, url) for _ in range(n)])
return result
def async_main(url, n):
return asyncio.run(aget_all(url, n))
```

We also add this function to the benchmark. Let’s also add a benchmark with 1000 request, just for async methods.

```
## client.py
if __name__ == "__main__":
urls = ["http://127.0.0.1:8000/wait", "http://127.0.0.1:8000/asyncwait"]
funcs = [sync_get_all, thread_pool, async_main]
run_bench(100, funcs, urls)
run_bench(1000, [thread_pool, async_main], urls)
```

The results can be suprising:

```
15.84 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | sync_get_all
191.74 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | thread_pool
187.36 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | async_main
15.69 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | sync_get_all
217.35 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
666.23 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | async_main
234.24 reqs / sec | 1000 reqs | http://127.0.0.1:8000/wait | thread_pool
222.16 reqs / sec | 1000 reqs | http://127.0.0.1:8000/wait | async_main
316.08 reqs / sec | 1000 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
1031.05 reqs / sec | 1000 reqs | http://127.0.0.1:8000/asyncwait | async_main
```

It appears that the bottleneck was indeed on the client-side! When both sides are asynchronous - and there is a lot of IO - the speed is impressive!

This is all great, until some heavy computation is required. We refer to these as *CPU-bound* workloads, as opposed to *IO-bound*. Inspired by the legendary David Beazley’s live coding, we are going to use a naive implementation of the Fibonacci sequence to perform heavy computations.

```
## server.py
...
def fibo(n):
if n < 2:
return 1
else:
return fibo(n - 1) + fibo(n - 2)
@app.get("/fib/{n}")
def fib(n: int):
return {"fib": fibo(n)}
```

Now, when I open two terminals, running `curl -I http://127.0.0.1:8000/fib/42`

in one and `python client.py`

in the other, we see the following results:

```
8.75 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | sync_get_all
54.94 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | thread_pool
60.64 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | async_main
9.52 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | sync_get_all
53.02 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
46.81 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | async_main
72.87 reqs / sec | 1000 reqs | http://127.0.0.1:8000/wait | thread_pool
122.97 reqs / sec | 1000 reqs | http://127.0.0.1:8000/wait | async_main
72.36 reqs / sec | 1000 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
51.73 reqs / sec | 1000 reqs | http://127.0.0.1:8000/asyncwait | async_main
```

It’s not that bad, but a bit disappointing. Indeed, we have 20x less throughput for the originally most performant one (`asyncwait`

route x `async_main`

client).

What’s happening here ? In python, there is a Global Interpreter Lock (GIL). If one request takes a very long time to be processed with high-CPU activity, in the meantime other requests cannot be processed as quickly: priority is given to the computations. We will see later how to take care of this.

For now, we try nested recursive concurrency. Let’s add:

```
## server.py
...
async def afibo(n):
if n < 2:
return 1
else:
fib1 = await afibo(n - 1)
fib2 = await afibo(n - 2)
return fib1 + fib2
@app.get("/asyncfib/{n}")
async def asyncfib(n: int):
res = await afibo(n)
return {"fib": res}
```

Let’s also add a timing middleware to our FastAPI app:

```
## server.py
...
from fastapi import FastAPI, Request
@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
start_time = time.time()
response = await call_next(request)
process_time = time.time() - start_time
response.headers["X-Process-Time"] = str(process_time)
return response
```

Now let’s test the speed:

`curl -D - http://127.0.0.1:8000/fib/30`

```
HTTP/1.1 200 OK
server: uvicorn
content-length: 15
content-type: application/json
x-process-time: 0.17467308044433594
{"fib":1346269}
```

And with async:

`curl -D - http://127.0.0.1:8000/asyncfib/30`

```
HTTP/1.1 200 OK
server: uvicorn
content-length: 15
content-type: application/json
x-process-time: 0.46001315116882324
{"fib":1346269}
```

It’s not that bad for overhead. But we see here a limitation of threads in Python: the same code in Julia would lead to a speed-up (using parallelism)!

So far we’ve used FastAPI with Uvicorn. The latter can be run with Gunicorn. Gunicorn forks a base process into `n`

worker processes, and each worker is managed by Uvicorn (with the asynchronous uvloop). Which means:

- Each worker is concurrent
- The worker pool implements parallelism

This way, we can have the best of both worlds: concurrency (multithreading) and parallelism (multiprocessing).

Let’s try this with the last setup, when we ran the benchmark while asking for the 42th Fibonacci number:

`pip install gunicorn`

`gunicorn server:app -w 2 -k uvicorn.workers.UvicornWorker --reload`

we get the following results:

```
19.02 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | sync_get_all
216.84 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | thread_pool
223.52 reqs / sec | 100 reqs | http://127.0.0.1:8000/wait | async_main
18.80 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | sync_get_all
400.12 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
208.68 reqs / sec | 100 reqs | http://127.0.0.1:8000/asyncwait | async_main
241.06 reqs / sec | 1000 reqs | http://127.0.0.1:8000/wait | thread_pool
311.40 reqs / sec | 1000 reqs | http://127.0.0.1:8000/wait | async_main
433.80 reqs / sec | 1000 reqs | http://127.0.0.1:8000/asyncwait | thread_pool
1275.48 reqs / sec | 1000 reqs | http://127.0.0.1:8000/asyncwait | async_main
```

Which is on par (if not a bit better!) than with a single Uvicorn process

The final files (client and server) are available as a github gist

I wholeheartedly recomment this amazing live-coding session by David Beazley. Maybe you can google websockets first, just to get that they open a bi-directional channel between client and server.

You can also read this detailed answer from stackoverflow to grasp differences between concurrency and parallelism in python.

BibTeX citation:

```
@online{guy2021,
author = {Guy, Horace},
title = {Concurrency in {Python} with {FastAPI}},
date = {2021-09-10},
url = {https://blog.horaceg.xyz/posts/python-concurrency},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2021. “Concurrency in Python with FastAPI.”
September 10, 2021. https://blog.horaceg.xyz/posts/python-concurrency.

In March 2020, lockdown measures were implemented all over the world due to the spread of Covid-19. I was at home, a bit bored and I was looking for something to do.

**TL;DR**: - Kaggle submission - Github repo

Google and Apple published datasets related to (anonymized) activity of smartphones users. They called this *mobility* data, although this doesn’t show trajectories but rather the frequentation of some categories of places, such as grocery stores, residential or workplaces.

At first, there were no csv files, only pdf graphs that did not show raw data. I found a reddit post that extracted some of the data from the reports (for US regions), and I extended it to make it work with worlwide data. I then published it on a small website I made for the occasion.

The code to do the download + parsing + extraction is available on a github repo

The Uncover challenge appeared on Kaggle a few days afterwards. From the description:

The Roche Data Science Coalition (RDSC) is requesting the collaborative effort of the AI community to fight COVID-19. This challenge presents a curated collection of datasets from 20 global sources and asks you to model solutions to key questions that were developed and evaluated by a global frontline of healthcare providers, hospitals, suppliers, and policy makers.

The task that immediately caught my attention:

How is the implementation of existing strategies affecting the rates of COVID-19 infection?

I had a (not so secret) weapon: mobility data.

First I did some exploration and dataviz leveraging the ACAPS dataset, Google mobility data and epidemiology data (number of cases, deaths etc.).

I exported a jupyter notebook to html, and since I leveraged Altair based on Vega-lite, the data is embedded in the js so it works without a server.

Then I found this Kaggle notebook by @anjum48 (*Datasaurus*), published in the Covid forecasting competition (another Kaggle competition). I like the scientific approach: based on well-known Compartmental models in epidemiology, he extends it to fit the Covid-19 situation.

After this preliminary work, I reach out to two friends for a collaboration: a data scientist and a quant analyst. They were also motivated by the project ! So we went to work on the challenge with this special forces team.

I had heard about bayesian inference before, but did not know how it worked. We decided to explore this path. As a first step, we chose a framework. We went for the young an promising Numpyro, based on JAX and made by some of the (very skilled) Pyro authors. This is also how I discovered JAX.

An example in particular seemed to fit our problem: a prey-predator differential equations model.

Imperial College in London had already started to publish a series of papers called the Covid-19 reports. Report 13 was particularly relevant for our challenge, and can be found here.

Kaggle required a notebook submission. Collaboration on notebook is particularly challenging: indeed, the JSON format that mixes code and data is not git-friendly. So, I developed a small ipython magic command that allows us to modify code inside a notebook cell, and when executed it saves it into a specific file. The notebook wasn’t version-controlled, but every cell snippet was. It wasn’t ideal, since the imports were not included in every cell.

*In fine*, we implemented bayesian Covid-19 models that perform:

- An estimation of the effective reproduction number as a function of mobility data and reported deaths
- An estimation of mobility as a function of non-pharmaceutical interventions

See the kaggle submission for the full analysis.

The Roche team accepted our submission, so we won the challenge for this particular task.

BibTeX citation:

```
@online{guy2021,
author = {Guy, Horace},
title = {Bayesian Epidemiology},
date = {2021-09-09},
url = {https://blog.horaceg.xyz/posts/uncover-covid},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2021. “Bayesian Epidemiology.” September 9,
2021. https://blog.horaceg.xyz/posts/uncover-covid.

As a data scientist by day, and tinkerer by night, I am often browsing Hacker News and the like in order to be up-to-date with the latest cool technologies.

On the scientific computing side, there is one behemoth: Python. However, it still isn’t on par with R for statistics. `Pandas`

is often criticized and compared to the supposedly more user-friendly `dplyr`

. The GIL prevents true multithreading. There is a fragmented community that can have a hard time understanding each other with the introduction of:

- Static types
- Asynchronous IO (
`asyncio`

) - Scientific python:
- foundation stack:
`numpy`

,`pandas`

,`matplotlib`

,`scipy`

,`scikit-learn`

- automatic differentiation stacks:
`tensorflow`

,`pytorch`

,`jax`

…

- foundation stack:

There are web developers, tinkerers, scientists, machine learning researchers, engineers and even business people are getting up to speed with Python nowadays! This is a wonderful success, and I believe no other programming language has ever been thriving that much.

It could appear the needs of these communities are disparate: although it might be useful here and there, few scientists will bother with asyncio, since the workloads are mostly CPU-bound.

However, these can mix and match, as we can see with the popularity of FastAPI, that leverages static types and concurrent programming, and is used for machine learning REST endpoints at e.g. Uber and Microsoft (as stated in their landing page).

Every day witnesses tens of new Python packages, many of which relate in some way to machine learning. What an exciting time to be a Python dev in machine learning! `:)`

There’s one slight problem with this situation though: numerical code in pure python will never be nearly as fast as in a low-level language like C, C++ or Rust. Thus, there will always be a great divide between library developers that need to leverage all the machine hardware, and library users, probably scientists, that won’t be able to tinker with the internals. Of course, there are some exceptions but statistically I believe this is true to some degree. I, for one, never once even glanced at some C++ source code of e.g. XLA or `jaxlib`

, which I use regularly.

Julia is also gaining traction, a dozen years after its inception. Very fast once JIT-compiled, its founders want to solve the two-languages problem in scientific computing: (at least) one low-level language used by library developers for fast numerical computation, e.g. C, C++, Rust, and one language for high-level manipulation, e.g. Python, R, Matlab.

Julia offers a unified stack in numerical computing, with the ease of manipulation of Matlab for arrays, the development speed of Python, and the speed of C. It already has a vibrant ecosystem of packages, and is better suited than Python for statistics, optimization, differential equations. However, the huge boom right now is in Machine learning, and even though Julia has strong libraries to offer, network effects retain most people on Python.

One promising Julia library is the mysterious Diffractor.jl. Keno Fischer, the author, has gone deep into Category Theory to implement a next-gen (sic!) automatic differenciation system. The potential is huge: imagine you could efficiently differentiate any Julia code! The first thing I will do when it’s ready is to implement a Markov Chain Monte Carlo engine.

Some very skilled colleagues of mine mentioned an emerging programming language: Elixir. I didn’t bother until I discovered that Jose Valim launched the Numerical Elixir project: - Nx, for multi-dimensional arrays - Axon, for neural networks - Explorer, for dataframes

and a blog post annoncing an innovative development environment: Livebook

This has been my gateway drug to web app development. Indeed, I googled Liveview and witnessed Chris McCord building a Twitter clone in 15 minutes with it. A few days later, I was avidly following Pragmatic Studio’s Liveview course and was instantly hooked (no pun intended) on Elixir, Phoenix and Liveview. Stay tuned for new projects!

I want to learn more about the web, *THE* platform. I have started to experiment with Svelte, and I picked up JS (and HTML) basics along the way.

However, I lack the CSS fundamentals to build a pleasant webpage, which is why I send all my gratitude to the wizards that built all these amazing open-source Static Site Generators (SSG) and themes!

When I am ready, SvelteKit will probably have reached 1.0 already and I will be able to make a blog with MdSvex, inserting Svelte components into the articles written in Markdown. I would love to build a cool portfolio like Markus Hatvan.

I am also very intrigued by Tailwind CSS which is all the rage right now ; so I don’t know if I should learn vanilla CSS or Tailwind first.

BibTeX citation:

```
@online{guy2021,
author = {Guy, Horace},
title = {Things {I’m} Excited About},
date = {2021-09-08},
url = {https://blog.horaceg.xyz/posts/excited-about},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2021. “Things I’m Excited About.” September 8,
2021. https://blog.horaceg.xyz/posts/excited-about.

I learned early about static website, first with Jekyll and Github pages, in 2016. At the time, I was using Windows 10 and wasn’t quite experienced. Since Jekyll is written^{1} in Ruby, I needed to bootstrap Ruby and the many gems required to run the blog. It was painful. I couldn’t understand the cryptic error messages ; for every update everything broke, I don’t think I used `rbenv`

at all, etc.^{2}.

I still managed to publish to Github pages somehow, and I was very proud of it. However, a few days after the first post, I gave up. I still learned to write in Markdown and initiated my software journey. It was somewhat fun to play with the terminal, the filesystem, the web.

Fast-forward three years since my failed attempt. This time, I had some content: a summary of a peculiar political book I was particularly excited about. I also had a bit more experience with the terminal and I was using Windows Subsystem for Linux (WSL) to have a *nix shell available.

I wrote the post first, in Markdown, then I hunted online for a static site generator (SSG) easier than Jekyll. This is when I found Hugo: a single binary and you’re good to go!^{3}

The next step was finding a good theme. Once I found one that seemed to fit the bill (with e.g. built-in search), I needed to wrestle with git submodules to make it work.

*In fine*, I managed to publish my content to the world, this time using Netlify to have more flexibility than Github pages. It worked okay, but I had some troubles with large image files that took a very long time to load. Moreover, the built-in search wasn’t very good and I wasn’t excited about the whole look of the website.

Two years later: here we go again. This time, I have a few years of experience under my belt, I am using Linux as my daily-driver (and MacOS at work). I’m much more comfortable using a terminal and software developemnt in general.

A friend of mine had been excited about Rust for a while, and I knew that there were some SSGs written in it. So, I browsed Zola themes hoping to find a decent-looking website. Since there aren’t that many themes for Zola, I quickly found a good one ^{4}.

With this setup, I had everything I was looking for:

- A single binary: no dependencies
- Blazingly fast
- support
- A good search UI & UX
- Image processing built-in
- A nice theme
- Bulma CSS, easy for the CSS newbie that I am

So i just went for it on a whim. It took me three hours to write my first post, set the dev environment up and deploying it to Cloudflare Pages. I even have web anayltics for free, *without additional javascript*^{5}: no cookies, yay!

Overall I am very happy with the experience so far. The website could be leaner and faster, but I believe it is fast enough. For now. And it’s free (as in beer).

By Tom Preston-Werner, founder and former CEO of Github.↩︎

Later on, I learned that I wasn’t the only one experiencing this fatigue.↩︎

No pun intended since Hugo is written in Go.↩︎

Actually this theme ships with the Google Analytics script, since there is no tree-shaking nor dead-code elimination (yet)↩︎

BibTeX citation:

```
@online{guy2021,
author = {Guy, Horace},
title = {Setting up Your Own Blog},
date = {2021-09-07},
url = {https://blog.horaceg.xyz/posts/static-blog},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2021. “Setting up Your Own Blog.” September 7,
2021. https://blog.horaceg.xyz/posts/static-blog.

Have you ever placed a bet ? Wondered what the odds actually mean ?

It’s all linked to probability.

Let us place focus on a betting situation: a bookie offers odds on an event , and on its complementary . There can be multiple binary events at the same time (or close). We will then extend to multiple outcomes.

In our case, we consider real-life processes, which are physical, and their outcomes. We regularly take the example of sports gambling, where processes are fixtures, which generate various outcomes.

For a given bet, we have (decimal) *odds* such that if you place a bet with *stake* on event , you lose in all cases, and if you win, you get in addition. So your total net gain is in case of a win, and otherwise.

Denote

- the inverse odds. This quantity is usually
*not*a probability.

Define the net unit gain:

is then the net gain of placing a bet on event with stake s and inverse odds .

Let us also define the *natural* bet, that is the bet on with a stake of :

We will see that this quantity is useful to stake the bets in order to combine them. Intuitively, it makes sense to stake bets with low odds with large money since you are (supposedly) likely to win, and bets with high odds with fewer money since you are (supposedly) unlikely to win.

Define a probability on the universe , and the probability of . This might be *the true* probability law on sports events, or your belief of it, or someone else’s belief. All we know it that it is a coherent belief since satisfies the axioms of probability.

A bet is *sure* when its gain is deterministic, i.e. there is not randomness involved.

When there is no possible confusion, we write and

We say that the odds are *coherent* for a given bookie, or that a *book* is coherent, if we have the following:

- when

It is another way to say that is additive.

Then we can deduce other rules:

This last equality shows that the **booksum** is universal. This also allows us to prove that

Which means that is a measure, and which is finite.

Warning: , rather . We will see that afterwards, this allows the bookie to make a profit.

If we apply the right transformation to , we might get (approximately) the **implied probability** law estimated by the bookie, and his margin on each bet.

Note that the popular saying that is false, since those do not sum to 1. To go further see the implied probability section.

We say that (for ) if we have .

- For ,

i.e.

Let , such that . We have :

- .

i.e.

- .

In the general case, when , we have:

When we apply the additivity formula for , we get:

i.e.

Unsurprinsingly, this is a sure bet. It is the opposite of the *margin* of the bookie. That is the gain of the unit bet on the sure event , placing a bet on both and with the proper stakes ratio.

By identifying , we recognize that:

This quantity is the **booksum** in general (for a given event ).

Let us write what is now trivial:

Note that we always have , even if the odds are uncoherent. However, if the odds are not coherent, we can’t say that this quantity is indeed . For instance, the bookie could theoretically offer other odds for the event , leading to . However, would they really offer such a bet ?

In the following sections, we do not assume coherent odds. However, we use the notations

A *fair* bet implies . Since , we then have:

- .

In this case, the odds are coherent since is a probability measure. We have , that is .

We also have the following:

This last formula is crucial to understand fairness. It is useful to hedge your bets with the right amount, since e.g. . More on this at Negative bets.

Note that even if bookkeepers were unwilling to make a profit, they don’t really know the true probability (is there such a thing?). So the best they can do is to offer odds that are fair for a probability measure , that is the reflection of their honest and coherent beliefs. This is equivalent to , and thus to the two following statements:

is coherent (additive), e.g. is a (finite) measure

Let’s compute the original hedging quantity that sums to zero in a fair betting situation. Recall that is the gain of placing a bet on combined with a bet on with the proper stakes. In an unfair (real) situation, this does not sum to zero.

If the booksum , then : this means that one can bet on both outcomes and make a sure profit, according to any outcome. This is called an *arbitrage* (or *arb*).

Thus, a bookie needs to set the booksum , otherwise this will lead to quick bankruptcy: everyone *in the know* will bet on both outcomes (with the proper stakes ratio).

Moreover, the higher the booksum, the higher the profits for bookies since . Recall that your loss is the bookie’s gain.

However, since maximizing the booksum means minimizing the average odds, informed bettors will likely place their bets with another bookie. Thus, for the bookie, it is all about maximizing the booksum while still attracting customers, keeping in mind that competitors do the same.

Let us now consider this *smart bookie* configuration: . Is there still any opportunity to make money as a bettor (and lose money as a bookie)? Well, *probably*. Recall the expected gains for a given event:

This means that if you trust your belief , and that it is higher than the one the bookie offers you:

- with the margin

then you should bet. In practice, you should probably look for differences that are wide enough for you to bet.

However, in this case there is no sure bet, and hence no sure profit = there is only an estimate of the expectation. Hence, this is much riskier than an arbitrage. In order to make a profit based on this, you need to bet on many opportunities where you find a positive expectation, so that on average your gain should be positive. If you have a robust estimation method, then you will make a profit if you size your bets correctly (see e.g. the Kelly criterion). On the other hand, if you your estimates are often off compared to the bookies’, then you will eventually go bankrupt (see the gambler’s ruin).

*Negative* bets are simply when you take the role of the bookie: you offer someone else a bet. Then, their gain is exactly your loss, and vice-versa. Hence, placing a negative bet is simply the opposite of the positive bet that the person takes. But what if you cannot offer someone else a bet at the price you buy it? It is hard to find either a bookie that allows you to take his place, or an exchange that does not take commissions.

Well, we have to resort to . Recall that, in the general case:

.

If the **fair situation**, . Thus, we can simply place the opposite bet with the proper stake:

However, in the **unfair situation**, considering a smart bookie, we have:

where . This is the bookie’s margin: you cannot hedge yourself at net zero cost. You must pay the price to the bookie. You can clearly see that again by placing a sure net negative bet:

Note that if the bookie is not smart, or that you are dealing with several bookies at the same time, treating them as one bookie offering the highest odds for any event, you might see an opportunity where , that is . This quantity becomes you sure net gain in this arbitrage case.

In the case of incoherent odds, a bettor can build a *dutch book* against the bookie, that is a set of bets that constitutes a surebet. That is, provided the bookie offers odds on all relevant events, which they don’t in practice to avoid this situation.

See Appendix for the construction of surebets with the help of reversing (negative) bets.

If we think like a bookie, here are our tasks in order to thrive:

- Estimate the true probabilities as precisely as possible
- Set the odds lower than the estimated ones (inverse odds are thus higher than estimated probabilities)
- Attract bettors by offering high enough odds
- Minimize the risk

As a bookie, you want to minimize the *risk*, i.e. the largest sum of money you could potentially lose. Consider for example that a large number of people do bet on an unlikely outcome, with high odds, and this event realizes. In order to avoid this, the bookie can intentionally skew the odds while realizing that many people (or few people with large sums of money) are betting on this outcome. This can also mean that there is a value opportunity for bettors, i.e. that the odds are not set properly. In this example, the bookie can monitor the volume of bets, while computing the risk, and decrease it when it is too high by decreasing the corresponding odds and increasing the odds of opposite events, in order to attract bettors. He can also place bets at another bookie in order to hedge.

We want to know the *implied probabilities* of the odds, the probability estimated by the bookie. We actually need to reverse engineer the recipe that they use in order to maximize profit based on their estimate of the true probabilities, in order to recover them. This is basically an inverse problem.

Denote:

with the bookie’s (coherent) belief, the bookie’s margin on event A. The bookie certainly hopes that , and tries to do that.

An easy technique to get a probability from consistent (and unfair) odds is to normalize the inverse odds:

This assumes that the : the bookies’ margin is uniform on all outcomes, which is probably false in real situations. However, this provides a first guess that is easy to compute.

If we assume that is coherent, it is immediate that we have and thus is a probability measure.

See Shin 1991, Strumbelj 2014 for further analysis.

Observe that for any given event, is a convex set:

.

This gives us a useful formula to combine gains, betting on mutiple odds at once:

where .

In practice, you can also use with the odds:

If we have access to multiple odds at a given time, we have virtually access to all the odds in between. Why would you bet in-between though, since you should take the highest odds?

Also, this trick might be useful to *reduce* a position to one equivalent bet, or *expand* a single bet into multiple ones. This might also be useful to test a risk assessment system (with e.g. a property-based testing framework).

A more realistic situation arises when we place multiple bets on the same event at different times, while odds have been varying (in-play betting for example). If you did spot value at time and the odds move in your favour, then you should still bet at time . In order to simplify you position, you can virtuallly reduce your two bets to one equivalent bet at odds in-between.

A probablity defined on any is a finite measure that satisfies the property .

We say for iff i.e.

More generally, we say for iff

,

Let’s consider the case when the odds are incoherent. In the case of , if you chose to bet both on and separately, then you can actually (implicitly) bet on with the proper stakes proportion. We assume that the bookie offers (incoherent) odds for , , and . This situation never arises in the scope of binary outcomes sports but rather typically in football (soccer):

We introduce proper notation:

We want to compare and .

We compute the theoretical sum of a positive and a negative bet:

If the “coherent” version of the inverse odds for are low enough, there is an opportunity. One has to bet with the inverse formula:

Consider the betting opportunity:

We have a sure (triple) bet. We will make a profit if and only if this quantity is positive, i. e.:

This is exactly the same arbitrage formula as before, but with three outcomes.

This time we target . In this case, we place the following bets:

We use exactly the same formula for , and we get a betting opportunity:

This time, we do not assume .

We leverage the following:

i.e.

.

Let .

We essentially want to compare and .

Then, consider the following pseudo-bet with the pseudo-odds :

Now we need to compare this to

First, we target . In order to actually place an approximation of this bet, we need to reverse the two negative bets and .

First we have:

With the same reasoning, we have use .

We get the sure bet combinations:

Consider the opposite quantity:

We need to reverse the bets on A and B to get a sure bet:

Provided we can place bets on , , and , we need to check the two above-mentioned quantities. However, it is usually not possible to compose bets in such a fashion: the bookies does not offer any bet on .

BibTeX citation:

```
@online{guy2021,
author = {Guy, Horace},
title = {Mathematics of Bookmaking},
date = {2021-09-06},
url = {https://blog.horaceg.xyz/posts/betting-theory},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2021. “Mathematics of Bookmaking.” September
6, 2021. https://blog.horaceg.xyz/posts/betting-theory.

The first thing I made used Jupyter, and a nifty package called *Voilà*, that turns a notebook into a standalone app. For automatic differentiation, I used vanilla JAX. For the front-end, I used ipywidgets, bqplot and matplotlib [^ipympl].

Bqplot is amazing for classic plots, plays very well with ipywidgets for reactivity. I was unable to find a way to make it display a vector field nicely though, so I had to use interactive matplotlib, which is heavier and slower, to display the gradient of the loss.

I deployed this on Heroku with a small Procfile:

`web: voila --port=$PORT --no-browser techni.ipynb`

And boom! It was live!

In this setting, every state change triggers a round-trip to the server, through websockets (Jupyter is based on Tornado). Locally, this works great since there is zero latency between my computer and my browser (running on… my computer).

However, whenever I browsed the Heroku-hosted app, there were significant lags. There is now a latency due to the speed of light through the wire, which ranges from 100ms to 600ms depending on your region of the world.

I needed a proper front-end, so that every action on the client does not require a round-trip to the server. I also would have more control over how to react once a new payload is sent from the server.

Moreover, the dataviz I made were functional and did a good job of explaining the concept. However, they were a bit primitive: no transition nor interpolation between distinct states, which provides a bit of a boorish experience.

I decided to leverage the new hot JS framework Svelte, with reactivity built-in. So, I surgically removed the JAX bits of the code and moves them to a back-end built with FastAPI, that the Svelte front-end calls. The front-end is hosted on Cloudflare pages, and calls the API whenever the hyperparameters of the descent are modified.

For plotting, I used the experimental `pancake`

library by the creator of Svelte, Rich Harris. Thanks to Svelte `tweened`

and `spring`

motion functions, the transitions are smooth and easy on the eyes.

There is a shortcoming of this setting: the latency is still there. It is handled in a better way with async javascript, but if the API is hosted in, say Frankfurt, then a user in Sydney is not going to have a very snappy experience.

I was wondering: could it be possible to replicate the backend in a few data centers, and route users to the closest one in order to minimize the latency? Well, after a few days of research I found that this is exactly what Fly provides. So, I went a bit crazy and replicated the back-end in 10 regions and thoroughly tested the latency with a VPN (accounting for the additional round-trips). It seemed to work well enough!

Now I had

- A front-end replicated on a CDN on 200+ locations
- A back-end replicated at 10 locations

At this point, you might think that this is a tad overkill for a simple pedagogical app, and I don’t think you would be very wrong.

I ported the JAX code to TensorflowJS, on Observable at first and then directly in the Svelte front-end. Now I have a fully client-side app: no server, yay !?

For the fast 30-step gradient descent, this works great and the computing happens instantly. However, when I add a simple neural network, I hit the performance wall: what takes 5ms in (already JIT-compiled) JAX, takes 300ms in the browser! yay…

BibTeX citation:

```
@online{guy2021,
author = {Guy, Horace},
title = {The Need for Speed},
date = {2021-09-03},
url = {https://blog.horaceg.xyz/posts/need-for-speed},
langid = {en}
}
```

For attribution, please cite this work as:

Guy, Horace. 2021. “The Need for Speed.” September 3, 2021.
https://blog.horaceg.xyz/posts/need-for-speed.