Science – NC233

Micromorts – how much risk of death would you accept?

tl;dr
– A 1% death risk is deceptively high
– Micromorts (1/10000 th of a percent) are a useful scale to model death risks
– Statistical models for human life use ligh-tailed distributions. High values are extremely rare

Today a short post that I had in my drafts for a long time. I didn’t expect that it would (unfortunately) be so relevant to today’s context.

Life is finite and all human activities are risky. Although we all face a certain (hopefully low) risk of dying each time we breathe, it’s not enough reason to prevent us from doing any activities and live isolated in bubbles. But exactly how much risk of dying is acceptable? How much risk on your own life would you be willing to accept?

Living in a bubble would be fun! – xkcd.com

Turns out most people, sometimes even trained scientists, are bad at estimating death risk probabilities. They often underestimate how bad even seemingly low probabilities of dying can turn out to be. During a dramatic time of my life where I cared for a child who became suddenly sick. The head surgeon told us she had an 85% chance of making it.
So maybe you’re thinking oh it wasn’t that bad! And I mean I understand, 85 is pretty close to 100, situation’s looking fairly good, right?

I was terrified.

To put this number into perspective, imagine if all patients admitted faced such a risk. Let’s say doctors see 15 patients per hour, work 10 hours a day and that the department has 10 doctors. This represents approximately 35’000 patients per year, which seems to fit this UK data. With a 15% death rate, this department would have to deal with more than 5000 deaths over the course of the year, which is the number of people who died in the entire city of San Francisco in 2018! This is one death every 1 hour 40 minutes!

An activity with a 99% chance of survival would certainly kill you in less than a year

In fact, routine surgical procedures with risks greater than 5% are classified high risk. Even a 99% chance of survival doesn’t look so good. If you were enough of a daredevil to engage every day in an activity that exposes you to “only” 1% death probability, then you’d be almost certainly dead within a year.
(You can see this easily by using this easy rule of thumb: consider a random event occurring with probability p. Then there is a 95% chance that the event will occur in less than 3/p tries. In this case, this would be 3 / 0.01 = 300 days, which is less than a year)

Micromort – the right scale for death risks

As it turns out, percents are not the right risk scale to talk about death risks. Ronald A. Howard realized this in 1979 and created the notion of micromort. A micromort represents a one-in-a-million chance of dying. Wikipedia has a list of how much risk some activities expose you to:

A micromort is one in a million chance of dying – it is equivalent to tossing 20 coins and getting 20 heads

Wikipedia has a list of how much risk some activities expose you to:

One day alive at age 20 – 1 micromort
Skydiving (one jump) – 10 micromorts
One day alive at age 90 – 400 micromorts
Being infected by the Spanish flu – 30000 micromorts
An ascent to Mt Everest – 40000 micromorts

Using this scale, my child’s illness exposed her to 150000 micromorts… which suddenly looks much frightening, and a much more intuitive representation of the risk she was actually exposed to.
(Side note if you’re wondering: my kid is fine, and I am really grateful for this 🙏 Diane, if you ever read this you are the sunshine of my life! ❤️☀️)

If this starts feeling scary, don’t worry too much. A risk of one micromort is equivalent to tossing 20 coins and getting only heads. It’s pretty unlikely! The problem is that you’re playing this game every day, and that every once in a while you have to remove some coins. By the time you’re 50, you only have 17 coins left, by the time you’re 90, you only have 11 left.

Statistical models of death

A neat thing about Micromorts is that they also make a good and intuitive statistical model for age at death for humans. Let’s consider this very simple model based on the “game” described in the previous paragraph. Every day you play this game, with a certain risk of dying (for sake of simplicity, let’s forget about modelling childhood and only concentrate on adult life):

Between the ages of 20 and 80, the risk is Floor(age / 10) – 1 micromorts (for example all days of your 26th year, you face a 1 micromort risk, all days in you 63rd year, you face 5 micromorts)
At age 80, the risk jumps to 100 micromorts, and then each year you have to add 50 additional micromorts (which means 150 micromorts at age 81, 200 micromorts at age 82, etc.)

Fun #Statistics fact: death age distribution (of adults) can be modeled simply with:

20-29: 1 micromort / day
30-39: 2
40-49: 3
50-59: 4
60-69: 5
70-79: 6
80: 100
81: 150
82: 200
…
100: 1200#DataScience
— NC233 (@nc233) December 27, 2020

We can run a few simulations in R to see what life expectancy looks like with this simple model. First let’s define a vector of the risks that match the model we just described:

max_age <- 300
min_age <- 20
age_cut <- 80
risks1 <- rep(1:7, each=365*10)
risks2 <- rep(((age_cut : max_age) - age_cut) * 50 + 100, each=365)
risks <- c(risks1, risks2) / (1e6)

Then we can run a few simulations to get a vector of age at death for 10 000 people playing this “game”. Note that all simulated values use the vectorization capabilities of the function rbinom:

N <- length(risks)
N_sims <- 1e4

days_sims <- matrix(rbinom(N_sims*N, 1, risks), ncol = N_sims, byrow = F)

death_ages_days <- apply(days_sims, 2, function(x) { 
    day_death <- which.max(x) 
    if(day_death > 1) {
      return(day_death / 365 + min_age)
    } else {
      return(max_age)
    } 
  })

The mean and max age at death are:

> mean(death_ages_days)
[1] 84.9348
> max(death_ages_days)
[1] 103.0274

Not too far from what we observe in most Western countries! For example, life expectancy for Canadian women in 2018 was 84.3 years, and that same year, the oldest Canadian man alive was 109 years old.

We can plot the distribution of age at death:

library(ggplot2)

model_plot <- 
   ggplot(data.frame(age=death_ages_days)) +   
     geom_histogram(aes(x=age, y=..density..), fill="#4b86b4") + 
     geom_density(aes(x=age, y=..density..), colour="#2a4d69") +
     xlab("Age at death") +
     ylab("Frequency") +
     theme_bw()
 print(model_plot)

In addition to being close to the actual demographic values, the most interesting property of this model is that it was able to generate a light tailed distribution.

A light tail distribution is one that quickly falls down for the highest values. Contrary to statistical distributions like revenue where extreme values (i.e. values that are several standard deviations greater than the mean) are routinely observed, extreme values are very unlikely when it comes to human life. The oldest person we know of reached 122 years of age and people who make it past 100 years old are still a tiny minority. It is extremely unlikely that we would observe someone living to be 200 or even 150 years old.
Even a good old normal distribution would yield extreme values more often (a distribution is said to have “light” and “fat” tails if extreme values are less (resp. more) likely to happen than with a normal distribution). We can see this on this plot from a StackExchange post where life expectancy distribution is plotted against the best normal fit:

Histogram of life expectancy vs. a normal distribution. We can see that extreme values of human life are far less more likely to happen than predicted by the normal distribution

One of the best statistical distributions one can use to model human life is the Weibull distribution. It is a variant of the exponential distribution, that is often studied in high school and is typically used to model memory-less failures (i.e. a probability of failing that is independent of time). The Weibull distribution is very similar except the failure rate increases with time, mimicking an aging process. The statistical model I used for this article is in fact very close to how the Weibull distribution works.

Do you like data science and would be interested in building products that help entrepreneurs around the world start and grow their businesses? Shopify is hiring 🎉 for many locations in Canada 🇨🇦 and around the world 🌎! Feel free to reach out directly to me 🙂

Edit: A previous version of the article didn’t feature the tweet illustrating the simple death model, and had typos in two micromorts numbers that were corrected

[Science] Looking for a new holiday or When does day length increase the fastest?

We’re now in the midst of the winter (at least in the northern hemisphere). Our Parisian winter, although not very cold, is quite rainy and cloudy, and it always depresses me a little no to see the sun for several days… Fortunately, at this the time of the year, every next day we can experience more daylight, reminding us we’ve never been closer to the next summer! This made me wonder: when does the length of day increase the most? My plan was to find out which day of winter it happened, and make it my own holiday, which I would celebrate by opening a bottle of Côte Rotie 2003 I was sparing for a special occasion.

Before I went on with the math, I told a few friends about my question, and they all told me they thought the day when day length increases the most would likely be the vernal equinox. But it didn’t seem that obvious to me! Sure, on March the 21st, the length of daytime will be equal everywhere on our planet, but that does not mean it has to be the same day that day length increases the most, does it? Let’s see who’s right!

Let’s do math!

As I’m not a physicist, I need an equation describing the length of daytime with respect to some parameters (including, hopefully, the latitude of the city where I live!). Fortunately, Robert Ferreol has already done a great job finding such an equation. For latitudes below the arctic circle (which is approximately 66.566667°), day length writes, in hours:
$D = \frac{24}{\pi} \arccos \left( – \tan \lambda \cdot \tan \left( \arcsin \left( \sin \delta \cdot \cos \left( \frac{2\pi}{365} (r-172) \right) \right) \right) \right)$

with:

$\begin{align*}
\lambda &= \text{latitude} \\
\delta &= \text{Earth axial’s tilt} \\
r &= \text{day of the year, } \in [0,365]
\end{align*}$

Sure, this equation is a tad complicated, by it doesn’t really matter if your trigonometry lessons seem a bit far, you just need to know how to read simple graphs to understand what follows.

We can plot the day length function, for example for Paris (latitude: 48.8567°).

I’m quite satisfied with this graph: maximum happens around day number 172, which happens to be the summer solstice, and minimum happens around day number 355, which is December 21st, aka the winter solstice. We can also read from our graph that day length on day number 1 (January 1st) is a little more than 8 hours, which matches this more precise time table.
We can also plot our equation for different latitudes. For example, at the equator (latitude 0°), we have:

Again, this is what we expected, as day length when you’re located on the equator is the same across the year.

Studying the derivative

Great, it seems that we now have a nice equation describing day length with good accuracy. But, what we really want is the variations of day length. In math terms, it means we want to study the derivative of the day length function. Sure, I could compute the derivative by hand, but I am a little lazy (and with a function this ugly, it’s very likely that I would have to start over a few times before getting it right), so I’m going to ask my computer to do it instead. This is called symbolic computation. People in science usually use software like Maple or Mupad, but they’re expensive non-free software, and anyway I have a huge crush on Python, so I will use the excellent Sympy:

import math
from sympy import *
from sympy.plotting import *

r = symbols('r') # This is the variable of my function
lat = math.radians(48.8567) # Paris
delta = math.radians(23.4373408135) # Earth's axial tilt

lengthDay = 24/(math.pi) * acos(-tan(lat) * tan( asin( sin(delta) * cos(2*(math.pi)/365*(r-172)) ) ) )

# Compute the derivative
dLengthDay = diff(lengthDay, r)

print dLengthDay # We print the analytical form of the derivative (...it is as complicated as expected!)
plot(dLengthDay, (r, 0, 365))

I’m not really interested in the closed form of the derivative, but the plot should be very informative. In fact, for Paris, it gives:

So, what is happening here? If you remember your calculus lessons correctly, when the derivative is greater than 0, the function increases, and when it’s lesser than 0, the function decreases. So it’s not a big surprise to see that the derivative is above the x’s axis before r = 172 (the summer solstice), and below the x’s axis between r = 172 and r = 355 (ie between the two solstices). In general, the value of the derivative gives the rate at which the function increases, which means the day we’re looking for has to be the one where the maximum of the derivative happens. In our graph, it seems to be around r = 80, but it’s not very easy to identify a clear maximum graphically. So, what is the best method to compute the maximum of a smooth function like this one? Yep, we derive it (yet again you might add. To be more specific, the day when day length increases the most is called an inflection point, which is when the second derivative equals 0).

# Second derivative
d2LengthDay = diff(dLengthDay, r)
print d2LengthDay
plot(d2LengthDay, (r, 0, 365))

The plot of the second derivative for Paris is:

To identify the day when the derivative is zero, we can do it numerically with a Newton method (seeded with r=80, which visually seems to be a good approximation of the zero):

print nsolve(d2LengthDay, 80)

The nsolve returns:

80.7499999999996

Now it’s clear that our maximum is unique, and happens around day number 80(*), which is… March 21st, the vernal equinox! Darn, it seems my friends were right!

(*) If you wonder why I turn 80.749999 into “day number 80” and not “day number 81”, check out the last paragraph of this post.

Wait a minute, what about other latitudes?

I was not entirely satisfied with this answer, so I started making tests for other latitudes. The shape of the derivative was very similar to the one I got with Paris, until I started tests with northern latitudes. For example, at 63.5°N, we get:

What does this graph mean? It appears that: for r = 80 (spring equinox), we now have a local minimum of the derivative instead of the maximum we had in Paris. The are now two local maxima, one happening between winter and spring, and the other one happening between spring and summer.
We can check this by plotting the second derivative: if we’re right, it will show three zeros between r=1 and r=172:

So it seems we were right! Zeros seem to be around 40, 80 and 120. But we can compute the dates of these zeros more precisely. For example, for Fairbanks, Alaska (latitude = 64.843611°), we write:

print nsolve(d2LengthDay, 50)
print nsolve(d2LengthDay, 80)
print nsolve(d2LengthDay, 120)

and get:

39.8279585264938
80.7499999999996
121.672041473505

The first one (r approximately equals 40), means that February, 9th is (one of) the inflection points we were looking for. To ensure that my equation was accurate enough, I compared my result with an astronomical time table. If you look at the “daylength” column, you will see that March 21st is indeed a local minimum, and that there are two local maxima for day length.

Computing the limit latitude

After a few tests, I started realizing that for very latitude above approximately 62°N, there were always two maxima, whereas for every latitude below, the only maximum that existed was around spring. This probably means that there exists a limit latitude, above which there are always two maxima, and below which always only one.
To compute (numerically) the limit latitude, we can write a quick and simple binary search:

def searchLimitLatitude(lat, count=0, lastBelow = 0, lastAbove = math.radians(66.56278), latBackup = lat):

	print "Test for latitude = ", math.degrees(lat)
	print "Iteration number: ", count

	lengthDay = 24/(math.pi) * acos(-tan(lat) * tan( asin( sin(delta) * cos(2*(math.pi)/365*(r-172)) ) ) )
	dLengthDay = diff(lengthDay, r)
	d2LengthDay = diff(dLengthDay, r)

	# How many distinct maxima are there?
	max1 = nsolve(d2LengthDay, 50)
	max2 = nsolve(d2LengthDay, 80)

	if(count >=1 and math.fabs(lat - latBackup) <= 1e-7 ): # we ask for 7-digit precision
		print "Finished, limit latitude is: ", math.degrees(lat)
		return lat

	if( math.fabs(max1 - max2) <= 1e-16 ):
		searchLimitLatitude(0.5 * (lastAbove + lat), count+1, lat, lastAbove, lat)
	else:
		searchLimitLatitude(0.5 * (lastBelow + lat), count+1, lastBelow, lat, lat)

print "Search limit latitude...."
lat = math.radians(50) # Initialization
searchLimitLatitude(lat)

Running this program gives:

Iteration number:  22
Finished, limit latitude is:  61.2497155042

As expected, the binary search converged very quickly (just 22 iterations to give a latitude precise within 7 digits!). Finally, we have a value for our limit latitude:

\lambda_{lim} \approx 61.2497 = 61 $^{\circ}$ 14 ‘ 59 ”

which, according to wikipedia, is just above Anchorage, Alaska and slightly below Tampere, Finland

Conclusion

My friends were partly right: I won’t be able to act snob and celebrate my own holiday. In fact, where I live (Paris, France), inflection point happens on the spring equinox, which many cultures have already been celebrating for centuries.
But, if you live in Fairbanks, Alaska, I suggest you call your boss to let him know you won’t be coming to work on February, 9th (which happens to be the day this is posted!);)

A few more precisions

You might wonder why when we found an inflection point on date 80.74999 for Paris, I said the corresponding day was #80 (March 21st), and when we found 39.8279 for Fairbanks, I said the corresponding day was #40. In fact, we reach here the limits of our simplified equation. One of the assumptions we made was to consider that the Sun was located at infinite distance from Earth (ie that the Sun was just a tiny point in the sky). Of course, this is not true, and it can lead to small errors if we stick to our equation. For example, on the vernal equinox at the North Pole, day length is really about 6 minutes longer than our equation predicts. In order to decide which day was really the day of the maximum increase in day length, I had to “cheat” a little (because I needed a more precise equation) by checking an astronomical table for these locations.

Something also worth noting is that there is also a slight modification of Earth’s tilt axis every year, which means the second derivative (and thus the inflection point) also changes every year. I don’t really have time for more than a few tests, but location of the inflection point seems to be very sensitive the parameters of the equation. This means that inflection point in Fairbanks could happen on very different dates each year.

EDIT : As Boris pointed out in the comments, I could have dealt with my problem more simply, by writing day length as a discrete series of 365 days, instead of deriving a continuuous function. With the continuuous method, day number “39.9” doesn’t make much sense, and I couldn’t decide whether it was February, 8th (#39) or February, 9th (#40). Using finite sequences, I needn’t have used the astronomical table for Fairbanks, for the day I was looking for would just have been the max of the delta series.