Browsed by
Category: R

Rugby World Cup explainer using data

Rugby World Cup explainer using data

Last week, a stereotypical “French” ceremony opened the 10th Rugby World Cup in Stade de France, in the suburbs of Paris, France. As a small boy growing up in the southern half of France, I developed a strong interest for the sport. Now being an adult living and working in North America, where barely anyone has ever heard the word “Rugby”, I now rarely have anyone else to talk to about Antoine Dupont’s (captain of the French team and best…

Read More Read More

Using R to build predictions for UEFA Euro 2020

Using R to build predictions for UEFA Euro 2020

Last friday, Euro 2020, one of the biggest events in International soccer, was kicked off by the inaugural match between Italy and Turkey (Italy won it 3-0). Euros (short for European Championships) are usually held every 4 years, but because of he-who-must-not-be-named, last year’s edition was postponed to this summer, while keeping the name “Euro 2020” (much like the Tokyo Olympics). 4 5 years ago, for Euro 2016, I basically wanted to try some cool methods based on splines on…

Read More Read More

Have you checked your features distributions lately?

Have you checked your features distributions lately?

tl;dr Trying to debug a poorly performing machine learning model, I discovered that the distribution of one of the features varied from one date to another. I used a simple and neat affine rescaling. This simple quality improvement brought down the model’s prediction error by a factor 8 Data quality trumps any algorithm I was recently working on a cool dataset that looked unusually friendly. It was tidy, neat, interesting… the kind of things that you rarely encounter in the wild!…

Read More Read More

The Mrs. White probability puzzle

The Mrs. White probability puzzle

tl;dr -I don’t remember how many games of Clue I’ve played but I do remember being surprised by Mrs White being the murderer in only 2 of those games. Can you give an estimate and an upper bound for the number of games I have played?We solve this problem by using Bayes theorem and discussing the data generation mechanism, and illustrate the solution with R. Making use of external information with Bayes theorem Having been raised a frequentist, I first…

Read More Read More

Est-ce que cette piscine est bien notée ?

Est-ce que cette piscine est bien notée ?

J’ai pris la (mauvaise ?) habitude d’utiliser Google Maps et son système de notation (chaque utilisateur peut accorder une note de une à cinq étoiles) pour décider d’où je me rend : restaurants, lieux touristiques, etc. Récemment, j’ai déménagé et je me suis intéressé aux piscines environnantes, pour me rendre compte que leur note tournait autour de 3 étoiles. Je me suis alors fait la réflexion que je ne savais pas, si, pour une piscine, il s’agissait d’une bonne ou…

Read More Read More

Analyse de pronostics pour le Mondial 2018

Analyse de pronostics pour le Mondial 2018

On est les champions ! Si nous n’avons pas eu le temps de faire un modèle de prédiction pour cette coupe du monde de football 2018 (mais FiveThirtyEight en a fait un très sympa, voir ici), cela ne nous a pas empêché de faire un concours de pronostics entre collègues et ex-collègues statisticiens, sur le site Scorecast. Les résultats obtenus sont les suivants : Un autre système de points ? Le système de points utilisé par Scorecast est le suivant…

Read More Read More

Weighting tricks for machine learning with Icarus – Part 1

Weighting tricks for machine learning with Icarus – Part 1

Calibration in survey sampling is a wonderful tool, and today I want to show you how we can use it in some Machine Learning applications, using the R package Icarus. And because ’tis the season, what better than a soccer dataset to illustrate this? The data and code are located on this gitlab repo: https://gitlab.com/haroine/weighting-ml First, let’s start by installing and loading icarus and nnet, the two packages needed in this tutorial, from CRAN (if necessary): install.packages(c(“icarus”,”nnet”)) library(icarus) library(nnet) Then…

Read More Read More

A shiny app to convert sports scores

A shiny app to convert sports scores

I’m a huge sports fan, but I certainly don’t have extended knowledge about all team sports. Sometimes when I hear about scores in a sports I’m not quite “fluent” in, I wonder how they would translate in a sports I know better. I guess many people ask the same question from time to time. For instance, three years ago, many americans started wondering how the 7-1 blowout that happened during the World Cup semifinals would translate in basketball, football or…

Read More Read More

Announcing Icarus v0.3

Announcing Icarus v0.3

This weekend I released version 0.3.0 of the Icarus package to CRAN. Icarus provides tools to help perform calibration on margins, which is a very important method in sampling. One of these days I’ll write a blog post explaining calibration on margins! In the meantime if you want to learn more, you can read our course on calibration (in French) or the original paper of Deville and Sarndal (1992). Shortly said, calibration computes new sampling weights so that the sampling estimates match…

Read More Read More