R [english] – NC233

Rugby World Cup explainer using data

September 20, 2023 Antoine Rebecq

Last week, a stereotypical “French” ceremony opened the 10th Rugby World Cup in Stade de France, in the suburbs of Paris, France. As a small boy growing up in the southern half of France, I developed a strong interest for the sport. Now being an adult living and working in North America, where barely anyone has ever heard the word “Rugby”, I now rarely have anyone else to talk to about Antoine Dupont’s (captain of the French team and best…

Read More Read More

Using R to build predictions for UEFA Euro 2020

June 15, 2021 Antoine Rebecq

Last friday, Euro 2020, one of the biggest events in International soccer, was kicked off by the inaugural match between Italy and Turkey (Italy won it 3-0). Euros (short for European Championships) are usually held every 4 years, but because of he-who-must-not-be-named, last year’s edition was postponed to this summer, while keeping the name “Euro 2020” (much like the Tokyo Olympics). 4 5 years ago, for Euro 2016, I basically wanted to try some cool methods based on splines on…

Read More Read More

Have you checked your features distributions lately?

April 14, 2021 Antoine Rebecq

tl;dr Trying to debug a poorly performing machine learning model, I discovered that the distribution of one of the features varied from one date to another. I used a simple and neat affine rescaling. This simple quality improvement brought down the model’s prediction error by a factor 8 Data quality trumps any algorithm I was recently working on a cool dataset that looked unusually friendly. It was tidy, neat, interesting… the kind of things that you rarely encounter in the wild!…

Read More Read More

Creating an hex map of France electricity consumption

June 2, 2020 Thomas M

The French Ministry for the Ecological and Inclusive Transition (for which I’m currently working) is ongoing a process of opening data related to energy consumption. Each year, we publish data for every neighborhood in France (at the iris statistical level, even adresses in some cases) and to the nature of the final consumer (a household, an industry, a shop…). These data are available here (website in French – direct link to 2018 electricity consumption data). Making a map to have…

Read More Read More

Causal Inference cheat sheet for data scientists

April 29, 2020 Antoine Rebecq

Being able to make causal claims is a key business value for any data science team, no matter their size.Quick analytics (in other words, descriptive statistics) are the bread and butter of any good data analyst working on quick cycles with their product team to understand their users. But sometimes some important questions arise that need more precise answers. Business value sometimes means distinguishing what is true insights from what is incidental noise. Insights that will hold up versus temporary marketing…

Read More Read More

Micromorts – how much risk of death would you accept?

March 8, 2020 Antoine Rebecq

A micromort is one in a million chance of dying – it is equivalent to tossing 20 coins and getting 20 heads

The Mrs. White probability puzzle

April 28, 2019 Antoine Rebecq

tl;dr -I don’t remember how many games of Clue I’ve played but I do remember being surprised by Mrs White being the murderer in only 2 of those games. Can you give an estimate and an upper bound for the number of games I have played?We solve this problem by using Bayes theorem and discussing the data generation mechanism, and illustrate the solution with R. Making use of external information with Bayes theorem Having been raised a frequentist, I first…

Read More Read More

Ranking places with Google to create maps

March 11, 2019 Thomas M

Today we’re going to use the googleway R package, which allows their user to do requests to the GoogleMaps Places API. The goal is to create maps of specific places (restaurants, museums, etc.) with information from Google Maps rankings (number of stars given by other people). I already discussed this in french here to rank swimming pools in Paris. Let’s start by loading the three libraries I’m going to use : googleway, leaflet to create animated maps, and RColorBrewer for…

Read More Read More

Weighting tricks for machine learning with Icarus – Part 1

July 5, 2018 Antoine Rebecq

Calibration in survey sampling is a wonderful tool, and today I want to show you how we can use it in some Machine Learning applications, using the R package Icarus. And because ’tis the season, what better than a soccer dataset to illustrate this? The data and code are located on this gitlab repo: https://gitlab.com/haroine/weighting-ml First, let’s start by installing and loading icarus and nnet, the two packages needed in this tutorial, from CRAN (if necessary): install.packages(c(“icarus”,”nnet”)) library(icarus) library(nnet) Then…

Read More Read More

NC233

Sampling and data tinkering

Browsed by
Category: R [english]

Rugby World Cup explainer using data

September 20, 2023 Antoine Rebecq

Using R to build predictions for UEFA Euro 2020

June 15, 2021 Antoine Rebecq

Have you checked your features distributions lately?

April 14, 2021 Antoine Rebecq

Creating an hex map of France electricity consumption

June 2, 2020 Thomas M

Causal Inference cheat sheet for data scientists

April 29, 2020 Antoine Rebecq

Micromorts – how much risk of death would you accept?

March 8, 2020 Antoine Rebecq

The Mrs. White probability puzzle

April 28, 2019 Antoine Rebecq

Ranking places with Google to create maps

March 11, 2019 Thomas M

Weighting tricks for machine learning with Icarus – Part 1

July 5, 2018 Antoine Rebecq