Sampling – NC233

Graph sampling for machine learning at Montreal AI Symposium

September 6, 2019 Antoine Rebecq

I’ll be at the 2019 Montreal AI Symposium today, presenting a poster about Network sampling and an application to Machine Learning: Network sampling and applications to big data and machine learning from Antoine Rebecq Featured image: Montreal Skyline, by Taxiarchos228

[Sampling] Présentation à Ottawa – une nouvelle base pour les enquêtes de l’INSEE

November 8, 2018 Thomas M

Demain (jeudi 8 novembre), je donnerai une présentation au Symposium de méthodologie de Statistiques Canada sur la mise en place du nouveau système d’échantillonnage de l’INSEE pour les enquêtes auprès des ménages et des individus à partir des sources fiscales. Ce changement de base apporte de nouvelles opportunités (nouvelles variables, nouveaux moyens de contact, meilleure coordination des enquêtes) mais aussi des défis (concordance des concepts, gestion du champ de la base administrative). Les acétates sont ci-dessous :

[Sampling] Big data and sampling in Ottawa

November 6, 2018 Antoine Rebecq

Tomorrow (November 7th), I’ll give a talk at the Statistics Canada Symposium on survey sampling and big data. I’ll show how techniques that were developed at official statistics institutes can now be used in the context of big data and machine learning, and add a lot of value. I’ll show some examples with: A/B testing Tracking design Calibration in Machine Learning Network analysis User feedback Bring survey sampling techniques into big data de Antoine Rebecq And really glad to…

Read More Read More

Weighting tricks for machine learning with Icarus – Part 1

July 5, 2018 Antoine Rebecq

Calibration in survey sampling is a wonderful tool, and today I want to show you how we can use it in some Machine Learning applications, using the R package Icarus. And because ’tis the season, what better than a soccer dataset to illustrate this? The data and code are located on this gitlab repo: https://gitlab.com/haroine/weighting-ml First, let’s start by installing and loading icarus and nnet, the two packages needed in this tutorial, from CRAN (if necessary): install.packages(c(“icarus”,”nnet”)) library(icarus) library(nnet) Then…

Read More Read More

Comment annoncer les résultats des élections à 20h ?

April 30, 2017 Thomas M

Il y a une semaine quasiment jour pour jour, dimanche 23 avril à 20h, les résultats du premier tour de l’élection présidentielle ont été annoncés sur les plateaux des grandes chaînes, TF1 ou France Télévisions par exemple. Pour donner ce résultat, il n’est pas envisageable d’attendre les remontées officielles, qui n’arrivent que tard dans la nuit, une fois que tous les bureaux ont été dépouillés. D’autre part, il ne serait pas très pertinent de récupérer les résultats au fur et…

Read More Read More

Marges d’erreurs, approche modèle et sondages

April 22, 2017 Antoine Rebecq

Si cette élection présidentielle aura permis quelque chose, c’est bien d’avoir des discussions intéressantes sur les sondages ! Cette course à quatre est inédite dans l’histoire de la Vème République, et avec les grosses surprises de l’actualité récente (Trump et Brexit), il est normal de s’interroger sur l’incertitude réelle contenue dans ces données de sondages. Je propose donc de parler aujourd’hui des “marges d’erreurs” (dits aussi “intervalles de confiance à 95%”) qui ont pour but de quantifier cette incertitude. Je…

Read More Read More

Les sondeurs se copient, vraiment ? (le herding)

April 20, 2017 Antoine Rebecq

Un tweet de Nate Silver posté ce lundi semble avoir déchaîné les passions de nombreux observateurs : I continue to worry about the lack of variation in French election polls. Polls shouldn't be this consistent unless there's massive herding. pic.twitter.com/Xgd8dNUytN — Nate Silver (@NateSilver538) April 17, 2017 Dans ce gazouillis, Nate Silver (célèbre analyste statistique américain, rédacteur en chef du site fivethirtyeight.com) remarque que les estimations des intentions de vote par les instituts de sondage français sont assez proches les…

Read More Read More

Sampling graphs – MAD-Stat Seminar at Toulouse School of Economics

March 22, 2017 Antoine Rebecq

Tomorrow (march 23rd), I’ll be presenting my work on sampling designs for graph (and particularly extension sampling designs, with an application to Twitter data) at the MAD Stat seminar of the Toulouse School of Economics. Here are my slides: Sampling graphs efficiently – MAD Stat (TSE) from Antoine Rebecq

Announcing Icarus v0.3

March 7, 2017 Antoine Rebecq

This weekend I released version 0.3.0 of the Icarus package to CRAN. Icarus provides tools to help perform calibration on margins, which is a very important method in sampling. One of these days I’ll write a blog post explaining calibration on margins! In the meantime if you want to learn more, you can read our course on calibration (in French) or the original paper of Deville and Sarndal (1992). Shortly said, calibration computes new sampling weights so that the sampling estimates match…

Read More Read More

A winning strategy at the lottery

January 17, 2017 Antoine Rebecq

tl;dr – It is possible to construct a winning strategy at the lottery by choosing the numbers that other people rarely select. We discuss this and prove it on a small example. There are many things I don’t like with so-called math reasoning and lotteries, and I wanted to write about it for a very long time. So, on the one hand we have the classic scammers who try to sell you the “most probable numbers” (or alternatively the “numbers that are…

Read More Read More

NC233

Sampling and data tinkering

Browsed by
Category: Sampling

Graph sampling for machine learning at Montreal AI Symposium

September 6, 2019 Antoine Rebecq

[Sampling] Présentation à Ottawa – une nouvelle base pour les enquêtes de l’INSEE

November 8, 2018 Thomas M

[Sampling] Big data and sampling in Ottawa

November 6, 2018 Antoine Rebecq

Weighting tricks for machine learning with Icarus – Part 1

July 5, 2018 Antoine Rebecq

Comment annoncer les résultats des élections à 20h ?

April 30, 2017 Thomas M

Marges d’erreurs, approche modèle et sondages

April 22, 2017 Antoine Rebecq

Les sondeurs se copient, vraiment ? (le herding)

April 20, 2017 Antoine Rebecq

Sampling graphs – MAD-Stat Seminar at Toulouse School of Economics

March 22, 2017 Antoine Rebecq

Announcing Icarus v0.3

March 7, 2017 Antoine Rebecq

A winning strategy at the lottery

January 17, 2017 Antoine Rebecq