Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.

Thursday, 17 January 2019

2018 Italian general election: Details on my simulation

This article describes the simulation behind the app that you find here

This simulation of the results for the 2018 general election is based on the results from the last two national elections (the Italian parliament election in 2013 and the European Parliament election 2014) and national polls conducted until 16 February 2018. The simulation is based on one assumption, which is reasonable but not necessarily realistic: the relative territorial strength of parties is stable. From this assumption derives that if the national support for a party (as measured by national voting intention polls) varies, it varies consistently and proportionally everywhere. A rising tide lifts all boats and vice versa. The assumption has some empirical justification. If we compare the difference from the national support (in percentage) for each district in 2013 and 2014 we see a significant correlation, especially in the major parties.

Votes to party in the 2018 Chamber districts


Tuesday, 27 February 2018

NDVI, risk assessment and developing countries

The Normalized Difference Vegetation Index (NDVI) estimates the greenness of plants covering the surface of the Earth by measuring the light reflected by the vegetation into space. The main idea behind the NDVI is that visible and near-infrared light is absorbed in different proportions by healthy and unhealthy plants: a green plant will reflect 50% of the near infrared-light it receives and only 8% of the visible light while an unhealthy plant will reflect respectively 40% and 30%. NDVI can then be used to quantitatively compare vegetation conditions across time and space (and indeed is quite widely used, a Google Scholar search on NDVI produced 60,500 hits).


Thursday, 14 February 2013


Twitter: frbailo




  • New Course Available Now: Machine Learning with Tidymodels
    New Course Available Now: Machine Learning with Tidymodels The ever increasing application of machine learning models in industry and academia requires tools which are easy to use and ensure a reliable model fitting process. The R package universe cov... The post New Course Available Now: Machine Learning with Tidymodels first appeared on R-bloggers.
  • Cluster Analysis in R
    Cluster Analysis in R, when we do data analytics, there are two kinds of approaches one is supervised and another is unsupervised. Clustering is... The post Cluster Analysis in R appeared first on finnstats. The post Cluster Analysis in R first appeared on R-bloggers.
  • Recidivism: Identifying the Most Important Predictors for Re-offending with OneR
    In 2018 the renowned scientific journal science broke a story that researchers had re-engineered the commercial criminal risk assessment software COMPAS with a simple logistic regression (Science: The accuracy, fairness, and limits of predicting recidivism). According to this article, COMPAS uses 137 features, the authors just used two. In this post, I ... The post […]
  • Webscraping Tables in R: Datapasta Copy-and-Paster
    This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Here are the links to get set up. 👇 Get the Code YouTube Tutorial (Click image to play tutorial) ... The post Webscraping Tables in R: Datapasta Copy-and-Paster first appeared on R-bloggers.
  • SwimmeR goes to the Para Games and other Updates – v0.9.0
    There’s a new version of SwimmeR available, v0.9.0. It follows v0.8.0, which I didn’t like and didn’t write about. I’ve made some improvements though and here we are. Rather than just telling you what’s in v0.9.0 I’m going to indulge myself and approach this ... The post SwimmeR goes to the Para Games and other […]

RSS Simply Statistics

  • Streamline - tidy data as a service
    Tldr: We started a company called Streamline Data Science that offers tidy data as a service. We are looking for customers, partnerships and employees as we scale up after closing our funding round! Most of my career, I have worked in the muck of data cleaning. In the world of genomics, a lot of […]
  • The Four Jobs of the Data Scientist
    In 2019 I wrote a post about The Tentpoles of Data Science that tried to distill the key skills of the data scientist. In the post I wrote: When I ask myself the question “What is data science?” I tend to think of the following five components. Data science is (1) the application of design […]
  • Palantir Shows Its Cards
    File this under long-term followup, but just about four years ago I wrote about Palantir, the previously secretive but now soon to be public data science company, and how its valuation was a commentary on the value of data science more generally. Well, just recently Palantir filed to go public and therefore submitted a registration […]

RSS Statistical Modeling, Causal Inference, and Social Science

  • Can you trust international surveys? A follow-up:
    Michael Robbins writes: A few years ago you covered a significant controversy in the survey methods literature about data fabrication in international survey research. Noble Kuriakose and I put out a proposed test for data quality. At the time there were many questions raised about the validity of this test. As such, I thought you […]
  • We’re hiring (in Melbourne)
    Andrew, Qixuan and I (Lauren) are hiring a postdoctoral research fellow to explore research topics around the use on multi-level regression and poststratification with non-probability surveys. This work is funded by the National Institutes of Health, and is collaborative work with Prof Andrew Gelman (Statistics and Political Science, Columbia University) and Assoc/Prof Qixuan Chen (Biostatistics, […]
  • Hierarchical modeling of excess mortality time series
    Elliott writes: My boss asks me: For our model to predict excess mortality around the world, we want to calculate a confidence interval around our mean estimate for total global excess deaths. We have real excess deaths for like 60 countries, and are predicting on another 130 or so. we can easily calculate intervals for […]