Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.

Thursday, 17 January 2019

2018 Italian general election: Details on my simulation

This article describes the simulation behind the app that you find here

This simulation of the results for the 2018 general election is based on the results from the last two national elections (the Italian parliament election in 2013 and the European Parliament election 2014) and national polls conducted until 16 February 2018. The simulation is based on one assumption, which is reasonable but not necessarily realistic: the relative territorial strength of parties is stable. From this assumption derives that if the national support for a party (as measured by national voting intention polls) varies, it varies consistently and proportionally everywhere. A rising tide lifts all boats and vice versa. The assumption has some empirical justification. If we compare the difference from the national support (in percentage) for each district in 2013 and 2014 we see a significant correlation, especially in the major parties.

Votes to party in the 2018 Chamber districts


Tuesday, 27 February 2018

NDVI, risk assessment and developing countries

The Normalized Difference Vegetation Index (NDVI) estimates the greenness of plants covering the surface of the Earth by measuring the light reflected by the vegetation into space. The main idea behind the NDVI is that visible and near-infrared light is absorbed in different proportions by healthy and unhealthy plants: a green plant will reflect 50% of the near infrared-light it receives and only 8% of the visible light while an unhealthy plant will reflect respectively 40% and 30%. NDVI can then be used to quantitatively compare vegetation conditions across time and space (and indeed is quite widely used, a Google Scholar search on NDVI produced 60,500 hits).


Thursday, 14 February 2013


Twitter: frbailo




  • Handling & Sharing PCAPs Like a Boss with PacketTotal
    The fine folks over at @PacketTotal bequeathed an API token on me so I cranked out an R package for it to enable more dynamic investigations work (RStudio makes for an amazing incident responder investigations console given that you can script in multiple languages, code in C[++], and write documentation all at the same time... […]
  • Code and Data in a large Machine Learning project
    We did a large machine learning project at work recently. It involved two data scientists, two backend engineers and a data engineer, all working on-and-off on the R code during the project. The project had many interesting and new aspects to me, among them are doing data science in an agilish way, how to keep […]
  • RQuantLib 0.4.8: Small updates
    A new version 0.4.8 of RQuantLib reached CRAN and Debian. This release was triggered by a CRAN request for an update to the script which was easy enough (and which, as it happens, did not result in changes in the configure script produce...
  • Rcpp 1.0.1: Updates
    Following up on the 10th anniversary and the 1.0.0. release, we excited to share the news of the first update release 1.0.1 of Rcpp. package turned ten on Monday—and we used to opportunity to mark the current version as 1.0.0! It arrived at CRAN ov...
  • wrapr::let()
    I would like to once again recommend our readers to our note on wrapr::let(), an R function that can help you eliminate many problematic NSE (non-standard evaluation) interfaces (and their associate problems) from your R programming tasks. The idea is to imitate the following lambda-calculus idea: let x be y in z := ( λ […]

RSS Simply Statistics

  • 10 things R can do that might surprise you
    Over the last few weeks I’ve had a couple of interactions with folks from the computer science world who were pretty disparaging of the R programming language. A lot of the critism focused on perceived limitations of R to statistical analysis. It’s true, R does have a hugely comprehensive list of analysis packages on CRAN, […]
  • Open letter to journal editors: dynamite plots must die
    Statisticians have been pointing out the problem with dynamite plots, also known as bar and line graphs, for years. Karl Broman lists them as one of the top ten worst graphs. The problem has even been documented in the peer reviewed literature. For example, this British Journal of Pharmacology paper titled Show the data, don’t […]
  • Interview with Stephanie Hicks
    Editor’s note: For a while we ran an interview series for statisticians and data scientists, but things have gotten a little hectic around here so we’ve dropped the ball! But we are re-introducing the series, starting with Stephanie Hicks. If you have recommendations of a (junior) person in academics or industry you would like to […]

RSS Statistical Modeling, Causal Inference, and Social Science