recogeo: A new R package to reconcile changing geographic boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain how to install the package, reconcile two spatial objects and check the results.


Friday, 1 February 2019

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.

Thursday, 17 January 2019

How to (quickly) enrich a map with natural and anthropic details

In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.


Thursday, 9 August 2018

Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.


Tuesday, 26 April 2016


Twitter: frbailo




  • Gold-Mining Week 12 (2019)
    Week 12 Gold Mining and Fantasy Football Projection Roundup now available. The post Gold-Mining Week 12 (2019) appeared first on Fantasy Football Analytics.
  • Package management: Using repositories in production systems
    Data science is characterized among other things using open source tools. An advantage when working with open source languages such as R or Python is the large package world. This provides tools for numerous use cases and problems through the development within huge communities. The packages are organized in digital online archives – so-called repositories. […]
  • Rammstein vs. Lacrimosa
    Some time ago, someone I follow on twitter posted about having to buy a whole book with rules to tease out grammatical gender in German. Further down the replies, someone reminisced about trying (and failing) to learn German just by listening to Rammstein’s lyrics. I studied about drei Jahre of German at the same time […]
  • BayesComp 20 [schedule]
    The schedule for the program is now available on the conference webpage of BayesComp 20, for the days of 7-10 Jan 2020. There are twelve invited sessions, including one j-ISBA session, and a further thirteen contributed sessions were selected by the scientific committee. And three tutorials on the first day. Looking forward seeing you in […]
  • Galton’s board all askew
    Since Galton’s quincunx has fascinated me since the (early) days when I saw a model of it as a teenager in an industry museum near Birmingham, I jumped on the challenge to build an uneven nail version where the probabilities to end up in one of the boxes were not the Binomial ones. For instance,  […]

RSS Simply Statistics

  • You can replicate almost any plot with R
    Although R is great for quickly turning data into plots, it is not widely used for making publication ready figures. But, with enough tinkering you can make almost any plot in R. For examples check out the flowingdata blog or the Fundamentals of Data Visualization book. Here I show five charts from the lay press […]
  • So You Want to Start a Podcast
    Podcasting has gotten quite a bit easier over the past 10 years, due in part to improvements to hardware and software. I wrote about both how I edit and record both of my podcasts about 2 years ago and, while not much has changed since then, I thought it might be helpful if I organized […]
  • The data deluge means no reasonable expectation of privacy - now what?
    Today a couple of different things reminded me about something that I suppose many people are talking about but has been on my mind as well. The idea is that many of our societies social norms are based on the reasonable expectation of privacy. But the reasonable expectation of privacy is increasingly a thing of […]

RSS Statistical Modeling, Causal Inference, and Social Science