How to (quickly) enrich a map with natural and anthropic details

In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.


Thursday, 9 August 2018


Twitter: frbailo




  • Statistics Sunday: Using Text Analysis to Become a Better Writer
    Using Text Analysis to Become a Better Writer We all have words we love to use, and that we perhaps use too much. As an example: I have a tendency to use the same transitional statements, to the point that, before I submit a manuscript, I do a find all to see how many times […]
  • Clustered Covariances in sandwich 2.5-0
    Version 2.5-0 of the R package 'sandwich' is available from CRAN now with enhanced object-oriented clustered covariances (for lm, glm, survreg, polr, hurdle, zeroinfl, betareg, ...). The software and corresponding vignette have been improved ...
  • Ordered Probit Model and Price Movements of High-Frequency Trades
    The analysis of high frequency stock transactions has played an important role in the algorithmic trading and the result can be used to monitor stock movements and to develop trading strategies. In the paper “An Ordered Probit Analysis of Transaction Stock Prices” (1992), Hausman, Lo, and MacKinlay discussed estimating trade-by-trade stock price changes with the […]
  • More Practical Data Science with R Book News
    Some more Practical Data Science with R news. Practical Data Science with R is the book we wish we had when we started in data science. Practical Data Science with R, Second Edition is the revision of that book with the packages we wish had been available at that time (in particular vtreat, cdata, and […]
  • Mapping the Prevalence of Alzheimer Disease Mortality in the USA
    In comparison with other statistical software (e.g., SAS, STATA, and SPSS), R is the best for data visualization. Therefore, in all posts I have written for DataScience+ I take advantage of R and make plots using ggplot2 to visualize all the findings. For example, previously I plotted the percentiles of body mass index in the […]

RSS Simply Statistics

  • The Law and Order of Data Science
    One conversation I’ve had a few times revolves around the question, “What’s the difference between science and data science?” If I were to come up with a simple distinction, I might say that Science starts with a question; data science starts with the data. What makes data science so difficult is that it starts in […]
  • The Trillion Dollar Question
    Recently, Apple’s stock price rose to the point where the company’s market valuation was above $1 trillion, the first U.S. company to reach that benchmark. Subsequently, numerous articles were published describing Apple’s journey to this point and why it got there. Most people describe Apple as a technology company. They make technology products: iPhones, iPads, […]
  • Why I Indent My Code 8 Spaces
    Jenny Bryan recently gave a wonderful talk at the Use R! 2018 meeting in Brisbane about “Code Smells and Feels” (I recommend you watch a video of that talk). Her talk covers various ways to detect when your code “smells” and how to fix those smells through refactoring. While there is quite a bit of […]

RSS Statistical Modeling, Causal Inference, and Social Science

  • The competing narratives of scientific revolution
    Back when we were reading Karl Popper’s Logic of Scientific Discovery and Thomas Kuhn’s Structure of Scientific Revolutions, who would’ve thought that we’d be living through a scientific revolution ourselves? Scientific revolutions occur on all scales, but here let’s talk about some of the biggies: 1850-1950: Darwinian revolution in biology, changed how we think about […]
  • Let’s get hysterical
    Following up on our discussion of hysteresis in the scientific community, Nick Brown points us to this article from 2014, “Excellence by Nonsense: The Competition for Publications in Modern Science,” by Mathias Binswanger, who writes: To ensure the efficient use of scarce funds, the government forces universities and professors, together with their academic staff, to […]
  • The fallacy of the excluded middle — statistical philosophy edition
    I happened to come across this post from 2012 and noticed a point I’d like to share again. I was discussing an article by David Cox and Deborah Mayo, in which Cox wrote: [Bayesians’] conceptual theories are trying to do two entirely different things. One is trying to extract information from the data, while the […]