recogeo: A new R package to reconcile changing geographic boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain how to install the package, reconcile two spatial objects and check the results.

(more…)

Friday, 1 February 2019

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.
(more…)

Thursday, 17 January 2019

How to (quickly) enrich a map with natural and anthropic details


In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.

(more…)

Thursday, 9 August 2018

Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.

(more…)

Tuesday, 26 April 2016

tweets


Twitter: frbailo

links


blogroll


RSS r-bloggers.com

  • COVID-19 shiny / plotly dashboard
    Governments and COVID-19: Which one stops it faster, better, has fewer people dying? These questions get answered with my dashboard. A contribution to the shiny-contest: https://community.rstudio.com/t/material-design-corona-covid-19-dashboard-2020-shiny-contest-submission/59690 Intro How did Corona spread? Using the animation feature of R-shiny this can be easily tracked.COVID-19 is the major topic in all news channels. The place I live in […]
  • RcppSimdJson 0.0.4: Even Faster Upstream!
    A new (upstream) simdjson release was announced by Daniel Lemire earlier this week, and my Twitter mentions have been running red-hot ever since as he was kind enough to tag me. Do look at that blog post, there is some impressive work in there. We wr...
  • C is for coalesce
    For the letter C, we'll talk about the coalesce function. If you're familiar with SQL, you may have seen this function before. It combines two or more variables into a single column, and is a way to deal with missing data. When you give it a list of va...
  • Introductory videos for Explanatory Model Analysis with R
    Remote teaching at my university encouraged me to prepare some video materials for Explanatory Model Analysis techniques, i.e. techniques of exploration, explanation and visualisation of predictive models.The pyramid for Explanatory Model Analysis. Lef...
  • Custom Power BI visual for Line chart with two Y-Axis
    Power BI support certain type of visuals that are by default available in the document. These are absolutely great and work perfectly fine, have a lot of capabilities to set properties and change the settings. But every so often in…Read more ›

RSS Simply Statistics

  • Is Artificial Intelligence Revolutionizing Environmental Health?
    NOTE: This post was written by Kevin Elliott, Michigan State University; Nicole Kleinstreuer, National Institutes of Health; Patrick McMullen, ScitoVation; Gary Miller, Columbia University; Bhramar Mukherjee, University of Michigan; Roger D. Peng, Johns Hopkins University; Melissa Perry, The George Washington University; Reza Rasoulpour, Corteva Agriscience, and Elizabeth Boyle, National Academies of Sciences, Engineering, and Medicine. […]
  • You can replicate almost any plot with R
    Although R is great for quickly turning data into plots, it is not widely used for making publication ready figures. But, with enough tinkering you can make almost any plot in R. For examples check out the flowingdata blog or the Fundamentals of Data Visualization book. Here I show five charts from the lay press […]
  • So You Want to Start a Podcast
    Podcasting has gotten quite a bit easier over the past 10 years, due in part to improvements to hardware and software. I wrote about both how I edit and record both of my podcasts about 2 years ago and, while not much has changed since then, I thought it might be helpful if I organized […]

RSS Statistical Modeling, Causal Inference, and Social Science

  • Noise-mining as standard practice in social science
    The following example is interesting, not because it is particularly noteworthy but rather because it represents business as usual in much of social science: researchers trying their best, but hopelessly foiled by their use of crude psychological theories and cruder statistics, along with patterns of publication and publicity that motivate the selection and interpretation of […]
  • Conference on Mister P online tomorrow and Saturday, 3-4 Apr 2020
    We have a conference on multilevel regression and poststratification (MRP) this Friday and Saturday, organized by Lauren Kennedy, Yajuan Si, and me. The conference was originally scheduled to be at Columbia but now it is online. Here is the information. If you want to join the conference, you must register for it ahead of time; […]
  • More coronavirus research: Using Stan to fit differential equation models in epidemiology
    Seth Flaxman and others at Imperial College London are using Stan to model coronavirus progression; see here (and I’ve heard they plan to fix the horrible graphs!) and this Github page. They also pointed us to this article from December 2019, Contemporary statistical inference for infectious disease models using Stan, by Anastasia Chatzilena et al. […]