recogeo: A new R package to reconcile changing geographic boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain how to install the package, reconcile two spatial objects and check the results.

(more…)

Friday, 1 February 2019

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.
(more…)

Thursday, 17 January 2019

How to (quickly) enrich a map with natural and anthropic details


In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.

(more…)

Thursday, 9 August 2018

Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.

(more…)

Tuesday, 26 April 2016

tweets


Twitter: frbailo

links


blogroll


RSS r-bloggers.com

  • New Course Available Now: Machine Learning with Tidymodels
    New Course Available Now: Machine Learning with Tidymodels The ever increasing application of machine learning models in industry and academia requires tools which are easy to use and ensure a reliable model fitting process. The R package universe cov... The post New Course Available Now: Machine Learning with Tidymodels first appeared on R-bloggers.
  • Cluster Analysis in R
    Cluster Analysis in R, when we do data analytics, there are two kinds of approaches one is supervised and another is unsupervised. Clustering is... The post Cluster Analysis in R appeared first on finnstats. The post Cluster Analysis in R first appeared on R-bloggers.
  • Recidivism: Identifying the Most Important Predictors for Re-offending with OneR
    In 2018 the renowned scientific journal science broke a story that researchers had re-engineered the commercial criminal risk assessment software COMPAS with a simple logistic regression (Science: The accuracy, fairness, and limits of predicting recidivism). According to this article, COMPAS uses 137 features, the authors just used two. In this post, I ... The post […]
  • Webscraping Tables in R: Datapasta Copy-and-Paster
    This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Here are the links to get set up. 👇 Get the Code YouTube Tutorial (Click image to play tutorial) ... The post Webscraping Tables in R: Datapasta Copy-and-Paster first appeared on R-bloggers.
  • SwimmeR goes to the Para Games and other Updates – v0.9.0
    There’s a new version of SwimmeR available, v0.9.0. It follows v0.8.0, which I didn’t like and didn’t write about. I’ve made some improvements though and here we are. Rather than just telling you what’s in v0.9.0 I’m going to indulge myself and approach this ... The post SwimmeR goes to the Para Games and other […]

RSS Simply Statistics

  • Streamline - tidy data as a service
    Tldr: We started a company called Streamline Data Science https://streamlinedatascience.io/ that offers tidy data as a service. We are looking for customers, partnerships and employees as we scale up after closing our funding round! Most of my career, I have worked in the muck of data cleaning. In the world of genomics, a lot of […]
  • The Four Jobs of the Data Scientist
    In 2019 I wrote a post about The Tentpoles of Data Science that tried to distill the key skills of the data scientist. In the post I wrote: When I ask myself the question “What is data science?” I tend to think of the following five components. Data science is (1) the application of design […]
  • Palantir Shows Its Cards
    File this under long-term followup, but just about four years ago I wrote about Palantir, the previously secretive but now soon to be public data science company, and how its valuation was a commentary on the value of data science more generally. Well, just recently Palantir filed to go public and therefore submitted a registration […]

RSS Statistical Modeling, Causal Inference, and Social Science

  • Can you trust international surveys? A follow-up:
    Michael Robbins writes: A few years ago you covered a significant controversy in the survey methods literature about data fabrication in international survey research. Noble Kuriakose and I put out a proposed test for data quality. At the time there were many questions raised about the validity of this test. As such, I thought you […]
  • We’re hiring (in Melbourne)
    Andrew, Qixuan and I (Lauren) are hiring a postdoctoral research fellow to explore research topics around the use on multi-level regression and poststratification with non-probability surveys. This work is funded by the National Institutes of Health, and is collaborative work with Prof Andrew Gelman (Statistics and Political Science, Columbia University) and Assoc/Prof Qixuan Chen (Biostatistics, […]
  • Hierarchical modeling of excess mortality time series
    Elliott writes: My boss asks me: For our model to predict excess mortality around the world, we want to calculate a confidence interval around our mean estimate for total global excess deaths. We have real excess deaths for like 60 countries, and are predicting on another 130 or so. we can easily calculate intervals for […]