recogeo: A new R package to reconcile changing geographic boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain how to install the package, reconcile two spatial objects and check the results.

(more…)

Friday, 1 February 2019

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.
(more…)

Thursday, 17 January 2019

How to (quickly) enrich a map with natural and anthropic details


In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.

(more…)

Thursday, 9 August 2018

Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.

(more…)

Tuesday, 26 April 2016

tweets


Twitter: frbailo

links


blogroll


RSS r-bloggers.com

  • Current approaches to Species Distribution Modelling in R
    Current approaches to Species Distribution Modelling in R My course notes for “Current approaches to Species Distribution Modelling in R “ for World Fisheries Congress 2021 are now free online. The course covers tidyverse and sf workflows for models,... Continue reading: Current approaches to Species Distribution Modelling in R
  • Why and How to Model Conditional Variance, with an Application to my Letterboxd Data
    One of the main assumptions of linear regression taught in statistics courses is that of “constant variance” or “homoscedasticity.” Having data that do not have constant variance (i.e., are heteroscedastic) is then often treated as a problem—a nuisance that violates our assumptions and, among other things, produces inaccurate ... Continue reading: Why and How to […]
  • Gold-Mining Week 3 (2021)
    Week 3 Gold Mining and Fantasy Football Projection Roundup now available. The post Gold-Mining Week 3 (2021) appeared first on Fantasy Football Analytics. Continue reading: Gold-Mining Week 3 (2021)
  • EARL online 2021: highlights
    Thank you to everyone who joined us for EARL 2021 – especially to all of the fantastic presenters! We were... The post EARL online 2021: highlights appeared first on Mango Solutions. Continue reading: EARL online 2021: highlights
  • GooglyPlusPlus2021: Restarting IPL 2021 as-it-happens!!!
    The IPL 2021 extravaganza has restarted again, now in Dubai, and it was time for me to crank up good ol’ GooglyPlusPlus2021. As in my earlier post, GooglyPlus2021 with IPL 2021 as it happens, during the initial set of IPL 2021 games,, a command script will execute automatically every day, download the latest ... Continue […]

RSS Simply Statistics

  • Streamline - tidy data as a service
    Tldr: We started a company called Streamline Data Science https://streamlinedatascience.io/ that offers tidy data as a service. We are looking for customers, partnerships and employees as we scale up after closing our funding round! Most of my career, I have worked in the muck of data cleaning. In the world of genomics, a lot of […]
  • The Four Jobs of the Data Scientist
    In 2019 I wrote a post about The Tentpoles of Data Science that tried to distill the key skills of the data scientist. In the post I wrote: When I ask myself the question “What is data science?” I tend to think of the following five components. Data science is (1) the application of design […]
  • Palantir Shows Its Cards
    File this under long-term followup, but just about four years ago I wrote about Palantir, the previously secretive but now soon to be public data science company, and how its valuation was a commentary on the value of data science more generally. Well, just recently Palantir filed to go public and therefore submitted a registration […]

RSS Statistical Modeling, Causal Inference, and Social Science

  • More on that claim that scientific citations are worth $100,000 each
    Earlier today we discussed a stunning claim by scholar and Ted talk performer Albert-Laszlo Barabasi: It’s possible to put actual monetary value on each citation a paper receives. We can, in other words calculate exactly how much a single citation is worth. . . . in the United States each citation is worth a whopping […]
  • Albert-Laszlo Barabasi is underpaid. By a lot!
    David Sholl writes: I thought your readers might be interested in this excerpt from the relatively new book in the Malcolm Gladwell tradition by Albert-Laszlo Barabasi, The Formula: The Universal Laws of Success: It’s possible to put actual monetary value on each citation a paper receives. We can, in other words calculate exactly how much […]
  • f2f is better
    Today I had my first full in-person work meeting in over a year. It was great!