recogeo: A new R package to reconcile changing geographic boundaries (and corresponding variables)

Demographics information is usually reported in relation to precise boundaries: administrative, electoral, statistical, etc. Comparing demographics information reported at different point in time is often problematic because boundaries keep changing. The recogeo package faciliates reconciling boundaries and their data by a spatial analysis of the boundaries of two different periods. In this post, I explain how to install the package, reconcile two spatial objects and check the results.

(more…)

Friday, 1 February 2019

Are you parallelizing your raster operations? You should!

If you plan to do anything with the raster package you should definitely consider parallelize all your processes, especially if you are working with very large image files. I couldn’t find any blog post describing how to parallelize with the raster package (it is well documented in the package documentation, though). So here my notes.
(more…)

Thursday, 17 January 2019

How to (quickly) enrich a map with natural and anthropic details


In this post I show how to enrich a ggplot map with data obtained from the Open Street Map (OSM) API. After adding elevation details to the map, I add water bodies and elements identifying human activity. To highlight the areas more densely inhabitated, I propose to use a density-based clustering algorithm of OSM features.

(more…)

Thursday, 9 August 2018

The two alternatives to the monasterisation of the World wide web

Saint Michael’s Abbey, in the Susa Valley, Piedmont. Source: Wikipedia.

In Medieval Europe, information was physically concentrated in very few secluded libraries and archives. Powerful institutions managed them and regulated who could access what. The library of the fictional abbey that is described in Umberto Eco’s The Name of the Rose is located in a fortified tower and only the librarian knows how to navigate its mysteries. Monasteries played an essential role in preserving written information and creating new intelligence from that knowledge. But being written information a scarce resource, with the keys to libraries came also authority and power. Similarly, Internet companies are amassing information within their fortified walls. In so doing, they provide services that we now see as essential but they also contravene the two core principles of the Internet: openness and decentralisation.

(more…)

Monday, 7 May 2018

Local participation and not unemployment explains the M5S result in the South

The abundance of economic data and the scarcity of social data with a comparable level of granularity is a problem for the quantitative analysis of social phenomena. I argue that this fundamental problem has misguided the analysis of the electoral results of the Five Star Movement (M5S) and its interpretation. In this article, I provide statistical evidence suggesting that — in the South — unemployment is not associated with the exceptional increase in the M5S support and that local participation is a stronger predictor of support than most of the demographics.

What happened

The 2018 Italian general elections (elections, since both the Chamber of Deputies and the Senate, were renewed) saw

  1. a significant increase in the number of votes for two parties, the Five Start Movement (M5S) and the League (formerly Northern League),

and

  1. an increase in the importance geography as an explanatory dimension for the distribution of votes.

The following two maps show where the M5S and the League have increased electoral support from 2013 to 2018. (Electoral data are always data for the election of the Chamber of Deputies).

Vote difference: 2018-2013 (a few communes have not reported all the results, notably Rome)

 

The geographic pattern is quite simple. The M5S has increased its support in the South and maintained its votes in the North, the League has significantly strengthened its support in the North but has also collected votes in the South, where it had virtually no support. The third and the fourth most voted parties, the Democratic Party (PD) and Berlusconi’s Forza Italia (FI), have lost votes almost everywhere. If we map the results of the four parties side-by-side with the same scale, the PD and FI almost faded into the background.

Votes in the 2018 General elections

Yet, major metropolitan areas do not always follow the national trend. If Naples unambiguously voted M5S, Turin, Milan and Rome did saw the Democratic Party as the most voted party in the wealthiest districts.

Votes in the 2018 General elections (Clock-wise from top-left: Turin, Milan, Naples, Rome)

The density of the distribution of results at the commune and sub-commune level in the macro regions indicates that if the M5S electorally dominates in the South and in the two major islands, the League is the most popular party in the North.

Distribution of votes at commune or sub-commune level

The territoriality of the results, especially along the North-South dimension, makes the analysis especially complicated. This because the strong result of the League in the North and of the M5S in the South might simplistically suggest that immigration (which is much stronger in the North) explains the League’s result in the North and unemployment and poverty (stronger in the South) explain the M5S’s result in the South. This reading is especially attractive since immigration and the M5S proposal to introduce a guaranteed minim income have dominated the campaign.

(more…)

Tuesday, 20 March 2018

2018 Italian general election: Details on my simulation

This article describes the simulation behind the app that you find here

This simulation of the results for the 2018 general election is based on the results from the last two national elections (the Italian parliament election in 2013 and the European Parliament election 2014) and national polls conducted until 16 February 2018. The simulation is based on one assumption, which is reasonable but not necessarily realistic: the relative territorial strength of parties is stable. From this assumption derives that if the national support for a party (as measured by national voting intention polls) varies, it varies consistently and proportionally everywhere. A rising tide lifts all boats and vice versa. The assumption has some empirical justification. If we compare the difference from the national support (in percentage) for each district in 2013 and 2014 we see a significant correlation, especially in the major parties.

Votes to party in the 2018 Chamber districts

(more…)

Tuesday, 27 February 2018

Quick analysis of the Italian referendum results

The 2016 Italian referendum torpedoed the constitutional reform presented by the government presided by Matteo Renzi (41). According to the final count, which includes 1.2 million votes cast overseas, the reform was rejected by almost 60% of the voters.

Three parties played a predominant role during the electoral campaign: the ruling Democraric Party (PD), leaded by the chief of government Renzi, the Five Star Movement (M5S), founded and leaded by Beppe Grillo (68), and the Lega Nord (LN), leaded by Matteo Salvini (43). The fourth Italian party, Forza Italia, for different reasons – including the health of Silvio Berlusconi (80) – played a minor role.

(more…)

Monday, 5 December 2016

Cosa possiamo imparare dal M5S

Leggo e rispondo al post di Massimo Mantellini (Il M5S, il wifi e il principio di precauzione) in cui si evidenzia con preoccupazione come il Movimento abbia portato in Parlamento, dunque in qualche modo legittimandole, posizioni anti-scientifiche; un “pensiero tossico, banale e a suo modo inattaccabile, che nuoce al Paese intero”.

Il Movimento Cinque Stelle con un bacino elettorale che si aggira tra il 25 e il 30% (8.5-10 milioni di persone) è necessariamente complesso in termini di rappresentanza demografica e di diversità di opinione. Considerando un astensionismo del 25%, se vi trovate in fila al supermercato delle 10 persone che vi precedono circa due votano M5S. Purtroppo questa complessità raramente traspare nelle narrazioni giornalistiche, e chi fa informazione tende (troppo) spesso a preferire i tratti caricaturali (da cappello di carta stagnola o da gita in Corea del Nord, per intenderci). Ma questo tipo di informazione è sbagliata: primo perché distorce nella semplificazione, secondo perché incoraggia comportamenti macchiettistici, grotteschi e sbracati da parte di chi sedendo in istituzioni affollate cerca visibilità.

(more…)

Friday, 22 July 2016

Road to Rome: The organisational and political success of the M5S

The Five Star Movement (M5S) obtained two major victories in the second round of municipal elections on 19 June 2016 in Rome and Turin. Rome attracted the most international attention but it is M5S’ victory in Turin that is likely the most consequential for them and other European anti-establishment parties.

In Rome, a municipality with 2.8 million people and an annual budget of €5 billon, Virginia Raggi (age 37) gained doubled the votes of her contender Roberto Giachetti (age 55). In Turin, a city with a population of 900,000 and an annual budget of €1.69 billion, Chiara Appendino (age 31) outstripped Piero Fassino (age 66) by about 10 percentage points.

Continue reading on Pop Politics Aus

Friday, 8 July 2016

Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.

(more…)

Tuesday, 26 April 2016

tweets


Twitter: frbailo

links


blogroll


RSS r-bloggers.com

  • Object of Type Closure is Not Subsettable
    I started using R in 2004. I started using R religiously on the day of the annular solar eclipse in Madrid (October 3, 2005) after being inspired by David Hunter’s talk at ADASS. It took me exactly 4,889 days to figure out what this vexing error means, even though trial and error helped me move […]
  • Cloudy with a chance of Caffeinated Query Orchestration – New rJava Wrappers for AWS Athena SDK for Java
    There are two fledgling rJava-based R packages that enable working with the AWS SDK for Athena: awsathena | GL| GH awsathenajars | GL| GH They’re both needed to conform with the way CRAN like rJava-based packages submitted that also have large JAR dependencies. The goal is to eventually have wrappers for anything R folks need... […]
  • call for sessions and labs at Bay2sC0mp²⁰
    A call to all potential participants to the incoming BayesComp 2020 conference at the University of Florida in Gainesville, Florida, 7-10 January 2020, to submit proposals [to me] for contributed sessions on everything computational or training labs [to David Rossell] on a specific language or software. The deadline is April 1 and the sessions will […]
  • forcats::fct_match
    This journey started almost exactly a year ago, but it’s finally been sufficiently worked through and merged! Yay, I’ve officially contributed to the tidyverse (minor as it may be). It began with a tweet, recalling a surprise I encountered that...Continue Reading →
  • RVowpalWabbit 0.0.13: Keeping CRAN happy
    Another small RVowpalWabbit package update brings us version 0.0.13. And just like Rblpapi yesterday, we have a new RVowpalWabbit update to cope with staged installs which will be a new feature of R 3.6.0. No other changes were made No new code or fe...

RSS Simply Statistics

  • Open letter to journal editors: dynamite plots must die
    Statisticians have been pointing out the problem with dynamite plots, also known as bar and line graphs, for years. Karl Broman lists them as one of the top ten worst graphs. The problem has even been documneted in the peer reviewed literature. For example, this British Journal of Pharmacology paper titled Show the data, don’t […]
  • Interview with Stephanie Hicks
    Editor’s note: For a while we ran an interview series for statisticians and data scientists, but things have gotten a little hectic around here so we’ve dropped the ball! But we are re-introducing the series, starting with Stephanie Hicks. If you have recommendations of a (junior) person in academics or industry you would like to […]
  • The Tentpoles of Data Science
    What makes for a good data scientist? This is a question I asked a long time ago and am still trying to figure out the answer. Seven years ago, I wrote: I was thinking about the people who I think are really good at data analysis and it occurred to me that they were all […]

RSS Statistical Modeling, Causal Inference, and Social Science

  • Boris Karloff (3) vs. Mel Brooks; Riad Sattouf advances
    In yesterday’s contest, Dalton asks: Lance Armstrong isn’t even a GOAT. Did he cheat to get included on the list at the expense of Eddy Merckx? But then Jrc points out: Lance isn’t in for Cycling GOAT, he’s in for NGO-bracelet GOAT. I’m pretty sure he didn’t juice the bracelets. Although now that I think […]
  • Kevin Lewis has a surefire idea for a project for the high school Science Talent Search
    Here’s his idea: If I were a student, I’d do a study on how Science Talent Search judges are biased. That way, they can’t reject it, otherwise it’s self-confirming. That’s a great idea! Maybe it’s possible to go meta on this one by adding some sort of game-theoretic model or simulation of talent search submission […]
  • Riad Sattouf (1) vs. Lance Armstrong; Bruce Springsteen advances
    Best comment yesterday came from Jan: Now we have opportunity to see in the next round whether Julia is really that much better than Python! But that doesn’t resolve anything! So to pick a winner we’ll have to go with Tom: Python foresaw the replication crisis with their scientific method of proving someone is a […]