Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.


Tuesday, 26 April 2016


Twitter: frbailo



RSS r-bloggers.com

  • Introducing scale model in greybox
    At the end of June 2021, I released the greybox package version 1.0.0. This was a major release, introducing new functionality, but I did not have time to write a separate post about it because of the teaching and lack of free time. Finally, Christmas has arrived, and I could spend several ... Continue reading: […]
  • Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}
    Setup Loading the R libraries and data set. # Loading libraries library(geomtextpath) # For adding text to ggplot2 curves library(tidytuesdayR) # For loading data set library(ggbeeswarm) # For creating a beeswarm plot library(tidyverse) # For the gg... Continue reading: Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}
  • Non-linear model of serial dilutions with Stan
    In chapter 17 “Parametric nonlinear models” of Bayesian Data Analysis1 by Gelman et al., the authors present an example of fitting a curve to a serial dilution standard curve and using it to estimate unknown concentrations. Below, I build t... Continue reading: Non-linear model of serial dilutions with Stan
  • Predicting future recessions
    Even if this sounds incredible, yes, we can predict future recessions using a couple of time series, some simple econometric models, and … R !  The basic idea is that the slope of the yield curve is somewhat linked to the probability of future recessions. In other words, the difference between the ... Continue reading: […]
  • Detecting multicollinearity — it’s not that easy sometimes
    By Huey Fern Tay with Greg Page When are two variables too related to one another to be used together in a linear regression model? Should the maximum acceptable correlation be 0.7? Or should the rule of thumb be 0.8? There is actually no single, ‘one-size-fits-all’ answer to this question. As an ... Continue reading: Detecting multicollinearity […]

RSS Simply Statistics

RSS Statistical Modeling, Causal Inference, and Social Science