Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.


Tuesday, 26 April 2016


Twitter: frbailo



RSS r-bloggers.com

  • November 8th & 9th in Munich: Workshop on Deep Learning with Keras and TensorFlow in R
    Registration is now open for my 1.5-day workshop on deep learning with Keras and TensorFlow using R. It will take place on November 8th & 9th in Munich, Germany. You can read about one participant’s experience in my workshop: Big Data – ...
  • I’ll be talking about ‘Decoding The Black Box’ at the Frankfurt Data Science Meetup
    I have yet another Meetup talk to announce: On Wednesday, October 26th, I’ll be talking about ‘Decoding The Black Box’ at the Frankfurt Data Science Meetup. Particularly cool with this meetup is that they will livestream the event at www.youtube....
  • Le Monde puzzle [#1066]
    The second Le Monde mathematical puzzle in the new competition is sheer trigonometry: When in the above figures both triangles ABC are isosceles and the brown segments are all of length 25cm, find the angle in A and the value of DC², respectively. This could have been solved by R coding the various possible angles […]
  • «smooth» package for R. Intermittent state-space model. Part I. Introducing the model
    Intro One of the features of functions of smooth package is the ability to work with intermittent data and the data with periodically occurring zeroes. Intermittent time series is a series that has non-zero values occurring at irregular frequency (Svetuknov and Boylan, 2017). Imagine retailer who sells green leap sticks. The demand on such a […]
  • Not Hotdog: A Shiny app using the Custom Vision API
    I had a great time at the EARL Conference in London last week, and as always came away invigorated by all of the applications of R that were presented there. I'll do a full writeup of the conference later this week, but in the meantime I wanted to share the materials from my own presentation […]

RSS Simply Statistics

  • Divergent and Convergent Phases of Data Analysis
    There are often discussions within the data science community about which tools are best for doing data science. The most recent iteration of this discussion is the so-called “First Notebook War”, which is well-summarized by Yihui Xie in his blog post (it is a great read). One thing that I have found missing from many […]
  • Being at the Center
    Hilary Parker and I just released part 2 of our book club discussion of Nigel Cross’s book Design Thinking and it centers around a profile of designer Gordan Murray, who spent his career designing Formula One race cars. One of the aspects of his job as a designer is taking a “systems approach” to solving […]
  • Constructing a Data Analysis
    This week Hilary Parker and I have started our “Book Club” on Not So Standard Deviations where we will be discussing Nigel Cross’s book Design Thinking: Understanding How Designers Think and Work. We will be talking about how the work of designers parallels the work of data scientists and how many of the principles developed […]

RSS Statistical Modeling, Causal Inference, and Social Science