Explicit semantic analysis with R

Explicit semantic analysis (ESA) was proposed by Gabrilovich and Markovitch (2007) to compute a document position in a high-dimensional concept space. At the core, the technique compares the terms of the input document with the terms of documents describing the concepts estimating the relatedness of the document to each concept. In spatial terms if I know the relative distance of the input document from meaningful concepts (e.g. ‘car’, ‘Leonardo da Vinci’, ‘poverty’, ‘electricity’), I can infer the meaning of the document relatively to explicitly defined concepts because of the document’s position in the concept space.


Tuesday, 26 April 2016


Twitter: frbailo



RSS r-bloggers.com

  • Lecture slides: Real-World Data Science (Fraud Detection, Customer Churn & Predictive Maintenance)
    These are slides from a lecture I gave at the School of Applied Sciences in Münster. In this lecture, I talked about Real-World Data Science and showed examples on Fraud Detection, Customer Churn & Predictive Maintenance. Real-World Data Scie...
  • Use foreach with HPC schedulers thanks to the future package
    The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It's ideal for working with computations that take a long time to complete; that would benefit from using distributed, parallel frameworks to make them complete faster; and that you'd rather not have locking up your interactive R session. You […]
  • Feature Selection using Genetic Algorithms in R
    From a gentle introduction to a practical solution, this is a post about feature selection using genetic algorithms in R.
  • Using clusterlab to benchmark clustering algorithms
    Clusterlab is a CRAN package (https://cran.r-project.org/web/packages/clusterlab/index.html) for the routine testing of clustering algorithms. It can simulate positive (data-sets with __1 clusters) and negative controls (data-sets with 1 cluster). Why test clustering algorithms? Because they often fail in identifying the true K in practice, published algorithms are not always well tested, and we need to know […]
  • Selecting ‘special’ photos on your phone
    At the beginning of the new year I always want to clean up my photos on my phone. It just never happens. So now (like so many others I think) I have a lot of photos on my phone from … Continue reading →

RSS Simply Statistics

  • How Data Scientists Think - A Mini Case Study
    In episode 71 of Not So Standard Deviations, Hilary Parker and I inaugurated our first “Data Science Design Challenge” segment where we discussed how we would solve a given problem using data science. The idea with calling it a “design challenge” was to contrast it with common “hackathon” type models where you are presented with […]
  • The Netflix Data War
    A recent article in the Wall Street Journal, “At Netflix, Who Wins When It’s Hollywood vs. the Algorithm?” by Shalini Ramachandran and Joe Flint details some of the internal debates within Netflix between the Los Angeles-based content team, which is in charge of developing and marketing new content for the streaming service, and the data […]
  • The Role of Theory in Data Analysis
    In data analysis, we make use of a lot of theory, whether we like to admit it or not. In a traditional statistical training, things like the central limit theorem and the law of large numbers (and their many variations) are deeply baked into our heads. I probably use the central limit theorem everyday in […]

RSS Statistical Modeling, Causal Inference, and Social Science