Another language-related R project – this time building on the Apache OpenNLP library. The project was done as part of the John Hopkins MOOC Data Science on Coursera. It was one of the course projects that was somewhat open to interpretation and therefore actually led to a somewhat independent and creative effort.
The app I build is here and as always the code is on github.
The idea behind the app is be to be able to help kids doing dictation exercises where they have to underline different words in a text (e.g. identify all verbs or all nouns) by automating it using NLP. The app is just a simple implementation of tokenization (identifies the words in the text – where they begin and end) and part-of-speech tagging (identifies word class of the words based on the tokenized words).
Project took about a day to build from scratch as it required no learning of algorithms because the Apache OpenNLP algorithm has already learned on a corpora. So it was just a matter of writing the code that takes a string and a word class as inputs from a shiny app.