1. Getting R to do what we need

Many literature scholars have brought “computational literary analysis” (CLA) into their own sub-fields, whether that be Shakespearean studies, intertextual studies, or metadata studies. As the popularity of CLA grows and more humanities scholars embrace the digital turn in research, there will be increasing demand for computational literacy and the ability to code well. In this project, Visualizing Tolkien with R, I hope to accomplish two goals: first, to establish a baseline of what Tolkien literary studies can do with natural language processing (since little research has been done in this crossroads); second, to make this project as open as possible to humanist researchers who may have little-to-no background in computational studies. I should emphasize that CLA is not just numbers and pretty graphs – there is no point to creating these statistics, probabilities, or visualizations (called “distant reading”) if there is no “close reading” of the texts being studied. Close and distant reading do not exist in isolation of each other. That is why every sub-project that I have posted on this website will have the close reading literary analysis merged with the distant reading.

In R, “packages” are toolboxes with specific tools designed for a unique purpose. It’s useful because that means that many things can be done under the umbrella of R. These toolboxes can also sometimes talk to each other. There are a lot of packages in R just for text analysis. Some are specially designed to handle computational linguistics, some are designed for data science, and some focus on literary analysis. So it is unsurprising that whenever we start a new project, or projects, we have to think about which package best suits our needs and fits with what we already know.

For Visualizing Tolkien with R, I’ve decided to use <quanteda> most of the time. Sometimes I will also turn <quanteda>’s outputs into a tidy version so it can be used with the <tidyverse>. Here’s the breakdown of what each package will do:

<quanteda> will be used for studying token distribution, sentiment analysis, and topic modelling
the <tidyverse> will be used for turning some of our <quanteda> outputs into graphs

Please feel free to follow along with every project, as I have posted video walkthroughs and the GitHub links to the code for each project! This website was designed for researchers to learn alongside with me.

00:00 / 02:59

Following along with me? Here's your first steps.

2. Getting familiar with RStudio's layout.

Back to sub-projects page

1. Installing R and RStudio.