On Medium: A (very) short essay on race, technology, and invisibility in America.
I first read Invisible Man in 2014 in a "Great American Novel" college course. A year later, when I took "African American Writers and Autobiography," I kept thinking about the book. We read only memoirs and autobiographies, but themes from Ellison's novel kept coming up. I wished there were a way to "see" the threads of connection between Ellison's extended metaphor of Black identity and subjecthood in fiction and some of the lived experiences and reflections of authors like Richard Wright, Maya Angelou, and Samuel Delaney. These ideas already exist broadly in scholarly writing. One of the major points of Ellison's novel is that it's an everyman: he even ends it with the question, "Who knows, but that, on the lower frequencies, I speak for you?"
I was fascinated, however, by the idea of visualizing these "lower frequencies." I made it the focus of my senior thesis in the English department, and thus was born my fascination with topic modeling as a way to suggest theme and metaphorical significance. I love the idea that topic modeling might actually pick up on some of those lower frequencies in Invisible Man, to digitally point out patterns, associations, and signifiers that slip past the analog reader’s focus, or are outside the physical realms of analog analysis.
That being said, what I actually produced in 2017 was bar chart in R that was extremely hard to read or intuit unless you had quite a bit of background on the project. This reboot gives the topic model a home that is A) not in a 60 page paper written by an undergrad and B) accessible and interactive.
Topic modeling is a way of measuring relationships between words. This model uses LDA topic modeling, a statistical model that aims to identify, from a text sample, the topics that could have originally been used to compile the text sample. The researcher decides how many topics the model will return (20 was best when I ran the model), and then through a process called Gibbs Sampling, the topic model “sorts” the text into that number of topics (Jockers). The end result is comprised of lists of highly correlated words that make up different “topics.” Another way of understanding it is this: when Ellison uses the word “invisibility,” he uses the words “music”, “light”, “hole” and “world” disproportionately often compared to how often he uses those words when he has not just used the word “invisibility.” In plain English: when Ellison talks about invisibility, he also often talks about music, light, the protagonist’s hole, and the world.
You can see this in the word frequency plot for each topic -- words are not randomly or evenly distributed through the novel, but clustered into "topics" based on what Ellison is writing about on that page or in this chapter.
In progress: A short tutorial on how I used d3.dispatch in this project.