My end of semester project is a proof of concept for a tool called myopic appliance. Myopic appliance takes it’s inspiration from and is a response to Voyant Tools. This web application takes in plain text, performs natural language processing (NLP), and produces quantified data visualizations and tables for the input text. In the context of the humanities, this type of analysis is called distant reading. The question explored in creating myopic appliance is: can computational techniques like NLP be used to conduct close reading, a method used in literary study where observing word choice, syntax, and sentence sequencing as it interacts with the content to inform the meaning of a written passage.
I used the phrase “proof of concept” above purposefully, rather than “prototype.” The coding completed over the past three weeks demonstrates the feasibility of applying NLP and web styling in aid of close reading. What was built in this time period can be used on any plain text, albeit with a threshold of tech knowhow as a barrier to entry. The steps for that are as follows:
- Clone the repo
git clone [email protected]:klp/myoptic_appliances.git - Create one or many
.txtfiles and save it to thesource_txtfolder in the project - Install the required python packages
pip install -r requirements.txteither at the system or in a virtual environment (i.e.python -m venvname_of_virtualenv) - Open the
process_text.pyfile and update thefile_pathslist with the files to process (e.g."source_txt/example.txt") - Run
process_text.pywith a python interpreter, which generated a JSON file inprocessed_txtthat maps all NLP operations - Open the
script.jsand update jsonPaths variable with the process_txt path(s) to files you want to ingest into the tool (e.g"processed_txt/example.json") - Start up a web server to host the
index.htmlpage — at the CUNY Graduate Center, students are fond of Microsoft’s Visual Studio Code, and use the Live Server extension
This is far from a prototype, which would represent a step towards a final product. But it is possible to experiment with different source texts following this simple, if not easy, process.
As is often the case with digital projects, the final output deviates from the initial proposal. I moved away from a deformance based meshing of two texts published the same year in different countries with similar culture impacts (Stowe’s Uncle Tom’s Cabin and Turgenev’s A Sportsman’s Sketches). Two factors when into generating this deviation. First, before working on the web presentation layer, I cobbled together a brittle command line utility to experiment with combining sentences and swapping parts of speech between the two texts, and found the results wanting. I opted to use three chapters from Second, after receiving feedback in multiple class sessions, the consensus was that this project appears to veer into pedagogical territory. Because the first functionality focused on parts of speech emphasis and de-emphasis, it seemed a natural fit to divert from the deformance angle to a more straightforward and predetermined method of manipulating text.
Challenges, I’ve had a few, but not too few to mention. After a significant break from using popular Python NLP libraries like Natural Language Toolkit (nltk) and even the industrial strength spaCy, I forgot how much the output requires finessing. To avoid context switching and frankly strength a particular skill, perhaps advisedly, I moved some of that finesse to the clients side JavaScript. The concern with that decision comes from previous experience trying to process text in browser, and the potential for poor performance (see Our Mutual Language Processor). The fear of performance issues lead me to the type writer like display of text across the screen in myopic appliance. I stuck with that effect even when no performance issues emerged because it felt in keeping with the close reading ethos of the project.
In any event, the in-browser text process resembled Whac-A-Mole at times. At first, all punctuation was treated as words, creating all sorts of spacing. After six hours spent trying to convert text between underscore into italicized words, I’ve given up for this particular iteration. Dashes (i.e. ‘-‘) weren’t labeled as punctuation, and required their own special handling that sometimes works predictably.
Another challenge for me: my still nascent frontend web skills like CSS, Javascript and UI design. Most of my programming experience includes backend APIs, micro services, and data processing. My standards are high enough in the technical realm that I think that’ll translate into the world of frontend engineering. Not entirely so. While the digital humanities mentality welcomes projects that have some rough edges, unintentional stylistic choices that verge from what’s recognizable on the internet risk dismissal from the interactor. Getting margin, padding, flexbox concepts to work exactly as I want them still presents a challenge. Intuiting UI/UX is also not a real thing, though I think building a crumby UI and refining it feels faster/easier than sweating the interaction details ahead of development. I also split out the UI into a logical set of steps, which introduced a number bugs. For instance, there was a state in which you can both emphasis and de-emphasis parts of speech.
When looking at the iteration completed for this class, relating the output to the course material can be a stretch. I never quite got to the exploration of alea or mimic, though Ilinx could sneak into the idea of collecting parts of speech when related to the original text. I liken the play at work here as more similar to playing a piece of music.
One neat idea I had during the development process was using Github releases. This allowed me to bundle up versions of my application into zip files. I also included releases available on the Github Pages site, linked below:
- https://klp.github.io/myoptic_appliances/versions/v0.2.0/
- https://klp.github.io/myoptic_appliances/versions/v0.3.0/
- https://klp.github.io/myoptic_appliances/versions/v0.3.1/
- https://klp.github.io/myoptic_appliances/versions/v0.4.0/
- https://klp.github.io/myoptic_appliances/versions/v0.5.0/
- https://klp.github.io/myoptic_appliances/versions/v0.5.1/
Finally, if we were pushing toward a v1 of this project, uploading your own text would have to be the bare minimum, I would wager, as it provides the most value to someone using this tool. Expecting most users to go through the process above to get going on a text of their own is not realistic in my scenarios. My guess is that the distance between v0.5 and v1.0 would be much longer than v0 to now.









