Unveiled during our trip to the Scholars’ Lab at the University of Virginia, Prism is a “crowdsourcing” tool that allows users to analyze the words and phrases of a given text by highlighting pieces of text with up to three color-coded categories.
There are multiple applications and benefits to the tool. The coolest part of the tool is that the same functions have different uses across disciplines and professions. For instance, in an English classroom, Prism can be used to interpret themes of a writing (e.g. rhetoric, orientalism, and social Darwinism in Thomas Jefferson’s Notes on the State of Virginia). To marketing researchers, Prism can be used in focus groups to elicit how certain writings make them feel (i.e. positive, negative, indifferent). With any writing, Prism can induce users to place the ostensibly similar into distinct buckets.
For our purposes, Prism can play a very special role in the interpretation of the thousands of pages of literature on the coeducation at Washington and Lee University. Using categories such as pro, con, and indifferent, visitors can effectively categorize whether an artifact is for or against coeducation as a whole. Furthermore, using ratings such as positive, negative, and neutral, users can analyze official statements and decisions, the university can better understand how to craft future decisions in the future to be more readily accepted. Effectively crowd-sourced sentiment analysis, Prism can provide a reliable way to categorize what would be difficult to categorize without extensive interpretation.
Of course, the greatest limitation to the tool for our project is that each text must be OCRed and uploaded separately. The tool also is limited to three categories – which although prevents an over-abundance of information – can restrict more complicated questions from being asked. Third, there is not yet an automated way to upload a large amount of texts or download them after they have been interpreted.
Visualization tools help us to understand large data by representing large numbers or juxtapositions with pictures and motion pictures. Cirrus gives users the experience of understanding a text by way of a word cloud. (The tool is a part of a much larger visualization experience from Voyant Tools.)
Above simply providing the most used words in a text, Cirrus also provides users with the option to blacklist (known as a “stop words list” in Cirrus) certain words that may distract from the main points of a text. For instance, running Cirrus on Cirrus’ pre-loaded Shakespeare play database, we find the most prominent words are and, the, to, that, my, and your. However, once the generic stop words list is applied, the most prominent words are thou, shall, king, lord, come, and thee. This is a much more identifiable and helpful list of words to understand Shakespeare.
If we process all of the letters, articles, and reports from coeducation through Cirrus, I predict we will see a special focus on culture and tradition. Of course, we will need to use stop words list that includes coeducation, W&L, Washington and Lee, and a variety of other words that are expected to appear and would not provide us with good insight.
Even after we process texts through Cirrus, we will still face the problem of potentially misleading information. For instance, those who are most impassioned on either side of the coeducation decision may reveal a polarity without displaying the moderate opinions. Furthermore, highlighting the words that appear most in a text may not highlight the sparse opinions that carried the a lot of weight among the W&L community or opinions that could be omitted by sloppy stop listing.
The two tools that I have discussed provide our project with tools that are both pleasing to the eye and allow users to potentially learn something new and surprising. They are not perfect, but the preponderance of each tool’s beneficial effects call for its use on our project.