- Daniel J. Cohen, “From Babel to Knowledge: Data Mining Large Digital Collections, Essays on History and New Media. http://chnm.gmu.edu/essays-on-history-new-media/essays/?essayid=40
- Ted Underwood, “Where to Start with Text Mining,” The Shell and the Stone.http://tedunderwood.com/2012/08/14/where-to-start-with-text-mining/
- John Theibault, “Visualizations and Historical Arguments,” Writing History in the Digital Age, eds. Jack Dougherty and Kristen Nawrotzki.http://writinghistory.trincoll.edu/evidence/theibault-2012-spring/
It is interesting to think about the fact that search engines and APIs can help make sense of a collection of sources that would have lacked cohesion and understandability otherwise. Digitizing history clearly improves our sense of organization and access to various historical aids, tools, and databases. I had not heard of Syllabus Finder or the H-Bot prior to these readings, so when I looked them up and tried using them myself, I was pleasantly surprised and how much I found. And these are only two of many search tools that allow users to access a wide variety of information pertaining to a specific topic. I am also intrigued by H-Bot because you can ask the engine specific questions, something that not many search engines allow because they rely on specific key phrases versus providing an answer to a question. These tools are instrumental in the progress of digital history, as Daniel J. Cohen says: “ these computational methods, which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade.” The patterns and answers resulting from these tools will definitely help us, as historians, further and develop our analysis on specific subjects of interest.
Text mining, and relying on a larger body of work definitely helps historians provide answers to and insight on their questions. However, not enough is digitized to make use of. Obviously, digitizing documents costs money and time, so that is a deterrent, but it is so helpful having digitized primary documents on the web. The library of congress website if by far my favorite due to the wide variety of selection and time period that they’ve digitized. However, I wish more primary sources would be uploaded by archival research centers, especially for areas that are less commonly accessed or accessible by people. Not only does the programming and digitization of documents allow for easier access by the researcher, but it also helps categorize and organize primary documents from various different locations. And with this, the word graphs can be created (such as in Underwood’s article) using multiple sources – drawing relationships between specific terms and time periods. These tools, such as the Google NGram viewer, help show/trace history over time, something that would have been difficult to achieve before. As we see in Writing History in the Digital Age, there are various graphs, tables, maps and diagrams to help relay digitized information into a concise and cohesive visual work. This is fascinating because these visual tools show a multitude of factors in just one image, and they are all based on the works that are digitized so far.