On Distant and Close Reading Again

What we are thinking about to develop the “Pico Project” focuses on the application of “topic modelling” techniques to detect sources of the Conclusiones CM in the available digitised corpora of some medieval authors (Albertus Magnus, Thomas Aquinas and possibly John Duns Scotus).  Similar attempts have already been successfully carried out—cf. Timothy Allen & al., “Plundering Philosophers: Identifying Sources of the Encyclopédie,” http://quod.lib.umich.edu/j/jahc/3310410.0013.107/–plundering-philosophers-identifying-sources?rgn=main;view=fulltext, and Glenn H. Roe, “Intertextuality and Influence in the Age of Enlightenment: Sequence Alignment Applications for Humanities Research,” http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/intertextuality-and-influence-in-the-age-of-enlightenment-sequence-alignment-applications-for-humanities-research/. A procedure associated with distant reading can be of help in identifying possible sources of a given text.

So far, nothing new.  But the application of topic modelling may be brought to bear also on the practice of Linked Open Data annotation, a procedure more commonly associated with close reading. For identified topics, or groups of terms that frequently occur together, can provide actually observed data to define controlled languages and, possibly, to develop specific ontologies to annotate the texts. If we do this, instead of “Using semantic enrichment to enhance big data solutions,” (see http://www-01.ibm.com/software/ebusiness/jstart/semantic/), we may use topic modelling to enhance annotation and semantic enrichment or, to say it otherwise, we may use distant reading to enhance close reading.


Distant and Close Reading

I must confess that I am still an affectionate adherent of “close” reading.  Nevertheless I am powerfully impressed by the results of “distant” reading.  So I won’t here engage in the mock battle between the two opposing parties, but rather suggest another possible way of creating “synergistically recursive interactions” between distant and close reading ( Katherine Hayles, How We Think, 2012, p. 31 ).  Aerial photography allows us to see unnoticed archeological sites, but then we have to dig…

The reason of this post is disclosed here at once:  I recently wrote a letter to Ernesto Priani about the Pico Project and when I congratulated Massimo Riva for the opening of this blog, I was prompted, as a reply, to publish my letter.  So let me try to contextualise it.  Ernesto and I had an exchange about annotation in the discussion that was started to prepare the new project proposal :  I was insisting on the use of linked data to bring to the fore intra-textual relations among terms, whereas Ernesto was stressing the need of pointing out inter-textual relations to other works, either sources or later works influenced by Pico.  We agreed that the two concerns could indeed be reconciled and lately I was reflecting on a practical approach, that I presented to Ernesto.

Before translating my letter a few explanatory remarks are here in order.  Pico’s 900 Theses or Conclusiones CM are a collection of statements of past philosophers of all schools, together with a collection of statements of his own, that all aim at confirming their possible overall concordance.  To identify the exact sources of the statements that Pico reports, is a daunting philological task.  But in the case of Thomas Aquinas there might be a chance…  so let me (more or less) translate what I wrote :

« In some cases—I am thinking of Thomas Aquinas—we have at our disposal the entire corpus online.  A sort of “non-consumptive reading” or “topic modelling” may be of help, I believe, in finding within Aquinas’ corpus the passages referred to by each one of the theses attributed to him by Pico in his Conclusiones.

« Finding the exact references through this form of “distant reading” may be very helpful indeed, for here lies a chance to satisfy both our requirements and to reconcile our two distinct points of view.  Singling out source references in this way can offer us a heuristic basis for a subsequent analysis and “close reading” of Pico’s text, aimed at a critical interpretation of his thought.

« In “topic modelling,” a “topic” is defined in the following way:  “A ‘topic’ consists of a cluster of words that frequently occur together” (cf. MALLET).  In relation to Pico’s theses and by referring to their sources, it seems then possible to individuate distinct “topics,” that may be used as an interpretative device to analyse Pico’s works, possibly enabling us to develop, in a bottom-up way, fragments of controlled vocabularies or ontologies, that can be assumed as a basis for a systematic annotation of his texts and the production of linked open data for their semantic enrichment.

« This is in short the idea I am proposing.  As a tool to individuate source references, instead of MALLET (see above), I would prefer the “word2vec” approach, that consists in a vector representation of words, based on their co-occurrence.  A very interesting feature of this method is that by adding or removing a term in a cluster of words, that we may choose to define a “topic,” the set of the passages referred to changes radically.  This method seems to me very interesting indeed, because it brings to mind the notion of ‘language games’ introduced by Wittgenstein, according to which the meaning of a term is defined by the set of its relations to all the other terms in a given game.  And I think that this particular aspect of the “word2vec” approach can bring about, along with significant results, also very important theoretical insights ».

Dino Buzzetti