On Distant and Close Reading Again

What we are thinking about to develop the “Pico Project” focuses on the application of “topic modelling” techniques to detect sources of the Conclusiones CM in the available digitised corpora of some medieval authors (Albertus Magnus, Thomas Aquinas and possibly John Duns Scotus).  Similar attempts have already been successfully carried out—cf. Timothy Allen & al., “Plundering Philosophers: Identifying Sources of the Encyclopédie,” http://quod.lib.umich.edu/j/jahc/3310410.0013.107/–plundering-philosophers-identifying-sources?rgn=main;view=fulltext, and Glenn H. Roe, “Intertextuality and Influence in the Age of Enlightenment: Sequence Alignment Applications for Humanities Research,” http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/intertextuality-and-influence-in-the-age-of-enlightenment-sequence-alignment-applications-for-humanities-research/. A procedure associated with distant reading can be of help in identifying possible sources of a given text.

So far, nothing new.  But the application of topic modelling may be brought to bear also on the practice of Linked Open Data annotation, a procedure more commonly associated with close reading. For identified topics, or groups of terms that frequently occur together, can provide actually observed data to define controlled languages and, possibly, to develop specific ontologies to annotate the texts. If we do this, instead of “Using semantic enrichment to enhance big data solutions,” (see http://www-01.ibm.com/software/ebusiness/jstart/semantic/), we may use topic modelling to enhance annotation and semantic enrichment or, to say it otherwise, we may use distant reading to enhance close reading.

Planning the VHL/DH meeting @ Brown (17-18 April, 2015)

I met with Andy Ashton and Elli Mylonas, last week, and we began to jot down a plan for our April, 2015 meeting. It looks like the dates are definitely going to be April 17 and 18. We thought that this two-day symposium/workshop could be focused on three general themes, along the lines of another symposium held at Brown a couple of years ago. We thought that for each of these main themes we could have a “keynote” speaker and a series of two or three panelists responding to the keynote and/or presenting their work in progress, etc.

Here is a very tentative, preliminary outline of the themes and panels:

Themes/panels:

Friday, April 17, afternoon 2-5pm

1. Annotations (Ontologies and Tools). Proposed keynote: Jan Christoph Meister (University of Hamburg, DE)

Saturday, April 18, morning

2. Corpora and Collections (Repositories and Interoperability). Proposed keynote: Trevor Muñoz (Univ. Of Maryland)

Saturday, April 18, afternoon

3. Scholarly Networks (Scholarly Communication and Collaboration). Keynote: Ray Siemens (University of Victoria, BC, director of INKE), “Planning for Renaissance Knowledge Network” (followed by a final discussion)

It would be great to involve these scholars who work at nearby institutions:

Julia Flanders (Northeastern Un., Boston)
John Unsworth (Brandeis Un., Boston)
MIT Annotation Studio (Cambridge, Ma.): Kurt Fendt, Jim Paradis (others?)

And we could also possibly reach out to people from the Shared Canvas project: Robert Sanderson, Los Alamos National Laboratory, Benjamin Albritton, Stanford University, as well as other European colleagues such as Fabio Ciotti (Roma, Tor Vergata), Riccardo Pozzo (CNR), Paul Caton (King’s College, London). Other scholars from the DH group @ Brown will be invited as well as other DH U.S. projects in Italian Studies (such as the Dartmouth Dante Lab, for example). Invitations to scholars outside the U.S. will obviously depend on available funding. But we will also have to wait and see who is available to join us on the proposed dates. The sooner we begin to contact potential invitees, the better. So please, let me know your thoughts about all this, at your earliest convenience!

Planning the next phase

About a month has gone by since I forwarded you the detailed NEH feedback for our unfunded grant proposal. I thank you all for your renewed support. However, I believe that, in light of the feedback we received, we need a longer gestation period before we resubmit the proposal to NEH or another funding agency. Rather than rushing (again) to meet the February deadline of the NEH Implementation program, to which we submitted our proposal last year, it makes more sense to take a little more time in order to rethink, reframe and hopefully refine our proposal and, more importantly, reaffirm and clarify our goals.

One of the reviewers, indeed the most critical of our proposal, had perhaps also the most constructive criticism. While this reviewer found “some merit” in the proposal – indeed s/he considered it “in many ways an excellent proposal to develop a digital editing and curation environment for texts from Early Modern Italy,” s/he also expressed the following concerns:

“- Year 1 seems to include many planning, prototyping, and experimental activities. Despite the fact that this initiative would build on earlier DH projects, the initiative may benefit from a 1 year start up phase prior to an implementation grant. In particular, I note from the detailed timeline included in an appendix that project staff would need to familiarize themselves with Shared Canvas (a complex data model) during the grant term and draft a feature list for annotation functionality. These seem like preliminary, pre-implementation activities. “

Personally, I believe this could be, in a nutshell, our plan for the next few months. This planning phase could (and perhaps should) also include another suggestion made by the same reviewer:

“- The environmental scan mentions MESA, but the project team should note that Renaissance scholars are also planning an ARC node to meet their needs named the Renaissance Knowledge Network (ReKN). The project team should be consulting closely with ReKN members and particularly with Ray Siemens at University of Victoria who is leading the initiative. Since the project team will be using Shared Canvas and OAC standards, interoperability with other resources and projects should be an essential part of a planning phase.”

These seem very specific and useful goals to be discussed among us, also in view of our planned meeting here at Brown, for which I’d like to propose the following dates: April 17-18, or May 1-2, 2015 (please, let me know at your earliest convenience which of these dates are preferable for you).

Of course, there might be other ideas (and other tools) to be taken into consideration and perhaps also tested in this planning phase, according to the specific goals of our various projects, and I invite you to propose them in response to this post (Dino for example has already posted some interesting suggestions for the next phase of the Pico project). The underlining question for me remains: what can the VHL provide that would make possible for us to meet our specific research goals and, in the process, allow us to develop our scholarly network to include also operative connections among our university libraries? I remain convinced that “l’unione fa la forza” (unity makes strength) and I consider this preliminary discussion essential also for a successful planning of our seminar/workshop in the Spring (more thoughts about this, shortly).

A Call for Papers that can be of interest for our projects

DH-CASE II: Collaborative Annotations in Shared Environments: metadata, tools and techniques in the Digital Humanities, will be held in conjunction with the DocEng 2014 conference.

I copy here a message sent by Patrick Schmitz

We invite submissions for DH-CASE II: Collaborative Annotations in Shared Environments: metadata, tools and techniques in the Digital Humanities, to be held in conjunction
with the ACM Document Engineering 2014 conference.
Digital Humanities is rapidly becoming a central part of humanities research, drawing upon  tools and approaches from Computer Science, Information Organization, and
Document Engineering to address the challenges of analyzing and annotating the growing number and range of corpora that support humanist scholarship
== Focus of workshop
From cuneiform tablets, ancient scrolls, and papyri, to contemporary letters, books, and manuscripts, corpora of interest to humanities scholars span the world’s cultures
and historic range. More and more documents are being transliterated, digitized, and made available for study with digital tools. Scholarship ranges from translation to
interpretation, from syntactic analysis to multi-corpus synthesis of patterns and  ideas. Underlying much of humanities scholarship is the activity of annotation.  Annotation of the “aboutness” of documents and entities ranges from linguistic markup,to structural and semantic relations, to subjective commentary; annotation of “activity”
around documents and entities includes scholarly workflows, analytic processes, and patterns of influence among a community of scholars. Sharable annotations and  collaborative environments support scholarly discourse, facilitating traditional practices and enabling new ones.
The focus of this workshop is on the tools and environments that support annotation, broadly defined, including modeling, authoring, analysis, publication and sharing.  We will explore shared challenges and differing approaches, seeking to identify emerging best practices, as well as those approaches that may have potential for wider application or influence.
== Call
We invite contributions related to the intersection of theory, design, and implementation, emphasizing a “big-picture” view of architectural, modeling and  integration approaches in digital humanities. Submissions are encouraged that discuss data and tool reuse, and that explore what the most successful levels are for reusing  the products of a digital humanities project (complete systems? APIs? plugins/modules? data models?). Submissions discussing an individual project should focus on these larger questions, rather than primarily reporting on the project’s activities. This workshop is a forum in which to consider the connections and influences between DH annotation tools and environments, and the tools and models used in other domains, that may provide new approaches to the challenges we face. It is also a locus for the discussion of emerging standards and practices such as OAC (Open Annotation Collaboration) and Linked Open Data in Libraries, Archives, and Museums (LODLAM).
== Submission procedures
Papers should be submitted at www.easychair.org/conferences/?conf=dhcase2014. An abstract of up to 400 words must be submitted by June 1st, and the deadline for full papers (6 to 8 pages) is June 8, 2014. Submissions will be reviewed by the program committee and selected external reviewers. Papers must follow the ACM SIG Proceedings format.
Up to three papers of exceptional quality/impact will be invited to submit an extended abstract (2-4 pages) for inclusion in the DocEng 2014 conference proceedings.
== Key dates:
June 1    Abstracts due (400 words max)
June 8    Full workshop papers due
June 30   Notification of acceptance to workshop. Up to 3 papers may be invited
           to submit extended abstracts
Sept. 16  Workshop
We look forward to seeing you in Ft. Collins!
Workshop Organizers: Patrick Schmitz, Laurie Pearce, Quinn Dombrowski

Distant and Close Reading

I must confess that I am still an affectionate adherent of “close” reading.  Nevertheless I am powerfully impressed by the results of “distant” reading.  So I won’t here engage in the mock battle between the two opposing parties, but rather suggest another possible way of creating “synergistically recursive interactions” between distant and close reading ( Katherine Hayles, How We Think, 2012, p. 31 ).  Aerial photography allows us to see unnoticed archeological sites, but then we have to dig…

The reason of this post is disclosed here at once:  I recently wrote a letter to Ernesto Priani about the Pico Project and when I congratulated Massimo Riva for the opening of this blog, I was prompted, as a reply, to publish my letter.  So let me try to contextualise it.  Ernesto and I had an exchange about annotation in the discussion that was started to prepare the new project proposal :  I was insisting on the use of linked data to bring to the fore intra-textual relations among terms, whereas Ernesto was stressing the need of pointing out inter-textual relations to other works, either sources or later works influenced by Pico.  We agreed that the two concerns could indeed be reconciled and lately I was reflecting on a practical approach, that I presented to Ernesto.

Before translating my letter a few explanatory remarks are here in order.  Pico’s 900 Theses or Conclusiones CM are a collection of statements of past philosophers of all schools, together with a collection of statements of his own, that all aim at confirming their possible overall concordance.  To identify the exact sources of the statements that Pico reports, is a daunting philological task.  But in the case of Thomas Aquinas there might be a chance…  so let me (more or less) translate what I wrote :

« In some cases—I am thinking of Thomas Aquinas—we have at our disposal the entire corpus online.  A sort of “non-consumptive reading” or “topic modelling” may be of help, I believe, in finding within Aquinas’ corpus the passages referred to by each one of the theses attributed to him by Pico in his Conclusiones.

« Finding the exact references through this form of “distant reading” may be very helpful indeed, for here lies a chance to satisfy both our requirements and to reconcile our two distinct points of view.  Singling out source references in this way can offer us a heuristic basis for a subsequent analysis and “close reading” of Pico’s text, aimed at a critical interpretation of his thought.

« In “topic modelling,” a “topic” is defined in the following way:  “A ‘topic’ consists of a cluster of words that frequently occur together” (cf. MALLET).  In relation to Pico’s theses and by referring to their sources, it seems then possible to individuate distinct “topics,” that may be used as an interpretative device to analyse Pico’s works, possibly enabling us to develop, in a bottom-up way, fragments of controlled vocabularies or ontologies, that can be assumed as a basis for a systematic annotation of his texts and the production of linked open data for their semantic enrichment.

« This is in short the idea I am proposing.  As a tool to individuate source references, instead of MALLET (see above), I would prefer the “word2vec” approach, that consists in a vector representation of words, based on their co-occurrence.  A very interesting feature of this method is that by adding or removing a term in a cluster of words, that we may choose to define a “topic,” the set of the passages referred to changes radically.  This method seems to me very interesting indeed, because it brings to mind the notion of ‘language games’ introduced by Wittgenstein, according to which the meaning of a term is defined by the set of its relations to all the other terms in a given game.  And I think that this particular aspect of the “word2vec” approach can bring about, along with significant results, also very important theoretical insights ».

Dino Buzzetti

 

Welcome/Benvenuti (Massimo Riva, Director, Virtual Humanities Lab @ Brown University)

This blog is a platform for discussing topics in the Digital Humanities, focusing on the implementation of an experimental framework for close collaboration of a worldwide network of scholars contributing to the Virtual Humanities Lab at Brown University and currently at work on the creation of significant digital resources for the study of various facets of  humanist culture.

In the age of data mining, “distant reading” and cultural analytics, we increasingly rely upon automated, algorithm-based procedures in order to parse the exponentially growing database of digitized textual and visual resources. Yet, within this deeply networked and massively interactive environment, it is crucial to preserve the “expert logic” of primary and secondary sources, expert opinions, textual stability, citations, and so on which forms the heritage and legacy of humanities scholarship. Scholarly collaboration cannot be limited to the developing of tools or the application of tools developed by others but must envision “a disciplined set of practices that problematizes methodology, tools and interpretation at the same time”  (Stefan Sinclair, Introduction: Correcting Methods).

We want to develop “strategies for Scholarsourcing” (D’Iorio-Barbera), as opposed to crowdsourcing, because we believe that comprehensive research protocols for open collaborative work would advance the agenda of networked communities of practice similar to the one envisioned here.