Egan2023f

"Is it okay that Shakespeare provides the model for scholarly digital editions of early modern plays?" by Gabriel Egan

The short answer is "Yes" [SLIDE], but since leaving the matter there would make for rather a short talk I will elaborate by adding some provisos [SLIDE]. Those provisos concern working with a commercial publisher to make printed and online editions, and working with Extensible Markup Language (XML) conforming to the Text Encoding Initiative (TEI) standard.

The question arises for me because I am a general editor of the New Oxford Shakespeare edition [SLIDE], which when we finish it will consist of six volumes. There are two volumes to go, called the Complete Alternative Versions, and I persuaded my fellow general editors Gary Taylor and Terri Bourus, and the publisher, Oxford University Press, that we should create them from scratch in Extensible Markup Language conforming to the Text Encoding Initiative standard. The Press had up to that point been requiring the editors to submit copy in the form of Microsoft Word documents marked up with an arcane set of symbols that made the start of the first edition of Hamlet look like this:

<D Enter two Centinels.>
+<S1 1.><S2 First Sentinel> <IC>S</IC>Tand: who is that?
=<S1 2.><S2 Second Sentinel> Tis I.
<S1 1.><S2 First Sentinel> O you come most carefully vpon your watch,
<S1 2.><S2 Second Sentinel> And if you meete {Marcellus} and {Horatio},
The partners of my watch, bid them make haste.
$<S1 1.><S2 First Sentinel> I will: See who goes there.

This, I was told, was known as JML or Jowett Markup Language and I still do not know if they were pulling my leg about that name. [SLIDE] Notice that it uses an old convention that the oldest among you might recognize from the COCOA tag set developed in the 1960s in which a pair of pointy brackets encloses the text to be marked up, as with the opening stage direction "Enter two Centinels". It also uses the later convention from Standard Generalized Markup Language (SGML) of a pair of pointy brackets forming an opening tag, as we call it, in front of the text to be marked up and a second point of pointy brackets forming the closing tag after the text. Additionally, whole type lines are marked up by having as the first character a plus sign or an equals character (indicating the first and last parts of a split verse line) or a dollar sign (indicating an amphibious verse line).

[BLANK SLIDE] Aside from the barbarity of inflicting this Frankenstein markup language on junior editors, the main drawback of JML was that the software creating the edition was simply Microsoft Word. Thus no validation of the encoding could be done by the editor as she worked. If she placed her symbols in the wrong places nothing alerted her to this error, and even if the codes were in the right places there was no way to see the effect of the codes until the publisher received the encoding file, processed it to produce proofs, and sent those back to the editor. It was in order that the editor could validate her work as she edited and produce a local proof to check out the visual effect of her encoding that I pressed for the project and the publisher to switch to TEI XML. My proposal, which was accepted, was for the editors to do their work using the Oxygen XML Editor and the Press to create an Extensible Stylesheet Language Transformation (XSLT) by which the editor could, on her own computer, turn the draft edition into HTML resembling the final edition, for local proofing. The Press would also accept the final copy in XML form for ingestion to its production system for print and online publication.

Of course, there isn't just one standard called TEI XML that we can apply to the editing of early modern plays. There are many choices to be made within the TEI specification, and perhaps the most consequential is the means by which the textual apparatus is created. For those familiar with this part of the TEI specification, I will say that we chose the Parallel Segmentation Method over the Double End-Point Method for identifying lemmas and their alternative readings. Our prime consideration was that the Parallel Segmentation Method seems the easier for the XML beginner to understand and learn. But it imposes some limitations that I will return to.

Oxford University Press has invested in the making of the New Oxford Shakespeare Complete Alternative Versions in this way, not least in paying for a software developer to write the complex Extensible Stylesheet Language Transformation that turns the editor's draft XML into HTML for local proofing. Because the edition will print original-spelling and modern-spelling versions of each work we need to encode four kinds of note. The original-spelling texts will have textual notes detailing readings in textual witnesses other than the base text and lineation notes indicating editorial departures from the lineation of the base text. The modern-spelling texts will have glossarial notes at the bottom of the page and performance notes about theatrical choices in the right margin. Getting these to appear correctly in the local proofs required a lot of work from the XSLT developer.

A many of you know, the Oxford Marlowe: Collected Works is a new project general edited by our hosts Rory Loughnane, Catherine Richardson, and Sarah Dustagheer. Along with Laura Estill I have the honour of being a Digital Advisor to the project. For this ambitious new Marlowe edition, Oxford University Press understandably wanted to take advantage of the effort and money it had put into developing a digital workflow for the completion of the New Oxford Shakespeare. It has been agreed that the editors will, from the outset, work in TEI XML using the same elements and attributes and workflow as have been developed for the New Oxford Shakespeare. Hence my title about whether this is okay.

The editorial notes for the Marlowe edition will, if anything, be more complex still than those for Shakespeare. The original-spelling Marlowe will have textual notes and lineation notes as in the original-spelling Shakespeare, but the modern-spelling Marlowe will have three layers of notes at the bottom of the page -- glossarial notes, textual notes, and lineation notes -- and a layer of reception notes (including performance notes) in the right margin.

The TEI's Parallel Segmentation Method of creating an editorial apparatus relies on first identifying a span of text in the edition being presented to the reader, the lemma, that the editor wishes to make a note about, and then accreting around this lemma the encoding that creates the note. Quite often an editor wants to make multiple notes of different kinds about the same span of text. Take this line from Marlowe's play Edward II [SLIDE]:

<l>My men, like satyrs grazing on the lawns</l>

When we have applied the encoding for the four layers of notes that appear on the Marlowe modern-spelling page this line can become rather buried by the encoding. If we want to record the fact that the reading "grazing" comes from the authority we are calling JONES and that another authority we call MS has the reading "gasing" at this point, the encoding is [SLIDE]:

<l>My men, like satyrs
<app type="textual">
<lem wit="#JONES">grazing</lem>
<rdg wit="#MS">gasing</rdg>
</app> on the lawns </l>

If we want to provide a commentary note explaining that satyrs are woodland gods, the additional encoding is this [SLIDE]:

<l>My men, like <app type="commentary">
<lem>satyrs</lem>
<note>woodland gods</note>
</app>
<app type="textual">
<lem wit="#JONES">grazing</lem>
<rdg wit="#MS">gasing</rdg>
</app> on the lawns </l>

If we want to provide a reception note recording that in his film of the play Derek Jarman made a particular visual choice to accompany the words "like satyrs", the additional encoding is this [SLIDE]:

<l>My men, <app type="reception">
    <lem>like <app type="commentary">
        <lem>satyrs</lem>
        <note>woodland gods</note>
      </app>
    </lem>
    <note>In Jarman's film ...</note>
</app>
<app type="textual">
    <lem wit="#JONES">grazing</lem>
    <rdg wit="#MS">gasing</rdg>
</app> on the lawns </l>

Finally, if we want to provide a lineation note recoding that another edition broke the line after "satyrs", the additional encoding is [SLIDE]:

<app type="lineation">
    <lem wit="#JONES>
        <l>My men, <app type="reception">
            <lem>like <app type="commentary">
    <lem>satyrs</lem>
    <note>woodland gods</note>
</app>
        </lem>
        <note>In Jarman's film ...</note>
    </app>
    <app type="textual">
        <lem wit="#JONES">grazing</lem>
        <rdg wit="#MS">gasing</rdg>
    </app> on the lawns </l>
    </lem>
    <rdg wit="#SMITH">2 lines: satyrs|</rdg>
</app>

This, I submit, is complex but acceptably complex. So long as editors are applying each layer of encoding one at a time -- if do all their textual notes first, then go back and do all their lineation notes, then their glossarial and reception notes -- I think that editors who are early modern drama experts first and technical experts second can cope with this level of accreted markup. But I think this is at the limit of what we can expect editors to achieve in encoding XML, as it were, by hand.

Because we have chosen the Parallel Segmentation Method rather than the Double End-Point Method we have imposed certain limitations upon ourselves. Principally, each new layer of encoding for a different kind of editorial note has to be contained with an existing layer or else must wrap itself around an existing layer [SLIDE]. This is the famous Russian Doll principle of XML: no element may sprawl across the boundaries of another element. In our case this means that lineation notes, for instance, can govern only a whole line or a group of lines and that likewise textual, commentary, and reception notes must discuss either a fragment of a line or a whole line or a group of lines; their subject cannot be the last words on one line and the first words on the next. I will come back to this point about the Russian doll principle shortly [BLANK SLIDE].

As a model for editing early modern literature using digital markup, the work of Shakespeare has several obvious advantages. His canon of around one million words is unusually large and reasonably varied in styles. He wrote plays in all the familiar genres of comedy, history, tragedy, and tragicomedy, using verse and prose. He wrote short poems, his sonnets, and long narrative poems, his Venus and Adonis and Lucrece. His works come down to us primarily in the form of printed books, but we have one whole manuscript play, Sir Thomas More. A substantial proportion of the works (about one third) were written collaboratively with other authors and/or were adapted by others before our earliest or most-reliable textual witness was created, so we must edit these as inherently co-authored texts. For a model of just how editors should approach this problem, I refer you to Rory Loughnane's essay "Re-editing non-Shakespeare for the Modern Reader" in the Review of English Studies in 2017 (Loughnane 2017). Admittedly, we do not have any examples of Shakespeare's creative writing in the form of prose, but then prose is by some way the easiest textual form to encode in any markup language.

Most importantly, much of our rich inheritance of textual theory and best practice was developed and honed for use with the works of Shakespeare. The principles of the twentieth-century New Bibliography were largely developed with Shakespeare as the model author. I want now to spend a few minutes illustrating just how exceptionally forward thinking was W. W. Greg, the founding father of the New Bibliography, in his conception of the key aspects of textuality. Greg grasped the essentially tree-like structure of the textual materials he was dealing with, and the fact we can encode this structure in multiple forms to suit our varied purposes.

Greg's book The Calculus of Variants of 1927 is a difficult read, and a difficult book present two cognitive challenges not one. The first is just making sense of it, of course, but the second is avoiding a cognitive bias that arises from having succeeded in making sense of it. Almost 40 years ago, Stanley Wells made an astute comment about this cognitive bias in the context of an exceptionally ingenious textual hypothesis devised by John Dover Wilson regarding a crux in Love's Labour's Lost. Wells wrote that "Wilson's solution . . . requires so much effort to understand that a reader who masters it is predisposed to reward himself with belief" (Wells et al. 1987, 271). In the brief account of Greg's achievement that follows, I have tried not to fall into this trap.

In The Calculus of Variants, Greg draw family trees, stemmata, to show the relationships between the various textual witnesses to a work for cases where one witness was made by copying one of the others. Greg understood that such trees can be represented in multiple ways that all convey the same information [SLIDE]. Here is the nesting of apparatus elements from our encoding of one line of Marlowe's Edward II. [SLIDE x 7]. It is a tree structure that we can also represent like this [SLIDE]. These two diagrams convey the same information about the relationships. [SLIDE] And a third way to show these relationships is with one of the diagrams that John Venn used to communicate set theory [SLIDE]. I will give you a moment to convince yourselves that these three pictures show the same relationship: that A has two children called B and C who are therefore siblings and that B has a child called D.

Greg was aware that at the heart of what he was trying to do with ancestral relationships of textual witnesses was a branch of set theory, as developed to its highest form in that other nearly unreadable book of this era, Bertrand Russell and Alfred North Whitehead's Principia Mathematica [SLIDE]:

The whole matter is, of course, at bottom one of formal logic, and the necessary foundations are fully set forth by Russell and Whitehead in those sections of Principia Mathematica which deal with the ancestral relation (*R: see Pt. II, Sect. E, *90-*97, in Vol. i; also Introd. sect. vii and Appx. B in the second edition). No doubt, most of what is significant in the present essay could be expressed in their symbolism by anyone sufficiently trained to its use. (Greg 1927, v)

We can show that Greg was right: the various symbolisms for set theory are equivalent and we can move between them. Indeed. the notation using formulas to represent family trees that Greg developed in The Calculus of Variants is, essentially, XML. [SLIDE] Here is Greg's drawing of the relationships between a set of textual witnesses, with his equivalent formula notation written above the tree. [SLIDE] Here is how we would encode the same tree in XML. In quoting XML, we generally use indentation from the left margin to distinguish children from their parents and add a line break after each element. These conventions exist only to make the code more readable by humans: they make no difference to the meaning. [SLIDE] If we flatten our XML by removing the indentations and line breaks we find that Greg's notation is the same as the XML. Greg's system is, I think, the earliest application to textual matters of the insight by the Victorian mathematician Arthur Cayley (1821-1895) that formulas and trees are interchangeable representations of the same structures (Cayley 1857).

In The Calculus of Variants, Greg was of course referring to textual witnesses existing in relationships that are like family trees. The use of genealogical trees to represent how textual documents are related one to another by processes of copying in transmission was popularized (although not invented) by the German philologist Karl Lachmann, 1793-1851 (Kenney 1974). It was not until the 1960s that a research group at the computer company International Business Machines (IBM) realized the usefulness to digital representation of treating individual texts as also internally structured as trees. The result was the Standard Generalized Markup Language (SGML) that preceded XML. The internal hierarchical features common to a set of documents -- such as a play being made of a given number of acts, each made of one or more scenes, each made of one or more speeches, each made of one or more verse or prose lines -- can also abstracted and expressed separately as a tree structure. Thus the internal relationships within textual witnesses are essentially of the same kind as the external relationships between textual witnesses. Both are tree structures. And both can be represented by drawings that actually look like trees (upside down ones) or by the entirely equivalent formulas used by Greg and used by XML employing only the letters and punctuation found on a typewriter keyboard [BLANK SLIDE].

To sum up this part of my talk: leaving aside all other considerations, if anyone is going to learn encoding of literary texts they should learn a rigorous and scientifically valid system such as XML, not a proprietary and intellectually incoherent system such as Oxford University Press's JML that we started with. This is not a criticism of publishers. Indeed, we have to work within the limitations of what publishers can practically do if we are to produce editions such as the New Oxford Shakespeare and Oxford Marlowe: Collected Works. The purpose of encoding in these projects is to make a complete-works edition that fits into an existing tradition of such editions. We do not have the luxury of encoding in our XML every feature that we think someone might in the future be interested in. Instead we encode only what we want to have visually distinguished in the final print and online outputs.

I mention this last point because my experience on another XML project that has just finished -- the online 'Transforming Middlemarch' edition of the 1994 BBC television adaptation of George Eliot's novel -- is that the temptation to encode everything that you know how to encode, because you have read the TEI guidelines carefully, can be almost overwhelming. I would be interested to hear from others who have worked on large TEI encoding projects if that has been their experience. I follow the email technical discussion list for the TEI consortium and I am often amazed at the complexities of encoding that other projects are doing. I cannot fathom the complexities of the Extensible Stylesheet Language Transformation that would be needed to extract the information in this encoding and turn it into something a reader might want to see.

I do not say that other projects are over-encoding, only that I see merit in keeping the encoding as light and hence as human-readable as possible. The temptation of over-encode that I have witnessed apparently goes back to the beginnings of the TEI, when it was still using SGML instead of XML. Describing this period in the late 1990s, Peter Robinson recalls a wilful avoidance of practicality and the aim being to capture the very essence of a text [SLIDE]:

A constant refrain of TEl meetings was "that [regardless of what it was] is an implementation matter." This trump card was played at moments when the conversation was verging dangerously upon the practical. The implication was the following: we must not allow what is practical to impose upon the purity of our theory. It also happened to be a very safe gambit, as there was little chance of whatever was being discussed actually being implemented. (Robinson 2010, 139-40)

I counsel a very practical approach to TEI in which the purity of theory is indeed sacrificed. The intellectually pure solutions to some of the limitations arising from the Oxford University Press approach to TEI are ones that, I fear, merely mortal textual scholars would have trouble applying and that the work of writing and maintaining Extensible Stylesheet Language Transformations to cope with them might be an uneconomical proposition for commercial publishers.

I will end, then, with the provisos promised at my beginning. [SLIDE] It is okay to use a model of TEI XML devised for Shakespeare on other early modern dramatists so long as:

:: it's for a commercial publisher with an investment in XML/XSLT it wants to build on (by reusing code)

:: you can live with the limitations imposed by the choices you made when Shakespeare was your model

:: you don't care if your XML is theoretically impure and might never see the light of day when it is finished

_____________

Q&A

Examples of TEI that seem to me to go too far: the Folder Digital Editions that use an unique identifier for each of the one million token in the canon.

Example of encoding that I wish were made available to the reader: the 1989 Electronic Edition of the 1986-87 Wells & Taylor William Shakspeare: The Complete Works for Oxford University Press included COCOA tags showing when the editors thought there was a change of author and when they thought that three lines of verse were amphibuous (so that the first two might make a single complete line of verse just as readily as the second and third line might, so they could not decide which to join up with which).

Cayley, Arthur. 1857. "On the Theory of Analytical Forms Called Trees." Philosophical Magazine. 4th 13. 172-76.

Greg, W. W. 1927. The Calculus of Variants: An Essay on Textual Criticism. Oxford. Clarendon Press.

Kenney, E. J. 1974. The Classical Text: Aspects of Editing in the Age of the Printed Book. Sather Classical Lectures. 44. Berkeley CA. University of California Press.

Loughnane, Rory. 2017. "Re-editing Non-Shakespeare for the Modern Reader: The Murder of Mutius in Titus Andronicus." Review of English Studies. 68 New Series. 268-95.

Robinson, Peter. 2010. "How we Been Publishing the Wrong Way, and How we Might Publish a Better Way." Electronic Publishing: Politics and Pragmatics. Edited by Gabriel Egan. New Technologies in Medieval and Renaissance Studies. Toronto. Medieval and Renaissance Texts and Studies (MRTS) and ITER. 139-55.

Wells, Stanley, Gary Taylor, John Jowett and William Montgomery. 1987. William Shakespeare: A Textual Companion. Oxford. Oxford University Press.