Egan2025e

"Why the world needs people who understand computing and creativity" by Gabriel Egan

In his celebrated talk at Cambridge University in May 1959, C. P. Snow lamented the fact that people in the humanities tend to be ignorant of even the basic facts of scientific knowledge. Snow was a scientist and a novelist, well placed to stand astride the arts and the sciences. What particularly annoyed him was the asymmetry. Scientists, he observed, tend to be quite literate and cultured, but cultured literary people are often innumerate and unable to understand even the basic language and ideas of science. The level of ignorance of science shown by people in the arts and humanities would, said Snow, be the equivalent of a scientist never having read any of the works of William Shakespeare.

Shakespeare is an interesting test case because although not everyone has studied him closely, almost everyone has some idea of of the generally agreed facts about him. [SLIDE] We all feel we know the basic facts. We know that from Shakespeare's mind came the words for about 38 plays. We know that Shakespeare invented a lot of the words we currently use (more than anybody else) We know that Shakespeare had an unusually large vocabulary We know that he was in many ways exceptional.

Actually, though, we don't know these things. They are myths. [SLIDE] The number of known Shakespeare plays is now 43 not 38, because we have found ones not known before. But Shakespeare did not write them on his own: about a third of all his plays were co-written with other dramatists of his time. [SLIDE] Shakespeare coined no new words at all. [SLIDE] Shakespeare was average in the size of his vocabulary; you almost certainly know more words that he did. [SLIDE] And in his inventiveness in choosing words, Shakespeare falls in the middle of pack of dramatists of his time: he was not at all exceptional.

How have we come to recently discover these things about Shakespeare? The answer is the use of computational methods that were not available to us before. I don't have time to talk you through all the recent discoveries so you'll have to trust me. (At the end I'll give some references if you want to follow this up.) The key thing is that computation gives us new knowledge about Shakespeare that no methods arising within the arts and humanities could give us. [SLIDE] As Samuel Johnson put it, "Nothing amuses more harmlessly than computation, and nothing is oftener applicable to real business or speculative inquiries. A thousand stories which the ignorant tell, and believe, die away at once, when the computist takes them in his gripe".

* * *

What should we do? Can the arts and the sciences speak to each other? You might at this point be expecting me to advocate for more interdisciplinary collaborations between arts and humanities scholars and science and technology scholars. You might, because I am an Arts and Humanities scholar, expect me to be observe that there are core principles and approaches that are common to computation and the Humanities, such as the fact that:

* computers are programmed in a language

* in computers, as in texts, context is all

* Boolean logic is like philosophical logic

These things are true. The great hero of twentieth-century linguistics, Noam Chomsky, is also a hero in Computer Science for his classification of the power of various kinds of grammars. The meaning of the same string of binary digits inside a computer is indeed contextual: in one place it might represent a number, in another a letter of the alphabet, and in another the opcode for an instruction. And first-order predicate logic is common to computing and philosophy.

[BLANK SLIDE] But these likenesses don't save those of us in the Humanities from the charge of innumeracy and scientific ignorance. Snow was essentially right. The situation isn't symmetrical. In general, scientists are already literate and cultured enough to work on arts and humanities problems. The reverse is not true. In general, Humanists do not know enough about quantitative and computational methods to even distinguish between a feasible investigation and an impossible one.

My suggestion is that we should teach more about quantitative and computational methods within our undergraduate Arts and Humanities provision. I am not referring to the kind of 'digital awareness' and 'digital literacy' courses that some universities, particularly in North America, are offering. These do not in my view meet the need. The way for Humanists to get a feel for what is possible and what is not in quantitative and computational methods is for them to get hands-on experience actually using the methods. And that means teaching Humanities studies how to do computer programming.

This is what the English Literature team does on the final-year module 'Textual Studies Using Computers'. To introduce students to how computers work and how to program them, modern computers -- even stripped-down ones such as the Raspberry Pi -- are much too complicated. Before one can use a Windows or Macintosh computer or a Raspberry Pi to teach a modern, simplified programming language such as Scratch, one has to 'boot' the computer to load an operating system of bewildering complexity. If a student asks "What is happening now?" as the machine boots, we have to answer "It is complicated, so let us not discuss it now: wait until we get to the simple Scratch environment". This, I submit, is an unsatisfactory answer. We have to just pretend not to notice all the complexity that is needed as a prior step to reaching the simple programming environment, the simplicity of which is entirely false since it depends on an underlying complexity that it exists to conceal.

[SLIDE] The solution to this problem is to use old, simple computers. We do what is called Minimal Computing. We use replicas of the world's first desktop computer, the Altair 8800 from 1975 and we use paper tape for mass storage. Paper tape makes it clear that digital text is always stored on a physical medium: there is nothing virtual about it. In other modern media the zeroes and ones are very small (say microscopic pits on a metal surface) and/or stored in a form such as magnetism that our eyes cannot detect. With paper-tape, the zeroes and ones can be read directly off the tape by the naked eye, and one of the first tasks that students are given is to manually decode the binary ASCII of a strip of paper-tape containing a famous quotation.

Once the students understand the essential architecture of the machine, they begin programming in the language BASIC. I am sometimes asked why we do not teach a 'real' programming language that the students can use elsewhere, such as C or Python. My defence of teaching BASIC is that, after trying everything else, it turns out to be the language that Humanities most readily take to. We're not trying to teach them skills that will get them jobs as computer programmers in today's market. Rather, we want them to quite quickly be able to implement simple algorithms of their own devising that manipulate language. We also want them to hit obstacles arising from limitations of hardware.

When one teaches computational stylistics in a lab of modern computers, students can go a very long way before they reach the limitations of the machines' capacity to store and process texts. Having written a program that counts the frequencies of all the words in Jane Austen's novel Persuasion it is trivial to rewrite the program to do the same for all of Austen’s novels, and then for all Charles Dickens's novels too. In the students’ minds, the problem of scale is, at least for a while, deferred by modern computers' prodigious memory and mass storage capacities. But the problem of scale will nonetheless always at some point return. If one tries to initialize a memory array to hold all the words of all the eighteenth- and nineteenth-century novels that are currently available in digital form -- totalling several billion words -- even the latest computers run out of memory space. An algorithm that works for a few novels will not work for hundreds of novels considered at once.

With its mere 64 kilobytes memory, the Altair 8800 computer hits this limitation of scale quite early in students' projects and they have to learn the methods for overcoming it. If they cannot fit even the whole of the novel Persuasion in the machine at one time, they have to learn how we analyse texts in smaller chunks, compiling the results for Chapter One before starting afresh on Chapter Two, and so on, collating all the results once the final chapter has been processed. That is, the use of Minimal Computing resources forces attention onto the inevitable problems of scale right from the start, and teaches students not to assume that everything can be solved by just adding more memory and mass storage. Instead, they learn to rethink their algorithms so that when scaled up to work on more texts they take longer to execute but do not employ substantially greater resources of memory or mass storage.

In other words, the principles of Minimal Computing are an excellent introduction, for those who wish to pursue the topic, to the principles of supercomputing. I realize that this is a large claim. Later this week the first meeting of the University's High Performance Computing user group takes place to plan the exploitation of the newly acquired supercomputing hardware. It is my intention that Humanities' applications will fully exploit this new opportunity. [SLIDE]