Digital Byzantinist – Ruminations of a Byzantine geek

17 December 2015

The cultural becomes digital: what now? Reflections from the technological humanities

What follows is the English version of an article I was asked to write for Unipress, the magazine of the University of Bern, for a themed issue on “digital realities”. It appears in a German version in this month’s issue.

Computers regulate our lives. We increasingly keep our personal and cultural memories in “the cloud”, on services such as Facebook andnstagram. We rely on algorithms to tell us what we might like to buy, or see. Even our cars can now detect when their drivers are getting tired, and make the inviting suggestion that we pull off the road for a cup of coffee.

But there is a great irony that lies at the heart of the digital age, which is this: the invention of the computer was the final nail in the coffin of the great dream of Leibniz, the seventeenth-century polymath. The development of scientific principles that took place in his era of the Enlightenment gave rise to a widely shared belief that humans were a sort of extraordinarily complex biological mechanism, a rational machine. Leibniz himself had a firm belief that it should be possible to come up with a symbolic logic for all human thought and a calculus to manipulate it. He envisioned that the princes and judges of his future would be able to use this “universal characteristic” to calculate the true and just answer to any question that presented itself, whether in scientific discourse or in a dispute between neighbors. Moreover, he firmly believed that there was nothing that would be impossible to calculate – no Unknowable.

Over the course of next 250 years, a language for symbolic logic, known today as Boolean logic, was developed and proven to be complete. Leibniz’ question was refined: is there a way, in fact, to prove anything at all? To answer any question? More precisely, given a set of starting premises and a proposed conclusion, is there a way to know whether the conclusion can be either reached or disproven from those premises? This challenge, posed by David Hilbert in 1920, became known as the Entscheidungsproblem.

After Kurt Gödel demonstrated the impossibility of answering the question in 1930, Alan Turing in 1936 proved the positive existence of insoluble problems. He did this by imagining a sort of conceptual machine, with which both a set of mathematical operations and its starting input could be encoded, and he showed that there exist combinations of operation and input that would cause the machine to run forever, never finding a solution.

Conceptually, this is a computer! Turing’s thought experiment was meant to prove the existence of unsolvable problems, but he was so taken with the problems that could be solved with his “machine” that he wanted to build a real one. Opportunities presented themselves during and immediately after WWII for Turing to build his machines, specifically Enigma decryption engines, and computers were rapidly developed in the post-war environment even as Turing’s role in conceiving of them was forgotten for decades. And although Turing had definitively proved the existence of the Unknowable, he remained convinced until the end of his life that a Turing machine should be able to solve any problem that a human can solve–that it should be possible to build a machine complex enough to replicate all the functions of the human brain.

Another way to state Turing’s dilemma is: he proved that there exist unsolvable problems. But does human reasoning and intuition have the capacity to solve problems that a machine cannot? Turing did not believe so, and he spent the rest of his life pursuing, in one way or another, a calculating machine complex enough to rival the human brain. And this leads us straight to the root of the insecurity, hostility even, that finds its expression in most developed cultures toward automata and computers in particular. If all human thought can be expressed via symbolic logic, does it mean that humans have no special purpose beyond computers?

Into this minefield comes the discipline known today as the Digital Humanities. The early pioneers of the field, known until the early 2000s as “Humanities Computing”, were not too concerned with the question – computers were useful calculating devices, but they themselves remained firmly in charge of interpretation of the results. But as the technology that the field’s practitioners used developed against a cultural background of increasingly pervasive technological transformation, a cultural clash between the “makers” and the “critics” within Humanities Computing was inevitable.

Digital Humanities is, more than usually for the academic disciplines of the humanities, concerned with making things. This is “practice-based” research – the scholar comes up with an idea, writes some code to implement it, decides whether it “works”, and draws conclusions from that. And so Digital Humanities has something of a hacker culture – playful, even arrogant or hubristic sometimes – trying things just to see what happens, building systems to find out whether they work. This is the very opposite of theoretical critique, which has been the cornerstone of what many scholars within the humanities perceive as their specialty. Some of these critics perceive the “hacking” as necessarily being a flight from theory – if Digital Humanities practitioners are making programs, they are not critiquing or theorizing, and their work is thus flawed.

Yet these critics tend to underestimate the critical or theoretical sophistication of those who do computing. Most Digital Humanities scholars are very well aware of the limitations of what we do, and of the fact that our science is provisional and contingent. Nevertheless, we are often guilty of failing to communicate these contingencies when we announce our results, and our critics are just as often guilty of a certain deafness when we do mention them.

A good example of how this dynamic functions can be seen with ‘distant reading’. A scholar named Franco Moretti pointed out in the early 2000s that the literary “canon” – those works that come to mind when you think of, for example, nineteenth-century German literature – is actually very small. It consists of the things you read in school, those works that survived the test of time to be used and quoted and reshaped in later eras. But that’s a very small subset of the works of German literature that was produced in the 19^th century! Our “canon” is by its very nature unrepresentative. But it has to be, since a human cannot possibly read everything that was published in 100 years.

So can there be such a thing as reading books on a large scale, with computers? Moretti and others have tried this, and it is called distant reading. Rather than personally absorbing all these works, he has it all digitized and on hand so that comparative statistical analysis can be done, patterns in the canon can be sought against the entire background against which it was written.

As a result we now have two models of ‘reading’. One says that human interpretation should be the starting point in our humanistic investigations; the other says that human interpretation should be the end point, delayed as long as possible while we use machines to identify and highlight patterns. Once that’s done, we can make sense of the patterns.

And so what we digital humanities practitioners, the makers, tend toward is a sort of hybrid model between human interpretation and machine analysis. In literature, this means techniques such as distant reading; in history, it might mean social network analysis of digitized charters or a time-lapse map of trading routes based on shipping logs. The ultimate question that our field faces is this: can we make out of this hybrid, out of this interaction between the digital and the interpretative, something wholly new?

The next great frontier in Digital Humanities will be exactly this: whether we can extend our computational models to handle the ambiguity, uncertainty, and interpretation that is ubiquitous in the humanities. Up to now everything in computer science has been based on 1 and 0, true and false. These two extraordinarily simple building blocks have enabled us to create machines and algorithms of stunning sophistication, but there is no “maybe” in binary logic, no “I believe”. Computer scientists and humanists are working together to try to bridge that gap; to succeed will produce true thinking machines.

29 November 2015

A round table un-discussion, part 1

In September I had the privilege of sitting on a round table at the Digital Humanities Autumn School at the University of Trier, along with Manfred Thaller, Richard Coyne, Claudine Moulin, Andreas Fickers, Susan Schreibman, and Fabio Ciotti. Per standard procedure we were given a list of discussion questions to think about in advance; as with any such discussion panel, especially when the audience weighs in, the conversation did not follow the prescribed plan.

Lively and worthwhile as the round table was, it would still be a pity for my preparation to go entirely to waste. Richard published his own reactions to the questions beforehand; I’ll join him by presenting mine here, in retrospect. This turns out to be a rather longer piece than I initially envisioned, and so here I present the first in what will be a five-part series.

1. The field of digital humanities is characterized by a great variety of approaches, methods and disciplinary traditions. Do you see this openness and dynamic as a chance or even necessary condition of the field or do you think that this endangers its institutionalization in an academic environment which is still dominated by disciplines? Does it make sense to make distinctions between specific disciplinary approaches in DH, for example digital history, digital linguistics, or digital geography?

If we leave the word “digital” out of the first sentence, it holds perfectly true – the humanities are heterogeneous. Thus I don’t think it is really any surprise that the digital humanities would be the same. Computational methods have developed fairly independently in the different disciplines, which does make specific disciplinary approaches rather inevitable – there are digital methods used primarily by linguists, that have been developed and shaped by the sorts of questions linguists ask themselves. These questions are very different from the sorts of questions that art historians ask themselves, and so art history will have a different set of tools and different methods for their application. And so we might perhaps better ask: does it makes sense to not make distinctions between specific disciplinary approaches in DH?

That said, the answer to this question is not so clear-cut as all that. In practice, very many people inside and outside the field think first of methods for text processing – encoding, tagging and markup – when they hear the words “digital humanities”. And so as a community, digital humanities practitioners tend to be extremely (though not exclusively, of course) text-oriented. Some of these conceive of the elements of DH pedagogy in terms of a specific set of tools; these usually include the XML infrastructure and TEI encoding for different text-oriented problem domains, as well as analysis of large (usually plain-text) corpora, which can include a combination of natural-language processing tools and statistical text-mining tools. That said, while text may be the Alpha of DH it is no longer the Omega – in the last five to ten years the archaeological and historical sciences have brought methods and techniques for mapping, timeline representation, and network analysis firmly onto the scene. One nevertheless retains the impression of Digital Humanities as a grab bag of skills and techniques to be taught to a group of master’s students, knowing that the students will perhaps apply 20-40% of the learned skills to their own independent work, but that the 20-40% will be different for each student.

So then, can digital humanities really be called a field or a discipline? It’s such a good question that it comes up again and again, and many people have attempted answers. The answer that has perhaps found the most consensus goes something like this: digital humanities is about method, and specifically about how we bring our humanistic enquiry to (or indeed into) the domain of computationally-tractable algorithms and data. That question of modeling would seem to be the common thread that unites the digital work going on in the different branches of the humanities, and it brings up in its turn questions of epistemology, sociology of academia, and science and technology studies.

What bothers me about this answer is that it gives us two choices, neither of which are entirely satisfactory: DH is either an auxiliary science (Hilfswissenschaft, if you speak German) or a meta-field whose object of study is the phenomenon of humanities research. The former is difficult to justify as an independent academic discipline with degree programs; the latter is much easier to justify, but appeals to something like 1% of those who consider themselves DH practitioners. I haven’t come up with an answer that I deem satisfactory, that ties a majority of the practitioners to a coherent set of research agendas.

In that case, a reader might reasonably ask, what is it that I am trying to accomplish, that fits under the “Digital Humanities” rubric? To answer that question, I have to say a little about who I am. I am extremely fortunate to be of the generation best-placed to really understand computers and what they can do: young enough that personal computers were a feature of my childhood, but old enough to be there for the evolution of these computers from rather simplistic and ‘dumb’ systems to extremely sophisticated ones, and to remember what it was like to use a computer before operating system developers made any effort to hide or elide what goes on “under the hood”. This means that I have a fluent sense of what computers can be made to do, and how thos things are accomplished, that I have been able to gain gradually over thirty years. In comparison, I began post-graduate study of the humanities twelve years ago.

So my work in the digital humanities so far has been a process of seeing how much of my own humanistic enquiries, and the evidence I have gathered in their pursuit, can be supported, eased, and communicated with the computer. It meant computer-assisted collation of texts, when I began to work on a critical edition. It has meant a source-code implementation of what I believe a stemma actually is, and how variation in a text is treated by philologists, as I have come to work on medieval stemmatology. It has recently begun to mean a graph-based computational model of the sorts of information that a historical text is likely to contain, and how those pieces of information relate to each other. And so on. Nowhere in this am I especially concerned with encoding, standards, or data formats, although from time to time I employ all three to get my work done. Rather, I rely on the computer to capture my hypotheses and my insights, and so I find myself needing to express these hypotheses and insights in as rigorous and queryable a way as possible, so as not to simply lose track of them. Critics might say (indeed, have said) that my idea of “digital humanities” is a glorified note-taking system; those critics may as well call Facebook a glorified family newsletter. Rather (and for all the sentiment that DH is first and foremost about collaboration) the computer allows an individual researcher like myself to track, and ingest, and retain, and make sense of, and feel secure in the knowledge that I will not forget, far more information than I could ever deal with alone. Almost as a side effect, it allows me in the long run to present not just the polished rhetoric appearing in some journal or monograph that is the usual output of a scholar in the humanities, but also a full accounting of the assumptions and inferences that produced the rhetoric. That, for me, is what digital humanities is about.

28 April 20153 April 2016

Tools for digital philology: Transcription

In the last few months I’ve had the opportunity to revisit all of the decisions I made in 2007 and 2008 about how I transcribe my manuscripts. In this post I’ll talk about why I make full transcriptions in the first place, the system I devised six years ago, my migration to T-PEN, and a Python tool I’ve written to convert my T-PEN data into usable TEI XML.

Transcription vs. collation

When I make a critical edition of a text, I start with a full transcription of the manuscripts that I’m working with, fairly secure in the knowledge that I’ll be able to get the computer to do 99% of the work of collating them. There are plenty of textual scholars out there who will regard me as crazy for this. Transcribing a manuscript is a lot of work, after all, and wouldn’t it just be faster to do the collation myself in the first place? But my instinctive answer has always been no, and I’ll begin this post by trying to explain why.

When I transcribe my manuscripts, I’m working with a plain-text copy of the text that was made via OCR of the most recent (in this case, 117-year-old) printed edition. So in a sense the transcription I do is itself a collation against that edition text – I make a file copy of the text and begin to follow along it with reference to the printed edition, and wherever the manuscript varies, I make a change in the file copy. At the same time, I can add notation for where line breaks, page divisions, scribal corrections, chapter or section markings, catchwords, colored ink, etc. all occur in the manuscript. By the end of this process, which is in principle no different from what I would be doing if I were constructing a manual collation, I have a reasonably faithful transcription of the manuscript I started with.

But there are two things about this process that make it, in my view, simpler and faster than constructing that collation. The first is the act I’m performing on the computer, and the second is the number of simultaneous comparisons and decisions I have to make at each point in the process. When I transcribe I’m correcting a single text copy, typing in my changes and moving on, in a lines-and-paragraphs format that is pretty similar to the text I’m looking at. The physical process is fairly similar to copy-editing. If I were collating, I would be working – most probably – in a spreadsheet program, trying to follow the base text word-by-word in a single column and the manuscript in its paragraphs, which are two very different shapes for text. Wherever the text diverged, I would first have to make a decision about whether to record it (that costs mental energy), then have to locate the correct cell to record the difference (that costs both mental energy and time spent switching from keyboard to mouse entry), and then deciding exactly how to record the change in the appropriate cell (switching back from mouse to keyboard), thinking also about how it coordinates with any parallel variants in manuscripts already collated. Quite frankly, when I think about doing work like that I not only get a headache, but my tendinitis-prone hands also start aching in sympathy.

Making a transcription

So for my own editorial work I am committed to the path of making transcriptions now and comparing them later. I was introduced to the TEI for this purpose many years ago, and conceptually it suits my transcription needs. XML, however, is not a great format for writing out by hand for anyone, and if I were to try, the transcription process would quickly become as slow and painful as I have just described the process of manual collation as being.

As part of my Ph.D. work I solved this problem by creating a sort of markup pidgin, in which I used single-character symbols to represent the XML tags I wanted to use. The result was that, when I had a manuscript line like this one:

whose plaintext transcription is this:

Եւ յայնժամ սուրբ հայրապետն պետրոս և իշխանքն ելին առ աշոտ. և

and whose XML might look something like this:

<lb/><hi rend="red">Ե</hi>ւ յայնժ<ex>ա</ex>մ ս<ex>ուր</ex>բ 
հ<ex>ա</ex>յր<ex>ա</ex>պ<ex>ե</ex>տն պետրոս և 
իշխ<ex>ա</ex>նքն ելին առ աշոտ. և

I typed this into my text editor

*(red)Ե*ւ յայնժ\ա\մ ս\ուր\բ հ\ա\յր\ա\պ\ե\տն պետրոս և իշխ\ա\նքն 
ելին առ աշոտ. և

and let a script do the work of turning that into full-fledged XML. The system was effective, and had the advantage that the text was rather easier to compare with the manuscript image than full XML would be, but it was not particularly user-friendly – I had to have all my symbols and their tag mappings memorized, I had to make sure that my symbols were well-balanced, and I often ran into situations (e.g. any tag that spanned more than one line) where my script was not quite able to produce the right result. Still, it worked well enough, I know at least one person who was actually willing to use it for her own work, and I even wrote an online tool to do the conversion and highlight any probable errors that could be detected.

My current solution

Last October I was at a collation workshop in Münster, where I saw a presentation by Alison Walker about T-PEN, an online tool for manuscript transcription. Now I’ve known about T-PEN since 2010, and had done a tiny bit of experimental work with it when it was released, but had not really thought much about it since. During that meeting I fired up T-PEN for the first time in years, really, and started working on some manuscript transcription, and actually it was kind of fun!

What T-PEN does is to take the manuscript images you have, find the individual lines of text, and then let you do the transcription line-by-line directly into the browser. The interface looks like this (click for a full-size version):

which makes it just about the ideal transcription environment from a user-interface perspective. You would have to try very hard to inadvertently skip a line; your eyes don’t have to travel very far to get between the manuscript image and the text rendition; when it’s finished, you have not only the text but also the information you need to link the text to the image for later presentation.

The line recognition is not perfect, in my experience, but it is often pretty good, and the user is free to correct the results. It is pretty important to have good images to work with – cropped to include only the pages themselves, rotated and perhaps de-skewed so that the lines are straight, and with good contrast. I have had the good fortune this term to have an intern, and we have been using ImageMagick to do the manuscript image preparation as efficiently as we can. It may be possible to do this fully automatically – I think that OCR software like FineReader has similar functionality – but so far I have not looked seriously into the possibility.

T-PEN does not actively support TEI markup, or any other sort of markup. What it does offer is the ability to define buttons (accessible by clicking the ‘XML Tags’ button underneath the transcription box) that will apply a certain tag to any portion of text you choose. I have defined the TEI tags I use most frequently in my transcriptions, and using them is fairly straightforward.

Getting data back out

There are a few listed options for exporting a transcription done in T-PEN. I found that none of them were quite satisfactory for my purpose, which was to turn the transcription I’d made automatically into TEI XML, so that I can do other things with it. One of the developers on the project, Patrick Cuba, who has been very helpful in answering all the queries I’ve had so far, pointed out to me the (so far undocumented) possibility of downloading the raw transcription data – stored on their system using the Shared Canvas standard – in JSON format. Once I had that it was the work of a few hours to write a Python module that will convert the JSON transcription data into valid TEI XML, and will also tokenize valid TEI XML for use with a collation tool such as CollateX.

The tpen2tei module isn’t quite in a state where I’m willing to release it to PyPI. For starters, most of the tests are still stubs; also, I suspect that I should be using an event-based parser for the word tokenization, rather than the DOM parser I’m using now. Still, it’s on Github and there for the using, so if it is the sort of tool you think you might need, go wild.

Desiderata

There are a few things that T-PEN does not currently do, that I wish it did. The first is quite straightforward: on the website it is possible to enter some metadata about the manuscript being transcribed (library information, year of production, etc.), but this metadata doesn’t make it back into the Shared Canvas JSON. It would be nice if I had a way to get all the information about my manuscript in one place.

The second is also reasonably simple: I would like to be able to define an XML button that is a milestone element. Currently the interface assumes that XML elements will have some text inside them, so the button will insert a <tag> and a </tag> but never a <tag/>. This isn’t hard to patch up manually – I just close the tag myself – but from a usability perspective it would be really handy.

The third has to do with resource limits currently imposed by T-PEN: although there doesn’t seem to be a limit to the number of manuscripts you upload, each manuscript can contain only up to 200MB of image files. If your manuscript is bigger, you will have to split it into multiple projects and combine the transcriptions after the fact. Relatedly, you cannot add new images to an existing manuscript, even if you’re under the 200MB limit. I’m told that an upcoming version of T-PEN will address at least this second issue.

The other two things I miss in T-PEN have to do with the linking between page area and text flow, and aren’t quite so simple to solve. Occasionally a manuscript has a block of text written in the margin; sometimes the block is written sideways. There is currently no good mechanism for dealing with blocks of text with weird orientations; the interface assumes that all zones should be interpreted right-side-up. Relatedly, T-PEN makes the assumption (when it is called upon to make any assumption at all) that text blocks should be interpreted from top left to bottom right. It would be nice to have a way to define a default – perhaps I’m transcribing a Syriac manuscript? – and to specify a text flow in a situation that doesn’t match the default. (Of course, there are also situations where it isn’t really logical or correct to interpret the text as a single sequence! That is part of what makes the problem interesting.)

Conclusion

If someone who is starting an edition project today asks me for advice on transcription, I would have little reservation in pointing them to T-PEN. The only exception I would make is for anyone working on a genetic or documentary edition of authors’ drafts or the like. The T-PEN interface does assume that the documents being transcribed are relatively clean manuscripts without a lot of editorial scribbling. Apart from that caveat, though, it is really the best tool for the task that I have seen. It has a great user interface for the task, it is an open source tool, its developers have been unfailingly helpful, and it provides a way to get out just about all of the data you put into it. In order to turn that data into XML, you may have to learn a little Python first, but I hope that the module I have written will give someone else a head start on that front too!

20 April 20152 April 2016

Coming back to proper (digital) philology

For the last three or four months I have been engaging in proper critical text edition, of the sort that I haven’t done since I finished my Ph.D. thesis. Transcribing manuscripts, getting a collation, examining the collation to derive a critical text, and all. I haven’t had so much fun in ages.

The text in question is the same one that I worked on for the Ph.D. – the Chronicle of Matthew of Edessa. I have always intended to get back to it, but the realities of modern academic life simply don’t allow a green post-doc the leisure to spend several more years on a project just because it was too big for a Ph.D. thesis in the first place. Of course I didn’t abandon textual scholarship entirely – I transferred a lot of my thinking about how text traditions can be structured and modelled and analyzed to the work that became my actual post-doctoral project. But Matthew of Edessa had to be shelved throughout much of this, since I was being paid to do other things.

Even so, in the intervening time I have been pressed into service repeatedly as a sort of digital-edition advice columnist. I’m by no means the only person ever to have edited text using computational tools, and it took me a couple of years after my own forays into text edition to put it online in any form, but all the work I’ve done since 2007 on textual-criticism-related things has given me a reasonably good sense of what can be done digitally in theory and in practice, for someone who has a certain amount of computer skill as well as for someone who remains a bit intimidated by these ornery machines.

Since the beginning of this year, I’ve had two reasons to finally take good old Matthew off the shelf and get back to what will be the long, slow work of producing an edition. The first is a rash commitment I made to contribute to a Festschrift in Armenian studies. I thought it might be nice to provide an edited version of the famous (if you’re a Byzantinist) letter purportedly written by the emperor Ioannes Tzimiskes to the Armenian king Ashot Bagratuni in the early 970s, preserved in Matthew’s Chronicle. The second is even better: I’ve been awarded a grant from the Swiss National Science Foundation to spend the next three years leading a small team not only to finish the edition, but also to develop the libraries, tools, and data models (including, of course, integration of ones already developed by others!) necessary to express the edition as digitally, accessibly, and sustainably as I can possibly dream of doing, and to offer it as a model for other digital work on medieval texts within Switzerland and, hopefully, beyond. I have been waiting six years for this moment, and I am delighted that it’s finally arrived.

The technology has moved on in those six years, though. When I worked on my Ph.D. I essentially wrote all my own tools to do the editing work, and there was very little focus on usability, generalizability, or sustainability. Now the landscape of digital tools for text critical edition is much more interesting, and one of my tasks has been to get to grips with all the things I can do now that I couldn’t practically do in 2007-9.

Over the next few weeks, as I prepare the article that I promised, I will use this blog to provide something of an update to what I published over the years on the topic of “how to make a digital edition”. I’m not going to explore here every last possibility, but I am going to talk about what tools I use, how I choose to use them, and how (if at all) I have to modify or supplement them in order to do the thing I am trying to do. With any luck this will be helpful to others who are starting out now with their own critical editions, no matter their comfort with computers. I’ll try to provide a sense of what is easy, what has a good user interface, what is well-designed for data accessibility or sustainability. And of course I’d be very happy to have discussion from others who have walked similar roads, to say what has worked for them.

17 June 20142 April 2016

SOLVED! The mystery of the character encoding

Update, two hours later: we have a solution! And it’s pretty disgusting. Read on below.

Two posts in a row about the deep technical guts of something I’m working on. Well I guess this is a digital humanities blog.

Yesterday I got a wonderful present in my email – a MySQL dump of a database full of all sorts of historical goodness. The site that it powers displays snippets of relevant primary sources in their original language, including things like Arabic and Greek. Since the site has been around for rather longer than MySQL has had any Unicode support to speak of, it is not all that surprising that these snippets of text in their original language are rather badly mis-encoded.

Not too much of a problem, I naïvely thought to myself. I’ll just fix the encoding knowing what it’s supposed to have been.

A typical example looks like this. The Greek displayed on the site is:
μηνὶ Νοἐμβρίω εἰς τὰς κ ´ ινδικτιῶνος ε ´ ἔτους ,ς

but what I get from the database dump is:
Î¼Î·Î½á½¶ ÎÎ¿á¼Î¼Î²Ïá½·Ï‰ Îµá¼°Ï‚ Ï„á½°Ï‚ Îº á¿½ Î¹Î½Î´Î¹ÎºÏ„Î¹á¿¶Î½Î¿Ï‚ Îµ á¿½ á¼”Ï„Î¿Ï…Ï‚ ,Ï‚

Well, I recognise that kind of garbage, I thought to myself. It’s double-encoded UTF-8. So all I ought to need to do is to undo the spurious re-encoding and save the result. Right?

Sadly, it’s not that easy, and here is where I hope I can get comments from some DB/encoding wizards out there because I would really like to understand what’s going on.

It starts easily enough in this case – the first letter is μ. In Unicode, that is character 3BC (notated in hexadecimal.) When you convert this to UTF-8, you get two bytes: CE BC. Unicode character CE is indeed Î, and Unicode character BC is indeed ¼. As I suspected, each of these UTF-8 bytes that make up μ has been treated as a character in its own right, and further encoded to UTF-8, so that μ has become Î¼. That isn’t hard to undo.

But then we get along to that ω further down the line, which has become Ï‰. That is Unicode character 3C9, which in UTF-8 becomes CF 89. Unicode CF is the character Ï as we expect, but there is no such Unicode character 89. Now it is perfectly possible to render 89 as UTF-8 (it would become C2 89) but instead I’m getting a rather inexplicable character whose Unicode value is 2030 (UTF-8 E2 80 B0)! And here the system starts to break down – I cannot figure out what possible mathematical transformation has taken place to make 89 become 2030.

There seems to be little mathematical pattern to the results I’m getting, either. From the bad characters in this sample:

ρ -> 3C1 -> CF 81 --> CF 81    (correct!!)
ς -> 3C2 -> CF 82 --> CF 201A
τ -> 3C4 -> CF 84 --> CF 201E
υ -> 3C5 -> CF 85 --> CF 2026
ω -> 3C9 -> CF 89 --> CF 2030

~~Ideas? Comments? Do you know MySQL like the back of your hand and have you spotted immediately what’s going on here? I’d love to crack this mystery.~~

After this post went live, someone observed to me that the ‘per mille’ sign, i.e. that double-percent thing at Unicode value 2030, has the value 89 in…Windows CP-1250! And, perhaps more relevantly, Windows CP-1252. (In character encodings just as in almost everything else, Windows always liked to have their own standards that are different from the ISO standards. Pre-Unicode, most Western European characters were represented in an eight-bit encoding called ISO Latin 1 everywhere except Windows*, where they used this CP-1252 instead. For Eastern Europe, it was ISO Latin 2 / CP-1250.)

So what we have here is: MySQL is interpreting its character data as Unicode, and expressing it as UTF-8, as we requested. Only then it hits a Unicode value like 89 which is not actually a character at all. But instead of passing it through and letting us deal with it, MySQL says “hm, they must have meant the Latin 1 value here. Only when I say Latin 1 I really mean CP-1252. So I’ll just take this value (89 in our example), see that it is the ‘per mille’ sign in CP-1252, and substitute the correct Unicode for ‘per mille’. That will make the user happy!”

Hint: It really, really, doesn’t make the user happy.

So here is the Perl script that will take the garbage I got and turn it back into Greek. Maybe it will be useful to someone else someday!

#!/usr/bin/env perl

use strict;
use warnings;
use Encode;
use Encode::Byte;

while(<>) {
    my $line = decode_utf8( $_ );
    my @chr;
    foreach my $c ( map { ord( $_ ) } split( '', $line ) ) {
        if( $c > 255 ) {
            $c = ord( encode( 'cp1252', chr( $c ) ) );
        }
        push( @chr, $c );
    }
    my $newline = join( '', map { chr( $_ ) } @chr );
    print $newline;
}

[*] Also, as I realized after posting this, except Mac, which used MacRoman. Standards are great! Let’s all have our own!

17 November 2013

How to have several Catalyst apps behind one Apache server

Since I’ve changed institutions this year, I am in the process of migrating Stemmaweb from its current home (on my family’s personal virtual server) to the academic cloud service being piloted by SWITCH. Along the way, I ran into a Perl Catalyst configuration issue that I thought would be useful to write about here, in case others run into a similar problem.

I have several Catalyst applications – Stemmaweb, my edition-in-progress of Matthew of Edessa, and pretty much anything else I will develop with Perl in the future. I also have other things (e.g. this blog) on the Web, and being somewhat stuck in my ways, I still prefer Apache as a webserver. So basically I need a way to run all these standalone web applications behind Apache, with a suitable URL prefix to distinguish them.

There is already a good guide to getting a single Catalyst application set up behind an Apache front end. The idea is that you start up the application as its own process, listening on a local network port, and then configure Apache to act as a proxy between the outside world and that application. My problem was, I want to have more than one application, and I want to reach each different application via its own URL prefix (e.g. /stemmaweb, /ChronicleME, /ncritic, and so on.) The difficulty with a reverse proxy in that situation is this:

I send my request to http://my.public.server/stemmaweb/
It gets proxied to http://localhost:5000/ and returned
But then all my images, JavaScript, CSS, etc. are at the root of localhost:5000 (the backend server) and so look like they’re at the root of my.public.server, instead of neatly within the stemmaweb/ directory!
And so I get a lot of nasty 404 errors and a broken application.

What I need here is an extra plugin: Plack::Middleware::ReverseProxyPath. I install it (in this case with the excellent ‘cpanm’ tool):

$ cpanm -S Plack::Middleware::ReverseProxyPath

And then I edit my application’s PSGI file to look like this:

use strict;
use warnings;
use lib '/var/www/catalyst/stemmaweb/lib';
use stemmaweb;
use Plack::Builder;
builder {
enable( "Plack::Middleware::ReverseProxyPath" );
my $app = stemmaweb->apply_default_middlewares(stemmaweb->psgi_app);
$app;
}

where /var/www/catalyst/stemmaweb is the directory that my application lives in.

In order to make it all work, my Apache configuration needs a couple of extra lines too:

    # Configuration for Catalyst proxy apps. This should eventually move
# to its own named virtual host.
RewriteEngine on
<Location /stemmaweb>
RequestHeader set X-Forwarded-Script-Name /stemmaweb
RequestHeader set X-Traversal-Path /
ProxyPass http://localhost:5000/
ProxyPassReverse http://localhost:5000/
</Location>
RewriteRule ^/stemmaweb$ stemmaweb/ [R]

The RequestHeaders inform the backend (Catalyst) that what we are calling “/stemmaweb” is the thing that it is calling “/”, and that it should translate its URLs accordingly when it sends us back the response.

The second thing I needed to address was how to start these things up automatically when the server turns on. The guide gives several useful configurations for starting a single service, but again, I want to make sure that all my Catalyst applications (and not just one of them) start up properly. I am running Ubuntu, which uses Upstart to handle its services; to start all my applications I use a pair of scripts and the ‘instance’ keyword.

description "Starman master upstart control"
author      "Tara L Andrews (tla@mit.edu)"
# Control all Starman jobs via this script
start on filesystem or runlevel [2345]
stop on runlevel [!2345]
# No daemon of our own, but here's how we start them
pre-start script
port=5000
for dir in `ls /var/www/catalyst`; do
start starman-app APP=$dir PORT=$port || :
port=$((port+1))
done
end script
# and here's how we stop them
post-stop script
for inst in `initctl list|grep "^starman-app "|awk '{print $2}'|tr -d ')'|tr -d '('`; do
stop starman-app APP=$inst PORT= || :
done
end script

The application script, which gets called by the control script for each application in /var/www/catalyst:

description "Starman upstart application instance"
author      "Tara L Andrews (tla@mit.edu)"
respawn limit 10 5
setuid www-data
umask 022
instance $APP$PORT
exec /usr/local/bin/starman --l localhost:5000 /var/www/catalyst/$APP/$APP.psgi

There is one thing about this solution that is not so elegant, which is that each application has to start on its own port and I need to specify the correct port in the Apache configuration file. As it stands the ports will be assigned in sequence (5000, 5001, 5002, …) according to the way the application directory names sort with the ‘ls’ command (which roughly means, alphabetically.) So whenever I add a new application I will have to remember to adjust the port numbers in the Apache configuration. I would welcome a more elegant solution if anyone has one!

5 July 20133 April 2016

Enabling the science of history

One of the great ironies of my academic career was that, throughout my Ph.D. work on a digital critical edition of parts of the text of Matthew of Edessa’s Chronicle, I had only the vaguest inkling that anyone else was doing anything similar. I had heard of Peter Robinson and his COLLATE program, of course, but when I met him in 2007 he only confirmed to me that the program was obsolete and, if I needed automatic text collation anytime soon, I had better write my own program. Through blind chance I was introduced to James Cummings around the same time, who told me of the existence of the TEI guidelines and suggested I use them.

It was, in fact, James who finally gave me a push into the world of digital humanities. I was in the last panicked stages of writing up the thesis when he arranged an invitation for me to attend the first ‘bootcamp’ held by the Interedition project, whose subject was to be none other than text collation tools. By the time the meeting was held I was in that state of anxious bliss of having submitted my thesis and having nothing to do but wait for the viva, so I could bend all my hyperactive energy in that direction. Through Interedition I made some first-rate friends and colleagues with whom I have continued to work and hack to this day, and it was through that project that I met various people within KNAW (the Royal Dutch Academy of Science.)

After I joined Interedition I very frequently found myself talking to its head, Joris van Zundert, about all manner of things in this wide world of digital humanities. At the time I knew pretty much nothing of the people within DH and its nascent institutional culture, and was moreover pretty ignorant of how much there was to know, so as often as not we ended up in some kind of debate or argument over the TEI, over the philosophy of science, over what constitutes worthwhile research. The main object of these debates was to work out who was holding what unstated assumption or piece of background context.

One evening we found ourselves in a heated argument about the application of the scientific method to humanities research. I don’t remember quite how we got there, but Joris was insisting (more or less) that humanities research needed to be properly scientific, according to the scientific method, or else it was rubbish, nothing more than creative writing with a rhetorical flourish, and not worth anyone’s time or attention. Historians needed to demonstrate reproducibility, falsifiability, the whole works. I was having none of it–while I detest evidence-free assumption-laden excuses for historical argument as much as any scholar with a proper science-based education would, surely Joris and everyone else must understand that medieval history is neither reproducible nor falsifiable, and that the same goes for most other humanities research? What was I to do, write a Second Life simulation to re-create the fiscal crisis of the eleventh century, complete with replica historical personalities, and simulate the whole to see if the same consequences appeared? Ridiculous. But of course, I was missing the point entirely. What Joris was pushing me to do, in an admittedly confrontational way, was to make clear my underlying mental model for how history is done. When I did, it became really obvious to me how and where historical research ultimately stands to gain from digital methods.

OK, that’s a big claim, so I had better elucidate this mental model of mine. It should be borne in mind that my experience is drawn almost entirely from Near Eastern medieval history, which is grossly under-documented and fairly starved of critical attention in comparison to its Western cousin, so if any of you historians of other places or eras have a wildly different perspective or model, I’d be very interested to hear about it!

When we attempt a historical re-construction or create an argument, we begin with a mixture of evidence, report, and prior interpretation. The evidence can be material (mostly archaeological) or documentary, and we almost always wish we had roughly ten times as much of it as we actually do. The reports are usually those of contemporaneous historians, which are of course very valuable but must be examined in themselves for what they aren’t telling us, or what they are misrepresenting, as much as for what they positively tell us. The prior interpretation easily outweighs the evidence, and even the reports, for sheer volume, and it is this that constitutes the received wisdom of our field.

So we can imagine a rhetorical structure of dependency that culminates in a historical argument, or a reconstruction. We marshal our evidence, we examine our reports, we make interpretations in the light of received wisdom and prior interpretations. In effect it is a huge and intricate connected structure of logical dependencies that we carry around in our head. If our argument goes unchallenged or even receives critical acceptance, this entire structure becomes a ‘black box’ of the sort described by Bruno Latour, labelled only with its main conclusion(s) and ready for inclusion in the dependency structure of future arguments.

Now what if some of our scholarship, some of the received wisdom even, is wide of the mark? Pretty much any historian will relish the opportunity to demonstrate that “everything we thought we knew is wrong”, and in Near Eastern history in particular these opportunities come thick and fast. This is a fine thing in itself, but it poses a thornier problem. When the historian demonstrates that a particular assumption or argument doesn’t hold water–when the paper is published and digested and its revised conclusion accepted–how quickly, or slowly, will the knock-on effects of this new bit of insight make themselves clear? How long will it take for the implications to sort themselves out fully? In practice, the weight of tradition and patterns of historical understanding for Byzantium and the Near East are so strong, and have gone for so long unchallenged, that we historians simply haven’t got the capacity to identify all the black boxes, to open them up and find the problematic components, to re-assess each of these conclusions with these components altered or removed. And this, I think, is the biggest practical obstacle to the work of historians being accepted as science rather than speculation or storytelling.

Well. Once I had been made to put all of this into words, it became clear what the most useful and significant contribution of digital technology to the study of history must eventually be. Big data and statistical analysis of the contents of documentary archives is all well and good, but what if we could capture our very arguments, our black boxes of historical understanding, and make them essentially searchable and available for re-analysis when some of the assumptions have changed? They would even be, dare I say it, reproducible and/or falsifiable. Even, perish the thought, computable.

understanding_dh A few months after this particular debate, I was invited to join Joris and several other members of the Alfalab project at KNAW in preparing a paper for the ‘Computational Turn’ workshop in early 2010, which was eventually included in a collection that arose from the workshop. In the article we take a look at the processes by which knowledge is formalized in various fields in the humanities, and how the formalization can be resisted by scholars within each field. Among other things we presented a form of this idea for the formalization of historical research. Three years later I am still working on making it happen.

I was very pleased to find that Palgrave Macmillan makes its author self-archiving policies clear on their website, for books of collected papers as well as for journals. Unfortunately the policy is that the chapter is under embargo until 2015, so I can’t post it publicly until then, but if you are interested meanwhile and can’t track down a copy of the book then please get in touch!

J. J. van Zundert, S. Antonijevic, A. Beaulieu, K. van Dalen-Oskam, D. Zeldenrust, and T. L. Andrews, ‘Cultures of Formalization – Towards an encounter between humanities and computing‘, in Understanding Digital Humanities, edited by D. Berry (London: Palgrave Macmillan, 2012), pp. 279-94.

11 June 2013

Early-career encyclopedism

So there I was, a newly-minted Ph.D. enjoying my (all too brief) summer of freedom in 2009 from major academic responsibilities. There must be some sort of scholarly pheromone signal that gets emitted in cases like these, some chemical signature that senior scholars are attuned to that reads ‘I am young and enthusiastic and am not currently crushed by the burden of a thousand obligations’. I was about to meet the Swarm of Encyclopedists.

It started innocently enough, actually even before I had submitted, when Elizabeth Jeffreys (who had been my MPhil degree supervisor) offered me the authorship of an article on the Armenians to go into an encyclopedia that she was helping to edit. As it happened, this didn’t intrude again on my consciousness until the following year–I was duly signed up as author, but my email address was entered incorrectly in a database so I was blissfully ignorant of what exactly I had committed to until I began to get mysterious messages in 2010 from a project I hadn’t really even heard of, demanding to know where my contribution was.

Lesson learned: you can almost always get a deadline extended in these large collaborative projects. After all, what alternatives do the editors have, really?

The second lure came quite literally the evening following my DPhil defense, when Tim Greenwood (who had been my MPhil thesis supervisor) got in touch to tell me about a project on Christian-Muslim relations being run out of Birmingham by David Parker, and that I would seem to be the perfect person to write an entry on Matthew of Edessa and his Chronicle. Flush with victory and endorphins, of course I accepted within the hour. Technically speaking this was a ‘bibliographical history’ rather than an ‘encyclopedia’, but the approach to writing my piece was very similar, and it was more or less the ideal moment for me to summarize everything I knew about Matthew.

For a little bit of doctoral R&R, academic style, I flew off a few days later to Los Angeles for the 2009 conference of the Society of Armenian Studies. There in the sunshine I must have been positively telegraphing my relaxation and lack of obligations, because Theo van Lint (who had only just ceased being my DPhil supervisor) brought up the subject of a number of encyclopedia articles on Armenian authors that he had promised and was simply not going to have a chance to do. By this time I was beginning to get a little surprised at the number of encyclopedia articles floating around in the academic ether looking for an authorly home, and I was not so naïve as to accept the unworkable deadline that he had, but subject to reasonability I said okay. He assured me that he would send me the details soon.

Around that time, through one of the mailing lists to which I had subscribed in the last month or so of my D.Phil., I got wind of the Encyclopedia of the Medieval Chronicle (EMC). The general editor, Graeme Dunphy, was looking for contributors to take on some of the orphan articles in this project. Matthew of Edessa was on the list, and I was already writing something similar for the Christian-Muslim Relations project, so I wrote to volunteer.

And then everything happened at once. Theo wrote to me with his list, which turned out to be for precisely this EMC project. The project manager at Brill, Ernest Suyver, who knew me from my work on another Brill project, wrote to me to ask if I would consider taking on several of the Armenian articles. Before I could answer either of these, Graeme wrote back to me, offering me not only the article on Matthew of Edessa that I’d asked for–not only the entire set of Armenian articles that both Theo and Ernest had sent in my direction–but the job of section editor for all Armenian and Syriac chronicles! The previous section editor had evidently disappeared from the project and it seems that only someone as young and unburdened as me had any hope of pulling off the organization and project management they needed on the exceedingly short timescale they had, or of being unwise enough to believe it could be done.

But I was at least learning enough by then to expect that any appeal to more senior scholars than myself was likely to be met with “Sorry, I have too much work already” and an unspoken coda of “…and encyclopedia articles are not exactly a priority for me right now.” There was the rare exception of course, but I turned pretty quickly to my own cohort of almost- or just-doctored scholars to farm out the articles I couldn’t (or didn’t want to) write myself. So I suppose by that time even I was beginning to detect the “yes I can” signals coming from the early-career scholars around me. Naturally the articles were not all done on time–it was a pretty ludicrous time frame I was given, after all–but equally naturally, delays in the larger project meant that my part was completed by the time it really needed to be. And so in my first year as a postdoc I had a credit on the editorial team of a big encyclopedia project, and a short-paper-length article, co-authored with Philip Wood, giving an overview of Eastern Christian historiography as a whole. I remain kind of proud of that little piece.

Lesson learned: your authors can almost always get you to agree to a deadline extension in these large collaborative projects. After all, what alternative do you have as editor, short of finding another author, who will need more time anyway, and pissing off the first one by withdrawing the commission?

The only trouble with these articles is that it’s awfully hard to know how to express them in the tickyboxes of a typical publications database like KU Leuven’s Lirias. Does each of the fifteen entries I wrote get its own line? Should I list the editorship separately, or the longer article on historiography? It’s a little conundrum for the CV.

Nevertheless I’m glad I got the opportunity to do the EMC project, definitely. And here’s another little secret–if I am able to make the time, I kind of like writing encyclopedia articles. It’s a nice way to get to grips with a subject, to cut straight to the essence of “What does the reader–and what do I–really need to know in these 250 words?” This might be why, when yet another project manager for yet another encyclopedia project found me about a year ago, I didn’t say no, and so this list will have an addition in the future. After that, though, I might finally have to call a halt.

I have written to Wiley-Blackwell to ask about their author self-archiving policies; I have a PDF offprint but am evidently not allowed to make it public, frustratingly enough. I will update the Lirias record if that changes. Brill has a surprisingly humane policy that allows me to link freely to the offprints of my own contributions in an edited collection, so I have done that here. I don’t seem to have an offprint for all the articles I wrote, though, so will need to rectify that.

Andrews, T. (2012). Armenians. In: Encyclopedia of Ancient History, ed. R. Bagnall et al. Malden, MA: Wiley-Blackwell.

Andrews, T. (2012). Matthew of Edessa. In: Christian–Muslim Relations. A Bibliographical History 1. Volume 3 (1050- 1200), ed. D. Thomas and B. Roggema. Leiden: Brill.

Andrews, T. and P. Wood. (2012). Historiography of the Christian East. In: Encyclopedia of the Medieval Chronicle, general editor G. Dunphy. Leiden: Brill.
(Additional articles on Agatʿangełos, Aristakēs Lastivertcʿi, Ełišē, Kʿartʿlis Cxovreba, Łazar Pʿarpecʿi, Mattʿēos Uṙhayecʿi, Movsēs Dasxurancʿi, Pʿawstos Buzand, Smbat Sparapet, Stepʿanos Asołik, Syriac Short Chronicles (with J. J. van Ginkel), Tʿovma Arcruni, Yovhannēs Drasxanakertcʿi.

5 November 20122 April 2016

Public accountability, #acwrimo, and The Book

Over the course of 2011, among the long-delayed things I finally managed to do was to put together a book proposal for the publication of my Ph.D. research. While I am reasonably pleased with the thesis I produced, it is no exception to the general rule that it would not make a very good book if I tried to publish it as it stands. As it happens there is a reasonably well-known series by a well-respected publisher, edited by someone I know, where my research fits in rather nicely. Even more nicely, they accepted my proposal.

Now here is where I have to humblebrag a little: I wrote my Ph.D. thesis kind of quickly, and much more quickly than I would recommend to any current Ph.D. students. Part of this was luck–once I hit upon my main theme, a lot of it just started falling into place–but part of it was the sheer terror of an externally-imposed deadline. I had rather optimistically applied for a British Academy post-doctoral fellowship in October 2008, figuring that either I’d be rejected and it would make no difference at all, or that I’d be shortlisted and have a deadline of 1 April 2009 to have my thesis finished and defended. At the time I applied I had a reasonable outline, one more or less completed chapter and the seeds for two more, and software that was about 1/3 finished. By the beginning of January I was only a little farther along, and I realized that the BA was going to make its shortlisting decisions very soon and, unless I made a serious and concerted effort to produce some thesis draft, I may as well withdraw my name. Amazingly enough this little self-motivational talk worked wonders and I spent the middle two weeks of January writing like crazy and dosing myself with ibuprofen for the increasingly severe tendinitis in my hands. (See? Not recommended.) Then, wonder of wonders, I was shortlisted and I got to dump the entire thing in my supervisor’s lap and say “Read this, now!” The next month was a panic-and-endorphin-fuelled rush to get the thing ready for submission by 20 February, so that I could have my viva by the end of March. This involved some fairly amusing-in-retrospect scenes. I had to enlist my husband to draw a manuscript stemma for me in OmniGraffle because my hands were too wrecked to operate a trackpad. I imposed a series of strict deadlines on my own supervisor for reading and commenting on my draft, and met him on the morning of Deadline Day to incorporate the last set of his corrections, which involved directly hacking a horribly complicated (and programmatically generated) LaTeX file that contained the edited text I had produced. (Yes, *very* poor programming practice that, and I am still suffering the consequences of not having taken the time to do it properly.)

In the end the British Academy rejected me anyway, but what did I care? I had a Ph.D.

With that experience in mind, I set myself an ambitious and optimistic target of ‘spring 2012’ for having a draft of the book. For the record the conversion requires light-to-moderate revision of five existing chapters, complete re-drafting of the introductory chapter, and addition of a chapter that involves a small chunk of further research. It was in this context, last October, that I saw the usual buzz surrounding the ramp-up to NaNoWriMo and thought to myself “you know, it would be kind of cool to have an academic version of that.”

It turns out I’m not the only one who thought this thought–there actually was an “Ac[ademic ]Bo[ok ]WriMo” last year. In the end the project that was paying my salary demanded too much of my attention to even think about working on the book, and the idea went by the wayside. The target of spring 2012 for production of the complete draft was also a little too optimistic, even by my standards, and that deadline whizzed right on by.

Here it is November again, though, and AcWriMo is still a thing (though they have dropped the explicit ‘book’ part of it), and my book still needs to be finished, and this year I don’t have any excuses. So I signed myself up, and I am using this post to provide that extra little bit of public accountability for my good intentions. I am excusing myself from weekend work on account of family obligations, but for the weekdays (except *possibly* for the days of ESTS) I am requiring of myself a decent chunk of written work, with one week each dedicated to the two chapters that need major revision or drafting de novo.

I won’t be submitting the thing to the publisher on 30 November, but I am promising myself (and now the world) that by the first of December, all that will remain is bibliographic cleanup and cosmetic issues. I am really looking forward to my Christmas present of a finished manuscript, and I am counting on public accountability to help make sure I get it. Follow me on Twitter or App.net (if you don’t already) and harass me if I don’t update!

23 October 2012

Conference-driven doctoral theses

In the computer programming world I have occasionally come across the concept of ‘conference-driven development’ (and, let’s be honest, I’ve engaged in it myself a time or two.) This is the practice of submitting a talk to a conference that describes the brilliant software that you have written and will be demonstrating, where by “have written” you actually mean “will have written”. Once the talk gets accepted, well, it would be downright embarrassing to withdraw it so you had better get busy.

It turns out that this concept can also work in the field of humanities research (as, I suspect, certain authors of Digital Humanities conference abstracts are already aware.) Indeed, the fact that I am writing this post is testament to its workability even as a means of getting a doctoral thesis on track. (Graduate students take note!)

In the autumn of 2007 I was afloat on that vast sea of Ph.D. research, no definite outline of land (i.e. a completed thesis) in sight, and not much wind in the sails of my reading and ideas to provide the necessary direction. I had set out to create a new critical edition of the Chronicle of Matthew of Edessa, but it had been clear for a few months that I was not going to be able to collect the necessary manuscript copies within a reasonable timeframe. Even if I had, the text was far too long and copied far too often for the critical edition ever to have been feasible.

One Wednesday evening, after the weekly Byzantine Studies department seminar, an announcement was made about the forthcoming Cambridge International Chronicles Symposium to be held in July 2008. It was occurring to me by this point that it might be time to branch out from graduate-student conferences and try to get something accepted in ‘grown-up’ academia, and a symposium devoted entirely to medieval chronicles seemed a fine place to start. I only needed a paper topic.

Matthew wrote his Chronicle a generation after the arrival of the First Crusade had changed pretty much everything about the dynamics of power within the Near East, and his city Edessa was no exception. Early in his text he features a pair of dire prophetic warnings attributed to the monastic scholar John Kozern; the last of these ends with a rather spectacular prediction of the utter overthrow of Persian (read: Muslim, but given the cultural context you may as well read “Persian” too) power by the victorious Roman Emperor, and Christ’s peace until the end of time. It is a pretty clearly apocalyptic vision, and much of the Chronicle clearly shows Matthew struggling to make sense of the fact that some seriously apocalyptic events (to wit, the Crusade) occurred and yet it was pretty apparent forty years later that the world was not yet drawing to an end with the return of Christ.

Post-apocalyptic history, I thought to myself, that’s nicely attention-getting, so I made it the theme of my paper. This turned out to be a real stroke of luck – I spent the next six months considering the Chronicle from the perspective of somewhat frustrated apocalyptic expectations, and little by little a lot of strange features of Matthew’s work began to fall into place. The paper was presented in July 2008; in October I submitted it for publication and turned it into the first properly completed chapter of my thesis. Although this wasn’t the first article I submitted, it was the first one that appeared in print.

Andrews, T. L. (2009). The new age of prophecy: the Chronicle of Matthew of Edessa and its place in Armenian historiography. In Kooper, E. (ed.) The Medieval Chronicle VI. Amsterdam: Rodopi