So, you finally got that beautiful scholarly editing project on the rails. You have begun to painstakingly edit those [insert crazy number] letters from late Renaissance artists. Or those [insert second crazy number] medieval prose manuscripts.
Or, as in my case, you have collected data on a humungous amount of performance events from twentieth-century theatre, letting various groups of students work together using a Google Spreadsheet. (A project going by the name of Belgium is Happening, more on this below & in upcoming posts.)
It goes without saying that for the editing phase of those projects the Text Encoding Initiative has been a tremendous boon.
But how about publishing your results? For some projects, ‘going print’ is no longer a viable option. The materials you have collected may simply be too large or cumbersome for print. Besides, a scholarly publication (at least in my experience) takes at least half so much time for negotiations with publishers, going through another editorial process, &c., as the edition itself took to complete.
So, you want to do an online scholarly publishing project.
When I started out my first project in the summer of 2008 — the Corpus Toneelkritiek Interbellum, a corpus of Dutch theatre reviews from the interwar period — the natural choice for publishing my TEI files online seemed to be the formatting language XSLT. As a TEI beginner, it was the evident way to go, as it is used in many projects for the presentation of an edition.
XSLT is basically a powerful formatting language (and, in some ways, also a rudimentary programming language) that tells the web browser how to sort, lay-out, and present the intricacies of your XML edition for the benefit of the illiterate masses out there who prefer to do their reading without angle brackets or namespace declarations.
XSLT has the distinct advantage of a ridiculously simple publishing model. If you simply connect a stylesheet to an XML document on your web server, the browser will pick up both, and render your XML documents accordingly. At least, that’s what supposed to happen after you managed to write or customize your own XSLT stylesheet.
For the sake of clarity, I did not reach that stage of my initiation.
As it happened, I was also picking up some basic Python scripting skills at the time. And what I saw in Python — delicious pseudo-code featuring non-offending commands such as file.open() — looked quite different from the verbose and swollen style of your average XSLT stylesheet.
I decided to convert my source files into run-of-the-mill HTML using Python’s quite powerful XML parsing capabilities. And then publish them, first as static files on a general-purpose server at our university, later on as part of a dynamic (and again Python-driven) website on my own web server.
However, there are certain distinct disadvantages & upper limits to these approaches. Either way — running your website on a university server, a private hosting solution, or your own server — you are basically into self-publishing. Will you use an established platform aka CMS (Content Management System, e.g., WordPress or Drupal) or do you prefer to grow your own HTML/CSS? What is the most advantageous and flexible place to host it? If you run your own server, when does it need to be updated? Do you really need that latest Apache update? If you are doing a dynamic website, will the database continue to behave as it does today? When to update your database software? Is it possible that your website will one day attract a lot of traffic, necessitating more than one server? What search engine do you use for your collection of texts? Do you simply plug in a Google search box, or do you want some more searching power for your users? If so, what software do you choose?
Ah, the joys of server maintenance.
While looking for alternative models of publishing online, I stumbled upon Google Sites. It is part of the Google Apps suite that is marketed to companies, schools, and groups. Using Google Sites, you design a basic website and populate it with content. Plus, every page you add is automatically indexed for powerful and lightning-fast site searching.
At first, that was exactly how I used it. It was a good choice for course background materials & wikis — i.e., as a high-tech replacement for the idiotic constraints of your average Blackboard installation.
But then I realised the full potential of the feature allowing you the edit the HTML contents of any page. If you would simply convert your scholarly project’s source files to HTML (using any of the above-mentioned methods) and insert them into a Google Site, it could grow to be the ideal publication platform for your project:
- Styling is applied automatically to your text. No more fiddling about with CSS stylesheets, just choose a general theme for your site, insert the HTML, and you’re done. If you want to apply additional styling, you can use the WYSIWYG editor or the HTML editor that accepts in-line CSS styling, too.
- Your project is running on Google’s servers. They are doing the maintenance, and since they’re the world’s largest data farm, it should remain accessible even when under heavy load. (The Google approach to hosting is dealt with from a different perspective in a Campfire One video on Google App Engine)
- Your project’s texts are immediately indexed and fully searchable. Anyone who has dipped into the difficulties of configuring your own search engine (certainly when coming from a non-IT background) should be happy about this.
- If you decide to make your website public, it seems quite reasonable to expect that its contents are also swiftly added to the main Google search index, so your editing efforts may become more visible.
The only point is … how are you going to upload those [refer to crazy number 1] of HTML documents to your Google Site?
More on that soon, in an upcoming post.