Thomas Crombez

Publishing scholarly projects using Google Sites, pt. 2

In Scholarly Publishing on February 13, 2010 at 10:47 pm

In my previous post on scholarly publishing projects, I summed up the advantages of using Google Sites to make documents available online. But how to automate that process for a huge number of documents? For example, how to build a fully-searchable website for hundreds of Victorian letters, or scores of descriptions of performance events?

In my earlier publishing projects (e.g., the SARMA project for collecting dance & theatre reviews by Pieter T’Jonck) there seemed to be no other option but to master the power overcome the restrictions of HTML and CSS, throw a website together, add a search engine, and host the whole bunch on your own webserver or your university’s or a hosting company’s. That means lots of work, and between three and five new areas of expertise you need to explore if you’re coming from a traditional humanities background, as I do (I studied Philosophy and Theatre Studies, and those are still the main subjects of my teaching assignments).

That’s how I got into maintaining my own webserver, and although it’s certainly been a rich and rewarding process (insert your favorite funny accent to pronounce these words) it is getting on my nerves, too.

Still, other systems seemed like either still more work to do (such as learning the ropes of a Content Management System) or too limited in possibilities. That how I first thought of Google Sites (or similar systems).

What changed my mind was the release of the Google Data API for Google Sites in September 2009. That’s quite a mouthful. The Data API is Google’s backdoor for programmers. Using the “Application Programming Interface,” you may automate the sending of emails through GMail, automatically update events in Google Calendar, upload movies to YouTube, request maps from Google Maps etc. etc. Basically, anything you can do through one of Google’s services by mouseclicks, you can automate through the Data API. Moreover, there’s not only an API client for serious Java people, but also one for the queen of playful code — Python!

It’s a godsend, and I love every inch of it.

(Well, except for that little one stupid inch that makes it impossible to publish the PDF files in your Google Docs account automatically, but more on that in a future post.)

Using the Google Data API, you can automatically post HTML-formatted documents to a Google-hosted website, where they are instantly made available online and fully indexed for site searching.

Read that sentence again. It’s crazy, if only because of this: although there is an upper limit to the amount of stuff you can freely post on a Google Site, this limit only holds for page attachments. If it’s simple HTML pages you are posting, there is a no upper limit.

To give a concrete example: this is the workflow I followed for the current version of Belgium is Happening, an online register of post-war performance events.

  1. Students collected information on performance events through research in books, journals, and archival records. They recorded this info in a gigantic spreadsheet on which they could work simultaneously (i.e., collaborative editing — I used Google Docs but there’s other systems, too). Each row of the spreadsheet holds one event, detailing its date, place, participants, and a short description.
  2. A first script pulled in the data from the Google spreadsheet through the Google Data API for Google Docs. (If they put Google in the name of one more new service they devise, I’ll have them sued for artificially limiting the vocabulary richness of my blogposts.) Note that this step isn’t strictly necessary — the script could also read in the data from a local data file.
  3. A second script formatted the event data as HTML, logged in to the Google Sites API (using my normal Google account details), selected the first event, created a new page on the Google Site for the project, posted the HTML to the new page, and then continued to do so for the next 1,399 events.

And that’s how I posted no less 1,400 separate pages on the site of Belgium is Happening, and made all of those events fully searchable at the same time. Running not on my server, not on my time, but on Google’s servers — i.e., much more efficiently than I could ever aspire to do. Check out a sample of events here: belgiumishappening/home/events

(To be perfectly honest, what you see there is already a bit of a hack — I really wanted to offer visitors a random selection of events, which is impossible in the current setup of Google Sites. You can present an overview of subpages, but since all my events are under one ‘parent’ page, this list would be very large and unwieldy. So I cheated a bit, more on this later….)

I first wanted to share the Python scripts I used at the end of this post, but they’re really too much tailor-made for my project to be informative for others. However, I strongly suggest reading the code for the sample applications that is provided with the gdata Python client — all the basics are there.

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine

Reblog this post [with Zemanta]
Advertisements
  1. […] detailing the process of publishing a scholarly editing project on Google Sites (see pt. 1 and pt. 2). Image via […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: