Pisa 2001

What follows is a summary of the 'roadmap' meeting on 'New Directions in Humanities Computing' held in Pisa this April, as a preparation for the final plenary of the conference. It included both the ALLC Committee (plus the former chair, Susan Hockey) and the Conference Programme Committee. The meeting was divided into sessions on:

Linguistics
Literary studies
Bibliography & textual criticism
Manuscripts, libraries & archives
Visual imaging, multimedia & performance studies
Methodologies, multi-disciplines and digital scholarship

Each of these topics was assigned to a sub-group, who made presentations addressing the following issues:

Where are we now, especially recent developments?
Where are we going - especially promising new developments?
What should our agenda be?

Orthogonal to this, each presentation was asked to cover:

New methodologies for research
New approaches to teaching
Changing functions and roles in society

As these headings show, 'Humanities Computing' has been defined fairly selectively, in line with the main interests represented in the Conference. For this and other reasons, the summary is tentative and provisional, and very far from being comprehensive or systematic. It has tried to cover the main points made in the meeting, but has not identified areas of uncertainty or disagreement. In particular, in the interests of brevity, points made under the heading 'Where are we now?' have been omitted. However links to much fuller accounts of the presentations and discussion on each topic are included under the relevant headings. The aim here is simply to provide a reasonably concise starting-point for discussion at the Round Table, where we shall begin with the question: how do the conference proceedings cause us to modify, develop or add to the points that are listed here?

Where are we going: promising new developments?

Linguistics

For a fuller account, see linguistics.

Logical modelling of language. In addition to the use of logical programming languages, such as PROLOG and Lisp, for the realization of applied linguistic tasks, we now find the same programming environment used for the modelling of linguistic structure, in the form of computer simulations of structures predicted by a given theory.
Creation of more advanced corpora, including speech corpora, and web-based corpora. Multimodality: opportunities for combining spoken, written and non-verbal language.
The Web itself as a corpus.
(Foreign) language teaching. Increased incorporation and integration of multimedia elements and non-linear structures. New speech technologies: not only will programs speak to the language learner but they will also evaluate the speech responses of the student.
Speech technology. Linguistics will have an increasing share in solving practical industry-level speech technology tasks for the consumer market as well as education in general.
Methodology/Tools. Broadening of markup techniques from text studies to other types of material (speech, multimodal communication)

Literature

For a fuller account, see literature.

Markup. A lot still needs to be done in the field of deep mark-up.
Markup as interpretation that others can then work on: a new view of criticism and interpretation as work-in-progress.
Manuscript images, Within manuscript studies we now have the ability to link the manuscript image with its transcription. This not only renders the manuscript more widely available to scholars who may not have been able to travel to where the actual manuscript was; it also has the potential to change the interpretation of that literary text. This technology is of course applicable outside the field of literary studies as well.
Linking versions of texts. We can now build parallel concordances to help us look at different versions of texts, documenting authorial change and the genesis of texts, and leading us back to the question of what the text is.
Hypertext novels: are these a dead end or an exciting new development? One way or the other, they again raise the question of what a text is.
Large corpora. Because we have such vast quantities of electronic texts and varying corpora we can now say things about specific topics which we simply couldn't say before. Technology is forcing us to redefine what we mean by literary studies; in other ways, conversely, it also reinforces the canon.
Application of new resources for linguistics to literary analysis: parsers, lemmatized dictionaries, electronic thesauri, etc.

Textual Criticism / Scholarly Editing

For a fuller account, see textual.

New kinds of critical edition. Computer technology has made it possible to overcome the limitations of printed editions: variant readings must no longer be listed 'atomistically' in a critical apparatus, but can be presented in their full context; adding facsimiles to the transcription of the original sources can help to clarify difficult cases, like paleographically difficult readings, abbreviations, fragmentary sources.
Encoding systems. Encoding standards like XML and TEI help in structuring the text, in exchanging, in publication. Linking techniques allow almost instantaneous switching between edition text, apparatus, transcription and facsimile of the sources, commentary, illustrations etc.
The computer as tool. On the other hand, the easy availability of these new possibilities should not be used as a replacement for scholarly work. An XML-editor like XMETAL is an editor and a valuable tool to control the tagging, but nothing more; editorial work consists of many steps: textual collation, collecting and classifying the variants, analysing, indexing, sorting, transforming, and formatting. Different tools are needed for all of these steps.
An electronic edition should not only be a surrogate of a book; it must offer additional functionality for screening, searching and analysing the text, also for solving questions which have not been thought of (or not been handled) by the editing team. This means that the 'naked' marked-up text should be included in the electronic distribution

Manuscripts, Libraries & Archives

For a fuller account, see manuscripts.

New cataloguing systems. Catalogue data are still very much influenced by printed catalogues and card archives. One of the aims for the future is to liberate cataloguing systems and the use of metadata from this.
Cataloguing for a wider public. Libraries have a tradition of cataloguing for the specialist. When catalogue information - and ideally copies of the books and manuscripts - are made available for the general public through the Internet it has to be done in a way which is useful outside the sphere of library professional.
Work on digital preservation, which is giving us new understandings of the complexity of digital data and the integration of this with systems and programs. In particular, work on migration and emulation.
Automated methods of generating metadata and searchability in complex texts.
Information retrieval, including automatic keywording from bodies of digital data, then creation and refinement of thesauri, use of lemmatized dictionaries for searching, creation of topic maps, etc.
Query by image content for visual materials.
E-books: new modes of publication which give us new corpora of data which don’t need to be rekeyed.
Born digital materials are being catalogued and preserved with greater attention to their sustainability than has happened in the past.
Digital object repositories are being created, e.g. the Flexible and Extensible Digital Object and Repository Architecture (FEDORA).

Multimedia, Hypermedia

For a fuller account, see multimedia.

[Many of the following points apply to other subject areas as well, but came up particularly under the present heading.]

Wireless technologies are developing and some standards are emerging.
Multilingual initiatives, and the increased development of Unicode, are simplifying access to and enriching the available resources.
High bandwidth creates new possibilities.
Archiving initiatives such as the CNI/Internet 2 working groups are developing advanced network applications and technologies, in partnerships of academia, industry and government.
The automation of processes and procedures is facilitating project work, with particularly promising developments emerging in metadata harvesting.
Multimedia facilities are being integrated into building design and classrooms.
Computing humanists are disseminating their expertise through initiatives such as NINCH 'Guide to good practice', and 'Database of Humanities Computing Projects'.
New kinds of projects, integrating documents, databases and mapping tools, such as GIS.
Markup and multimedia/video.

What should the agenda be? What are the challenges?

We need to have a broader picture of what is currently being done in the way of humanities computing projects.
- We need to set up a database of research groups, types of activities, methodologies, goals and prospects
- We need to find out what scholars and teachers want from digital resources, and what they are actually doing with them.
- Software forum: where people who really do research on humanities subjects should come together with programmers in order to create good interactive and integrated tools.
- We need to get to scholars and publishers and librarians before they start to create electronic resources.
A Virtual Institute of Humanities Computing. The strength of our community may eventually be multiplied under the organizational force of a virtual institute which may offer:
- online courses on methodology as well as specific humanities computing tasks especially for the beginners,
- distance diplomas in humanities computing for which the academic background of a network of universities may be required,
- online publications, especially for work in progress to stimulate dialogue and provide forum for talented young scholars,
- a software repository,
- workshops by leading representatives of various fields of humanities computing to disseminate and enhance accumulated knowledge and constantly increase the share of computing in humanities research in general.
Reusability. In Humanities Computing we have always had a lot of individual projects. There should be more and more emphasis on reusable resources. Nothing in digital form is ever finished; products of humanities computing as contributions to work-in-progress.
Preservation
- The present technical and organizational conditions of electronic distribution (both online and offline) are far less reliable than those for print media. Industry has long knowledge of deterioration in analogue media; the life-span of digital media is not yet known.
- Multimedia objects have different issues of storage and file size and formats. Decisions have to made about compression, about the preservation master object, about the derivative object for access. There is no formal technical standard as yet.
- How to preserve electronic resources for a time when the related software is no longer current?
- Preservation of born digital materials: how do we deal with them, issues of legal deposit, and future access.
Problems of rights.
- Copyright. The results of many expensive editorial endeavours are 'buried' in libraries, while electronic copies of out-of-copyright (and out-of-date) texts are used even in academic instruction.
- Micropayments and licensing issues.
- Institutions sharing courseware, materials.
- Data protection.
Multi-professionalism.
- Stakeholders need to collaborate and communicate: pieces of projects come from many areas and must be multidisciplinary. This requires multi-professionalism - between different disciplines within academic institutions, between humanities, engineering and industry.
- Humanities computing scholars act as mediators, interpreters and bridge builders They can inform other professionals of the recent developments in adding value through metadata and encourage a focus on content and purpose, not medium.
New role for librarians
- No longer responsible for containers (printed books) but now also for (digital) content.
- Library and software communities need to work more closely with users
Putting the cultural heritage on the Internet, with tools to help people of all ages to work with it. More research is needed on what those tools might do. We need generalisable systems with reusable resources. There is a lot of political pressure to reach out to wider communities and the rest of society.
Work on multilinguality and UNICODE—oriental and ancient languages are still not as well covered as we would wish.
Automated metadata extraction projects: more work needed.
High bandwidth is required for accessing multimedia resources on the Web and we are all aware of the digital divide in respect to under-developed countries. This also exists in the UK. A recent survey of Internet access in the UK found that 32% of the population had access, but in Scotland only 15%, and very few of those had high-bandwith access. We must continue to be aware of this in our projects.
Metadata for digital objects is well developed for access, retrieval and management of resources, but there is still work to be done on metadata for sounds and especially for moving images. The creation of hypermedia archives and resources is driving work on both metadata and methods of defining relationships. This work is primarily being done in academic projects, such as the Library of Congress Making of America project; METS; and the NYU moving image archive initiatives.
Electronic plagiarism as a major problem for teaching.
At present too many libraries are putting their content on the internet as images rather than searchable text.

Humanities Computing: the methodological common ground

For a fuller account, see methodologies.

A decade ago we knew enough to relate common techniques to the various disciplines: we first suspected, then partly knew that humanities computing was concerned with a methodological common ground within which disciplinary boundaries did not apply. Recent experience, say within the last 5 years, has increasingly involved multi-technology/multi-media work on the basis of large-scale resources, with pronounced multidisciplinary results and discovery of further potentials. In effect networked resources have begun to manifest the ancient model of the research library, in which singular and relatively unchanging resources are separated from their manifold and highly changeable uses, allowing for indefinite recontextualization across the many fields of study to which each resource is relevant. The emergence of this multidisciplinary digital library has served not to fragment the methodological common ground but to emphasize its centrality and extend its breadth. The future directions for humanities computing therefore involve systematic exploration of this common ground to ensure that developments are coherent, cohesive and responsible to its cultural inheritance.

Humanities computing specialists thus have a vital role as interdisciplinary and interprofessional mediators. The old model of support services is no longer valid: research should be seen as a common enterprise between 'technologists' and 'scholars'.