posted 28 Aug 2003 in Volume 7 Issue 1
On the web: Launching the ARK
ARKive does not rescue wildlife directly but provides a digital safe haven for detailed records – films, photographs, sound recordings and memories – of their existence. Launched in May 2003 by Sir David Attenborough, Rob Curtis describes the site’s development and need for content management.
ARKive is an initiative of the Wildscreen Trust, designed to capture and preserve digital copies of wildlife stills and moving images. Slides and film are fragile things, perishable with age, bulky to store, and of limited economic value for an organisation, unless they depict ‘valuable’ charismatic species. Organisations have no preservation remit, and hence, no reason to archive physical media if it is of low quality, low interest or low value.
The ARKive project was designed to save the content on that media, preserve it and make it publicly available. The project is run under the auspices of the Wildscreen Trust – organisers of the Wildscreen Festival, the world’s largest and most prestigious wildlife film and television festival for over 20 years. The original idea for ARKive was developed ten years ago by the late Chris Parsons, then chief executive of the Trust, former head of the BBC Natural History Unit, and producer of BBC’s Life on Earth.
Images and recordings are being donated to ARKive by the most famous names in natural history film-making, including ABC Australia, the BBC Natural History Unit, Discovery, Granada, National Geographic, Oxford Scientific Films, as well as specialist photographic agencies such as Auscape, Ardea, Naturepl and NHPA. ARKive has also been inundated with donations of media from many conservation organisations from around the world, individual wildlife film-makers, photographers, scientists and academics.
How does a charity, which runs a film festival, design a project for rich media storage and play out on the web? The Trust realised from the outset that there was no way the technology of the day would meet the challenge. So initial work was done on project design. Would people give us media to digitise? What would we accept? Where would the money come from?
In fact, the wildlife industry has embraced the project, with donations from almost every major company and individual. The money has come from the Heritage Lottery Fund, the New Opportunity Fund, and sponsorship in-kind from HP.
Initial work on project design centred on the building of taxonomies and the idea of creating a species record. At the time, there were no taxonomies for media describing species that were considered adequate for ARKive’s preservation remit. So we created our own, in consultation with academics concerned with endangered species, and with a research team at HP Labs. This allowed us to create a flexible taxonomy that would allow for expansion, but which was manageable enough for our team of media researchers to use daily.
Zoologists and media researchers built up our taxonomy internally, with help from external reviewing groups. The aim was to create a record for each species that would illustrate a number of behaviours to a scientist or interested lay person. Therefore, the top nodes for the taxonomy include locomotion, sound, reproduction, habitat, etc. Underneath the locomotion mode there are terms such as walking, hopping, wading and swimming. Each of these terms is subdivided, in the case of swimming into surface swimming and underwater. This allows us to select media and create text that will build a complete species record.
Having designed a way to tag the data that came into the building the project needed a digital-asset management (DAM) and workflow application. Tapes and slides move in and out of the building regularly. These media files are, in many cases, impossible to replace should they be damaged.
At the time, there was no commercial software suite that would fulfil the requirements, even with significant additional bespoke work. HP Labs, having investigated the DAM and workflow markets, undertook a large philanthropic development effort to build an accessioning system for ARKive.
The ARKive accessions system is the result of two years’ collaboration between the HP research team and the zoologists and media researchers of the Wildscreen Trust. It is almost entirely bespoke, written on a Microsoft platform using a SQL Server database as the indexing backbone to a set of XML snippets, which represent the metadata associated with a piece of media. The interface is written entirely in ASP.
This system allows for collection, indexing, tracking and digitisation of the media. For previews of the slides more of HP’s bespoke work was leveraged, drawing on previous projects undertaken for the BBC. Previews of the digitised film were generated using Flip Factory, a software suite for taking video and creating various exported files from it. In the case of ARKive, we generate five streams from each high-quality source movie. These are Windows media streams for high and low bandwidth, real streams for high and low bandwidth, and a downloadable QuickTime file.
Slides are scanned using commercial scanning agencies and return to ARKive in the form of 60MB raw tiffs. Moving pictures are digitised to the highest quality available for the piece. It must be noted that ARKive’s film comes to us in a number of physical formats, and the age range of material donated varies in terms of image quality.
Once the content that ARKive requires has been selected, it is digitised and placed into a storage or preservation vault, known as the asset store, which is the reason for the whole project. Within the store lie all of the high-quality preservation assets and associated metadata. The next issue we faced was how to disseminate this material on the web.
ARKive and content management
The first task was to decide what needed to be produced. We came up with a roster of websites that were required under the ARKive brand. www.arkive.org was to be the flagship of the brand, a site aimed at the sophisticated user, designed to be used by academics or people with a serious interest in wildlife. The site was designed to include streaming video in multiple formats and high-quality images to download. To hit the educational objectives from multiple angles, we undertook to build www.planetarkive.org, a site aimed at key stage two pupils, and www.arkiveeducation.org a site aimed at the teachers and parents of those pupils. Finally, we built www.arkivearkade.org, a static site, to add a viral marketing element to the mix.
Having decided what the deliverables were, content-rich sites with streaming media in November 2002, we set out to select a content-management system (CMS) from available market offerings. From our procurement process, we discovered that mid-range CMSs are ideal for text presentation, but do not cope well with rich media presentation. Therefore, we decided on a twin-track approach using the RedHat CCM suite for www.planetarkive.org, and www.arkiveeducation.org. We developed the main ARKive site using in-house contractors, Java/JSP and the struts framework.
The main site is not really a fully functional CMS. It allows its six users some editorial control of items like image selection, bulk text entry and image strapline creation, but its main purpose is to harvest data from our DAM system, transform it, and play it out to the web. The RedHat system is used by three people and takes on many more traditional CMS roles.
The education team use it primarily to produce standard-compliant websites for children aged 9-11 and their teachers/parents. While the two systems share media content from a central store, they do not, at present, share any other data or assets. This will be rectified as we attempt to merge the two into one unified system.
The ARKive web systems were built very quickly – in less than nine months from initial invitation to tenders to go live. We realised early on that we were not going to be in a position to specify every aspect of the system, and so opted for a dynamic systems-development methodology (DSDM). This allowed us to produce specifications for the parts of the system that were well understood, but also enabled us to utilise prototyping for aspects of the interface and the information flow that were less well understood.
DSDM is always a risky approach, especially when third parties on a day rate are involved. But in a project where hard and fast specifications cannot be delivered well in advance of initiation due to time constraints, it’s an option that can work well. The key here is to develop good links with the suppliers.
In light of expected publicity, the ARKive hosting environment was designed to be especially robust. Our rack is hosted at the University of the West of England, with a connection onto the JANET network. We have four streaming-media servers running the Helix product, two web servers running Resin and Apache, an Oracle database server, and a network file server.
Our tips on CM
ARKive is a hugely complex project with a large amount of risk. One of these risks is that almost no part of the project uses a purely off-the-shelf product. What we have learnt is that in that kind of delicate environment, response times to changes upstream need to be so rapid that having flexible partners and internal resources are essential. With this project, specifications were largely unavailable owing to the rapid development lifecycle and pushing technical boundaries. We would advocate employing staff used to a DSDM-style project.
Developing ARKive’s suite of websites involved four external agencies and several contractors and permanent staff over nine months. The large number of third-party companies complicated development significantly. In an ideal world, the preference would obviously be for either an entirely in-house development or an entirely outsourced one.
As with all systems, there have been changes since initial release. Very few of them have been on the back end of the system, with the majority of changes concerning browser compatibility, usability, and accessibility on the front end. Managing these changes, and the expectations of the non-technical board, is another key area for project success.
The future of ARKive
ARKive is now actively seeking funding to continue the project. Work will continue to complete the species holdings and bring more images and movies to the wider public.
Rob Curtis is web-development manager at Wildscreen. He can be contacted at firstname.lastname@example.org