The 2008 OpenRepositories conference was a lot of fun, but extremely tiring. Many of the days lasted 13 hours, and one day went to 15. As one of my fellow attendees stated, someone was trying to make sure we experienced every last bit of “conferencely goodness”. This, combined with the fact that I had some non-conference work to do, meant that I had to consume a lot more caffeine than normal.

Despite a few minor irritations, the conference went very smoothly. A large amount of technology was used, and for the most part worked as expected. Although the schedule was packed full, everything ran on time. Kudos to Les Carr and the team at Southampton!

A major achievement of this conference was the development of a conference repository. Most of the papers/presentations were collected beforehand, and it was very useful to refer to them during talks. Unfortunately, the content from the DSpace and Fedora user group meetings is not available yet; I hope it will eventually appear, because there were some great talks in both.

There was an increased focus on scientific data in repositories this year, starting with a keynote by Peter Murray-Rust that described the outlook for repositories from the viewpoint of chemistry researchers. Some of the points he brought up were echoed in many other talks throughout the conference:

  • Get into the researchers’ authoring stream as early as possible. One method that seems to be making headway is to propose the repository as a (dark) backup for the scientist’s local machine. This puts the content into the repository immediately, and there is little effort required of the scientist when the time comes to make it public.
  • Repositories must focus on text mining and other automated methods for metadata generation because “scientists hate metadata”.
  • PDF can cause serious problems for automatic processing methods. It is often better to locate the document that was used to produce the PDF, and process that instead.

I was surprised to see that code for new repository functionality is coming out at an astounding rate, to the point where I no longer have time to keep up with everything. Here are just a few of the announcements I remember:

  • The SWORD project has released a client for producing SWORD ingest packages, and server-side tools to ingest these packages into four major repository platforms.
  • The NSDL is starting to release most of the tools they have developed for their Fedora system, including the OnRamp/OnFire enhancements to Fez. It looks like OnRamp/OnFire will be rolled into the main Fez distribution, while other tools are available from the NCore Sourceforge space.
  • Several add-ons to DSpace 1.5 have been released by Graham Triggs and Tim Donohue. (See the HOWTO Category on the Dspace wiki.)
  • Sneep, the Social Networking Extensions for EPrints, should be available in May.
  • Within the Fedora community, there are quite a few new projects releasing code. Muradora and eSciDoc are the most interesting to me, but I’m sure there are others I missed.

Another major development is Microsoft’s entry into the repository arena. When this was announced shortly before the conference, I was extremely skeptical. Within libraries and universities, there has been a backlash against vendors, to the point where most people working on a “serious” repository won’t touch a product unless they can see the source. Even if Microsoft decides to open-source the repository itself, it depends on closed-source pieces, including the .NET framework and SQL Server. Due to these factors, I’m unlikely to even play with the new repository software, but I don’t wish Microsoft ill. I was incredibly surprised at the amount of negativity directed their way by some of the other conference attendees. I can understand frustration from the Fedora community, because the new system mirrors many features of Fedora, but some people seemed offended by the simple fact that Microsoft sent representatives to the conference.

What does Microsoft’s move mean for the future? I have no idea. Microsoft has a hit-and-miss record when entering new markets, and only time will tell if they manage to build something that customers want. Regardless, it means this whole idea of repositories is really starting to catch on, because the big kids are paying attention.

The conference ended with the official European launch of the OAI-ORE standard (called simply “ORE” by most people). So far, the greatest success of ORE is in getting attention from influential people. While there are a few demonstration systems, it is unclear just how useful the standard will be. In some ways, ORE is a dumbed-down version of METS. But the simplicity of the basic standard (assuming there is eventually a simple set of documentation) will appeal to many, and the use of arbitrary graphs rather than hierarchical structure means that ORE can handle a few types of information that are painful to represent in METS. However, note that ORE is an abstract model, while METS is a concrete data format, so it is theoretically possible to represent ORE information in METS format, though this may not be useful.

A few other notables:

  • All three of the major repository systems are coming out with new versions. DSpace 1.5 and EPrints 3.1 are available now, and Fedora 3.0 is in testing. I don’t know how much has changed in EPrints, but the DSpace and Fedora releases both represent major upgrades.
  • A large number of groups are building new systems for managing user accounts, most based on OpenID or Shibboleth.
  • Many are starting to view non-library systems like Flickr and Facebook as part of the repository ecosystem.
  • Quite a few Australian projects are using the Australian METS profile. I need to take a closer look at that.
  • I thought that I was working on a type of project that only a few other people in the world cared about — linking publications and the data used to create them. All of a sudden, everyone I meet is working on this exact problem!

One last result of the conference: I made a decision. Due to various events in my life, this blog has been on the back burner for a long time. No more. There are some pressing things that need to be said about repositories in general, and the Fedora vs. DSpace question in particular. During the past year, circumstances caused me to switch from the Fedora world to the DSpace world. Predictably, many of my conversations at the conference revolved around this switch. Now that my mind has started to truly process the problem, it is time to lay out the details. Fedora vs. DSpace. No holds barred. Coming soon.