Last week, the Encyclopedia of Life (EOL) released the first public version of their system. Although they had a huge amount of traffic, most critical reaction was neutral and/or negative. Notably, Rod Page posted a review which was categorized under the heading “suck”. And the comments from Slashdot were largely negative as well.

In general, it seems that no one out there understands what a difficult task the EOL is taking on. Yes, they have a lot of money and a lot of press/hype. Fine. Hold them to high standards. But, don’t hold them to an unrealistic timeline.

I’ve been involved in various digital library projects over the last 6 years, and I have never seen a serious project like this come together in less than two years. It takes time to find the right people for the job, time for those people to agree on the proper technical architecture, time to agree on policies, time to actually implement the system, time for testing, etc. This isn’t something that can be sped up by adding an extra $10 million or by using a “silver bullet” technology (*cough* Rails *cough*). If you want a high-quality system that will scale to large amounts of data and large amounts of use, while ensuring the longevity of the data, it’s gonna take two years. Or more.

You might argue that these technologies have been around for a while, and should be simple to put together into a new system. Unfortunately, as each new digital repository is built, there is a standard set of questions to be answered. A small sample of these questions:

  • How do I obtain content?
  • How do I massage content to fit my internal format?
  • What form of identifiers should I use?
  • How much metadata should I capture?
  • Can I get metadata automatically?
  • Can I build a system that encourages users to augment/correct the metadata?
  • Should I build the system from scratch, or build off an existing repository framework that doesn’t quite fit my needs?
  • What kinds of search/discovery make sense for this content, and what is the best way to implement them?
  • Am I violating any copyright or licensing restrictions by redistributing the content?
  • How can I present the content in a user-friendly way, while still accurately reflecting any copyright or licensing restrictions?

Answering each of these questions takes a certain amount of thinking, discussion, and implementation, all repeated until the solution is satisfactory. (If the answers to these questions are already known before the project starts, that means the project is just duplicating the functionality of another system, and there is no reason to build it in the first place.) Did I mention that this process takes about two years?

I’ve met a few people involved with the EOL. They are all smart people with good intentions, and a willingness to work with the biology community to build something truly worthwhile. They were forced to release an alpha version of their product in less than a year, primarily due to the schedule of the TED conferences. Yes, the current system is not overly impressive. I’m not worried about that. I’m much more concerned about where they go from here. Check the EOL site at this time next year, and then tell me what you think.