Mon 3 Mar 2008
Stop picking on the EOL!
Posted by ryan under Repositories, Technology
Last week, the Encyclopedia of Life (EOL) released the first public version of their system. Although they had a huge amount of traffic, most critical reaction was neutral and/or negative. Notably, Rod Page posted a review which was categorized under the heading “suck”. And the comments from Slashdot were largely negative as well.
In general, it seems that no one out there understands what a difficult task the EOL is taking on. Yes, they have a lot of money and a lot of press/hype. Fine. Hold them to high standards. But, don’t hold them to an unrealistic timeline.
I’ve been involved in various digital library projects over the last 6 years, and I have never seen a serious project like this come together in less than two years. It takes time to find the right people for the job, time for those people to agree on the proper technical architecture, time to agree on policies, time to actually implement the system, time for testing, etc. This isn’t something that can be sped up by adding an extra $10 million or by using a “silver bullet” technology (*cough* Rails *cough*). If you want a high-quality system that will scale to large amounts of data and large amounts of use, while ensuring the longevity of the data, it’s gonna take two years. Or more.
You might argue that these technologies have been around for a while, and should be simple to put together into a new system. Unfortunately, as each new digital repository is built, there is a standard set of questions to be answered. A small sample of these questions:
- How do I obtain content?
- How do I massage content to fit my internal format?
- What form of identifiers should I use?
- How much metadata should I capture?
- Can I get metadata automatically?
- Can I build a system that encourages users to augment/correct the metadata?
- Should I build the system from scratch, or build off an existing repository framework that doesn’t quite fit my needs?
- What kinds of search/discovery make sense for this content, and what is the best way to implement them?
- Am I violating any copyright or licensing restrictions by redistributing the content?
- How can I present the content in a user-friendly way, while still accurately reflecting any copyright or licensing restrictions?
Answering each of these questions takes a certain amount of thinking, discussion, and implementation, all repeated until the solution is satisfactory. (If the answers to these questions are already known before the project starts, that means the project is just duplicating the functionality of another system, and there is no reason to build it in the first place.) Did I mention that this process takes about two years?
I’ve met a few people involved with the EOL. They are all smart people with good intentions, and a willingness to work with the biology community to build something truly worthwhile. They were forced to release an alpha version of their product in less than a year, primarily due to the schedule of the TED conferences. Yes, the current system is not overly impressive. I’m not worried about that. I’m much more concerned about where they go from here. Check the EOL site at this time next year, and then tell me what you think.
March 6th, 2008 at 5:37 am
Ryan, I agree that projects of this magnitude take time. I guess I have two problems with the way things are going. First is the extraordinary amount of hype, (some of it unintentionally funny). This creates wildly unrealistic expectations, coupled with a release schedule that is driven by concerns that are ultimately irrelevant (does TED really matter that much, I suspect not). Ironically, so much buzz was generated that EOL fell over on the day of launch. The buzz also generates political fights about credit, hence the logo-fest on the EOL pages.
Secondly, the first release is almost exactly the wrong vision for EOL — static web pages littered with logos, capturing a tiny fraction of what we actually know. I think the notion of “a web page per species” is actually the least useful part of the whole project. EOL should be the place where enormous amounts of data are synthesised to yield new knowledge. Much of this is relatively easy to do — text mining article titles alone would generate a database of ecological associations, based on co-occurrence of species names.
I’m sure much of this will be worked out in time, but I suspect there will have to be a rethink of what the goals are and how to achieve them. The first release was never going to be great, but I would have liked some indication that EOL had decent vision of its future.
March 7th, 2008 at 5:23 pm
I couldn’t agree more. Though I also think that much of the hype is self-inflicted, and with it the expectations that this would basically perform like a red-hot start-up company with lots of investor capital behind it.
I have little doubt that all the technical issues, and some of them are formidable, will be surmounted by the EOL team with impressive enthusiasm, skill, and energy. What really worries me, however, is how the EOL is going to get the content. How will they persuade the zillion citizen scientists out there to update or add content to the EOL species pages instead of to the Wikipedia species pages, which is what they are doing in large numbers already. And if it doesn’t matter if they just continue doing that on Wikipedia, then just exactly what will EOL add in value on top of the Wikipedia species pages? Fancier layout? Is that where we repeat the word mash-up 10x in unison?
Maybe if mashing up content from various content providers is really what EOL will be all about, they should just fully focus on that and for now resign from the idea of generating content?
Most importantly though, why do I need to learn from Patrick about this post?
March 8th, 2008 at 10:27 am
Rod, points well taken, especially about the logos. Both Wikipedia and Google News reference external sites without obtrusive logos. Hopefully, the EOL will be able to follow their lead in convincing the community that a link is more important than a logo.
Hilmar, the difference between the EOL and Wikipedia is all about structure. Even with Wikipedia templates, the added (biology-oriented) structure of the EOL can make it much more useful. This is why we will always have a hierarchy of systems. The general-purpose Wikipedia can lead users to biology-oriented EOL, which can lead users to the more detailed organism-oriented and discipline-oriented repositories.