Friday, June 01, 2012

Can you trust EOL?

There's a recent thread on the Encyclopedia of Life concerning erroneous images for the crab Leptograpsus. This is a crab I used to chase around rooks on stormy west-coast beaches near Auckland, so I was a little surprised to see the EOL page for Leptograpsus looks like this:

Leptograpsus

The name and classification is the crab, but the image is of a fish (Lethrinus variegatus). Perhaps at some point in aggregating the images the two taxa, which share the abbreviated name "L.variegatus" got mixed up.

Now, errors like this are bound to happen in a project the size of EOL, and EOL has some pretty active efforts to clean up errors (e.g., the Homonym Hunters). But what bothers me about this example is the prominent label Trusted that appears below the image. If I look at all the images for Leptograpsus on EOL, I see "trusted" images for fish. All images of the crab (i.e., the real Leptograpsus) are labelled "unreviewed" and implicitly "untrusted":

Leptograpsus2

If you are going to claim something is "trusted" you need to be very careful. The images of the fish may well come from a trusted source (FishBase), and FishBase's assertion that the image is of Lethrinus variegatus may well be "trusted", but I certainly can't trust the assertion made by EOL that this image depicts a crab.

In this example the error is easy to spot (if you know that crabs and fish are different), but what if the error was more subtle? Or what if you are using EOL's API and explicitly asking for only content you can trust? Then you get the fish images (see https://gist.github.com/2850321).

If I can't trusted "trusted" then EOL has a problem. One way forward is to unpack the notion of "trust" and make sure the user knows what "trusted" means. In this case there are at least two assertions being made:
  1. This image is of a fish (made by FishBase)
  2. This image is of a crab (made by EOL)

EOL needs to make clear what assertions are being made, and which ones it is stating can be "trusted". Ideally it also needs to move away from blanket assertions of "trusted" versus not trusted, because that's far too coarse (just because FishBase knows about fish I'm not sure I'm going to put equal trust that every image it contains has been correctly identified). Trust is something that is conferred by users and acquired over time, not something to be simply asserted.