Friday, January 24, 2014

VertNet starts issue tracking using GitHub

VN Short LogoVertNet has announced that they have implemented issue tracking using GitHub. This is a really interesting development, as figuring out how to capture and make use of annotations in biodiversity databases is a problem that's attracting a lot of attention. VertNet have decided to use GitHub to handle annotations, but in a way that hides most of GitHub from users (developers tend to love things like GitHub, regular folks, not so much, see The Two Cultures of Computing).

The VertNet blog has a detailed walk through of how it works. I've made some comments on that blog, but I'll repeat them here.

At the moment the VertBet interface doesn't show any evidence of issue tracking (there's a link to add an issue, but you can't see if there are any issues). For example, visiting an example CUMV Amphibian 1766 I don't see any evidence on that page that there is an issue for this record (there is an issue, see https://github.com/cumv-vertnet/cumv-amph/issues/1). It think it's important that people see evidence of interaction (that way you might encourage others to participate). This would also enable people to gauge how active collection managers are in resolving issues ("gee, they fixed this problem in a couple of days, cool").

Likewise, it would be nice to have a collection-level summary in the portal. For example, looking at CUMV Amphibian 1766 I'm not able to click through to a page for CUMV Amphibians (why can't I do this at the moment - there needs to be a way for me to get to the collection from a record) to see how many issues there are for the whole collection, and how fast they are being closed.

I think the approach VertNet are using has a lot of potential, although it sidesteps some of the most compelling features of GitHub, namely forking and merging code and other documents. I can't, for example, take a record, edit it, and have those edits merged into the data. It's still a fairly passive "hey, there's a problem here", which means that the burden is still on curators to fix the issue. This raises the whole question of what to do with user-supplied edits. There's a nice paper regarding validating user input into Freebase that is relevant here, see "Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation" (http://dx.doi.org/10.1145/2556195.2556227 [not live yet], PDF here).