What is your provenance?

Gavin Bell, Nature Publishing Group, me@gavinbell.com

Overview
Provenance [wikipedia2007] is an old fashioned expression, it is widely used in the world of antiques. All items for sale have a provenance, this is the history of the artefact, showing its origin and how it has passed from hand to hand. For the buyer it acts as proof that the item is genuine and not a fake. Each intermediary between the origination of the artefact and the current seller depend on trust between buyer and seller at each transaction. It is a basic trust network.

People have a provenance too, they are known for a certain field or a certain task. In science publishing this includes a person’s citation record – the number of times an article has been mentioned in other papers. There are obvious parallels with how the web works in terms of publication, but similarly there is no common identity framework, so I can know Simon is the same Simon.

This paper looks at the whole internet as a single social network and will show that it is possible to build up a detailed picture of an individual, their friends and what they know, entirely automatically and with a surprising degree of validity.

Online identity is a deeply complex issue and one that presents much opportunity for heated debate and nit-picking of implementations. In this paper I’ll be looking at some issues I see in the path to creating an online provenance for us all on the internet. There are snakes, toads and some gems under virtually every rock in this landscape so if I blithely trample on your favourite topic, forgive me. Looking to the future is fraught with mishap.

Building blocks
Many of us (conference attending geeks) have multiple identities online, certainly many people have more than one email address. Whilst some people have multiple online identities, they tend to have a single web presence which they call home. Increasingly these personal pages are held on social software applications like MySpace or Livejournal, rather than being hand coded. Some of these people prefer a blog on Typepad or Blogger, but there is a tendency to to opt for a managed presence via a provider rather than hosting their own site.

They may also have a career-based presence, to separate the pictures of their kids from their papers on chemical bonding. Finally they may have a separate employer supplied webpage, usually quite out of date, if supplied at all. There is a blurring between career and personal for many people, but the distinction between family and friends vs professional interests remains a valid one.

Many users struggle to maintain more than one of these spaces at any one time. Users, especially teenagers, also tend to migrate en-mass if one significant person moves, or the new site is “better”, leaving a an out of date presence on the old site, [boyd2006]. This was most marked in terms of Friendster and MySpace.

The ability to separate identities and also act anonymously is important for free speech [EFF2007], but whilst important I’m going to put this to one side, as content generated in this manner does not contribute to provenance. Kim Cameron has a good overview of the wider issues in identity [Cameron2005].