Comparative evaluation of enrichments

This page gathers material and results for an experiment in the context of the EuropeanaTech task force on enrichment and evaluation.

The focus is mainly on the method, to try to define and encourage a unified approach to evaluation in our community, and maybe as a way to unify APIs (as a same script should be able to run the different enrichment services). We do not want to limit the scope to a certain country nor even a type enrichment tool (it could be a concept or person detection service).

Current participants: Daniel, Dimitris, Hugo, Nuno, Vladimir, Aitor, Rainer

Experiment 1: Evaluate the results of different enrichment tools on EDM data

Objective

Make different enrichment tools work with EDM data.
Understand the real challenges of harmonizing the presentation of the results.
Understand the real challenges of evaluating the results.
Gather relevant cases to expand the gold standard.

Steps

Download the EDM data (see below);
Run your own enrichment service on top of these data;
Generate a CSV following the patterns given below to present the results and send to the mailing list;
Assess the results (Not for now will be done in a second step)

Datasets

An extraction from TEL of 17300 EDM records covering 19 countries.
The following zip file contains: one single xml file with all records: file:dataset.zip in all.zip

Vladimir: this set is riddled with scientific articles (math, biology, livestock care, etc). Example record: 1000095952730, about varieties of wheat called "Pliska, Sreca, Vila, Holly". When some author uses known proper names to name new things, the concept extractor will have no idea and return the original meaning (places). These records are from specific scientific fields (not really culture), with specific terms.

Results (produced enrichments)

Each participant will generate a CSV file with the results (we opted for a CSV as it is the easiest to generate and to process), following this structure:

A line for each enrichment (i.e. link):

Meaning:

<ProvidedCHO>: URI of the edm:ProvidedCHO;
<property>: Qualified name of the property (e.g. dcterms:spatial)
<target>: URI of the entity (e.g. DBPedia, Geonames);
<confidence>: floating point value from 0 (uncertainty) to 1 (certainty), or empty if the confidence is unknown;
<source_text>: Literal where the entity was identified.

Example:

http://data.theeuropeanlibrary.org/BibliographicResource/2000085482942;dcterms:spatial;http://dbpedia.org/resource/Prague;0.9;Praha

Results collected from the participants

Europeana	file:enrich.europeana.csv in all.zip
TEL	file:enrich.tel.csv in all.zip
LoCloud	file:loCloud.zip in all.zip	Two results: One using English background link service and another using the vocabulary match service. There is no source_text in any record, is it possible to fix this?
Pelagios (Simon Rainer)	file:pelagios-wikidata.csv.zip in all.zip Coreferenced to Geonames and DBPedia: file:enrich.pelagios.coref.csv in all.zip
SILK (Daniel)	file:dct_spatial_dbpedia.csv in all.zip	Included for now only dct:spatial enrichments with DBpedia using Silk.
Ontotext	file:ontotext.tar.gz in all.zip Coreferenced to DBPedia: file:enrich.ontotext.v1.coref.zip in all.zip file:enrich.ontotext.v2.coref.zip in all.zip	Two versions, both using Ontotext's concept extractor. Both versions return rich results for English (general concepts not limited to type), but are limited to Person and Place for other languages. The difference between the versions comes from what we consider "English" or "Other language", either using the record language or the language tag of literals.

CSV file with all the results: file:enrich.all_05062015.zip in all.zip
Simple analysis of the results: https://docs.google.com/spreadsheets/d/1xNmeMiP_Y5e5IqRsx4WAIaPF84ns9eX0EwIaFCq6Ajg

Evaluation

Agreements between results:

A total of 22 agreements combinations were identified.
An overview of the agreement computed for the results is available at: https://docs.google.com/spreadsheets/d/1xNmeMiP_Y5e5IqRsx4WAIaPF84ns9eX0EwIaFCq6Ajg/edit#gid=1109289169
The corresponding clusters can be downloaded from here: file:agreement_clusters.zip in all.zip
Files starting with "only" contain the results for which no agreement was found.

Gold Standard (sampled from the agreements):

Spreadsheet: https://docs.google.com/spreadsheets/d/1gKos1qDPlH1LZc_QLavjVdpfVKR3JMdnY6PPNxrmWao/edit#gid=938663830
Sampling rate: max of 100 (for each agreement line)
Columns of the spreadsheet:

Link to the original resource
Title or description if title is not available
Property and corresponding value (i.e. the source of enrichment)
Matched Term (term that was linked, in the original source, i.e. typically a subset of the value)
Target Resource (link, i.e. URI, to the external vocabulary)
Annotation (fill in with the assessment according to the guidelines)

Annotated corpus (made out of the samples): file:annotatedsamples.zip in all.zip

Inter-Annotator agreement:

Spreadsheet: https://docs.google.com/spreadsheets/d/10fV31D0QQ9flRAt3pgJNh8wVHq6n0ZRgc83-3QHb1OQ
Inter-annotator corpus + annotations (ready for download):
Fleiss Kappa evaluation: https://docs.google.com/spreadsheets/d/10fV31D0QQ9flRAt3pgJNh8wVHq6n0ZRgc83-3QHb1OQ/edit#gid=1905159313

Ontotext Remarks

At the end we print "type" (Person, Organization, Place, or Thing which is generic)
Our pipeline is tuned for English only, so some of the results on non-English records are ridiculous. We may send two variants: full and EN-records only.