This page gathers material and results for an experiment in the context of the EuropeanaTech task force on enrichment and evaluation.

The focus is mainly on the method, to try to define and encourage a unified approach to evaluation in our community, and maybe as a way to unify APIs (as a same script should be able to run the different enrichment services). We do not want to limit the scope to a certain country nor even a type enrichment tool (it could be a concept or person detection service).

Current participants: Daniel, Dimitris, Hugo, Nuno, Vladimir, Aitor, Rainer


Experiment 1: Evaluate the results of different enrichment tools on EDM data

Objective

Steps

  1. Download the EDM data (see below);
  2. Run your own enrichment service on top of these data;
  3. Generate a CSV following the patterns given below to present the results and send to the mailing list;
  4. Assess the results (Not for now will be done in a second step) 

Datasets

Vladimir: this set is riddled with scientific articles (math, biology, livestock care, etc). Example record: 1000095952730, about varieties of wheat called "Pliska, Sreca, Vila, Holly". When some author uses known proper names to name new things, the concept extractor will have no idea and return the original meaning (places). These records are from specific scientific fields (not really culture), with specific terms.

Results (produced enrichments)

A line for each enrichment (i.e. link):

<ProvidedCHO>;<property>;<target>;<confidence>,<source_text>

Meaning:

  • <ProvidedCHO>: URI of the edm:ProvidedCHO;
  • <property>: Qualified name of the property (e.g. dcterms:spatial)
  • <target>: URI of the entity (e.g. DBPedia, Geonames);
  • <confidence>: floating point value from 0 (uncertainty) to 1 (certainty), or empty if the confidence is unknown;
  • <source_text>: Literal where the entity was identified.

Example:

http://data.theeuropeanlibrary.org/BibliographicResource/2000085482942;dcterms:spatial;http://dbpedia.org/resource/Prague;0.9;Praha

Europeanafile:enrich.europeana.csv in all.zip
TELfile:enrich.tel.csv in all.zip
LoCloudfile:loCloud.zip in all.zipTwo results: One using English background link service and another using the vocabulary match service. There is no source_text in any record, is it possible to fix this?
Pelagios (Simon Rainer)file:pelagios-wikidata.csv.zip in all.zip
Coreferenced to Geonames and DBPedia: file:enrich.pelagios.coref.csv in all.zip 

SILK (Daniel)file:dct_spatial_dbpedia.csv in all.zipIncluded for now only dct:spatial enrichments with DBpedia using Silk.
Ontotextfile:ontotext.tar.gz in all.zip
Coreferenced to DBPedia:
file:enrich.ontotext.v1.coref.zip in all.zip
file:enrich.ontotext.v2.coref.zip in all.zip
Two versions, both using Ontotext's concept extractor. Both versions return rich results for English (general concepts not limited to type), but are limited to Person and Place for other languages. The difference between the versions comes from what we consider "English" or "Other language", either using the record language or the language tag of literals.

Evaluation

Agreements between results:

Gold Standard (sampled from the agreements):
Inter-Annotator agreement:

Ontotext Remarks