Skip to end of banner
Go to start of banner

Comparative evaluation of enrichments

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This page gathers material and results for an experiment in the context of the EuropeanaTech task force on enrichment and evaluation.

The focus is mainly on the method, to try to define and encourage a unified approach to evaluation in our community, and maybe as a way to unify APIs (as a same script should be able to run the different enrichment services). We do not want to limit the scope to a certain country nor even a type enrichment tool (it could be a concept or person detection service).

Current participants: Daniel, Dimitris, Hugo, Nuno, Vladimir, Aitor, Rainer


Experiment 1: Evaluate the results of different enrichment tools on EDM data

Objective

  • Make different enrichment tools work with EDM data.
  • Understand the real challenges of harmonizing the presentation of the results.
  • Understand the real challenges of evaluating the results.
  • Gather relevant cases to expand the gold standard.

Steps

  1. Download the EDM data (see below);
  2. Run your own enrichment service on top of these data;
  3. Generate a CSV following the patterns given below to present the results and send to the mailing list;
  4. Assess the results (Not for now will be done in a second step) 

Datasets

  • An extraction from TEL of 17300 EDM records covering 19 countries.
  • The following zip file contains: one single xml file with all records: file:dataset.zip in all.zip
Vladimir: this set is riddled with scientific articles (math, biology, livestock care, etc). Example record: 1000095952730, about varieties of wheat called "Pliska, Sreca, Vila, Holly". When some author uses known proper names to name new things, the concept extractor will have no idea and return the original meaning (places). These records are from specific scientific fields (not really culture), with specific terms.

Results (produced enrichments)

  • Each participant will generate a CSV file with the results (we opted for a CSV as it is the easiest to generate and to process), following this structure:

A line for each enrichment (i.e. link):

<ProvidedCHO>;<property>;<target>;<confidence>,<source_text>

Meaning:

  • <ProvidedCHO>: URI of the edm:ProvidedCHO;
  • <property>: Qualified name of the property (e.g. dcterms:spatial)
  • <target>: URI of the entity (e.g. DBPedia, Geonames);
  • <confidence>: floating point value from 0 (uncertainty) to 1 (certainty), or empty if the confidence is unknown;
  • <source_text>: Literal where the entity was identified.

Example:

http://data.theeuropeanlibrary.org/BibliographicResource/2000085482942;dcterms:spatial;http://dbpedia.org/resource/Prague;0.9;Praha

  • Results collected from the participants
Europeanafile:enrich.europeana.csv in all.zip
TELfile:enrich.tel.csv in all.zip
LoCloudfile:loCloud.zip in all.zipTwo results: One using English background link service and another using the vocabulary match service. There is no source_text in any record, is it possible to fix this?
Pelagios (Simon Rainer)file:pelagios-wikidata.csv.zip in all.zip
Coreferenced to Geonames and DBPedia: file:enrich.pelagios.coref.csv in all.zip 

SILK (Daniel)file:dct_spatial_dbpedia.csv in all.zipIncluded for now only dct:spatial enrichments with DBpedia using Silk.



Ontotextfile:ontotext.tar.gz in all.zip
Coreferenced to DBPedia:
file:enrich.ontotext.v1.coref.zip in all.zip
file:enrich.ontotext.v2.coref.zip in all.zip
Two versions, both using Ontotext's concept extractor. Both versions return rich results for English (general concepts not limited to type), but are limited to Person and Place for other languages. The difference between the versions comes from what we consider "English" or "Other language", either using the record language or the language tag of literals.

Evaluation

Agreements between results

Gold Standard (sampled from the agreements)

Inter-Annotator agreement



Manual evaluation on samples

Ontotext Remarks

  • At the end we print "type" (Person, Organization, Place, or Thing which is generic)
  • Our pipeline is tuned for English only, so some of the results on non-English records are ridiculous. We may send two variants: full and EN-records only.
  • No labels