Comparative evaluation of enrichments
This page gathers material and results for an experiment in the context of the EuropeanaTech task force on enrichment and evaluation.
The focus is mainly on the method, to try to define and encourage a unified approach to evaluation in our community, and maybe as a way to unify APIs (as a same script should be able to run the different enrichment services). We do not want to limit the scope to a certain country nor even a type enrichment tool (it could be a concept or person detection service).
Current participants: Daniel, Dimitris, Hugo, Nuno, Vladimir, Aitor, Rainer
Experiment 1: Evaluate the results of different enrichment tools on EDM data
Objective
- Make different enrichment tools work with EDM data.
- Understand the real challenges of harmonizing the presentation of the results.
- Understand the real challenges of evaluating the results.
- Gather relevant cases to expand the gold standard.
Steps
- Download the EDM data (see below);
- Run your own enrichment service on top of these data;
- Generate a CSV following the patterns given below to present the results and send to the mailing list;
- Assess the results (Not for now will be done in a second step)
Datasets
- An extraction from TEL of 17300 EDM records covering 19 countries.
- The following zip file contains: one single xml file with all records: file:dataset.zip in all.zip
Results (produced enrichments)
- Each participant will generate a CSV file with the results (we opted for a CSV as it is the easiest to generate and to process), following this structure:
A line for each enrichment (i.e. link): <ProvidedCHO>;<property>;<target>;<confidence>,<source_text> Meaning:
Example: http://data.theeuropeanlibrary.org/BibliographicResource/2000085482942;dcterms:spatial;http://dbpedia.org/resource/Prague;0.9;Praha |
- Results collected from the participants
Europeana | file:enrich.europeana.csv in all.zip | |
TEL | file:enrich.tel.csv in all.zip | |
LoCloud | file:loCloud.zip in all.zip | Two results: One using English background link service and another using the vocabulary match service. There is no source_text in any record, is it possible to fix this? |
Pelagios (Simon Rainer) | file:pelagios-wikidata.csv.zip in all.zip Coreferenced to Geonames and DBPedia: file:enrich.pelagios.coref.csv in all.zip | |
SILK (Daniel) | file:dct_spatial_dbpedia.csv in all.zip | Included for now only dct:spatial enrichments with DBpedia using Silk. |
Ontotext | file:ontotext.tar.gz in all.zip Coreferenced to DBPedia: file:enrich.ontotext.v1.coref.zip in all.zip file:enrich.ontotext.v2.coref.zip in all.zip | Two versions, both using Ontotext's concept extractor. Both versions return rich results for English (general concepts not limited to type), but are limited to Person and Place for other languages. The difference between the versions comes from what we consider "English" or "Other language", either using the record language or the language tag of literals. |
- CSV file with all the results: file:enrich.all_05062015.zip in all.zip
- Simple analysis of the results: https://docs.google.com/spreadsheets/d/1xNmeMiP_Y5e5IqRsx4WAIaPF84ns9eX0EwIaFCq6Ajg
Evaluation
Agreements between results:
- A total of 22 agreements combinations were identified.
- An overview of the agreement computed for the results is available at: https://docs.google.com/spreadsheets/d/1xNmeMiP_Y5e5IqRsx4WAIaPF84ns9eX0EwIaFCq6Ajg/edit#gid=1109289169
- The corresponding clusters can be downloaded from here: file:agreement_clusters.zip in all.zip
- Files starting with "only" contain the results for which no agreement was found.
- Spreadsheet: https://docs.google.com/spreadsheets/d/1gKos1qDPlH1LZc_QLavjVdpfVKR3JMdnY6PPNxrmWao/edit#gid=938663830
- Sampling rate: max of 100 (for each agreement line)
- Columns of the spreadsheet:
- Link to the original resource
- Title or description if title is not available
- Property and corresponding value (i.e. the source of enrichment)
- Matched Term (term that was linked, in the original source, i.e. typically a subset of the value)
- Target Resource (link, i.e. URI, to the external vocabulary)
- Annotation (fill in with the assessment according to the guidelines)
- Annotated corpus (made out of the samples): file:annotatedsamples.zip in all.zip
- Spreadsheet: https://docs.google.com/spreadsheets/d/10fV31D0QQ9flRAt3pgJNh8wVHq6n0ZRgc83-3QHb1OQ
- Inter-annotator corpus + annotations (ready for download):
- Fleiss Kappa evaluation: https://docs.google.com/spreadsheets/d/10fV31D0QQ9flRAt3pgJNh8wVHq6n0ZRgc83-3QHb1OQ/edit#gid=1905159313
Ontotext Remarks
- At the end we print "type" (Person, Organization, Place, or Thing which is generic)
- Our pipeline is tuned for English only, so some of the results on non-English records are ridiculous. We may send two variants: full and EN-records only.