Multilinguality
Europeana offers contents in more than 40 different languages, and is queried by users all around the world. This poses an important challenge in terms of multilinguality. Several experiments have been conducted to assess the possibility of using query translation, and different reports and evaluations has been done to improve the multilinguality in our contents.
Evaluation
- Extensive work on measuring the multilinguality of Europeana metadata has been undertaken through 2016 and 2017 by Peter Kiraly, Juliane Stiller, and Vivien Petras.
- A description and overview of the work can be found in the Multilingual Saturation of Metadata document, along with relevant links and instructions for the Metadata Quality Assurance Framework application.
- Report on quality metrics and improvement of multilinguality in Europeana (2017-08-31)
- Crowdsourced Multilingual Queries (Tim Hill had created a tool - mobsource - to gather queries from users including Europeana staff): See the attached tarball for a list of non-English queries submitted by users, along with their ratings of the resulting SERP and other comments.
Experiments
The experiments have been done in the framework of the Galateas Project to assess the quality of the translations of the queries. The final report on query translation can be found on this internal document: GALATEAS_D7_4.pdf. Additional information related to the experiments (also internal documents):
- Evaluation of a 250 query corpus in English, French and German performed within the Galateas project:
- Documentation of Creation of Gold Standard from Europeana Query Corpus
- Query corpus in English: file:English_corpus_Europeana.xml
- Query corpus in French: file:French_corpus_Europeana.xml
- Query corpus in German: file:German_corpus_Europeana.xml
- Evaluation of query translation using the Portal (done using the same corpus as in the Galateas Project):
Additional work
- Juliane Stiller and Vivien Petras (2016), White Paper on Best Practice for Multilingual Access
- Péter Király (2015) "Query Translation in European", Code4Lib Journal, Issue 27, http://journal.code4lib.org/articles/10285
- Juliane Stiller, Vivien Petras, Maria Gäde, Antoine Isaac (2014) "Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and Consequences", in: Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. EuroMed 2014, pp. 238-247
- J. Stiller, M. Gäde, and V. Petras (2010), Ambiguity of Queries and the Challenges for Query Language Detection, in CLEF 2010 Labs and Workshops Notebook Papers, ed. by M. Braschler, D. Harman and E. Pianta
- Multilingual Interface Preferences: http://dl.acm.org/citation.cfm?id=2637002.2637030
- Maria Gäde's PhD Thesis - Country and language level differences in multilingual digital libraries: http://edoc.hu-berlin.de/dissertationen/gaede-maria-2014-02-05/PDF/gaede.pdf
- Which Log for Which Information? Gathering Multilingual Data from Different Log File Types:http://link.springer.com/chapter/10.1007%2F978-3-642-15998-5_9
- Multilingual Access to Digital Libraries: The Europeana Use Case
http://www.degruyter.com/dg/viewarticle/j$002fiwp.2013.64.issue-2-3$002fiwp-2013-0014$002fiwp-2013-0014.xml - Cross-lingual information retrieval and semantic interoperability for cultural heritage repositories
http://www.aclweb.org/anthology/R13-1063
Europeana at CLEF
The Europeana Collection has been used in evaluation campaigns in CLEF, in the task Cultural Heritage in CLEF (CHiC) during the campaigns 2011, 2012 and 2013: