Ops Team meeting topics (mid-2017)
New elements in EDM, EDM profiles
Changes to EDM schema
Minutes of the meeting on EDM changes at: https://docs.google.com/document/d/1jPSTfFs3IpnMjO9JFoubGB7Q7gs6ouOsFzWtN4XWlIY/edit#
Last EDM changes reported at:
https://docs.google.com/document/d/1jopc4L9Mc55YV3iAY3JSYcEnf8x4eMWZavlSWzDZSSY/edit#
Future changes listed in: https://europeanadev.assembla.com/spaces/europeana-ingestion/tickets/2404-edm-schema-changes-for-future-updates-/details#
EDM test records at https://app.assembla.com/spaces/europeana-ingestion/wiki/EDM_Test_records
Test record for EDM internal
ACTION: Kirsten to create two separate test records as a Google doc -- ONGOING
list of EDMInternal properties from Github: https://github.com/europeana/corelib/wiki/EDMObjectTemplatesEuropeana
https://docs.google.com/document/d/1z7cyuJuOKFO_JB_UTjDTEXuD7kLtZlm7nq7bINfD1Vo/edit?usp=sharing
Test record for EDM external
EDM profiles
- Full roadmap in smartsheet (you should all have access)
- Public roadmap at https://docs.google.com/document/d/1omFFf4KsNZAAnaOUvXOvxRc1nmrKcLHxrrB58KwLY2g/edit#
EDM extensions specified, implementation on the way
- Annotations: The EDM extension is split into two documents, one that explains the basics of the model (as a concrete implementation of the Web Annotation Data Model), the classes and properties, and a companion document which explains how the model should be used to support each of the application scenarios that have been implemented. It is being implemented as part of the AnnotationAPI
The Annotation EDM profile spec: Main EDM Annotation profile, Modelling of the Application Scenarios
- Review of Organisation profile. New requirements for METIS available at https://docs.google.com/document/d/1NLBJ6tSg1PBkEdPZ03qPnkHD_ox9vHDfiDpuAr46fzk/edit The review has been done. We need to publish the updated profile.
- Tables for users, datasets and organisations in Metis available here: https://docs.google.com/spreadsheets/d/1jrL_ie5NU7JnnlQ-kmv8gK_HbGMPMJ1gcrZ_a3y5WhQ/edit#gid=0
- Dataset profile still need some work/decisions on implementing desired features (e.g. notifications)
- Collection profile developed after the data modelling R&D work done at http://hdl.handle.net/2142/45860
Cloud pilot, could be used as data: https://docs.google.com/spreadsheet/ccc?key=0ArFeVeAoD0YBdE1YSkJIT2hfMjZQR285QUxhdGxVbVE
Kate is interested in mapping some of the CARARE data to the collection profile. ESounds used the profile. Creative has user generated sets and saved searches. TEL could migrate TEL collections in EDM to EDM
will be implemented via the User Sets (and one element in the annotations to represent membership in virtual exhibitions and thematic collections)
DDB starting to model their implementation
- EDM for sounds http://pro.europeana.eu/web/network/europeana-tech/-/wiki/Main/Task+Force+on+EDM+profile+for+Sound
- Technical metadata profile Implemented as part of CRF
- Mapping to IIIF at http://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_profiles/IIIFtoEDM_profile_042016.pdf
ACTION: Call with DDB. They have now updated their Organisation profile and examples. We need to have a look again.
EDM for collections
Questionnaire at https://docs.google.com/forms/d/e/1FAIpQLSfo9UeMhI86F9Uzr0ST1DnT4CFMcQYSclJjnybny1RWlHDr3g/viewform?c=0&w=1 + discussion on Opsteam list
Possible EDM extensions being discussed
- Adaptation of the existing Rights part of EDM (for statements with deprecation date) will be discussed with the RightsStatements.org group. Proposal to RS.org group at https://basecamp.com/1768384/projects/11769988/messages/62190960
- Representing automatic enrichments. Projects that have them: DDB, APEx, SOCH, MIMO, LOCloud, TEL, DM2E, Food and Drink, Fashion. Fashion profiles creates momentum for this. Implementation (not ideal solution, but still) of enrichments as annotations is being discussed
- Representation of Full-text in EDM as part of TEL migration and Cloud (Europeana Newspapers)
Longer-term EDM items:
Representing Europeana links (internal to Europeana data space) derived from provider-sent links, for hierarchical objects, edm:isDerivativeOf, etc. This should be done on the Europeana proxies (e.g. an (edm:isRepresentationOf,dm2e.eu/blah) on provider proxy is replicated into (edm:isRepresentationOf,europeana.eu/blah) on the Europeana proxy.
Validating rights / see ticket https://europeanadev.assembla.com/spaces/europeana-ingestion/tickets/realtime_list?ticket=561 (for Metis)
Finished EDM task forces (for reference):
Data documentation
- EDM documentation
- ESE documentation
- Survey on the EDM mappings, refinements and extensions after the Aggregator Forum is updated and available from http://pro.europeana.eu/share-your-data/data-guidelines/edm-profiles
- Content strategy published http://pro.europeana.eu/publication/content-strategy
- Europeana Publishing Framework
Re-shaping EDM Documentation Robina has proposal for reconfiguring the documents ("how to") to meet data provider needs.
Is Marie-Claire's metadata brief (https://docs.google.com/document/d/1PUwINOvMxyRg2qQzYLOJfufWvTKxDQWomyzgO3YvEWU) fitting Robina's recommendation? Someone should to check whether it would be possible to create the EDM how to by mixing the brief and Robina's draft.
Last update of EDM guidelines and definition October 2017
ACTION: Valentine will update the IIIF Profile with the one change about svcs:Service + check if Dataset and organisation profile need update
- All-to-EDM Mapping template: https://drive.google.com/open?id=0BwwUW142mo9gYm9RRkRFT3pCMlU
To be used internally, for the ingestion team. To keep track of the ideas behind (XSLT) mappings.
EDM ingestion from partners
- Page with older or paused EDM experiments with partners, and relevant EDM features spotted in datasets
- Tracking most relevant issues and answers for EDM data ingestion
- List of existing EDM mappings, refinements or extensions
- Contextual vocabularies ingested from Partners
- Full list of Europeana activities. See Collections with IIIF content section at https://europeanadev.assembla.com/spaces/europeana-r-d/wiki/IIIF
Ongoing:
- DDB (Kirsten) working on their profile
- Europeana Space (Pierre)
"AthenaRC has developed two new micro-services: (1) a micro-service that allows mapping of subject terms to a common thesaurus (in this case AAT) and (2) a micro-service that maps temporal information to a common thesaurus (in this case Perio.do). Both micro-services require that the content providers create the appropriate mappings using MORe."
http://perio.do/guide/#finding-and-using-a-uri-for-a-period-or-collection
http://www.europeana.eu/api/v2/search.json?wskey=api2demo&query=when:*n2t*
http://mint-events.image.ntua.gr/wp-content/uploads/2016/03/Gavrilis-MoRe-presentation-Technopolis.pdf
- OpenUp/CommonNames! Do we get CommonNames data in a useful way? (i.e do the URI they send us de-refer to RDF)?
- There are two types of links , Pierre is talking with Gerda
Follow-up with Gerda now on the structure of the CommonNames vocabulary (Pierre, Valentine, Hugo)
- Europeana 14-18: new revision: new revision: some data is now prepared by Richard with new mapping. Data being tested but issues within UIM (difference in number of CHO/records in proxy). Ready for ingestion. Just wiating for CRF
first was https://europeanadev.assembla.com/spaces/europeana-npc/tickets/1684-europeana-1914-18--create-complete-list-of-complete-field-mapping-14-18-data/details#
then this https://europeanadev.assembla.com/spaces/europeana-npc/tickets/1689-spike--best-way-to-represent-stories-items/details#
then supposedly new mapping will be https://europeanadev.assembla.com/spaces/europeana-npc/tickets/1855-create-an-updated-mapping-that-takes-hierarchies-into-account/details#
- IIIF ingestion
Marjolein working with Nuno: NLWales Photography to replace current data > some issues on their side, but looking good. Testing with Wellcome library of improvements with current data
Issues are on descriptive side or IIIF data?
Mostly richer descriptive data, IIIF is good. No news
Swedish museum [mysterious name] with IIIF (Pablo) working to map to EDM
Follow-ups from IIIF Vatican conference, after Nuno's presentation on harvesting expeirments:
- BSB is interested in IIIF harvesting. Maybe we should try to see when DDB would be ready, and potentially discuss with them if they'd be ok with BSB sending us stuff 'on the side' if they won't be IIIF compatible before long time?
Can be interesting for KPIs
ACTION: Kirsten to communicate with DDB about timeframe for implementing IIIF
- Durham university is also interested with contributing some IIIF collections
- also Heidelberg. much longer-term though (that would go from a research dept through their UL)
ACTION: Ingestion team to update the list of vocabularies used by data partners
NB: there is already a list of problems at https://europeanadev.assembla.com/spaces/europeana-ingestion/tickets/1992-list-of-bad-enrichments-to-be-removed-from-uim/details?tab=followers To be updated with automatic enrichment problems (entities to be removed)
Changes from http to https might impact some embeddable players. Will impact BnF, Dismarc .... some sounds datasets
Data quality
Issues with language tags: https://europeanadev.assembla.com/spaces/europeana-npc/tickets/927/ (nothing will be done on this issue in the short term).
- Have a vocabulary agreed and available to normalise dc:type. ONGOING
https://docs.google.com/spreadsheets/d/1kqazJP74zNcsRmLsQgxoxqctQz8hMDdBChVSwHBwRzM/edit?usp=sharing
- Vocabulary was reviewed
- libraries for feedback on TEXT. Adina got a reply from Serbia.
- Fashion for feedback on Fashion types
- shared the vocabulary with the Rise of literacy
Draft report from Pablo https://docs.google.com/document/d/1_o2k_aqc4qcE9QmA73MjiG26AocWvC8ift-tgRNAEhM/edit (ACTION: open comments rights) > missing words 'multilingual' and 'searchable' --> this needs to be in the report
There may be need of guidelines for using specific values (or parts of the vocabularies) with specific EDM properties (dcterms:medium, format, etc).
It's dangerous to risk that a concept used with dc:format (e.g. by Fashion) would be used with another property by another aggregator.
Feedback from Kate Fernie
Call with EFashion. They will provide feedback to refine the list for Fashion items - Data Quality Plan (Kirsten and Pablo). https://docs.google.com/document/d/1bveUqx1KJP35UVrkpk3rJeC2axtm6MKaE-Af4fW5qC0/edit (template/ can be used as an introduction to the idea of Data Quality Plan
- Introduction letter to DSI partners (Draft)
https://docs.google.com/document/d/1uF7uv7dAviP5LGrvjv8D6vylTDYA6NyEtJbPYw-RELw/edit?usp=sharing
- Data Quality Plan progress report document
https://docs.google.com/document/d/1QqfYNrvWE_0oPI9yEyoderfi6JTaKyLrYOUWGl43LaU/edit?usp=sharing
MdV: Draft data quality plan has been shared with Fashion and should be agreed on by the 15th of August.
Carare- still in progress - i would ike to include those or some of in the DQP. It needs to be agreed with Kate
https://docs.google.com/document/d/1OjkcRG-CxTU1JJbgV2bvG0ndWcV53ArnrWNt0vjVp-M/edit?usp=sharing
Euscreen - done https://docs.google.com/document/d/144lOhwL3dOIO1dkJBHzmIludPE00Kzfc5G22XiWMphM/edit?usp=sharing
Photoconsortium - sent, waiting for their reply
https://docs.google.com/document/d/19OhisD3jhUtvGXs1bj-k5C18KFj0qrdHinzozz2XeMI/edit?usp=sharing
Fashion (finalised): https://docs.google.com/document/d/1p_tjHEv0qy2HJv_U4DESZA0-l2fPSumxDzWQXX4HYuA/edit?usp=sharing
Museu - still in progress https://docs.google.com/document/d/1_e3eTodKgz-ElgpkxgHofc0lZy55alu5PjLWW2kWLBQ/edit?usp=sharing
Sounds (finalised) https://docs.google.com/document/d/1XAkxThdkPMvSzm8HLrr7q___LWb-kEklhlePRRtaE1M/edit?usp=sharing
Open-up - sent, waiting for their reply
https://docs.google.com/document/d/1iOtymVnaJlVCk0zhJXkwktJ2geM5ONYUp6SAQ0LOSOc/edit?usp=sharing
EFG: to come
APEF - sent, waiting for their reply
https://docs.google.com/document/d/1KmIvrVq0mdfE76APn_iF2dNpPpmB8UeWU4-yrjZUf3M/edit?usp=sharing
- Problems pattern catalogue
To be distributed to data providers https://docs.google.com/spreadsheets/d/1atZr1w-h9AdWwWSBYLCCk6fAdJSxrCNP56QRLNY1jLg/edit#gid=1801176604
- Update from DQC
Completeness measure
- Collaboration with Peter on completeness measures + overview of related work at https://europeanadev.assembla.com/spaces/europeana-r-d/wiki/Task_Group_on_data_quality_
- Work on completeness requirements has been reactivated.
- Draft available at https://docs.google.com/document/d/1--_vWh9CMfH3yMJZ7X6MaGTaDIo3Wjo3UYqnLY-2pbQ/edit
Problem patterns
ACTION: everyone to look at the sheet and provide feedback
https://docs.google.com/spreadsheets/d/1zoU-1uPk2O5t5zRC1-MD3LakBQGJ2hsWlSnp3XS2iAk/edit?disco=AAAAA2-Id5E
Kirsten and Henning made a plan to disseminate patterns to providers. Currently in DQC.
20th century black hole
https://github.com/hugomanguinhas/europeana/blob/master/rd-exp/experiments/BlackHole.md
http://pro.europeana.eu/blogpost/the-missing-decades-the-20th-century-black-hole-in-europeana
Document was amended with comments from Hugo and sent to Kennisland. But we don’t know how Kennisland is using it
date enrichment and normalization will be considered in the coming prioritizations for enrichment and normalization. Discussions were continued as part of the definition of the DSI 2 KPIs.
Kennisland is re-starting work. Pablo is liaising with them.
Enhance provider data
use ISNI?
- Pablo's analysis: https://docs.google.com/spreadsheets/d/1l6Pot1-hZhpst_HDPJbYpG9pQynhPIDEFdCIqhdCQC8/edit?usp=sharing
- Not much perspective of Europeana registering providers in ISNI (as part of customer relationship management). This is long-term perspective only.
Milestone on the representation of accurate information for providers or data providers: http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_DSI/Milestones/europeana-dsi-ms1-specifications-for-the-accurate-representation-of-data-providers-names-in-the-dsi.pdf
Pablo working on the cleaning of organisation names- ONGOING- R&D team will consider matching our current providers name with ISNI providers ('ISNI enrichment') as an action on Hugo and Nuno to make Assessments and recommendations for enrichment tools and rules for July.
Cécile: Move to URIs (replace data provider literals with URI) is one thing. We will need to have a workflow in place: how do organizations register to get an identifer? Do we then ask them to use the identifier in the data? If yes, we need a communication plan because the change is huge.
Some data providers like DDB have organisation data. Should we harvest these data or not?
Relation with https://basecamp.com/1768384/projects/1000684/messages/68533674
Cécile: Workflow for organisation for METIS needs to be refined. How to ingest Organisation data from data providers, what will the process of attributing identifiers.
No feedback yet to Aggregator Forum feedback.
What type of data we accept: with Europeana ID. But after feedback from data providers do we continue on the same line, + owl:sameAs, form on Pro to get Org.
How do we create new organisations? Registry within Pro, or an email where they are provided with their Identifier?
Misunderstanding? by ID we mean a URI to resources, while their identifiers are catalogue numbers.
Do we accept these others identifiers, map it somewhere else?
Pablo: i'd say stick to original plan.
Cecile: there will be no changing data in METIS for first release. What data do we get in, and what do we want to get in.
ACTION: DPS to think of process flow needed for getting Organisations identifiers.
https://docs.google.com/spreadsheets/d/1BFmJidtdsSVEA10lcbKKAqZEH13cCkWhd-GYJaCUifU/edit
--> the meeting happened, some things still need to be discussed about how we envision our organization inventory. Henning met with Dasha and Aubery following the first meeting to start evaluating the process with Zoho in mind.
Probably nothing will be done before September. Cécile will follow-up with Dasha
Conversation required between Network team and ingestion team to discuss the relationships between the inventory and Zoho
DSI2 KPI on deduplcation of providers at risk. Discuss how we want to report on this.
Same applies to the completeness measure and dc:language normalized.
Normalization
Normalising providers and data provider names is on the way (Pablo) Plan had been written https://docs.google.com/document/d/1jLy971Zwpv9qu7hL1DzMsfREzAtMoJWUPhJnj6D25UY/edit
Normalization service for Metis requirements: see https://europeanadev.assembla.com/spaces/europeana-ingestion/tickets/2156-metis-requirements--cleaning-and-normalization-service-v1/details#
https://docs.google.com/document/d/1nJKZk7xgXXiCBAzA423MLT4V94-b8VEgIv0OCz6X68Y/edit
Overview of the different quality work started or needed at Europeana available at https://docs.google.com/document/d/1YW6829VGl1LSc-tguSLFh-tvvwrw88-SXqe1RfeNCd4/edit
Normalisation language: Nuno developed a new plugin to normalise the values of dc:language. We are now evaluating the first results of this normalisation
https://docs.google.com/spreadsheets/d/1Z-CGWr6rS7lkcGK75a4UxzRzrd80_O6L_s5DAuj_9uM/edit
Enrichment and entity collection
Documentation on tickets and progress:
https://www.assembla.com/spaces/europeana-ingestion/wiki/Enrichment_work_and_entities_collection
Old actions and pointers:
- Report of Quality Issues (including OCLC duplicates) and Actions to Improve the Metadata quality :https://www.assembla.com/spaces/europeana-r-d/wiki/Data_quality_
- 4 Starts system from LOD-LAM http://lod-lam.net/summit/2011/06/06/proposed-a-4-star-classification-scheme-for-linked-open-cultural-metadata/
- Task Force on multilingual Enrichment http://pro.europeana.eu/web/guest/network/task-forces/overview
- Queries for measuring quality -> benchmark for LOD pilot
- DM2E stats. Paper and table sent by email. Visualization at http://data.dm2e.eu/visualize/index.html.
- Other stats on the outdated LOD pilot: http://demo.seco.tkk.fi/aether/. For Europeana
METIS Requirements
Technical design plan version 3 published: https://docs.google.com/document/d/1zlOMDsrb1TTBtomdrzpkmOqmg4t9nMXc7mk6ryuUo-Y/edit
See also : https://www.assembla.com/spaces/europeana-ingestion/wiki/Metis_ALL
Design work progressing: https://projects.invisionapp.com/share/246QE59X7#/screens/219433550
Reindex
Document for re-index is here: https://docs.google.com/document/d/1lkea78ZgjkngiDqGCAqGT-Vuu_1eoElSE2B_kjg1EfY/edit
Events-Conferences
EuropeanaTech conference table at
https://docs.google.com/spreadsheets/d/11r3O7XhQDYz_2wWwYFP1K0FI54lMSPdXcuBHzrlivsM/edit#gid=1291848083. Interesting conferences and call for papers can be flagged there.