Recommendations for providing temporal metadata
Scope of this documentation
Dates are notoriously heterogenous in many types of resource cataloguing. Differences can arise, for example, where systems changed, or standards were either absent (i.e. free text fields were used for date entry) or changed over time, or even where responsible staff members left and new ones joined. In some cases where institutions use the same systems and standards, several dates can still be found associated with a particular item, which when aggregating data to a different source can result in some noise in the data and be unhelpful from a discovery perspective. Moreover, the process of aggregation can itself introduce issues when transforming data from one model into another that has fewer fields, or less specific fields for date values.
This section provides information on how dates are normalised by Europeana and includes recommendations for providers to submit (especially, format) dates so that they can work well with the data processing.
The recommendations also address the case of some date-related problems like lack of specificity in metadata elements used (cf the DQC problem pattern "Generic property is used while there is a more specific appropriate one"), the impossibility to assign an exact date of creation or use of an item, or changes of calendars.
Date normalisation at Europeana
Normalisation goal
The Europeana Data Quality Committee has adopted the Extended Date/Time Format (EDTF) as the optimal standard for the representation of dates in Europeana. Europeana has set up a normalisation process that aims to convert as many original dates as possible to this format, so that they can be exploited by downstream processes (for example for including dates and time ranges to the search filter options).
Note: This process is different from the automatic generation of the edm:year field, which is a mere extraction and simplification of (4-digit) years that can be found in metadata for the purpose of advanced searches.
Normalisation process
The normalisation process first checks if a provided date is valid against the EDTF or ISO8601 standards. EDTF is supported with level 1 compliance (consult the EDTF Specification for details), and ISO8601 is supported only in the “extended format”, which requires the use of the separator between the date components. The ISO8601 “basic format”, which omits separators (e.g. 20050924T100000), is not supported. If a date is not a valid EDTF or ISO8601 date, the process seeks to recognize patterns in the provided date values [cf appendix “Date patterns that can be normalised by Europeana”]
Normalised dates are attached to the equivalent element in the Europeana proxy in the EDM record for the object (while the original values remain on the provider proxy). Proxies in EDM are meant to represent metadata on an object, from the perspective of a specific data provider (a CHI, an aggregator, Europeana Foundation). For more detail please see the EDM Primer at https://pro.europeana.eu/page/edm-documentation . A visual explanation can also be found at https://pro.europeana.eu/page/linked-open-data#data-structure .
The normalised value is then stored in an instance of the contextual class edm:TimeSpan whose label is the date represented in EDTF. Both the original value in the provider proxy and the EDTF conforming label resulting from the normalisation process are displayed in the front end. Furthermore, links to Europeana entities for time spans (centuries), that result from enrichment processes, can show up as temporal metadata in the front end.
Two examples of the result of the normalisation process are shown below:
Example 1 - Original value
<dcterms:created>January, 1765</dcterms:created>
Example 1 - Normalised value
<dcterms:created rdf:resource="#1765-01"/>
[...]
<edm:TimeSpan rdf:about="#1765-01">
<skos:prefLabel xml:lang="zxx">1765-01<skos:prefLabel>
<skos:notation rdf:datatype="http://id.loc.gov/datatypes/edtf/EDTF-level1">1765-01</skos:notation>
<dcterms:isPartOf rdf:resource="http://data.europeana.eu/timespan/18">
<edm:begin>1765-01-01</edm:begin>
<edm:end>1765-01-31</edm:end>
</edm:TimeSpan>
Example 2 - Original value
<dcterms:created>13th century</dcterms:created>
Example 2 - Normalised value
Note: The zxx language tag reflects that the values of skos:prefLabel are meant to conform to a non-linguistic norm (here, EDTF), with possible values like 12XX (for the 13th century).
EDM properties normalised
The date properties that are subject of normalisation are the following:
In addition to these date properties, there are the following generic properties that are subject of normalisation:
Note: the above two generic properties are normalised only with highly reliable methods to minimise the risk of matching non-date values with the date patterns [cf appendix “Date patterns that can be normalised by Europeana” and recommendations on generic fields]. The current date normalisation does not process properties from any contextual entity (edm:TimeSpan, edm:Agent) that the provider would have contributed. It only considers properties of the provided CHO from the provider ore:Proxy.
Recommendations for providing dates
Formatting and conformance with standards
Of course, temporal metadata would ideally be provided in EDTF/ISO8601, but if data providers use one of the patterns that are recognized in the normalisation process (cf appendix below “Date patterns that can be normalised by Europeana”), these data will be processed.
Note that we expect dates formatted according to a certain standard to be fully conformant with it. For example, many date values using DCMI Period in the Europeana data include a period name but omit ‘name=’, for example ‘Fayum Neolithic Period; start=-5300; end=-4000’ instead of ‘name=Fayum Neolithic Period; start=-5300; end=-4000’. Such values are not conformant with the DCMI Period specification and therefore will not be normalised.
Europeana's normalisation process handles many cases of dates, be they expressed in cardinal form, e.g., "1800-01-01" or in ordinal form, e.g., "18th century", or using BC/AD qualifiers. But it is not able to handle every possible variation, especially across languages.
Examples include cardinal forms like "1st of January 1800" and some BC/AD patterns that are ambiguous across languages. For example, in the USA, dates are typically represented with the month first while most European countries represent the month in the middle. Also, the "eKr" abbreviation can refer to "BC" in Estonian and Finish (https://en.wiktionary.org/wiki/eKr.) and to "AD" in Danish and Norwegian (https://en.wiktionary.org/wiki/e.Kr.).
"Annotations" on dates, indicating for example a type of object lifecycle event, are also hard to process.
For example "ca. 1673 (Herstellung)" would require two steps: one for the annotation and one for 'ca.' Such annotations have benefits in terms of information communicated to website users who have found the item, especially when the class Event in EDM is still not yet implemented. But they make it more difficult to find it in the first place, as they are less machine-readable.
In the future, reporting about cases of failed normalisation could be included in the process of publishing metadata in Europeana or in a later step. In the meantime, providers who use specific formatting for dates should be cautious in what they assume for the normalisation process!
Other relevant recommendations
In EDM there are several properties that can be used for expressing dates of different events in the life of the provided cultural heritage object. Between dc:date and the more specialised dcterms:created, dcterms:issued, dcterms:temporal, always choose the most appropriate one. We know it is not always possible, but it is a pity to miss opportunities when existing information would lead to better (re-)user experience. [cf appendix “Use of specialised properties in the ARMA project”]
We have noticed that when providers map their dates to EDM they tend to provide the same date twice, both as a (numeric or) literal value and as a reference to an instance of the TimeSpan class. This should not happen as it may create redundancy which can result in a cluttered interface and a confusing user experience. Especially now that Europeana is able to normalise many of the provided dates and enrich them with additional information appended in a TimeSpan class, providers should be extra careful and try not to repeat information! Note that this works best if the TimeSpan is provided with a human-readable skos:prefLabel (as recommended in the EDM Mapping Guidelines) that can be picked up by display routines.
The normalisation process handles BC/AD dates. But these assume a calendar reference!
In EDTF and Europeana’s normalisation, only the Gregorian calendar is assumed and supported, so be careful that this will not cause some misinterpretations of dates. The transition from the Julian to the Gregorian calendars is expected to raise complications. It is unlikely that many metadata specialists undertake a specific conversion to ensure that their Julian dates, should they be known, can be exactly expressed in the Gregorian calendar. This means that timelines may often be (slightly) inaccurate or vary from institution to institution. This should not be a reason to not provide this information, however!
There are cases of temporal information about a cultural heritage object that actually don’t relate to the object (the edm:ProvidedCHO) itself, but for example to its digital representation. This information can be misleading for search and retrieval functionalities and date normalisation cannot fix this issue, as it is not rooted in the date formatting but the semantics. As explained in the One-to-One Principle, “conceptually distinct entities, such as a painting and a digital image of the painting, should be described by conceptually distinct descriptions.” This means that a record describing a painting should, for example, not include metadata about the date of the creation of the record, the digitisation process or about the file that resulted from it. Thus, our recommendation is to be careful when choosing the class for providing temporal information (e.g. date of creation of of a digitisation file should be added to dcterms:created in the corresponding edm:WebResource) and to avoid including dates that have no dedicated class (e.g. the date of catalogue record creation).
For many cultural heritage objects the metadata includes no concise dates or even dates, especially for older objects.
When recording the metadata giving a figure in numbers may be omitted because the exact date is not of primary concern, mostly because the time spans are conceptual ("Roman period") or broadly defined ("first half of first century AD"). Some of these time periods can be linked to dates that vary depending on the point of view of the person attributing the period to an object, or the beginning and end of periods can differ between countries/regions and depend on a region’s individual history (“Medieval period”).
This is fully acceptable, of course. Yet to add precision to the metadata, providers may sometimes still be able to express dates with edm:TimeSpans with approximate or partial date ranges (that can be normalised - at the moment normalisation of provided timespans is not yet implemented) in order to allow for exploitation of the temporal information on Europeana (e.g. filtering options). In other cases, periods may be instead expressed as (skos:)Concept, either with specific vocabularies like the Greek Historical Periods from Semantics.gr or by linking to Linked open data sources (which can also relate to time intervals) such as Wikidata or PeriodO , or a combination of both [cf appendix “Time periods in the Europeana Archaeology project”].
In this way it is possible to reflect the conceptual time period the metadata creator intended the object to be associated with. This is especially true when some of these sources come with time intervals. There is thus no concrete recommendation on whether to use a TimeSpan or Concept class - or both - for expressing time periods. Instead we want to make the data providers aware of the different options available to them.
The different ways to provide temporal data of time periods in Europeana come with their own advantages and caveats. An overview can be referenced in the appendix “Processing of timespans and concepts”.
Appendices
Date patterns that can be normalised by Europeana
Time periods in the Europeana Archaeology project
Processing of timespans and concepts
Credits and edit history
Editors: Antoine Isaac, Kristina Rose, Eleftheria Tsoupra, Adina Ciocoiu
Contributors: Nuno Freire, Fiona Mowat
Last update: Jun 12, 2024