Introduction
The International Image Interoperability Framework (IIIF) is a range of standards that were initially designed to offer a standardized and scallable way to describe how digital objects composed of one or more images can be displayed to an end-user. This offers the flexibility to developers and users to respectively develop and use different IIIF compatible viewers according to their needs and preferences. Since its first release, it has grown to cover other types of media such as sound and video, and soon also 3D.
The Europeana APIs offer for IIIF covers the Presentation API (versions 2.1 & 3), the Content Search API (version 1) and make use of the IIIF Image API when made available by content providers. Additionally, Europeana’s IIIF offer also includes support for fulltext (such as transcriptions, translations of transcriptions, captions and subtitles) via the IIIF Fulltext API using an EDM extension for fulltext. These APIs follow the specifications defined by the IIIF consortium.
Before starting to use these APIs, we recommend reading the Registering for an API key and reading the Terms of Use. If you want to get started with these APIs, go directly to the Getting Started section or try it out directly on the Console. If you want to get regular updates about the Europeana API, provide feedback and discuss it with other developers, we suggest to join the Europeana API discussion group at Google Groups.
IIIF datasets in Europeana: A Scholar’s delight
The recently launched Europeana Media player brings Europeana into a new era of International Image Interoperability Framework (IIIF) compatible, interoperable and unified playout of audiovisual heritage material online.
IIIF & Europeana Working Group
This dedicated IIIF & Europeana Working Group follows the first of the proposals from the Task Force 'Preparing Europeana for IIIF involvement.'
Impact Assessment report: EuropeanaTech and IIIF
This assessment looked at the impact of EuropeanaTech’s members, steering group and the Europeana Initiative’s work on IIIF - read a summary of the research and download the full report.
Getting Started
Retrieving a manifest
A manifest describes the information needed for a viewer to display a digital object to the end-user, such as basic metadata such as a title and description, and the content that makes part of the digital object. The manifest is not meant to present all the descriptive metadata associated to a given digital object but just the bare minimum for a user to grasp what it is about. If you wish to access the full metadata for an item, see the Record API. The manifest also offers links to the Record API using the “seeAlso“ field.
Presently, both version 2.1 & 3 of the IIIF specification are supported. The novelty of version 3 is that it also covers audio and video besides images.
Request
https://iiif.europeana.eu/presentation/[RECORD_ID]/manifest Accept: [ACCEPT]
Parameter | Description |
---|---|
RECORD_ID | The identifier of the record which is composed of the dataset identifier plus a local identifier within the dataset in the form of "/DATASET_ID/LOCAL_ID", for more detail see Europeana ID. |
format | A convenience parameter used to indicate the version of the IIIF Presentation API. Indicating the format within the Accept header is the preferred way to request a specific version. This parameter should not be used if a profile is indicated in the Accept header. |
Response
Retrieving full-text content
The term full-text is meant to refer to the correlation between the content resource (e.g. image, audio or video) and its textual representation (ie. transcription, subtitle, caption). In the EDM profile, the textual representation of the content resource is referred to as Full-Text Resource while the relations between the segments of the text and the coordinates in the image are referred to as Annotations.
Annotation Pages
An Annotation Page contains all the annotations that make up the full-text of a content resource (ie. image, audio or video). It is referred to by the Manifest and can be accessed via the following request.
Request
https://iiif.europeana.eu/presentation/[RECORD_ID]/annopage/[PAGE_ID] Accept: [ACCEPT]
Parameter | Description |
---|---|
RECORD_ID | The identifier of the record which is composed of the dataset identifier plus a local identifier within the dataset in the form of "/DATASET_ID/LOCAL_ID", for more detail see Europeana ID. |
PAGE_ID | The identifier of the annotation page. |
Header | Description |
ACCEPT | Used to indicate the format and version of the format. The following indicate the Accept header to be used for version 2.1 and 3 respectively:
|
Response
The response is a JSON-LD structure composed of the following fields:
Annotation
An Annotation specifies a single relation between the full-text resource and the content resource (ie. image, audio or video). It is referred to by the Annotation Page and can be accessed via the following request.
Request
https://iiif.europeana.eu/presentation/[RECORD_ID]/anno/[ANNO_ID]
Parameter | Description |
---|---|
RECORD_ID | The identifier of the record which is composed of the dataset identifier plus a local identifier within the dataset in the form of "/DATASET_ID/LOCAL_ID", for more detail see Europeana ID. |
ANNO_ID | The identifier of the annotation. |
Response v2.1
The response is a JSON-LD structure composed of the following fields:
Fulltext Resource
The full-text resource represents the (image or audio) transcription of a single content resource (e.g. a page of a newspaper or manuscript). A full-text resource can be accessed separately from an annotation or annotation page using the following method.
Request
https://api.europeana.eu/fulltext/[RECORD_ID]/[FULLTEXT_ID]
Parameter | Description |
---|---|
RECORD_ID (path) | The identifier of the record which is composed of the dataset identifier plus a local identifier within the dataset in the form of "/DATASET_ID/LOCAL_ID", for more detail see Europeana ID. |
FULLTEXT_ID (path) | The identifier of the full-text resource. |
lang (query) | A parameter used to request full-text in a specific language, if available. The value must be a two letter ISO639 code and match the languages that are supported by Europeana. Available values : en, nl, fr, de, es, sv, it, fi, da, el, cs, sk, sl, pt, hu, lt, pl, ro, bg, hr, lv, ga, mt, et, no, ca, ru, eu |
Response
The response is a JSON-LD structure composed of the following fields:
Searching on full-text
There are two methods for searching on full-text, one for searching across all items where fulltext is available and a second to search within the fulltext of a single item.
Search accross all items
This method adopts the same API structure and functionality as the Search API with the addition of 2 search fields and one profile as described below. For more information on the other methods, see the Search API documentation.
Request
https://api.europeana.eu/fulltext/search.json
Query fields | Datatype | Description |
---|---|---|
fulltext | Text | Allows searching on the transcribed text (ie. full-text) of the item. |
issued | Date | A date field reflecting the date of the Newspapers Issue. |
Profile | Description | |
hits | Displays the mentions in the transcribed text where the search keyword was found. | |
Parameter | Datatype | Description |
hit.fl (optional) | List (String) | A comma- or space-separated list of fields from which hit highlighting should be generated. A wildcard of “*†(asterisk) can be used to match multiple fields, such as “fulltext.*†or even “*†to highlight on all fields where highlighting is possible. If omitted default to “*â€. |
hit.selectors (optional) | Number | Specifies the maximum number of highlighted selectors (ie. snippets in Solr) to generate per result (ie. record). If omitted defaults to 1. It is possible for any number of selectors from 1 to this value to be generated, up to a limit of 10. |
Accessing images in high resolution: downloading data
To foster the reuse of the data that is published in Europeana as part of the Newspapers Thematic Collections, we make both the metadata and the full-text available for bulk download as compressed zip files. The metadata is available as CC0 the same way as all the metadata exposed via the API (see Terms of Use) while the full-text is available as Public Domain Mark.
List of datasets
The table below lists all the datasets that are published and available for download. If you are looking for the complete text of a Newspaper then we suggest using the (4) option, as opposed to using (3) where the trascription is partioned per page.
Given the fact that the files are very big and can take many hours to download, as an alternative to download directly via the browser, you can login to the FTP server at "download.europeana.eu" with username "anonymous". This will allow you to resume if the download gets stuck.
dataset number | Metadata1 | Full-text (ALTO)2 | Page level full-text (EDM)3 | Issue level full-text (EDM)4 |
---|---|---|---|---|
9200300 | (229M) (MD5) | (63G) (MD5) | (116G) (MD5) | (113G) (MD5) |
9200301 | (37M) (MD5) | (13G) (MD5) | (20G) (MD5) | (20G) (MD5) |
9200338 | (213M) (MD5) | (158G) (MD5) | (278G) (MD5) | (277G) (MD5) |
9200339 | (39M) (MD5) | (11G) (MD5) | (21G) (MD5) | (17G) (MD5) |
9200355 | (212M) (MD5) | (97G) (MD5) | (159G) (MD5) | (157G) (MD5) |
9200356 | (137M) (MD5) | (40G) (MD5) | (17G) (MD5) | (17G) (MD5) |
9200357 | (23M) (MD5) | (5G) (MD5) | (9G) (MD5) | (9G) (MD5) |
9200396 | (4M) (MD5) | (849M) (MD5) | (2G) (MD5) | (1G) (MD5) |
Legend:
The original metadata in EDM XML format before being ingested into Europeana. There are slight differences between this data and the one published. For more information see the /wiki/spaces/EF/pages/2385313809.
The full-text encoded using ALTO (Analyzed Layout and Text Object) as it was delivered to Europeana. The ALTO is an open XML Schema meant to describe text coming from OCR and layout information of pages for digitized material. For more information see the official documentation page at the Library of Congress.
The full-text encoded using the EDM profile for IIIF fullltext after being preprocessed for publication in Europeana. A note that as opposed to the format used by the API (ie. JSON-LD), the data is in RDF/XML as it is the format used for ingestion into Europeana.
Very similar to (3) but wih the full-text represented at the Issue level. This means that the edm:FullTextResource will convey the complete transcription of the Newspaper.
Dataset structure
On each compressed zip file, there will typically be a file per each item (ie. metadata or issue level full-text) or page (ie. ALTO and page level full-text) with the following structure:
Item | DATASET_ID/LOCAL_ID.xml |
Page | DATASET_ID/LOCAL_ID/PAGE_ID.xml |
That structure can be translated into links to the Europeana Collection portal where the item can be displayed or into the several APIs described on this page.
Source code and changelog
As mentioned in the introduction, the IIIF APIs are made up of several distinct APIs, each one with its own project in GitHub and changelog as listed below.
API | Last version | Description |
---|---|---|
IIIF Manifest API | Supports only the retrieval of manifests. | |
IIIF Full-text API | Supports the retrieval of full-text, both the Annotation Pages and Full-text resources. |
Add Comment