Metis Sandbox User Guide
- 1 1 Goal of the Metis Sandbox
- 2 2 Where to find the sandbox
- 3 3 How to prepare your dataset
- 4 4 General interface elements
- 5 5 The Basics: arriving at the Metis Sandbox
- 6 6 Upload a new dataset
- 7 7 Dataset processing
- 8 8 The tier calculation report
- 9 9 Problem patterns
- 9.1 9.1 Dataset / Overview
- 9.2 9.2 Record
- 10 10 Tier Zero Records
- 11 11 Troubleshooting
1 Goal of the Metis Sandbox
The Metis Sandbox is a test environment for your data. It consists of a set of tools with which you can:
simulate ingesting and running the Metis workflow on your data,
see what your records would look like on the actual Europeana.eu portal,
get insight into the quality of your records.
2 Where to find the sandbox
The Sandbox can be accessed through https://metis-sandbox.europeana.eu/ .
3 How to prepare your dataset
It is advised to make sure that your dataset can be used with the Metis Sandbox. Some things to keep in mind:
A dataset for the Metis Sandbox can currently not exceed 1,000 records. If your dataset is larger than that then you will see a warning message indicating that only the first 1,000 records will be processed.
A dataset should contain one record at minimum. If your dataset is empty, you will see an error message.
The records in your dataset needs to meet the requirements of the Europeana Data Model (EDM) external. More information on the EDM can be found on Europeana Data Model | Europeana PRO . If records are found that do not conform to this schema, you will see error messages. Note that you may choose to provide an XSLT file with the dataset, which the Metis Sandbox will use to try to transform your records into the correct format before validating them against the EDM external specifications.
A dataset must be either uploaded as a zip file or it can be sent via HTTP (i.e. zip file download) or OAI protocols.
Scheduled and unscheduled clean up
The Metis Sandbox is a testing environment for data, it does not aim to retain data indefinitely. This means that the Metis Sandbox gets cleaned up regularly. All data uploaded more than one month ago may be removed.
Additionally, during system maintenance or the release of a new Metis Sandbox version data may be removed at any time. Where possible, these events will be announced beforehand.
Datasets that are deleted from the Metis Sandbox will need to be uploaded again if you wish to access the tests and reports.
4 General interface elements
There are several different methods to interact with the sandbox. Below is a list of the general interface elements and their uses.
4.1 Page header elements
The page header contains two navigational elements, both of which are visible at all times.
The ‘hamburger icon’ (the three horizontal lines) on the left opens the side panel with external links and the theme selection.
Furthermore, clicking on the Europeana logo brings you back to the welcome screen at any time.
4.2 Buttons
Buttons are used to upload a dataset.
4.3 Input fields
Input fields are the white boxes where information can or must be entered by you. The description with the input field states what information should or can be entered. Input fields that are required to have a value can be recognised by an asterisk*.
Invalid or missing entries will result in an error message to be displayed below or next to the input field.
4.4 Page Indicators
Page indicators are shown at the top of the page. They behave as tab headers: clicking on an orb will navigate to the corresponding page. The number of page indicators can vary depending on your use of the Sandbox.
The active page is shaded orange with a yellow border (in the image below, the “Upload Dataset” is the active page). This is also indicated by the page title to the left of the orbs.
There are page indicators for the following pages (going clockwise from the orange item):
Upload Dataset
Track Dataset
Problem Patterns – dataset overview
Problem Patterns – record report
Tier calculations – record report
In addition, a page indicator can display a page’s state. In the example below the page indicator for the Record Report shows that:
Data has loaded
The values in the id fields (supplied by you and always available) reflect the data being displayed, i.e. the form is “clean”
A cog indicates that the page is busy. An example of this is when a new dataset is still being processed.
Note that the page indicator (orb) for Problem Patterns can appear twice: once for viewing problem patterns of a dataset and once for viewing problem patterns and individual records.
4.5 Links
Links are used to navigate between pages or to open popups in the Sandbox. There are different types of links used in the Sandbox interface.
For example:
The “track a new dataset” link takes you to the corresponding page.
The view detail links open up a popup that displays the details of an error.
Links with a warning sign open up a pop up with more information.
Links with a light bulb take you to the page with more information.
Underlined links switch from view of the tiers.
External links have an icon.
Some links, when hovered, show a small “copy” button which if clicked will copy the link (the URL) to your clipboard:
Links can be greyed when required information is missing. The image below shows that the Track and Issues links are greyed out because there is no information in the input field left of the links.
4.6 Drop-down menus
Drop-down menus allow you to make a selection of a list of predetermined values.
5 The Basics: arriving at the Metis Sandbox
5.1 The Welcome screen
The default view, the screen you land on when navigating to the tool is the Welcome screen.
You can click ‘GET STARTED’ to navigate to the Home screen (see section below).
The page indicators are already active, so you may for instance use the upload icon (the left-most one in the example above) to take a shortcut and navigate directly to the dataset upload form.
5.2 The page header and side panel
These two page elements are present and functional at any time, in any Metis Sandbox page you may find yourself in.
Most useful is the ‘hamburger icon’:
This icon opens the side panel. This panel contains three external links and a theme related option:
The available links are:
A link to training material, that can be used to try out some of the Metis Sandbox functionality in a more controlled setting.
A link to the feedback page, that also contains a helpdesk functionality. You can register a bug here, ask for support or suggest a new feature/improvement.
You are strongly encouraged to use the feedback page in case you find a bug, if you need support or if you come up with an idea for a new feature or the improvement of an existing one. The Europeana Foundation is committed to keep improving the Metis Sandbox for its users.
A link to the User Guide (which is the document you’re currently reading).
Additionally, you will find an option to switch (toggle) between the two available themes.
5.3 The Home screen
This screen allows you to start accessing the Metis Sandbox functionality. Here you can track an existing dataset, request information about a record within that dataset or create a new dataset. It looks like this:
A. Page Indicator: indicates that "Dataset Processing" is the current step. Once other steps become available then clicking this will return you to this step.
B. Dataset Id Input: used to enter the id of a previously uploaded dataset.
C. Record Id Input: used to enter the id of a record within the specified dataset. It enables when a dataset id is entered.
D. Create New Dataset Link: enables and navigates to the “Upload a new Dataset” functionality (see below).
E. Track link. This link enables when a dataset id is entered and, when clicked, takes you to the “Dataset Processing” functionality (see below) for the dataset with this dataset id.
F. Issues (Overview) link. This link enables when a dataset id is entered and, when clicked, takes you to the “Problem Patterns” functionality (see below) for the dataset with this dataset id.
G. Issues (Record) link. This link enables when a record id is entered and, when clicked, takes you to the “Problem Patterns” functionality (see below) for the record with this record id.
H. Tier Report link. This link enables when a record id is entered and, when clicked, takes you to the “Record Report” functionality (see below) for the record with this record id.
When you type a dataset ID or a record ID, a green link will appear in the input field. If you click it, you will be taken to the dataset or record preview as it would look like on Europeana.
6 Upload a new dataset
To create a new dataset click on the “create a new dataset” link at the bottom of the home screen (D in the image above). This will take you to the “Upload Dataset” form.
6.1 The Upload Form
The “Upload Dataset” view looks like this:.
A. Step Indicator: clicking this will take you to the “Dataset Processing” step.
B. The dataset name input field. A dataset name is valid if it contains only letters, digits and the underscore character (‘_’).
C. The dataset country drop-down.
D. The dataset language drop-down.
E. The harvest protocol radio button set.
F. The zip file input. This appears because “file upload” is the selected protocol. If the selected protocol is changed to “OAI-PMH upload” or “HTTP upload” then an alternative field (or set of fields) will appear here.
G. Step size field.
H. An (optional) checkbox to specify that you want the Metis Sandbox Server to transform your dataset using XSLT. If selected then a file input will appear below it allowing you to upload an XSL file.
I. The “Submit” button: enables when all the (obligatory) fields have been completed.
J. Step Indicator (inactive): indicates that "Upload Database" is the current step. If you switch to another step then clicking this will return you to this step.
Enter a descriptive name for your dataset in the input field below “Name”. Only letters, digits and the underscore character (‘_’) are supported. You can select the country and language of the dataset with the dropdown menus.
The next step is to determine the “Harvest protocol”: how you will upload your dataset. This is described in detail below. The “Submit” button at the bottom left will be enabled when all information is filled in and valid.
6.2 The Harvest Protocol
There are three ways to upload your datasets to the sandbox:
File upload: upload an archive (e.g. a zip file)
OAI-PMH upload: Ingestion with OAI-PMH
HTTP upload: ingestion via a hosted archive (e.g. a zip file) on a server through HTTP or HTTPS
6.2.1 Zip File
The “File upload” protocol is selected by default. This option allows you to upload an archive file with a dataset that is stored locally. The supported archive types are .zip
, .tar
and .tar.gz
archives.
Note that, even though it is not currently possible to upload multiple archive files, you can still achieve the same result by wrapping all your archives in one new zip file. The application fully supports nested archives (i.e. zip files of zip files).
6.2.2 OAI-PMH
To use the harvest protocol to OAI-PMH, you should enter values for the harvest URL, the metadata format, and optionally a setSpec value. For more details on these, please see the OAI-PMH specification.
6.2.3 HTTP(S) upload
You can also specify an archive that is accessible with a URL. Set the harvest protocol to “HTTP upload” to be able to enter a value for the URL. The URL should be the (HTTP or HTTPS) download location of an archive (.zip
, .tar
or .tar.gz
file) that contains the dataset records.
6.3 XSL Transformation to EDM (Optional)
It is possible to transform the records in the dataset to the EDM format, using XSLT before any further processing. Check the option “Records are not provided in the EDM (external) format”. An additional file input will appear for an XSL file to be specified.
6.4 The step size
This field allows you to influence the sampling behaviour.
A step size of n tells the Metis Sandbox to select every nth record for processing. This value must be a strictly positive whole number (i.e. 1 or larger). The default value is 1.
For instance, with a step size of 3, the records in position 3, 6, 9, 12, …, 3000 will be selected (or fewer, if the dataset is smaller than 3,000 records).
6.5 The Generated Dataset ID
The “Submit” button will become enabled once you have filled all fields. Click the “Submit” button to upload your dataset. You will be redirected to the “Dataset Processing” page, where you can see the data being processed in real-time.
A unique dataset id is generated for your upload and displayed at the top-right of the “Dataset Processing” page. Remember or save this ID to be able to get back to the dataset in the future (i.e. from the home screen, see above).
7 Dataset processing
Enter a dataset ID in the home screen (the “Dataset Processing” page) and click the ‘Track’ link to track (monitor) the processing of an uploaded dataset, or to see the results after it finishes processing.The “Track” button for the dataset id field is disabled when the field value is empty. This button will enabled when you type in a valid dataset id.
Invalid id’s will show a warning, and the submit buttons will be disabled again.
A record id can only be entered when a valid dataset id has been entered. The links next to the record field are greyed out when the field is empty or when an invalid value has been entered. The links will be enabled once you enter a valid record id.
See “record provider IDs and Europeana IDs” (below) for more information about record ids and record provider ids.
7.1 The Data Processing View
A submitted dataset id will bring up the dataset processing view. It will also change the page’s url to reflect the id of the dataset processing being displayed. The dataset processing view looks like the picture below.
A. The dataset name. The tick after the dataset name indicates that processing is complete
B. An (optional) flag indicating whether the dataset was xsl-transformed.
C. The processing date, preceded by an (optional) flag indicating that not all records in the dataset were processed.
D. The country and language of the dataset selected when the dataset was uploaded.
E. The processing steps performed on the dataset (they correspond to the list of items just below, element F).
F. The details of the processing steps performed on the dataset.
G. The (optional) warning indicating that not all records in the dataset were processed. See “step size” above for more information.
H. The (not enabled) record id field.
I. The dataset ID of the current dataset.
J. A link to the dataset preview as it would look like on Europeana.
K. The tier statistics tab opener.
L. The tier-zero indicator.
The tick after the dataset name indicates that processing is complete, and the generated dataset id is shown at the top-right.
The main (white) panel shows a list of processing steps, detailing how many records were processed during each, and an (optional) warning indicating that not all records in the dataset were processed. Clicking this warning, if present, will show additional information about the import.
The dataset id will also be filled in at the bottom of the screen, enabling the the “record id” field.
To track the data processing of a different dataset just replace the value in the dataset id field with another id and click the “track” button.
7.2 The Metis workflow
The data goes through nine steps as part of the processing workflow. These steps are:
Harvest (H): how many dataset records have been successfully imported
Transformation to EDM (Te): How many records have been transformed to the external EDM format (optional step)
Validation External (Ve): how many records passed EDM validation
Transformation (T): how many records have been transformed from the external EDM format to the internal EDM format
Validation Internal (Vi): how many records have passed internal validation
Normalisation (N): how many records have been normalised. Normalisation acts on individual values in the data and could include the deletion of redundant whitespace or of duplicate values
Enrichment (E): how many records have been successfully enriched
Media Processing (M): how many records have had their associated media processed
Publish (Pu): how many records have been published, i.e. uploaded to the Sandbox preview environment (which is a copy of the ‘real’ Europeana website, but does not share the same data).(see chapter 7)
The colours of each step indicate how successful this step was:
Green: (success) - the step completed without errors, and all records are considered suitable for ingestion
Yellow: (non-critical warning) - problems with the records have been detected, but the records could still be processed.
Red: (critical warning) - more serious problems with the records have been detected, and (some of) these records could not continue their path through the pipeline. These should longer be considered for ingestion (in their current form).
7.3 The Data Processing Errors Window
Shown below is an example of a dataset that processed with many errors:
A. A link to the errors window
B. The bold font of the number indicates that this is another link to the errors window
C. No report is available for this error, so the the number does not have a bold font and there is no link to the errors window
Errors are flagged by red numbers in the panel, and if an error report is available, by the “view detail” links in the right-hand column. The red number indicates the number of records affected (one in this case) and this number is repeated (parenthesised) in the “view detail” link.The red number also serves as a link to the error report, if available. In the screenshot above an error report is available for all processing steps apart from the last.
Clicking a link to the errors report will open a pop-up window, allowing you to see the error detail.
7.4 View the published records
Click on “view published records” (item J in the image in 7.1) to view your final data in a copy of the Europeana website. This link is shown in the top-right of the submitted “Dataset Processing” page UI, underneath the generated dataset id. This will show the dataset records as published on the Sandbox Preview environment.
It may, for example, appear like the image below.
7.5 Tier Statistics
Once a dataset has been processed it’s possible to view its tier statistics to help assess the dataset’s quality. The dataset processing tab will look something like this once a dataset has been processed:
A. The tier statistics tab opener
When you click the tier statistics tab opener, you will see a tab that looks like this:
A. The pie chart gives an overview of the statistics - shown by the content tier dimension (by default).
B. If you click the column headers, you toggle the column sort order and change the data dimension of the pie chart to that header’s default.
C. The second row of clickable column headers allow specific data dimensions to be set and sorted on.
D. The search input allows you to filter the record data by (part of the) record id.
E. The data grid shows the record data in a panel that you can scroll through. The fields are record id, content tier, content tier license, metadata tier (aggregate value), metadata tier (language dimension), metadata tier (enabling elements dimension) and metadata tier (contextual classes dimension). If you click on a record id, you will be taken to the tier calculation report for that record (see below).
F. Page navigation is enabled where necessary.
G. Here you can select the number of rows shown at a time in the table.
H. Here you can jump to a specified page by entering a (valid) page number.
I. The dataset floor row gives the lowest tier value present in the dataset (and the value you probably wish to look at to improve the quality of your data).
7.6 Filtering Tier Statistics
Clicking a pie-slice (or its corresponding legend item) will filter the data down to that value. A click on the value "3" in the pie, for example, will restrict the grid to showing only records that have a content tier value of "3".
A. The active filter. Clicking the active pie-slice will remove the applied filter.
B. The active filter's legend item. Clicks on legend items are equivalent to clicks on pie-slices.
C. Orange column headers indicate the active filter.
D. A new summary row appears below the data grid indicating aggregate values for the filtered data.
E. The pagination updates to reflect the filtered data.
F. Only records with a content-tier value of "3" are visible in the grid.
7.7 Sorting Filtered Tier Statistics
When dataset tier statistic data is filtered by content tier you can sort it by one of the other dimensions by clicking its column header. Usually clicking a column header changes the pie chart dimension and sorts on that column, but when a filter is active the sort will be applied within the data dimension that has been filtered on.
Here we see data that was filtered by content tier (value 3) and sorted by metadata tier (aggregate value).
A. Clicking this column-header will not change the dimension (it will remain “content tier”), but it will the sort (by metadata tier) within that dimension.
B. As before, the specific type of metadata tier sort (aggregate value) is clarified with an arrow-head indicator in the second sub-header row.
8 The tier calculation report
You can view a tier calculation report by clicking on a record ID in the tier statistics grid (see above). Alternatively, you can view the report by entering both the id of a dataset as well as the id of a record within this dataset (see below).
8.1 Record Provider Ids and Europeana Ids
Every processed record has both a Provider id and a Europeana id.
A Europeana id begins with a forward slash followed by the record’s dataset id, another forward slash and then a further sequence of (non-whitespace) characters. You can find the Europeana ID of a specific record by clicking the dataset preview link and finding and inspecting the records there.
A record’s Provider id, on the other hand, can be any sequence of (non-whitespace) characters, and is the value that can be found in the ‘rdf:about’ attribute of the ‘providedCHO’ section of your record.
You can search for a record using either of these record ids, so the “Report” button will enable itself when any sequence of non-whitespace characters has been entered into the record id field. If, however, the UI detects that you’ve entered an id that matches the format of a valid Europeana record id, then it will show a line connecting the record id with the dataset id, as shown here:
A. The record id begins with a slash followed by the dataset id, so the id fields are shown as connected.
B. You can now open the record report by clicking the button labelled “Tier Report”.
8.2 The Record Report
The record report - or Tier Report - is divided into two main sections:
the content tier section
the metadata tier section
You can navigate between these sections by clicking the corresponding navigation orbs. The computed value of each tier is shown within its navigation orb at the bottom. These computed values are single digit: numeric in the case of the content tier.
In the illustration below the computed values are “3” (for the content tier) and “A” (for the metadata tier).
A. Page Indicator: the inactive "Dataset Processing" orb, indicates that this page is not active and, if clicked, will bring you to the dataset processing page.
B. The Record Report summary: top-level information about this record as well as record download and viewing links.
C. Tier Navigation Orbs: you can toggle between the content and the media tier report from here.
D. Content Tier Information: data about the record's content tier.
E. Media Navigation Orbs: you can navigate multiple media items from here.
F. Processing Errors: record processing error information appears here.
G. Page Indicator: indicates that "Record Report" is the current page (via its orange colour) and that the form below is “clean” (via its tick icon).
8.3 Content Tier Media Information
The media information appears under the content tier breakdown section. If there are 5 or fewer items, then a navigation orb corresponding to each item will appear. The icon of each navigation orb illustrates the type of media item, as shown below.
If there are more than 5 media items available in the record report then the navigation orbs will be replaced with navigation arrows, an editable field and a spinner allowing you to browse the items or jump directly to a specific one, as shown below.
8.4 Metadata Tier Information
You can see the record report’s metadata tier information by clicking on the metadata tier navigation orb. Metadata tier information is split into three sub-sections:
Language dimension
Enabling Elements Dimension
Contextual Classes Dimension
These, like the main sections of the report, are navigable by clicking on the corresponding navigation orb.
Active language dimension
Active enabling elements dimension
Active contextual classes dimension
9 Problem patterns
You can view problem patterns for both a dataset and for a record. The dataset id and record id fields each have a (secondary) link labelled “Issues”.
Clicking “Issues (Overview)”, next to the dataset id input field (A) , will open a problem viewer page for the whole dataset. Clicking “Issues (Record)” (B)will open a problem viewer page for an individual record.
9.1 Dataset / Overview
The problem pattern viewer for datasets shows all the problem types that occur within a given dataset.
A key is shown (P1, P2, P3 etc.) together with a list of records in which that problem pattern was found. The little arrows at the top-right corner may be used to navigate between the different problem patterns.
The record-references behave as (internal) links to the separate instance of the problem pattern viewer used for records (with the exception of the references for P1, as they are not displayable for individual records).
The problem pattern report can be downloaded using the “export as pdf” link.
The 8 problem patterns that are in use now are:
Key | Title | Description |
P1 | Systematic use of the same title. | Check across all records if there are any duplicate titles, ignoring letter (upper or lower) case. |
P2 | Equal title and description fields. | Check whether there is a title - description pair for which the values are equal, ignoring letter (upper or lower) case. |
P3 | Near-Identical title and description fields. | Determine whether there is a title - description pair for which the values are too similar (or if one contains the other). We do this ignoring the letter case. |
P5 | Unrecognisable title. | Apply heuristics to determine whether a title is not human-readable. We check whether there are at most 5 characters that are not either alphanumeric or simple spaces. We also check whether the value fully contains a dc:identifier value. |
P6 | Non-meaningful title. | Check whether the record has a title of 2 characters or less as a rough heuristic of whether a title is meaningful. |
P7 | Missing description fields. | Check whether the record is lacking a description (or only has empty descriptions). |
P9 | Very short description. | Check whether the record has a description of 50 characters or less.
|
P12 | Extremely long titles. | Check whether the record has a title of more than 70 characters. |
Click on the title of a specific problem pattern to see a description.
9.2 Record
The problem pattern viewer for records shows all the types of problem patterns that occur within a single record.
Note that two of the page indicators in the image above show the same icon - one for each instance of the problem pattern viewer.
If you click on the “</>” button to the right of the problem pattern viewer, a panel expands that provides access to download links for the record.
10 Tier Zero Records
You will be warned if your dataset contains any records that have a “tier zero” rating, either for the content tier or the metadata tier in the track tab of the dataset processing page.
One or two indicators will be shown on the right side of the screen whenever a “tier zero” record was detected. The first is for records with content tier zero (the orb with stars), the second for records with metadata tier zero (the orb with a gauge). Only one may appear, or both, as appropriate.
Click the warning indicators to see the tier-zero warning panel. This panel will show links to at most 10 sample records that were detected as having content or media tier 0.
These links open the Record Report (see above) for the clicked record, opening the relevant subsection of the report according to whether the tier zero warning pertained to the content-tier or the metadata-tier. The small yellow triangular warning icons will be visible until the warnings have been reviewed. Only one warning is present in the image above, because the content tier zero records have already been viewed.
11 Troubleshooting
Dataset not found
Every two weeks the sandbox is emptied. It is highly possible that the dataset has been removed because of this.