Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel7

...

This screen allows you to start accessing the Metis Sandbox functionality. Here you can track an existing dataset, request information about a record within that dataset or create a new dataset. It looks like this:

...

A. Page Indicator: indicates that "Dataset Processing" is the current step.  Once other steps become available then clicking this will return you to this step.
B. Dataset Id Input: used to enter the id of a previously uploaded dataset.
C. Record Id Input: used to enter the id of a record within the specified dataset. It enables when a dataset id is entered.
D. Create New Dataset Link: enables and navigates to the “Upload a new Dataset” functionality (see below).
E. Track link. This link enables when a dataset id is entered and, when clicked, takes you to the “Dataset Processing” functionality (see below) for the dataset with this dataset id.
F. Issues (Overview) link. This link enables when a dataset id is entered and, when clicked, takes you to the “Problem Patterns” functionality (see below) for the dataset with this dataset id.
G. Issues (Record) link. This link enables when a record id is entered and, when clicked, takes you to the “Problem Patterns” functionality (see below) for the record with this record id.
H. Tier Report link. This link enables when a record id is entered and, when clicked, takes you to the “Record Report” functionality (see below) for the record with this record id.

...

The “Upload Dataset” view looks like this:.

...

A. Step Indicator: clicking this will take you to the “Dataset Processing” step.
B. The dataset name input field.  A dataset name is valid if it contains only letters, digits and the underscore character (‘_’).
C. The dataset country drop-down.
D. The dataset language drop-down.
E. The harvest protocol radio button set.
F. The zip file input.  This appears because “file upload” is the selected protocol.  If the selected protocol is changed to “OAI-PMH upload” or “HTTP upload” then an alternative field (or set of fields) will appear here.
G. Step size field.
H. An (optional) checkbox to specify that you want the Metis Sandbox Server to transform your dataset using XSLT.  If selected then a file input will appear below it allowing you to upload an XSL file.
I. The “Submit” button: enables when all the (obligatory) fields have been completed.
J. Step Indicator (inactive): indicates that "Upload Database" is the current step.  If you switch to another step then clicking this will return you to this step.

Enter a descriptive name for your dataset in the input field below “Name”. Only letters, digits and the underscore character (‘_’) are supported. You can select the country and language of the dataset with the dropdown menus.

...

A submitted dataset id will bring up the dataset processing view. It will also change the page’s url to reflect the id of the dataset processing being displayed.  The dataset processing view looks like the picture below.

...

A. The dataset name.  The tick after the dataset name indicates that processing is complete
B. An (optional) flag indicating whether the dataset was xsl-transformed.
C. The processing date, preceded by an (optional) flag indicating that not all records in the dataset were processed.
D. The country and language of the dataset selected when the dataset was uploaded.
E. The processing steps performed on the dataset (they correspond to the list of items just below, element F).
F. The details of the processing steps performed on the dataset.
G. The (optional) warning indicating that not all records in the dataset were processed. See “step size” above for more information.
H. The (not enabled) record id field.
I. The dataset ID of the current dataset.
J. A link to the dataset preview as it would look like on Europeana.

K. The tier statistics tab opener.
L. The tier-zero indicator.

The tick after the dataset name indicates that processing is complete, and the generated dataset id is shown at the top-right.

...

The colours of each step indicate how successful this step was:

  • Green: (success) - the step completed without errors, and all records are considered suitable for ingestion

  • Yellow: (non-critical warning) - problems with the records have been detected, but the records could still be processed.

  • Red: (critical warning) - more serious problems with the records have been detected, and (some of) these records could not continue their path through the pipeline. These should longer be considered for ingestion (in their current form).

...

Shown below is an example of a dataset that processed with many errors:

...

A. A link to the errors window
B. The bold font of the number indicates that this is another link to the errors window
C. No report is available for this error, so the the number does not have a bold font and there is no link to the errors window

...

Once a dataset has been processed it’s possible to view its tier statistics to help assess the dataset’s quality. The dataset processing tab will look something like this once a dataset has been processed:

...

A. The tier statistics tab opener

When you click the tier statistics tab opener, you will see a tab that looks like this:

...

A. The pie chart gives an overview of the statistics - shown by the content tier dimension (by default).

B. If you click the column headers, you toggle the column sort order and change the data dimension of the pie chart to that header’s default.

C. The second row of clickable column headers allow specific data dimensions to be set and sorted on.

D. The search input allows you to filter the record data by (part of the) record id.

E. The data grid shows the record data in a panel that you can scroll through. The fields are record id, content tier, content tier license, metadata tier (aggregate value), metadata tier (language dimension), metadata tier (enabling elements dimension) and metadata tier (contextual classes dimension). If you click on a record id, you will be taken to the tier calculation report for that record (see below).

F. Page navigation is enabled where necessary.

G. Here you can select the number of rows shown at a time in the table.

H. Here you can jump to a specified page by entering a (valid) page number.

I. The dataset floor row gives the lowest tier value present in the dataset (and the value you probably wish to look at to improve the quality of your data).

...

Clicking a pie-slice (or its corresponding legend item) will filter the data down to that value. A click on the value "3" in the pie, for example, will restrict the grid to showing only records that have a content tier value of "3".

...

A. The active filter. Clicking the active pie-slice will remove the applied filter.

B. The active filter's legend item. Clicks on legend items are equivalent to clicks on pie-slices.

C. Orange column headers indicate the active filter.

D. A new summary row appears below the data grid indicating aggregate values for the filtered data.

E. The pagination updates to reflect the filtered data.

F. Only records with a content-tier value of "3" are visible in the grid.

7.

...

7 Sorting Filtered Tier Statistics

When dataset tier statistic data is filtered by content tier you can sort it by one of the other dimensions by clicking its column header. Usually clicking a column header changes the pie chart dimension and sorts on that column, but when a filter is active the sort will be applied within the data dimension that has been filtered on.

Here we see data that was filtered by content tier (value 3) and sorted by metadata tier (aggregate value).

...

A. Clicking this column-header will not change the dimension (it will remain “content tier”), but it will the sort (by metadata tier) within that dimension.

B. As before, the specific type of metadata tier sort (aggregate value) is clarified with an arrow-head indicator in the second sub-header row.

...

You can search for a record using either of these record ids, so the “Report” button will enable itself when any sequence of non-whitespace characters has been entered into the record id field.  If, however, the UI detects that you’ve entered an id that matches the format of a valid Europeana record id, then it will show a line connecting the record id with the dataset id, as shown here:

...

A. The record id begins with a slash followed by the dataset id, so the id fields are shown as connected.
B. You can now open the record report by clicking the button labelled “Tier Report”.

...

In the illustration below the computed values are “3” (for the content tier) and “A” (for the metadata tier).

...

A. Page Indicator: the inactive "Dataset Processing" orb, indicates that this page is not active and, if clicked, will bring you to the dataset processing page.
B. The Record Report summary: top-level information about this record as well as record download and viewing links.
C. Tier Navigation Orbs: you can toggle between the content and the media tier report from here.
D. Content Tier Information: data about the record's content tier.
E. Media Navigation Orbs: you can navigate multiple media items from here.
F. Processing Errors: record processing error information appears here.
G. Page Indicator: indicates that "Record Report" is the current page (via its orange colour) and that the form below is “clean” (via its tick icon).

...

Clicking “Issues (Overview)”, next to the dataset id input field (A) , will open a problem viewer page for the whole dataset. Clicking “Issues (Record)” (B)will open a problem viewer page for an individual record.

...

Key

Title

Description

P1

Systematic use of the same title.

Check across all records if there are any duplicate titles, ignoring letter (upper or lower) case.

P2

Equal title and description fields.

Check whether there is a title - description pair for which the values are equal, ignoring letter (upper or lower) case.

P3

Near-Identical title and description fields.

Determine whether there is a title - description pair for which the values are too similar (or if one contains the other). We do this ignoring the letter case.

P5

Unrecognisable title.

Apply heuristics to determine whether a title is not human-readable. We check whether there are at most 5 characters that are not either alphanumeric or simple spaces. We also check whether the value fully contains a dc:identifier value.

P6

Non-meaningful title.

Check whether the record has a title of 2 characters or less as a rough heuristic of whether a title is meaningful.

P7

Missing description fields. 

Check whether the record is lacking a description (or only has empty descriptions).

P9

Very short description. 

Check whether the record has a description of 50 characters or less.

P12

Extremely long valuestitles.

Check whether the record has a title of more than 70 characters.

...