Table of Contents | ||||
---|---|---|---|---|
|
...
Note |
---|
Scheduled and unscheduled clean up Additionally, during system maintenance or the release of a new Metis Sandbox version data may be removed at any time. Where possible, these events will be announced beforehand. Datasets that are deleted from the Metis Sandbox will need to be uploaded again if you wish to access the tests and reports. |
...
Page indicators are shown at the top of the page. They behave as tab headers: clicking on an orb will navigate to the corresponding page. The The number of page indicators can vary depending on your use of the Sandbox.
...
Links can be greyed when required information is missing. The image below shows that the Track and Issues links are greyed out because there is no information in the input field left of the links.
...
...
4.6 Drop-down menus
Dropdown Drop-down menus allow you to make a selection of a list of predetermined values.
...
This screen allows you to start accessing the Metis Sandbox functionality. Here you can track an existing dataset, request information about a record within that dataset or create a new dataset. It looks like this:
...
A. Page Indicator: indicates that "Dataset Processing" is the current step. Once other steps become available then clicking this will return you to this step.
B. Dataset Id Input: used to enter the id of a previously uploaded dataset.
C. Record Id Input: used to enter the id of a record within the specified dataset. It enables when a dataset id is entered.
D. Create New Dataset Link: enables and navigates to the “Upload a new Dataset” functionality (see below).
E. Track link. This link enables when a dataset id is entered and, when clicked, takes you to the “Dataset Processing” functionality (see below) for the dataset with this dataset id.
F. Issues (Overview) link. This link enables when a dataset id is entered and, when clicked, takes you to the “Problem Patterns” functionality (see below) for the dataset with this dataset id.
G. Issues (Record) link. This link enables when a record id is entered and, when clicked, takes you to the “Problem Patterns” functionality (see below) for the record with this record id.
H. Tier Report link. This link enables when a record id is entered and, when clicked, takes you to the “Record Report” functionality (see below) for the record with this record id.
When you type a dataset ID or a record ID, a green link will appear in the input field. If you click it, you will be taken to the dataset or record preview as it would look like on Europeana.
...
...
6 Upload a new dataset
To create a new dataset click on the “create a new dataset” link at the bottom of the home screen (D in the image above). This will take you to the “Upload Dataset” form.
...
The “Upload Dataset” view looks like this:.
...
A. Step Indicator: clicking this will take you to the “Dataset Processing” step.
B. The dataset name input field. A dataset name is valid if it contains only letters, digits and the underscore character (‘_’).
C. The dataset country drop-down.
D. The dataset language drop-down.
E. The harvest protocol radio button set.
F. The zip file input. This appears because “file upload” is the selected protocol. If the selected protocol is changed to “OAI-PMH upload” or “HTTP upload” then an alternative field (or set of fields) will appear here.
G. Step size field.
H. An (optional) checkbox to specify that you want the Metis Sandbox Server to transform your dataset using XSLT. If selected then a file input will appear below it allowing you to upload an XSL file.
I. The “Submit” button: enables when all the (obligatory) fields have been completed.
J. Step Indicator (inactive): indicates that "Upload Database" is the current step. If you switch to another step then clicking this will return you to this step.
Enter a descriptive name for your dataset in the input field below “Name”. Only letters, digits and the underscore character (‘_’) are supported. You can select the country and language of the dataset with the dropdown menus.
...
There are three ways to upload your datasets to the sandbox:
File upload: upload a zipfilean archive (e.g. a zip file)
OAI-PMH upload: Ingestion with OAI-PMH
HTTP upload: upload a file ingestion via a hosted archive (e.g. a zip file) on a server through HTTP or HTTPS
6.2.1 Zip File
The “zip file” “File upload” protocol is selected by default. This option allows you to upload a zip an archive file with a dataset that is stored locally. The supported archive types are .zip
, .tar
and .tar.gz
archives.
...
Note that, even though it is not currently possible to upload multiple zip archive files, you can still achieve the same result by wrapping all your zip files archives in one new zip file. The application fully supports nested zip files archives (i.e. zip files of zip files).
...
To use the harvest protocol to OAI-PMH, you should enter values for the harvest URL, the metadata format, and optionally a setSpec value. For more details on these, please see the OAI-PMH specification.
...
6.2.3
...
HTTP(S) upload
You can also upload a zip file specify an archive that is accessible with a urlURL. Set the harvest protocol to “HTTP upload” to be able to enter a value for the UrlURL. The url URL should be the (HTTP or HTTPS) download location of the zip file an archive (.zip
, .tar
or .tar.gz
file) that contains the dataset records.
...
It is possible to transform the records in the dataset to the EDM format, using XSLT before any further processing. Check the checkmark before option “Records are not provided in the EDM (external) format”. An additional file input will appear for an XSL file to be specified.
...
A submitted dataset id will bring up the dataset processing view. It will also change the page’s url to reflect the id of the dataset processing being displayed. The dataset processing view looks like the picture below.
...
A. The dataset name. The tick after the dataset name indicates that processing is complete
B. An (optional) flag indicating whether the dataset was xsl-transformed.
C. The processing date, preceded by an (optional) flag indicating that not all records in the dataset were processed.
D. The country and language of the dataset selected when the dataset was uploaded.
E. The processing steps performed on the dataset (they correspond to the list of items just below, element F).
F. The details of the processing steps performed on the dataset.
G. The (optional) warning indicating that not all records in the dataset were processed. See “step size” above for more information.
H. The (not enabled) record id field.
I. The dataset ID of the current dataset.
J. A link to the dataset preview as it would look like on Europeana.
K. The tier statistics tab opener.
L. The tier-zero indicator.
The tick after the dataset name indicates that processing is complete, and the generated dataset id is shown at the top-right.
...
The colours of each step indicate how successful this step was:
Green: (success) - the step completed without errors, and all records are considered suitable for ingestion
Yellow: (non-critical warning) - problems with the records have been detected, but the records could still be processed.
Red: (critical warning) - more serious problems with the records have been detected, and (some of) these records could not continue their path through the pipeline. These should longer be considered for ingestion (in their current form).
...
Shown below is an example of a dataset that processed with many errors:
...
A. A link to the errors window
B. The bold font of the number indicates that this is another link to the errors window
C. No report is available for this error, so the the number does not have a bold font and there is no link to the errors window
...
Once a dataset has been processed it’s possible to view its tier statistics to help assess the dataset’s quality. The dataset processing tab will look something like this once a dataset has been processed:
...
A. The tier statistics tab opener
Clicking When you click the tier statistics tab opener, you will open see a tab that looks like this:
...
A. The pie chart gives an overview of the statistics - shown by the content tier dimension (by default).
B. When clicked If you click the column headers toggling , you toggle the column sort order and change the data dimension of the pie chart to that header’s default..
C. The second row of clickable column headers allow specific data dimensions to be set and sorted on.
D. The search input allows the user you to filter the record data by (part of the) record id.
E. The data grid shows the record data in a scroll-able panelpanel that you can scroll through. The fields are record id, content tier, content tier license, metadata tier (aggregate value), metadata tier (language dimension), metadata tier (enabling elements dimension) and metadata tier (contextual classes dimensiondimension). If you click on a record id, you will be taken to the tier calculation report for that record (see below).
F. Page navigation is enabled where necessary.
G. The Here you can select the number of rows shown at a time can be selected herein the table.
H. The user Here you can jump to a specified page by entering a (valid) page number here.
I. The data-dataset floor row gives dataset-wide tier-statistic summarythe lowest tier value present in the dataset (and the value you probably wish to look at to improve the quality of your data).
7.6 Filtering Tier Statistics
Clicking a pie-slice (or its corresponding legend item) will filter the data down to that value. A click on the value "3" in the pie, for example, will restrict the grid to showing only records that have a content tier value of "3".
...
A. The active filter. Clicking the active pie-slice will remove the applied filter.
B. The active filter's legend item. Clicks on legend items are equivalent to clicks on pie-slices.
C. Orange column headers indicate the active filter.
D. A new summary row appears below the data grid indicating aggregate values for the filtered data.
E. The pagination updates to reflect the filtered data.
F. Only records with a content-tier value of "3" are visible in the grid.
7.
...
7 Sorting Filtered Tier Statistics
When dataset tier statistic data is filtered by content tier you can sort it can be sorted by one of the other dimensions by clicking its column header. Usually clicking a column header changes the pie chart dimension and sorts on that column, but when a filter is active the sort will be applied within the data dimension that has been filtered on.
Here we see data that was filtered by content tier (value 3) and sorted by metadata tier (aggregate value).
...
A. Clicking this column-header will not change the dimension (it will remain “content tier”), but it will the sort (by metadata tier) within that dimension.
B. As before, the specific type of metadata tier sort (aggregate value) is clarified with an arrow-head indicator in the second sub-header row.
...
You can view a tier calculation report by clicking on a record ID in the tier statistics grid (see above). Alternatively, you can view the report by entering both the id of a dataset as well as the id of a record within this dataset (see below).
8.1 Record Provider Ids and Europeana Ids
...
You can search for a record using either of these record ids, so the “Report” button will enable itself when any sequence of non-whitespace characters has been entered into the record id field. If, however, the UI detects that you’ve entered an id that matches the format of a valid Europeana record id, then it will show a line connecting the record id with the dataset id, as shown here:
...
A. The record id begins with a slash followed by the dataset id, so the id fields are shown as connected.
B. You can now open the record report by clicking the button labelled “Tier Report”.
...
In the illustration below the computed values are “3” (for the content tier) and “A” (for the metadata tier).
...
A. Page Indicator: the inactive "Dataset Processing" orb, indicates that this page is not active and, if clicked, will bring you to the dataset processing page.
B. The Record Report summary: top-level information about this record as well as record download and viewing links.
C. Tier Navigation Orbs: you can toggle between the content and the media tier report from here.
D. Content Tier Information: data about the record's content tier.
E. Media Navigation Orbs: you can navigate multiple media items from here.
F. Processing Errors: record processing error information appears here.
G. Page Indicator: indicates that "Record Report" is the current page (via its orange colour) and that the form below is “clean” (via its tick icon).
...
Clicking “Issues (Overview)”, next to the dataset id input field (A) , will open a problem viewer page for the whole dataset. Clicking “Issues (Record)” (B)will open a problem viewer page for an individual record.
...
Key | Title | Description |
P1 | Systematic use of the same title. | Check across all records if there are any duplicate titles, ignoring letter (upper or lower) case. |
P2 | Equal title and description fields. | Check whether there is a title - description pair for which the values are equal, ignoring letter (upper or lower) case. |
P3 | Near-Identical title and description fields. | Determine whether there is a title - description pair for which the values are too similar (or if one contains the other). We do this ignoring the letter case. |
P5 | Unrecognisable title. | Apply heuristics to determine whether a title is not human-readable. We check whether there are at most 5 characters that are not either alphanumeric or simple spaces. We also check whether the value fully contains a dc:identifier value. |
P6 | Non-meaningful title. | Check whether the record has a title of 2 characters or less as a rough heuristic of whether a title is meaningful. |
P7 | Missing description fields. | Check whether the record is lacking a description (or only has empty descriptions). |
P9 | Very short description. | Check whether the record has a description of 50 characters or less. |
P12 | Extremely long valuestitles. | Check whether the record has a title of more than 70 characters. |
...