User sets / static collections (Researchers) 2024
(Meeting with Alba on January 9/25)
The main questions for this audience entail how they intend to use the items once they have found them, and what their needs are there.
QUESTIONS
Do they use the gallery feature to save items?
Yes, especially professors teaching post-secondary level.
But the 100-item limit and individual add/remove is limiting.
What additional needs do they have that galleries don’t offer?
More items allowed (no limit)
Ability to share, collaborate, versioning, and other information about the set itself. (Set metadata)
Documentation about the set(s) (see additional notes below).
Clear license and terms of use allowing reuse of the dataset without restrictions (we already do this at item level)
Provide a suggested citation for the dataset so reusers are aware of how to cite it (we already do this at item level)
Documentation could include: provenance information, how the dataset was created, by whom, what is included, a description of the transformation process, possible uses or how the dataset can be accessed; also i) a website that introduces the dataset; ii) a README file describing how the dataset was created and possible uses; iii) a metadata file; and iv) a data cover sheet with a more substantive overview of the data and the collection from which it is derived.
Include machine-readable metadata about the content provided in the dataset
They need to know when more items have been added to their [saved search]; and would the saved search or data set be related somehow?
What is the max. number of items they’d need to save?
Unlimited
Do they want to save a set of items from a specific point in time, or save the search that resulted in them (meaning that set of results would change over time, as new items are added), or both?
Both
Once they have that set, do they need to search, filter, or sort within that set?
They need annotations - they need to interact with the sets that way (not just at the item level)
Do they need to share the set? Collaborate?
Yes, both would be useful.
Do they need to export or download that set of items in some way? If so, what file formats do they want? Zip files, export to csv, or others? (would need to check with the devs that they are technically feasible)
File formats I found in the examples: CSV, JSON, XML, JPEG, TIFF, METS, ALTO (see this page)
Do they also use the Set API for their needs?
Search APIs (not up-to-date) and Record APIs - our APIs are not at the stage where users can get what they need; they are also difficult to use, and thus are a huge barrier; ideally the website interface could meet their needs.
Additional notes:
COLLECTIONS AS DATA WORKFLOW (read this and understand as the way people work with data at the set level), data-driven research for computational reuse: Example project https://www.kbr.be/en/projects/data-kbr-be/
Documentation of the data sets is also something structural to the R+D team - this documentation is structured and standardised to a certain extent: see table that Antoine is working on what fields we use to describe data sets: https://docs.google.com/spreadsheets/d/1iZNNviqzfrbAMtoy3frszDcHqRchCOsbSL9E5pPgnQA/edit?gid=0#gid=0 - from the item level to the data set level - there is coherence there - users can understand what the set contains at the data set level
There are data sets that come pre-packaged via ingestion; and then users also create data sets via galleries (for now)
CH professionals are interested in the functionality we have around galleries “we want to be the ones who create data sets” - researchers, students create datasets that are research data; there is already manipulation ; the CH data sets that are ingested are not necessarily user friendly, you need an api to work at that data set level