This page describes the updates made in the Editable Metadata Branch to provide enhanced support for managing metadata terms, and annotating resources, with a new design. 

Prior Design

Updated Design


 

Metadata display showing widget for input (date) and display (a single line display of the scientific variable term and unit from the SAS service), and the edit/delete icons for term currently hovered over. Adding/updating/deleting entries will dynamically update the page display without a reload.

The provenance of Audience entries. Hovering over an edit or delete will show the add or prior edit that it acted upon.


Updated metadata definitions table showing sorting and info about columns, condensed type info for lists derived from services, and the 'allow entries' column. In this example, 'Abstract' is no longer an option in the 'add metadata' list for this space, but existing abstract entries have not been deleted. 


Updated search interface showing the addition of a source (a space, no space (using the defaults for the clowder instance) or 'Managed Metadata' - the json terms generated by extractors.) Also shown is how widgets are used for terms that are, for example from dynamic lists. Queries from json-ld metadata use the formal URIs rather than labels and will find semantically matching entries across spaces. "Managed Metadata" is a configurable term, which could be changed, for example, to "Extracted Information". Typing in the term box for this entry will pull up matching terms indexed in Elastic Search as before.


Status

As of Aug 15, the branch is up-to-date with the dev branch and includes all of the functionality described above. It passes 80 tests (3 tests for add/get metadata apis on datasets and files, which I believe are deprecated, have been removed since the underlying methods, which use the old metadata model, have been removed.). A working test server is available at http://141.211.23.234/ for anyone who'd like to sign up. At present, metadata summary documents are generated on-demand when a page is viewed. This could be moved to a database updater. A new collection - metadatasummary - is created for the new metadata model. At present, existing user metadata entries are not removed, which means that a given clowder instance could be ~rolled back to its current state by switching back to the dev branch software and removing this collection. (Adds/edits/deletes made using the new model will be deleted. New metadata definitions will remain, and will have the new flag, which will be ignored by the dev-branch code.) To accept this new code, a decision needs to be made about whether to process existing user metadata up front, in a database update, or dynamically, and to remove the old metadata docs and context entries that are replaced (nominally they can remain but they won't be used).

Comparison:

With goals and design ideas: The above design as implemented is consistent with the ideas outlined below (from the original discussion) though support for cardinality and selecting definitions from external vocabularies have not been implemented. (They should be straight forward to add).

Design Goals:

The redesign retains the benefits of the current design (e.g. ability to add custom terms to a space, provenance regarding the origin of specific annotations, ability to add widgets for input/display, send notices of changes to the message bus) and supports:

and should simplify future enhancements such as:

 Design Ideas

Pros

This design builds on the strengths of the existing one in terms of maintaining (and improving provenance), supporting typed metadata, and enabling space-level customization. Many of the GUI-level changes are design agnostic and could probably be used/adapted for other storage designs. However, changing the low-level design to focus on a single space-level context/config artifact and pre-integrating the metadata for a given item, provides a clear mechanism to handle edits, context differences between sources (different extractors, different spaces, changes in space config over time) that will avoid confusion going forward - cases where different entries for the same predicate get shown with different labels, where a label is show twice because it is used for two different predicates, where it would be unclear whether type or cardinality rules only apply to the correct label/predicate combo, matching labels, or matching predicates regardless of label. It would also clean-up these types of issues for publication, and would make it easier to share the context/config of one space with another. Aside from changing how information is stored, which would be best to do early, most of the other functionality could be pursued incrementally.

Cons

In some sense, the benefits of this redesign only accrue to projects that actually use the flexibility and customizability of the system. For instances with a single space or no customization of metadata per space, where all of the metadata comes from the GUI rather than the API, where metadata is added by an authoritative user and isn't edited, etc., the current design is OK - the issues pointed out above could happen, but probably won't in practice. (Over time, and with more independent groups using an instance, these issues can and do become real though.). Some groups may thus see this design as overkill/premature. Aside from the added work to implement it, it's not clear that there are big cons. Adding a new metadata value means editing a document rather than just storing a new one, but retrieving the existing values would be one document recall rather than a scan across separate docs. Extracted metadata could be managed (with the space determining the label and/or enforcing type/cardinality rules) which would make the interface look more consistent but raises the issue of what happens if extracted metadata doesn't meet constraints? (Similar for other uses of the API.) Most of these are relatively neutral, but they make it clear that this design would have impacts outside the core of user-entered metadata. To be fair, once spaces have access control and customizable metadata, the issues of how extractors and api users handle those customizations, and how features such as being able to share datasets across spaces interact with these features (if two spaces have different metadata terms, which terms show up in the add metadata list? is it affected by whether I have access to one space or the other? what happens for extractors that work on behalf of a user through their key (a capability in development I think), etc.). I think this design can help answer those questions over time, but it does favor/flavor the answers towards a space-centric view of management/control.

Other design ideas:

GUI-centric

Leave the current metadata and context storage ~as is and focus in the GUI changes:

Pros

potentially less work up front

Cons

Would still leave questions of consistent labels as metadata is edited open unless some design changes are made (e.g. when a label is updated, do exiting entries get scanned and changed? are they changed dynamically for display?) - Similarly, a means of tracking deletes is required if edit is delete/add and/or if we want to track provenance for deletes. And, if deletes are tracked, assembling the set of metadata to display involves playing the sequence of operations forward. One can keep going through all of the other requires/desirable features above and ask similar questions and ask whether this approach, by the time it addresses those requirements, remains simpler/easier/faster to implement than the proposed design. I.e., it's not clear that this design really avoids any issues in the proposed one - if we want the functionality listed some decisions and changes are needed  and the question really becomes whether it is easier to implement given this underlying storage/service design.