This page is for discussion of ways to provide enhanced support for managing metadata terms, and annotating resources, with a new design. Given that a new design may/may not be able to support all desirable features simultaneously,  the discussion should include design  options and prioritization of features if/when conflicts occur. Similarly, some advanced functionality may be useful to contemplate in terms of design but might not be something that gets implemented.

Goals:

The redesign should retain the benefits of the current design (e.g. ability to add custom terms to a space, provenance regarding the origin of specific annotations, ability to add widgets for input/display, send notices of changes to the message bus) and makes it easier to do things such as :

 

Current Design:

Proposed Design

Space-based Managed metadata doc

How it would work:

Pros

This design builds on the strengths of the existing one in terms of maintaining (and improving provenance), supporting typed metadata, and enabling space-level customization. Many of the GUI-level changes are design agnostic and could probably be used/adapted for other storage designs. However, changing the low-level design to focus on a single space-level context/config artifact and pre-integrating the metadata for a given item, provides a clear mechanism to handle edits, context differences between sources (different extractors, different spaces, changes in space config over time) that will avoid confusion going forward - cases where different entries for the same predicate get shown with different labels, where a label is show twice because it is used for two different predicates, where it would be unclear whether type or cardinality rules only apply to the correct label/predicate combo, matching labels, or matching predicates regardless of label. It would also clean-up these types of issues for publication, and would make it easier to share the context/config of one space with another. Aside from changing how information is stored, which would be best to do early, most of the other functionality could be pursued incrementally.

Cons

In some sense, the benefits of this redesign only accrue to projects that actually use the flexibility and customizability of the system. For instances with a single space or no customization of metadata per space, where all of the metadata comes from the GUI rather than the API, where metadata is added by an authoritative user and isn't edited, etc., the current design is OK - the issues pointed out above could happen, but probably won't in practice. (Over time, and with more independent groups using an instance, these issues can and do become real though.). Some groups may thus see this design as overkill/premature. Aside from the added work to implement it, it's not clear that there are big cons. Adding a new metadata value means editing a document rather than just storing a new one, but retrieving the existing values would be one document recall rather than a scan across separate docs. Extracted metadata could be managed (with the space determining the label and/or enforcing type/cardinality rules) which would make the interface look more consistent but raises the issue of what happens if extracted metadata doesn't meet constraints? (Similar for other uses of the API.) Most of these are relatively neutral, but they make it clear that this design would have impacts outside the core of user-entered metadata. To be fair, once spaces have access control and customizable metadata, the issues of how extractors and api users handle those customizations, and how features such as being able to share datasets across spaces interact with these features (if two spaces have different metadata terms, which terms show up in the add metadata list? is it affected by whether I have access to one space or the other? what happens for extractors that work on behalf of a user through their key (a capability in development I think), etc.). I think this design can help answer those questions over time, but it does favor/flavor the answers towards a space-centric view of management/control.

Other design ideas:

GUI-centric

Leave the current metadata and context storage ~as is and focus in the GUI changes:

Pros

potentially less work up front

Cons

Would still leave questions of consistent labels as metadata is edited open unless some design changes are made (e.g. when a label is updated, do exiting entries get scanned and changed? are they changed dynamically for display?) - Similarly, a means of tracking deletes is required if edit is delete/add and/or if we want to track provenance for deletes. And, if deletes are tracked, assembling the set of metadata to display involves playing the sequence of operations forward. One can keep going through all of the other requires/desirable features above and ask similar questions and ask whether this approach, by the time it addresses those requirements, remains simpler/easier/faster to implement than the proposed design. I.e., it's not clear that this design really avoids any issues in the proposed one - if we want the functionality listed some decisions and changes are needed  and the question really becomes whether it is easier to implement given this underlying storage/service design.