This is a live design document on how to support authorization at the level of project spaces. The primary driver of this development is the SEAD project.

The current design tries to support requirements described in this document:

 

 

The following notes are from a meeting on 06/09/15 attended by Indira Gutierrez PoloMario FelarcaWinston JanszRob KooperLuigi Marini.

Goals:
  1. Meet the requirements of the above document
  2. Meet the outcomes of the SEAD all hands meeting in May 2015
  3. Simplify the design as much as possible to not overwhelm the user and provide a stable implementation within the current efforts
  4. Accommodate other use cases
Features Needed:

The following are already available in the current implementation:

  1. A dataset can be in multiple collections

The following need to be implemented:

  1. A file can only exist as part of a dataset (currently it can exist in multiple or none)
    CATS-25 - Getting issue details... STATUS
  2. A dataset can be part of multiple spaces (currently it can exist in multiple or none)
    CATS-26 - Getting issue details... STATUS
    1. With this design there is no "move" just assign to one or more spaces
  3. A collection can be part of multiple spaces (currently it can exist in multiple or none)
    CATS-27 - Getting issue details... STATUS
    1. With this design there is no "move" just assign to one or more spaces
  4. Use permissions on space, collection, dataset page to pick what is available and what is not in the GUI
    CATS-28 - Getting issue details... STATUS
  5. Nested collections (which are different from folders because a collection can be in multiple collections)
    CATS-29 - Getting issue details... STATUS
  6. Ability to list who has access to a dataset or collection on its page
    CATS-30 - Getting issue details... STATUS
  7. (Bonus) Folders in dataset to organize files similar to a file system
    CATS-31 - Getting issue details... STATUS
Notes:
  1. Implement access control only at the level of spaces
    1. Datasets and collections authorization is based on space
    2. For resources in multiple spaces take the union of permissions
  2. Only the owner can add a dataset/collection to a new space
  3. In a world where resources can be in multiple spaces, spaces becomes a view into the data, not a simple self contained place
  4. What happens if D1 is in C1, C1 is in S1, but D1 is not in S1?
  5. Publishing a dataset or collection for public viewing will be done as a separate feature from managing permission on a space level
Permissions Cleanup:

(Note this is the list from api.Permissions.Permission. It's pretty low lever and it's what controller look for in the case of secured actions)

 

 Public -> Public (will eventually be removed)
 Admin -> *Keep*
 CreateCollections -> CreateCollection
 DeleteCollections -> DeleteCollection
 EditCollection -> *Keep*
 ListCollections -> *Remove* (see ViewSpace)
 ShowCollection -> ViewCollection
 CreateSpaces -> CreateSpace
 UpdateSpaces -> EditSpace
 DeleteSpaces -> DeleteSpace
 EditSpace -> *Keep*
 ListSpaces -> *Remove*
 ShowSpace -> ViewSpace
 CreateDatasets -> CreateDataset
 DeleteDatasets -> DeleteDataset
 ListDatasets -> *Remove* (see ViewSpace)
 ShowDataset -> ViewDataset
 SearchDatasets -> ViewDataset
 AddDatasetsMetadata -> AddMetadata
 ShowDatasetsMetadata -> ViewMedata
 CreateTagsDatasets -> AddTag
 DeleteTagsDatasets -> DeleteTag
 ShowTags -> ViewTags
 UpdateDatasetInformation -> EditDataset
 UpdateLicense -> EditLicense
 CreateComments -> CreateComment
 RemoveComments -> DeleteComment
 EditComments -> EditComment
 CreateNotes -> CrateNote
 AddSections -> AddSection
 GetSections -> VieSections
 CreateTagsSections -> AddTag
 DeleteTagsSections -> Delete Tag
 CreateFiles -> AddFile
 DeleteFiles -> DeleteFile
 ListFiles-> *Remove* (everyone should be able to)
 ExtractMetadata -> ViewMetadata
 AddFilesMetadata -> AddMetadata
 ShowFilesMetadata -> ViewMetadata
 ShowFile -> ViewFile
 SearchFiles -> ViewFile
 CreateTagsFiles -> AddTag
 DeleteTagsFiles -> DeleteTag
 CreateStreams -> CreateGeoTemporalStream
 AddDataPoints -> CreateGeoTemporalDatapoint
 SearchStreams -> ViewGeoTemporalStream
 AddZoomTile -> CreatePreview
 Add3DTexture -> CreatePreview
 AddIndex -> CreateIndex
 CreateSensors -> CreateGeoTemporalSensor
 ListSensors -> ViewGeoTemporalSensor
 GetSensors -> ViewGeoTemporalSensor
 SearchSensor -> ViewGeoTemporalSensor
 RemoveSensors -> ViewGeoTemporalSensor
 AddThumbnail -> CreatePreview
 DownloadFiles -> *Same*
 GetUser -> ViewUser
 AddProject -> EditUser
 AddInstitution -> EditUser
 UserAdmin -> Admin
New List:
val Public, // Page is public accessible, i.e. no login needed
 Admin,

 // spaces
 ViewSpace,
 CreateSpace,
 DeleteSpace,
 EditSpace,

 // datasets
 ViewDataset,
 CreateDataset,
 DeleteDataset,
 EditDataset,

 // collections
 ViewCollection,
 CreateCollection,
 DeleteCollection,
 EditCollection,

 // files
 AddFile,
 DeleteFile,
 ViewFile,
 DownloadFiles,
 EditLicense,
 CreatePreview, // Used by extractors
 MultimediaIndexDocument,
 CreateNote,

 // sections
 CreateSection,
 ViewSection,
 DeleteSection, // FIXME: Unused right now
 EditSection, // FIXME: Unused right now

 // metadata
 AddMetadata,
 ViewMetadata,
 DeleteMetadata, // FIXME: Unused right now
 EditMetadata, // FIXME: Unused right now

 // social annotation
 AddTag,
 DeleteTag,
 ViewTags,
 AddComment,
 DeleteComment,
 EditComment,

 // geostreaming api
 GSCreateStream,
 GSAddDatapoint,
 GSViewDatapoints,
 GSAddSensor,
 GSViewSensor,
 GSDeleteSensor,

 // users
 ViewUser,
 EditUser = Value

  • No labels

7 Comments

  1. There would seem to be many issues ala note #4 with this approach: access suddenly becomes path dependent as soon as collections/datasets can be shared across projects. (And we dropped fine-grained permissions just to avoid this.) I'd also question whether/how different spaces can manage using different metadata terms, or choose different publication targets, or manage relationships, etc. if it is not clear which space the collection/dataset is 'from'. Even if technically possible, explaining to users when their comments in one space are being seen in other spaces, or why there are comments/edits from people who have no access to the space may be hard.

    In the discussion of publication, the idea of including published collections/datasets "by reference" has come up and seems to help. Thinking of sharing across spaces as 'pre-publication' of a frozen version of the collection/dataset might help here as well - there is one home where the data is live/editable/publishable, etc. all the other copies are read-only/frozen.

     

    1. There are some good issues raised here, especially in terms of user experience. Trying to keep things intuitive and simple with this set up will be non-trivial, and I believe will need to be carefully discussed and designed. For example, will there be a way for users associated with a space to have a discussion or exchange comments/notes/etc on a resource that has been shared but where that discussion would stay only within scope of the space? If not, is it something people may want or have a use for?

      Or when sharing a resource, would it be possible for the owner of the resource to specify or limit comments/notes/tags on the resource to a specific set of spaces? In a workflow where there is a project team and also collaborative reviewers, this may be useful at some stages.

      1. It's a bit of an aside for this thread, but comments seems to be enough of a special case that it might be something to treat separately. I could see comments frequently being too informal to be something that gets published (or at least they will require review). Flipping the design so that there's one threaded comment tree per space that mentions people and datasets, versus having comments directly associated with collections/datasets might make this easier to manage and might be more powerful - "Hey @Mario, I just looked @D1 and @D4  and think they show sensor @S3 is out of whack." - that type of process discussion doesn't really belong on any one object...

        Anyway - for another day.

        1. I like this idea as well. The question is how would we highlight that certain comments are accessible for one space and not the other. Would we start listing comments by space?

          One option would be to have comments/forums on a space level and then add annotation to get something like what you are describing.

          1. I think we're talking about the same thing - comments belong to a space and they are displayed in a way that makes it clear they are not part of a collection/dataset. 

            When publishing/sharing, there could be some option to attach relevant threads or to move comment threads/excerts/summaries into other metadata fields, i.e. this might be part of what you do in a staging area is to clean up this type of detal.

    2. I agree it is potentially confusing for the user and like what you are proposing. What would be the differences between viewing a datasets from a space it is part of and viewing a dataset in a space where it's only referenced?

      For example, user A has access to space X and user B to space Y. Dataset D is in space X and referenced in space Y. What does A view and what actions does she have accessible to her? What does B view and what actions does she have accessible to her?

      How can we in the interface highlight which space a resources belongs to and which one it is linked in?

      1. I think if we consider sharing and publication in concert with versioning (something else we've said we want for its own value) it can all hang together. There's relevant discussion in Curation/Publication ModelModel for research object lifecycle through SEAD systemDesign Issues for SEAD 2.0, but I think the broad view would be:

        Sharing and publication both create an object (~ an RO) that is fixed and independent of context (in terms of access control, vocabulary, and anything else we make project spaces or collection hierarchy handle). This object is composed of versions of the collections/datasets/files selected, plus some additional metadata / the selected objects are versioned when you publish/share. There could be two options in the originating space - show the frozen versions or auto-create new live versions to allow continuing editing. In the originating space and elsewhere, the fixed/reference data would be displayed with some indication (page color/icons/etc) that makes the no-edit/fixed content clear and references the published/shared object it came from. Shared/published items pushed/pulled into a new space would be governed by their licenses (published would usually be public, perhaps with an embargo, shared would probably list specific spaces that are allowed and might have a no-derived-works policy - we can decide whether we just display such things or try to enforce some of that.) If you need to edit, you create a new live item (if allowed, perhaps a version if you are in the original project space and a derived item if you are in another space) that conforms to the rules of that space (e.g. in terms of access control, allowed/required vocabulary, publication choices, etc.)

        There's probably more details to be worked out to make everything fully consistent (within spaces and with the designs for ACPS and how repositories see the world), I think simple versioning of an unpublished/unshared object fits within this scheme too. When you version, a fixed copy is made and can be viewed in the same way as imported published/shared items.

        I think a way to make this consistent is to write down one description (more detail on the above) and then a series of use cases like your question, and any time a new use case comes in, if any change to the model is needed, you make sure it is back-propagated to the other cases so we continue to have one consistent model.

        For your A/B/X/Y case, B would see a static D1(version i) and, if B has author privs in Y could create a derived D2 that they could edit/change/extend (as usual). Whoever created the share in X would have decided whether to replace the shared dataset with its static version, in which case A sees the same as B (but would have the option to create a new version rather than a derived dataset if they were an author), or, if the sharer kept a live version going, A would see a live, editable D1 that would indicate that a static D1(version i) exists (and could be viewed).

         

        I think this can be extended to sharing a whole collection, etc. and in those cases, limiting B to deriving new collections/datasets, we avoid issues of B trying to add a live dataset to a fixed/reference collection.