I've encountered a number of situations where the ability to influence the execution of an extractor beyond what is available via file/dataset information and metadata would be convenient.
Having a way to pass additional parameters to an extractor could increase their utility and usefulness while allowing a single extractor to do more without having to be duplicated.
An example could be the image metadata extractor. ImageMagick currently extracts a "ton" of information about an image. Everyone seems to like the idea of having automatic image metadata extraction happening. However, many folks would prefer not get all of the metadata but rather a subset. Imagine if we could pass a set of parameters to image metadata extractor to instruct it to only extract certain terms that we were interested in. Another user could use the same extractor but pass a different list of terms. There are of course many other examples where parameters could be useful.
I believe others have talked about adding this capability so this isn't a new idea.
One question would be, how would it be best to implement the passing of parameters, since there could be different reasons you want to pass them.
Also, where is the logical place to provide these parameters?
Personally, I like the idea of having the concept of "parameters" that can be specified at the instance and space level (possibly user level too?) for more than just extractors. Extractors could be one of the uses, but the use could be extended to other configuration options/overrides within the Clowder app itself. Similar to CSS.
For example (sorry, but I always visualize things like this as XML files)...
(Instance Level)
<extractor name="ncsa.image.metadata" >
<parameters>
<term name="WinXP-Author" metaname="Author" />
<term name="Make"/>
<term name="Model"/>
</paramaters>
</extractor>
(Space Level - Space1234)
<extractor name="ncsa.image.metadata" override="true">
<parameters>
<term name="Artist" metaname="Author"/>
<term name="ImageDescription" metaname="Description"/>
</paramaters>
</extractor>
Would define that for ALL spaces the ncsa.image.metadata extractor should extract WinXP-Author (stored as Author), Make and Model. HOWEVER for space (Space1234) this is overridden and Artist (stored as Author) and ImageDescription (stored as Description) is extracted. If override had been false or omitted, all parameters would have been passed to the extractor and it would have been up to the extractor logic to decide how to handle the conflicting "Author" parameters.
Please share your comments and ideas!