...
- Keep a registry of file types that can be directly opened by the comparison tool(s)
- For each job request converting from format A to format B
- Find a format alpha that can be reached from A that is within the set of loadable formats
- Find a format beta that can be reach reached from B that is within the set of loadable formats
- If both alpha and beta exists exist carry out the conversions from A to alpha and B to beta
- Compare the files of type alpha and beta. If the difference between alpha and beta is below some threshold record this edge as a good edge within the I/O-graph
The above algorithm assumes that a conversion to alpha and beta resulting in any information loss incurred from the conversion from A to B being undone is HIGHLY UNLIKELY (proof required).
Gliffy Diagram | ||||
---|---|---|---|---|
|
We implement this means of measuring information loss as follows:
- In PolyglotStewardAMQ create a new method convertWithLoss(...) that carries out the above algorithm,
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key BD-1313 - Modify Polyglot.java to add a function convertWithLoss(...) that calls convert by default
- Create a list of loadable formats by the comparison tools
- Call the DAP with a conversion request to alpha (make sure this request doesn't also attempt information loss estimation),
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key BD-1317 - Call the DAP with a conversion request to beta (make sure this request doesn't also attempt information loss estimation)
- Add comparison tools to DTS,
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key BD-1300 - Call the DTS with an extraction request for alpha
- Call the DTS with an extraction request for beta
- Add helper methods descriptor_set_distance and descriptor_distance (porting from https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/bdcli/browse/bd.py),
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key BD-1314 - Use descriptor_set_distance to compare extracted JSON from alpha and beta, if a match is found mark edge as good in I/O-graph (e.g. 1 vs 0)
- Save as a record in mongo document in the form:
...
- Application, A, B, 0/1
- Add code and flag to PolyglotStewardAMQ.conf to load edge weights from mongo on Polyglot load,
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key BD-1315 - Add endpoint to PolyglotRestlet.java that uses edge weights to determine best path,
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key BD-1316