Working on a project to push data off the mainframe using DFDL and parse it using Daffodil to store in an ODS. The data is currently stored using charset IBM-1047-S390, which is EBCDIC (yes, old school). We have been limited by our system team as they won't let us to any transformation on the mainframe.
I am having a conversation with the apps team that is looking at the daffodil piece. We saw the list of charsets available (utf-8, utf-16BE, utf-16LE, utf-32BE, utf-32LE, and ASCII) but is that just for "encoding" directive for specific fields (dfdl:encoding = "UTF_8"), or the binary charset that the parser can accept, or both?
The reason for the limited number of character sets is testing really, not that they're in any way hard to support or add. Ultimately we pass through that name to the underlying Java/ICU libraries. So we can add support for an encoding that is supported by Java/ICU very easily.
Before we do that I'd like to explore a few other things that come up in Mainframe data. Basically, we haven't implemented mainframe aspects of DFDL generally, as those funding our work don't have that kind of data, and IBM has a DFDL implementation that covers all the mainframe issues quite well.
So, while we could easily add an EBCDIC encoding to Daffodil, we also lack support for packed decimal, zoned decimal, IBM 390 binary floating point, the P and V parts of the number formatting, and lengthKind 'prefixed'. Those things are a larger project to add, so can't be done quickly. We hope to have them done late this year. If you need those, in addition to just the EBCDIC charset, then I would suggest to you to try the IBM DFDL implementation. If you really think your data is all text, just EBCDIC, then perhaps we can quickly add that and get you going.
And...if you know any developers who want to contribute to Daffodil to implement these we'd appreciate the help.
Added JIRA Ticket: DFDL-1729 for enabling ebcdic and other encodings.
Thanks for the quick response. It certainly makes sense. We didn't figure adding a new charset would be a major problem.
As for the other aspects of DFDL from the mainframe. Yes, I know IBM has products to sell us to handle it. We don't want to have to be tied to them, but they are being investigated.
Only one of the problems affect what we currently do from our mainframe, as far as I have been able to determine with a few hours of investigation. We do have packed decimals. I do not see any zoned decimal at least in the first batch of records we are looking to use this for. on the lengthKind=prefixed, that is not an option for the IBM DFDL version we use. We only use implicit or explicit. Based on your note, I assume normal hex (X) and binary digits (B - which we use for bit settings) are not an issue.
As far as contributing, that is possible with our company. We just have to get the approval and the paperwork. I will have someone check into that.
Thanks - Tom
Powered by a free Atlassian Confluence Open Source Project License granted to NCSA OpenSource. Evaluate Confluence today.