-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Data Transformation Service 4, Data Transformation Service 5
Symptom: All dap-dev bi-hourly tests failed.
Polyglot got stuck in processing 1 job, having NullPointerExceptions, and caused all dap-dev bi-hourly tests to fail.
[Sat Feb 20 01:05:59 2016] [steward] [200590093]: Found path for http://username:password@141.142.227.81:8184/file/topo3scale.bmp->tiff, submitting as job-200590093
[Sat Feb 20 01:06:00 2016] [restlet] [200590093]: 128.8.216.157 request for http://username:password@141.142.227.81:8184/file/topo3scale.bmp->tiff will be at [[94mhttp://dap-dev.ncsa.illinois.edu:8184/file/200590093_topo3scale.tiff[[0m
[Sat Feb 20 01:06:03 2016] [steward] [200590093]: Submitting job-200590093's next step, topo3scale.bmp->tiff via ImageMagick
java.lang.NullPointerException
at edu.illinois.ncsa.isda.softwareserver.polyglot.PolyglotStewardAMQ.process_jobs(PolyglotStewardAMQ.java:521)
at edu.illinois.ncsa.isda.softwareserver.polyglot.PolyglotStewardAMQ$3.run(PolyglotStewardAMQ.java:597)
[Sat Feb 20 01:06:06 2016] [steward] [200590093]: Submitting job-200590093's next step, topo3scale.bmp->tiff via ImageMagick
java.lang.NullPointerException
at edu.illinois.ncsa.isda.softwareserver.polyglot.PolyglotStewardAMQ.process_jobs(PolyglotStewardAMQ.java:521)
at edu.illinois.ncsa.isda.softwareserver.polyglot.PolyglotStewardAMQ$3.run(PolyglotStewardAMQ.java:597)
First thought the software server VMs in Nebula had issues, but their logs showed no job processing. From polyglot.log on dap-dev, found that polyglot got stuck in processing 1 job, having NullPointerExceptions.
Restarted Polyglot, then it put 3000+ jobs into the RabbitMQ queues. The converters spent a while to process them. (Rui also manually removed the ".doc" and ".rtf" files, which were known to have issues.) Then the dap-tests passed.
The log files showing the starting of the job that got stuck and restarting polyglot were attached. Rui redacted the browndog user name and password.