Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-687

CSV Performance degrades rapidly as file size increases

XMLWordPrintableJSON

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • s15
    • None
    • Back End
    • None
    • Fedora 18 x86_64, java version "1.7.0_19", 20GB of RAM, sbt launched with "sbt -mem 8192"

      I added a number of tests that measure performance as file size increases for CSV files. Currently, performance degrades quickly as the file size gets larger. For example, I got the following numbers by running the test on the machine described in the environment section:

      200k - 58.3 kb/s
      400k - 36.3 kb/s
      600k - 26.8 kb/s
      800k - 20.3 kb/s
      1m - 15.0 kb/s
      5m - 1.7 kb/s

      The tests have been added to the daffodil-perf/src/test/.../csv/TestCSV.scala

      The data files are located in a different repository on Tresys's network:

      svn+ssh://username@repos/repos/svn/ngf-dfdl/Input/csv

              jchab Jessie Chab
              jadams Joshua Adams
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 2 days, 1 hour, 33 minutes
                  2d 1h 33m