Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Normal
Fix Version/s: deferred
Affects Version/s: None
Component/s: Performance, QA
Labels:
None

A big improvement for these reports would be to make them "self-noise-eliminating", so unlike the report attached, one could eliminate all the red-lights that are about deltas that are "in the noise".

We want to attract attention (i.e., red-light) deltas that represent a statistically significant drop in performance. This can be a drop relative to prior performance of this branch, or a drop relative to prior performance of a baseline release.

To do this you need variance-based statistics like Z-score, which is based on standard deviation. Z-score means "how many standard deviations away from the mean is this value." Z-score's between -1 and 1 imply "it's ordinary variation, due to noise most likely". Z-score outside of -1 to 1 implies "it's significant. take a look."

We need the mean and standard deviation of (previousVal - baselineVal). We can then compute (currentVal - baselineVal), and if its z-score is < -1.0, then we would red-light the value - it means there is a statistically significant degradation in performance (relative to the baseline) due to this commit's code changes. This would only red-light changes due to this code commit. If a test performance is relatively unchanged day to day, but always slow relative to the baseline, this would not red-light that day's delta.

We probably also want to red-light if there is a general degradation in performance even for tests that are running faster than the baseline, so we would also want mean and standard deviation of previousVal, and similarly red-light if the delta z-score (relative to previousVal) is < -1.0.

And we want to red-light (or pink-light) tests that are simply slower than the baseline by a statistically significant amount as an ongoing trend. So we would include the currentVal in the mean and stdDev(previousVal), and for mean and stdDev(previousVal - baselineVal). Like everything else here, the assumption is these values are time taken, so lower is better/faster. If the mean of previousVal-baselineVal is negative by more than the stdDev(previousVal - baselineVal), then the trend is that this test is slower than the baseline by a significant amount on an ongoing basis, so we should "pink light" the test results. That particular day's run might or might not have reflected a statistically significant improvement or degradation, but the trend is still below the baseline by a statistically significant amount.

This takes all the noise variability out of the color highlighting.

Example:
baseline is 200, previous is 150, current 139. Mean of prev-baseline is 175, and std-dev of prev-baseline is 12.

So, current - prev-baseline is -36. Z-score of that is -3.0 which is < -1.0. So red-light goes on.

Example 2:
Current is 120. Mean of previous is 142, standard deviation of previous is 12.
Delta from mean is -22. zscore is -22/12 = -1.83 which is < -1.0, so we red-light this because it represents a statistically significant drop in performance from the average for that test.

Example 3:
Current is 120, folding that into mean and std deviation of (previous - baseline) gives mean -20 stdDev of 10. That means the test is generally 20 units slower than the baseline. The z-score of -20 relative to stdDev 10 is -2.0, so we would "pink light" the test, as generally being slower than the baseline on an ongoing basis.

The inverse of these - statistically significant improvements, could generate green-light, (or light-green).

To compute this you need at least 12 points of history so that you can have a meaningful mean and standard deviation to compute from.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Performance report for 05-30-2016.html
16 kB
01/Jun/16 12:28 PM

Assignee:: Unassigned

Reporter:: Mike Beckerle

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 01/Jun/16 12:20 PM

Updated:: 15/Sep/17 12:55 PM

Details

Description

Gliffy Diagrams

Attachments

Attachments

Activity

People

Dates

Tasks