There are two test here. Test 1 which looks pretty erratic and test 2 which looks pretty stable and repeatable around 3 ms. Given this I'm pretty sure you are going to choose test 2 on which to monitor and track performance. The test is very stable meaning the variance is very low and repeatable because it does not drift off over time. For an example of drift check out the following:
Here you can see the results are pretty stable in that the overall variance from number to number if pretty low however the results are not very repeatable and seem to drift up over time. For this particular test high ping time is bad. This is also and example of "death-by-a-thousand cuts" where from test to test results look good but over long periods of time you see performance is dropping off.
So the question then comes up how do you make stable and repeatable performance tests? The answer is to follow a test pattern like the xUnit pattern with a couple of extra steps. The pattern is the following:
- something most tests forget
Notice the following additional tests - warmup, step 4, publish, cleanup. Now let me explain these steps.
Warmup - This step is here to allow the performance test to "warm-up" the system under test. For example if you want to measure database queries generally you have to decide if you want hot (most likely the common case) numbers where the database has been in use for a while or cold numbers which is the state right after boot/init/etc. By having warmup you can test both hot and cold tests by the additional or removal of this step. An example might be selecting 10 rows from a database before doing the general select tests.
Step 4 - Ahh... the mystery. What is step 4? Take a quick look back at the first graph. Any ideas? Well the answer is VALIDATE. Most performance tests forget to validate the results they are getting. In the previous step of warmup we said to select 10 rows. Did the test actually return 10 rows? If not there is likely some error. Be sure to check your results and dont publish them if there was an error. Generally on performance graphs invalid results look like super high, 0, or super low numbers.
Publish - This is the act of pushing the result into your tracking infrastructure. Performance results tend to have a lifttime of usefulness however there is always good cause to look back over time.
Cleanup - Cleanup is like teardown without exiting all layers of initialization. Generally the role of cleanup is to get things put back in order so the test can be run again with minimal side-effects. For cold performance results you will need to teardown.
While execute is not a new step in the performance pattern I wanted to mention it because often times in performance tests you want "stable" numbers. This is generally achieve by running the execute step a number of times and averaging or repeating steps 3 - 6 a number of times. While averaging is often the right answer it can sometimes hide performance issues. Perhaps I blog on that another day.
Now that you have a solid performance test pattern go forth and create amazing results....