All topics fundamental: October 2011

Its long been known that polling is bad. It uses a ton of resources. The challenge is it trips up even great developers. Check out... http://m.guardian.co.uk/technology/2011/oct/29/iphone-4s-battery-location-services-bug?cat=technology&type=article

One way to catch this is to have a good set if resource monitoring tests. Its very likely Apple had these however its hard to catch with so many ways to configure software. This is where collecting these same resources from released devices can help (crowd sourcing test). Check out for example Microsofts SQM (aka Customer Improvement Program) data.

Should you decide to collect telemetry just remember the second addage... Bad collection is like polling.

One of the biggest challenges in monitoring and tracking performance is getting stable and repeatable numbers. Check out the following plot of two performance tests. On which do you think it will be easier to spot regressions?

There are two test here. Test 1 which looks pretty erratic and test 2 which looks pretty stable and repeatable around 3 ms. Given this I'm pretty sure you are going to choose test 2 on which to monitor and track performance. The test is very stable meaning the variance is very low and repeatable because it does not drift off over time. For an example of drift check out the following:

Here you can see the results are pretty stable in that the overall variance from number to number if pretty low however the results are not very repeatable and seem to drift up over time. For this particular test high ping time is bad. This is also and example of "death-by-a-thousand cuts" where from test to test results look good but over long periods of time you see performance is dropping off.

So the question then comes up how do you make stable and repeatable performance tests? The answer is to follow a test pattern like the xUnit pattern with a couple of extra steps. The pattern is the following:

setup
warmup
execute
something most tests forget
publish
cleanup
teardown

Notice the following additional tests - warmup, step 4, publish, cleanup. Now let me explain these steps.

Warmup - This step is here to allow the performance test to "warm-up" the system under test. For example if you want to measure database queries generally you have to decide if you want hot (most likely the common case) numbers where the database has been in use for a while or cold numbers which is the state right after boot/init/etc. By having warmup you can test both hot and cold tests by the additional or removal of this step. An example might be selecting 10 rows from a database before doing the general select tests.

Step 4 - Ahh... the mystery. What is step 4? Take a quick look back at the first graph. Any ideas? Well the answer is VALIDATE. Most performance tests forget to validate the results they are getting. In the previous step of warmup we said to select 10 rows. Did the test actually return 10 rows? If not there is likely some error. Be sure to check your results and dont publish them if there was an error. Generally on performance graphs invalid results look like super high, 0, or super low numbers.

Publish - This is the act of pushing the result into your tracking infrastructure. Performance results tend to have a lifttime of usefulness however there is always good cause to look back over time.

Cleanup - Cleanup is like teardown without exiting all layers of initialization. Generally the role of cleanup is to get things put back in order so the test can be run again with minimal side-effects. For cold performance results you will need to teardown.

While execute is not a new step in the performance pattern I wanted to mention it because often times in performance tests you want "stable" numbers. This is generally achieve by running the execute step a number of times and averaging or repeating steps 3 - 6 a number of times. While averaging is often the right answer it can sometimes hide performance issues. Perhaps I blog on that another day.

Now that you have a solid performance test pattern go forth and create amazing results....

Tony

All topics fundamental

Sunday, October 30, 2011

Old performance addage... Polling is bad

Friday, October 28, 2011

Crowd sourcing Apple iPhone 4S power performance

Monday, October 17, 2011

Performance Test Pattern