Friday, October 26, 2012

Google Compute Engine breaks Hadoop Terasort World Record

While I've been light on posts as of recent, I though I'd share a cool result that Google Cloud and MapR accomplished together.

This has been the result of a lot of performance analysis, tuning, and much more but nothing is specialized.  You get this experience without any effort - just boot virtual machines and go.

Here is a press release on the accomplishment:

The Cloud is powerful.


PS.  My Google Cloud Security, Performance and Test team is hiring :)

Wednesday, October 3, 2012

Testing 2.0: Enter the human

Over the last couple of weeks I've started writing and speaking on what I'm calling Testing 2.0.  To get an overview on this new chapter in test check out this post.

As a follow on to the Google Testing Blog post I had the honor of speaking at YaC in Russia.  The talk was focused on how to improve engineering productivity one of the focuses of Testing 2.0.  You can see the talk here (video soon).

On a side note... visiting Russia was awesome.

Saturday, March 24, 2012

Ads can steal your power - Mobile trade-offs

In reading the following article the other day "Study: Free Android Apps Can Steal Your Phone's Power" I was reminded of all the trade-offs one has to make when designing mobile applications.  Before we dig into some of the trade-offs one does have to wonder about the purpose of one company doing a sanctioned study of another companies products. We'll leave digging into that topic for the another day.

Now back to the topic of mobile trade-offs.  Some might argue this, but the single most important thing to design for is minimal power consumption.  Power is so important on mobile platforms because users really don't want to hang out at charging pods in airports, plug in on the train, plugging in at a friends house, ... One of the big delighters of the original Kindle was it could run for weeks on a single charge.  The newer Kindle Fire lasts less than a day like most of current gen platforms.  In looking at the new iPad3 it has a battery almost twice the size without any addition device life ~10 hours.   So where is all that power going?  The two biggest draws on power are the screen and radio.  You have some control of the power consumption on the screen.  The dimmer you go the longer you go.  The iPad3 has a super dense screen and new graphics CPUs.  I wonder how many people would keep the old screen and CPUs for 20 hours of use on an iPad2? 

I have a Windows Phone 7 (yes I am willing to admit I have one - I'm a techie) and just use it as an in house wi-fi device.  What totally surprised me is the phone lasts for weeks without the radio on.  The first time I had that happen I was pretty surprised by just how much impact the radio did have.

While I've talked able the screen and radio we can't ignore use of the processor (CPU).  Most mobile devices have ARM chips which do all kinds of cool things like clock the CPU lower when not in use, have low power cores for when the phone is idle, etc.  The take-away is doing really CPU intensive work like my fractal app at will drain your power.  It has my phone because zoom in is so cool :)

So what are some of the trade-offs for mobile you should be thinking about?
  • Do you really need to send data to your backend server every 10sec or would once an hour be good?  Not only will this save power it will also cost your customers less of their network bandwidth.  (Rule #1 - Use the radio less)
  • Does having a white background really make the App better? Can you choose dimmer colors? If the App is idle should you dim the screen? (Rule #2 - lighting up more pixels uses more power)
  • What about pushing more computation to the server rather than having the phone do it? (Rule #3 - Dont use the CPU for big calculations)
The last point around doing work on the server rather than the phone is one of the reasons Google AppEngine is growing so quickly for mobile.  The more you can do on the server the more power will be saved.  Its also very likely the computation will go much faster on the server.  The trade-off on the developer end is how to manage cost while getting the best customer experience.  By the way computation in the Cloud are far "greener" than on any other computer device.  But that is a topic for another day.

Hope this got you thinking....

     Anthony F. Voellm (aka Tony)

Friday, March 2, 2012

Old rules of thumb always need to be reconsidered

After being in the computer industry for a while you begin to appreciate just how much machine capabilities change and the need to change designs along with them.  For example just 10 years about developers would spend hours trying to find ways to save a few bytes of memory.  Now most of the code the world runs is via interpreters (JiT, PY, PhP, Jscript, ...) and a few bytes is less interesting.  I'm not saying to waste them but I personally would not put it as my first priority.  

Let me give you an example of how changes in machine capabilities caused the rethinking of an OS.  In Windows XP Microsoft Engineers designed the memory manager to aggressively push data from main memory (RAM) to disk.  This was done because RAM was costly and very small (~128MB) at the time and so if more memory could be freed up new applications could start faster.  If you waited until an application started to free memory users would wait 30 seconds to minutes before the application was usable because of paging RAM to disk.  Between the time Windows XP and Vista shipped RAM prices dropped dramatically (from $40 for 128MB to just $2 dollars). 

With the dramatic change in memory prices and the fact disks did not really get any faster Vista fundamentally broke from the past rule of thumb of free up as much RAM as possible and push it to disk to just the reverse.  RAM was cheap and relatively plentiful so a feature called SuperFetch was created to aggressively page in data from disk to RAM.  Based on the decision to not force RAM to disk overall UI performance seemed to be more snappy in Vista.  No more shaking the mouse after lunch with XP and waiting a minute+ before logging in.

Well it looks like with the improved performance of CPU's and networks old rules of thumb around UI responsiveness are starting to be reconsidered.  Some early UI research in 1968 by Miller and 1991 by Card  lead to rules of thumb for UI regularly cited in "Response Times: The 3 important limits" and extended for the World Wide Wait, I mean Web.

Here is a recap of those rules and a few more that have been adopted from experience and very likely paper I've read long ago and forgotten:

  • Users consider 100ms response times fast
  • At around 1 seconds users will notice a delay but are tolerant
  • At 5 seconds users are starting to get impatient and may take action
  • At 10 seconds they lose focus
  • At 15 seconds they are likely to hit “refresh”
  • At 30 seconds they generally navigate away and don’t come back if there is an acceptable alternative.
Well it looks like even hard earned rules of thumb for UI and Web are now falling as seen in a recent NYTIMES article "For Impatient Web Users, an Eye Blink Is Just Too Long to Wait". Based on this article it looks like 250millisec is the new goal for web responsiveness rather than 1 second as we had all used.

The overall morale of the story is don't hold on too dear to those rules of thumb and perhaps you should rethink them often. 

  -- Anthony F. Voellm

Monday, February 27, 2012

Fix security bugs early - Interesting paper

Interesting paper - Find security bugs before they release because of the high cost to fix later.  Internet Apps change some of the cost dynamics however the that does not mean fixing early is less important because its hard to fix your reputation.

Monday, November 7, 2011

A look at the Fundamentals in the Cloud

If you are interested in the Cloud and testing the following is a talk I did at GTAC2011 that might be interesting to you.

Part the Clouds and See Fact from Fiction

Sunday, October 30, 2011

Old performance addage... Polling is bad

Its long been known that polling is bad.  It uses a ton of resources.  The challenge is it trips up even great developers.  Check out...

One way to catch this is to have a good set if resource monitoring tests.  Its very likely Apple had these however its hard to catch with so many ways to configure software.  This is where collecting these same resources from released devices can help (crowd sourcing test).  Check out for example Microsofts SQM (aka Customer Improvement Program) data.

Should you decide to collect telemetry just remember the second addage... Bad collection is like polling.