CPU Ready to %RDY Conversion

October 21st, 2010 by jason No comments »

Most customers expect x amount of performance out of their virtual machines which is going to be dependent on established Service Level Agreements (SLAs).  Capacity planning, tuning, and monitoring performance all play a role in meeting customer SLAs.  When questioning performance of a physical machine, one of the first ubiquitous metrics that comes to mind is CPU utilization.  Server administrators are naturally inclined to look at this metric on virtual machines as well.  However, when looking at VM performance, Ready time is an additional metric to be examined from a CPU standpoint.  This metric tells us how much time the guest VM is waiting for its share of CPU execution from the host.

I began learning ESX in 2005 on version 2.0.  At that time, the VMware ICM class focused a lot on leveraging the Service Console.  At that time, vCenter Server 1.x was brand new and as such, ESXTOP was king for performance monitoring.  In particular, the %RDY metric in ESXTOP was used to reveal CPU bottlenecks as described above.  %RDY provides statistics in a % format.  I learned what acceptable tolerances were, I learned when to be a little nervous, and I could pretty well predict when the $hit was hitting the fan inside a VM from a CPU standpoint.  Duncan Epping at Yellow Bricks dedicates a page to ESXTOP statistics on his blog and at the very beginning, you’ll see a threshold he has published which you should keep in the back of your mind.

Today, ESXTOP still exists fortunately (it’s one of my favorite old-school-go-to tools).  The Service Console is all but gone, however, you’ll still find resxtop in VMware’s vMA appliance which is used to remotely manage ESXi (and ESX as well).  But what about the vSphere Client and vCenter Server?  With the introduction of vCenter Server, the disappearance of the Service Console, and the inclination of a Windows based administrator to lean on GUI based tools as a preference, notable focus has moved away from the CLI approach in lieu of the vSphere Client (in conjunction with the vCenter Server). 

Focusing on a VM in the vSphere Client, you’ll find a performance metric called CPU Ready.  This is the vSphere Client metric which tells us how much time the guest VM is waiting for its share of CPU execution from the host just as %RDY did in ESXTOP.  But when you look at the statistics, you’ll notice a difference.  %RDY in ESXTOP provides us with metrics in a % format.  CPU Ready in the vSphere Client provides metrics in a millisecond summation format.  I learned way back from the ICM class and through trench experience that ~10% RDY (per each vCPU) is a threshold to watch out for.  How does a % value from ESXTOP translate to a millisecond value in the vSphere Client?  It doesn’t seem to be widely known or published but I’ve found it explained a few places.  A VMware communities document here and a Josh Townsend blog post here.

There’s a little math involved.  To convert the vSphere Client CPU Ready metric to the ESXTOP %RDY metric, you divide the CPU Ready metric by the rollup summation (which are both values in milliseconds).  What does this mean?  Say for instance you’re looking at the overall CPU Ready value for a VM in Real-time.  Real-time is refreshed every 20 seconds and represents a rollup of values over a 20 second period (that’s 20,000 milliseconds).  Therefore…

  • If the CPU Ready value for the VM is, say 500 milliseconds, we divide 500 milliseconds by 20,000 milliseconds and arrive at nearly 3% RDY.  Hardly anything to be concerned about. 
  • If the CPU Ready time were 7,500, we divide 7,500 milliseconds by 20,000 milliseconds and arrive at 37.5% RDY or $hit hitting the fan assuming a 1 vCPU VM. 

What do I mean above by 1 vCPU VM?  The overall VM CPU Ready metric is the aggregate total of CPU Ready for each vCPU.  This should sound familiar – if you know how %RDY works in ESXTOP, then you’re armed with the knowledge needed to understand what I’m explaining.  The %RDY value in ESXTOP is the aggregate total of CPU Ready for each vCPU.  In other words, if you saw a 20% RDY value in ESXTOP for a 4 vCPU VM, the actual %RDY for each vCPU is 5% which is well under the 10% threshold we generally watch for.  In the vSphere Client, not only can you look at the overall aggregate CPU Ready for a particular VM (which should be divided by the number of assigned vCPUs for the VM), but you can also look at the CPU Ready values for the individual vCPUs themselves.  It is the per CPU Ready value which should be compared with published and commonly known thresholds.  When looking at Ready values, it’s important to interpret the data correctly in order to compare the right data to thresholds.

I’ve often heard the conversation of “how do I convert millisecond values in the vSphere Client to % values in ESXTOP?”  I’ve provided a working example using CPU Ready data.  Understand it can be applied to other metrics as well.  Hopefully this helps.

Hardware Status and Maintenance Mode

October 20th, 2010 by jason No comments »

I’m unable to view hardware health status data while a host is in maintenance mode in my vSphere 4.0 Update 1 environment.

SnagIt Capture

A failed memory module was replaced on a host but I’m skeptical about taking it out of maintenance mode until I am sure it is healthy.  There is enough load on this cluster such that removing the host from maintenance mode will result in DRS moving VM workloads onto it within five minutes.  For obvious reasons, I don’t want VMs running on an unhealthy host.

So… I need to disable DRS at the cluster level, take the host out of maintenance mode, verify the hardware health on the Hardware Status tab, then re-enable DRS.  It’s a round about process, particularly if it’s a production environment which requires a Change Request (CR) with associated approvals and lead time to toggle the DRS configuration. 

Taking a look at KB 1011284, VMware acknowledges the steps above and considers the following a resolution to the problem:

Resolution

By design, the host monitoring agents (IPMI) are not supported while the ESX host is in maintenance mode. You must exit maintenance mode to view the information on the Hardware Status tab. To take the ESX host out of maintenance mode:

1.Right click ESX host within vSphere Client.

2.Click on Exit Maintenance Mode.

Fortunately, this design specification has been improved by VMware in vSphere 4.1 where I have the ability to view hardware health while a host is in maintenance mode.

vCenter Storage Monitoring Plug-in Disabled

October 18th, 2010 by jason No comments »

Those who have upgraded to vSphere (hopefully most of you by now) may become accustomed to the new tab in vCenter labeled Storage Views. From time to time, you may notice that this tab mysteriously disappears from a view where it should normally be displayed.  If you’re a subscriber to my vCalendar, you’ll find a tip on July 18th which speaks to this:

Is your vSphere Storage Views tab or host Hardware Status tab not functioning or missing? Make sure the VMware VirtualCenter Management Webservices service is running on the vCenter Server.

The solution above is an easy enough resolution, but what if that doesn’t fix the problem?  I ran into another instance of the Storage Views tab disappearing and it was not due to a stopped VMware VirtualCenter Management Webservices service.  After a short investigation, I found a failed or disabled vCenter Storage Monitoring (Storage Monitoring and Reporting) plug-in:

SnagIt Capture

For those who cannot read the screen shot detail above, and for the purposes of Google search, I’ll paste the error code below:

The plug-in failed to load on server(s) <your vCenter Server> due to the following error: Could not load file or assembly ‘VpxClientCommon, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7c8-0a434483c7c50’ or one of its dependencies. The system cannot find the file specified.

I performed some testing in the lab and here’s what I found.  Long story short, installation of the vSphere 4.1 Client on a system which already has the the vSphere 4.0 Update 1 Client installed causes the issue.  The 4.1 Client installs a file called SMS.dll (dated 5/13/2010) into the directory C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Plugins\SMS\ overwriting the previous version (dated 11/7/2009).  While the newer version of the SMS.dll file causes no issues and works fine when connecting to vCenter 4.1 Servers, it’s not backward compatible with vCenter 4.0 Update 1.  The result is what you see in the image above, the plugin is disabled and cannot be enabled.

Furthermore, if you investigate your vSphere Client log files at C:\Users\%username%\AppData\Local\VMware\vpx\ you’ll find another similar entry:

System.IO.FileNotFoundException: Could not load file or assembly ‘VpxClientCommon, Version=4.1.0.0, Culture=neutral, PublicKeyToken=7c80a434483c7c50’ or one of its dependencies. The system cannot find the file specified.

Copying the old version of the SMS.dll file into its proper location resolves the plug-in issue when connecting to a vSphere 4.0 Update 1 vCenter Server, this much I tested, however I’m sure it immediately breaks the plug-in when connecting to a vCenter 4.1 Server (I didn’t go so far as to testing this).

Essentially what this boils down to is a VMware vSphere Client bug which is going to bite people who have both vCenter Server 4.0 and 4.1 in their environment, and the respective clients are installed on the same endpoint machine.  I expect to hear about this more as people start their upgrades from vSphere 4.0 to vSphere 4.1.  Some may not even realize they have the issue, after all, I didn’t notice it until I was looking for the Storage Views tab and it wasn’t there.  After lab testing, I did some looking around on the net to see if anyone had discovered or documented this issue and the only hit I came across was a recently started VMware Communities thread, however, there was no posted solution.  The thread does contain a few hints which would have pointed me in the right direction much quicker had I read it ahead of time.  Nonetheless, time spent in the lab is time well spent as far as I’m concerned.  Unfortunately, there’s no fix here I can offer.  This one is on VMware to fix with a new release of the vSphere 4.1 Client.

Update 12/1/10:  VMware has released KB 1024493 to identify this problem and temporarily address the issue with a workaround:

Installing each Client version in different folders does not work. When you install the first Client you are asked where you want to install it. When you install the second Client, you are not asked for a location. Instead, the installer sees that you have already installed a Client and automatically tries and install the second client in the same directory.

To install vSphere Client 4.0 and 4.1 in separate directories:

  1. Install vSphere Client 4.0 in C:\Client4.0.
  2. Copy C:\Client4.0 to an external drive (such as a share or USB).
  3. Uninstall vSphere Client 4.0. Do not skip this step.
  4. Install vSphere Client 4.1 in C:\Client4.1.
  5. Copy the 4.0 Client folder from the external drive to the machine.
  6. Run vpxClient.exe from the 4.0 or 4.1 folder.

I’m expecting a more permanent fix in the future which addresses the .DLL incompatibility in the 4.1 vSphere Client.

Update 2/15/11:  Through some lab testing, it looks as if VMware has resolved this issue with the release of vSphere 4.1 Update 1 although KB 1024493 has not been updated yet to reflect this.  I uninstalled all vSphere Clients, then installed vSphere Client 4.0 Update 1, then installed vSphere Client 4.1 Update 1.  The result is the vCenter Storage Monitoring plug-in is no longer malfunctioning.  The Storage Views tab is also available.  Both of those items are a positive reflection of a resolution.  The Search function is failing in a different way but I’m not convinced it has anything to do with two installed vSphere Clients because it is also failing on a different machine which has only one vSphere Client installed.

I’m a VCAP4-DCA

October 14th, 2010 by jason No comments »

I couldn’t have asked for a better night:  I attended the Minnesota Wild home opener with VMware, EMC, Tom Becchetti, Scot Joynt, met Paul Hokanson (TC with EMC), great customers, and the Wild defeated the Edmonton Oilers 4-2 in convincing fashion.  However, this was not the end of the evening coolness.  I checked my email when I got home and received the following notification from VMware:

Congratulations on passing the VMware Certified Advanced Professional vSphere4 Datacenter Administration exam!

I’m now a VCAP4-DCA.

On short notice, I was offered a chance  to sit the VCAP4-DCA BETA exam before it closed.  I drove 220 miles back in June to sit the exam.  I found the test to be extremely difficult and wasn’t expecting a passing score based on my experience.  I won’t go into the details now about the exam since I’ve already written about that previously.  Oddly, I sat the exam on 6/21/10, yet the date on the transcript shows 21-Jul-10.

I am pleased to have this exam in the books after previously thinking I would have to retake it.  It will allow me to focus on the VCAP4-DCD exam which will uplift my VCDX3 certification to VCDX4 certification.  Yes John Troyer, I am collecting them all.

Update 12/15/10: VMware has notified me that my transcript has been updated in the portal.  When I took a look, I saw I was awarded VCAPDCA-14.  I’m guessing this means #14.  If you don’t know what I’m referring to, VMware assigns sequential numbers to candidates who successfully meet the certification requirements, much like Microsoft did or still does (My MCP # from 1997 is 423097).  My VCP # is 2712 and my VCDX # is 34 (still not reflected in the portal).  On a podcast a few weeks ago, Jon Hall stated a new number would also be assigned for the VCAP4-DCD track.  I haven’t gotten the results of that BETA exam yet.

ESXi 4.x Installable HP Customized ISO Image DNA

October 12th, 2010 by jason No comments »

Those of you who are deploying ESXi in your environment probably know by now there are a few different flavors of the installable version you can deploy from:

  • ESXi 4.x Installable (the non-hardware-vendor-specific “vanilla” ESXi bits)
  • ESXi 4.x Installable Customized ISO Image (hardware-vendor-specific bits)
    • ESXi 4.x Installable HP Customized ISO Image
    • ESXi 4.x with IBM Customization
    • ESXi 4.x Installable Dell Customized ISO Image

Each of the major hardware manufacturers does things a little differently with respect to what and how they bake in their special components into ESXi.  There doesn’t seem to be much of a standard which the vendors are following.  The resulting .ISO file naming convention varies between vendors and even between builds from a specific vendor.  The lack of standards here can make managing a library of ESXi releases among a sea of datacenter hardware difficult to to keep track of.  It seems a bit careless if you ask me, but there are bigger fish to fry.

This short post focuses specifically the HP flavor of ESXi.  What’s the difference between ESXi 4.x Installable and ESXi 4.x Installable HP Customized ISO Image?  The answer is the HP ESXi Offline Bundle.  Essentially what this means is that if you install ESXi 4.x Installable, then install the HP ESXi Offline Bundle, the sum of what you end up with is identically equivalent to installing the ESXi 4.x Installable HP Customized ISO Image.

In mathematical terms…

SnagIt Capture

Where are these HP ESXi Offline Bundles?  You can grab them from HP’s web site.  Thus far, HP has been producing an updated version for each release of vSphere.  For reader convenience, I’ve linked a few of the most recent and relevant versions below:

In addition to the above, both ESX 4.1 and ESXi 4.1 on HP systems requires an add-on NMI Sourcing Driver which is discussed here and can be downloaded hereFailure to install this driver might result in silent data corruption. Isn’t that special.

vSphere Upgrade Path

October 11th, 2010 by jason No comments »

Old habits can be hard to break.  Such was the case today when I called out an individual for producing an ESXi 4.0 Update 2 upgrade document without referencing the requirement to upgrade vCenter 4.0 Update 1 to Update 2 first as a prerequisite. 

Up until the release of vSphere 4.0 Update 1 back in November of 2009, the VMware virtual infrastructure upgrade path was such that the vCenter Server was upgraded to the newer release, then the ESX(i) hosts were upgraded afterward.

As shown in the ESX and vCenter Server Compatibility matrix below, beginning with vSphere 4.0 Update 1, ESX(i) hosts can be upgraded ahead of their vCenter Server counterparts.  In fact, VMware allows a radically wider in versioning variation in that vCenter 4.0 (released May 2009, with no update) is compatible with ESX(i) 4.0 Update 2 which was released in June 2010, over a year later.

SnagIt Capture

After being corrected, I recalled hearing of this new compatibility some time back but the bits had fallen off the platter.  For the record, I’m not always right.  I’m fine with being wrong.  It happens plenty enough.  For me, it’s all about the learning.  Retaining the knowledge is an added benefit but isn’t always guaranteed if not used on a regular basis.

This mantra will provide some flexibility which may be needed to upgrade smaller groups of clusters or hosts (say for troubleshooting purposes) without impacting the centralized vCenter Server which in turn would impact the remaining clusters or hosts it manages by way of agent upgrades blasted out to each attached host.

Before you celebrate in the end zone Dallas Cowboys style, do note from the chart above that the upgrade to vSphere 4.1 reverts back to the old methodology of upgrading the vCenter Server first, and the attached ESX(i) hosts afterward.  In other words, ESX(i) 4.1 is ONLY compatible with vCenter Server 4.1.

Go Vikings!

Unisphere Client V1.0.0.12 Missing Federation

October 8th, 2010 by jason No comments »

A few weeks ago, the EMC Celerra NS-120 was upgraded to DART 6 and FLARE 30, in that order.  Before I get on with this post, let me just say that Unisphere is the bomb and offers at least a few opportunities for complimentary writing to give it the praise it truely deserves.  My hat is off to EMC, they answered the call (or was it the screams?) for unified management of unified storage. 

What was my opinion of the old sauce? 

  • Navisphere for CLARiiON block storage management was ok although it had a few bugs which forced a need to resort to NaviCLI once in a while.  Other than that, it looked old and was in need of a face/efficiency lift.  I’ve manged a few enterprise arrays from other vendors which have this same feel.  The biggest problem there being no end in sight of lackluster management or performance gathering tools.  Some vendors seem content with what they’ve always had which leads me to a few conclusions:
    • They don’t use their own software
    • The expectation is to use the CLI only
    • Hardware vendors can have outstanding hardware components but that doesn’t make them software developers
    • EMC has bumped it up a notch, at least with Unisphere – I can’t speak to Symmetrix management as I have no experience there
  • Celerra Manager for management of the Data Movers/iSCSI/NFS/CIFS was bug free, but very slow at times, particularly at first login.
  • Seasoned CLARiiON and Celerra TCs (as well as NetApp pros) might laugh at my tendancy to rely on GUI tools, but management of the storage is so few and far between, relearning CLI when a seldom task needs to be performed isn’t precious time well spent unless the tasks are going to be repeated often enough.

I’ve had some legacy Celerra software CDs sitting next to me in my den for several months (Navisphere, Celerra Network Server, etc.) and I will have no problem banishing them to the basement, probably not to be touched again until the next time the basement is cleaned out.  So look for some positive Unisphere posts from me in the future as I get the time.

Getting back on topic…  Earlier today I had finished taking a look at Nicholas Weaver’s SRM video.  Later, I was in the lab playing around with the EMC Celerra UBER VSA 3.2 (it’s the latest craze, you really must check it out).  I noticed a Unisphere feature Nicholas highlighted in his video which I don’t have on the Celerra NS-120’s build of Unisphere – the ability to federate storage array management in Unisphere via single pane of glass.

The Uber VSA has the ability to snap in multiple storage arrays into the Unisphere dashboard by way of an Add button:

SnagIt Capture

The Add button is missing in the Celerra NS-120’s build of Unisphere:

SnagIt Capture

The DART versions match at 6.0.36-4, however, the outstanding difference appears to be the Client Revision.  What’s worth pointing out is that the Add feature exists in the older client revision found in the Uber VSA, but is missing in the newer client revision found on the Celerra NS-120 which was upgraded a few weeks ago.

SnagIt Capture

I’m not sure if federation of multiple arrays was purposely removed by design or if it was an oversight, but it would be nice to get it back.  I should also point out that although federation appears to be missing for multiple arrays, it still exists and consolidates management intra of unified storage arrays consisting of CLARiiON block and the Celerra iSCSI/NFS/CIFS.

Update 3/4/11:  The Celerra NS-120 is now running DART 6.0.40-8, FLARE 04.30.000.5.511,7.30.10 (4.1), and Unisphere V1.0.0.14.  The Add feature to tie in multiple EMC storage frames into a single view is still missing.