Sunday, February 7, 2010

Why VMWare's VMmark Scores Have Become Useless

Well, it was bound to happen.  Every time an industry benchmark standard comes out, the manufacturers eventually figure out ways to "cook the books".  I've seen a LOT of FUD flying around from both HP and Cisco lately about VMmark scores and I have been asked a lot of questions about both platforms.  After a taking close look at the scores, I'm ready to throw in the towel.

Before I go further, take a look at the VMmark, 8 cores scores posted here.  You will see the Cisco B200 Blade is on top right now (of the major vendors, I don't count Fujistu, sorry Fujistu) with 25.06 and the HP is next with the BL490 24.54.  A couple of points:

What is the different between really really fast and really really really fast??

What is the difference between 25.06 and 24.54.  Maybe 1-2%?  Honestly, not much if they both meet your needs and you won't be pushing them to their limits.  I'm sorry but that is within a margin of error and/or the test could be reconfigured by everybody to meet the score.  At the end of the day both of them will meet your needs very well and the title of "fastest blade" means nothing!

Both Cisco and HP sell "big memory solutions" but they are no where to be seen!

Take a look at the memory in the details for both of them.  Both the HP 490 and B200 use 96GB of memory.  Where is the B250 with the larger memory footprint? Where is the BL490 with either 144GB or 192GB of memory?  You will also notice that the HP BL490 memory is running at 1333Mhz and the B200 is running at 1066 Mhz.  Since there is no big jump in performance numbers the VMmark score isn't memory bandwidth bound or HP would have had an advantage.  I suspect (although I don't have proof) that the VMmark score is now CPU bound and any memory above 96GB doesn't help the scores.  I further think (again, no proof) that the test isn't pushing the maximum memory bandwidth because there is no change from 1333 Mhz to 1066 Mhz.  It would be interesting to see if the drop to 800Mhz by HP would be noticed in the scores.

Cisco is using an EMC SAN with SSD's on the back end!

Take a look at the EMC Storage section on the Cisco benchmark.  They are using an EMC CX-240 with SSD drives!  There is NOTHING wrong with this, SSD's are coming down in prices but they provide a clear, known advantage to the IOP's numbers that could easily be the sole reason for the 1%-2% increase.  I'm willing to bet that if HP used the same storage configuration, they would produce similar scores.

Why didn't Cisco use the Palo card?

Cisco is using the Q-Logic CNA for the tests.  Why didn't they use the Palo card?  I suspect because it isn't "technically" released yet but that is the benchmark everyone wants to know about.

What am I trying to say here?

What I'm saying is that both HP and Cisco make great products and they will go to great lengths to make the other look bad.  They are so close to each other from a VMmark score perspective that any clear difference can't be shown with the current test.  Don't make a purchase based on a score!

Saturday, February 6, 2010

HP Blades Offer a 16GB DIMM, With a Catch

I found out something interesting on Friday, HP is offering a 16GB DIMM in their blades!  My first thought was wow, that sucker is gonna be expensive (and it is!).  But, after that I started to dig deeper as I always do, I found out something that is slightly disturbing.  The 16GB DIMM is actually a quad rank DIMM and not a dual rank DIMM and it is only 1066Mhz speed.  Many of you are saying... So what?

Well, this actually can make a difference from a design perspective.  Take a look at the Memory tables in the BL460 and BL490 QuickSpecs and you will see what I mean.

HP BL460 G6 QuickSpecs
HP BL490 G6 QuickSpecs

Most of the DIMMs sold by the major vendors are dual rank and 1333Mhz Speed.  Let me explain the concept of ranks first.  According to the Intel Nehalem architecture, you can only have 8 ranks per memory channel.  Each memory channel consisted of either 2 or 3 DIMMs per channel.  I have more information on memory layout in this article I wrote on Scott Lowe's site awhile back.  I never really discussed ranks because it was a limit you didn't hit.  You were only using either 4 ranks (2xdual rank dimms on the BL460) or 6 ranks (3xdual ranks on the BL490).  Quad rank DIMMs blows the math out of the water.  You can only put 2 on a memory bus to generate 8 ranks.  This means the BL490 no longer brings extra memory capacity to the table.  Both the BL460 and BL490 top out at 192GB with the 16GB DIMMs.

Memory speed is the next issue.  If you are running a memory bandwidth intensive application, you will expect about a 7% boost in performance by keeping the memory speed at 1333MHZ instead of dropping down to 1066Mhz.  Because the maximum speed of the 16 GB DIMM is 1066Mhz you will never reach a 1333Mhz speed.  Furthermore, populating both slots in the memory channel (the max of 12 DIMMs) drops the speed from 1066 Mhz to 800 Mhz.  The performance drop from 1333 Mhz to 800 Mhz is over 30%!!  This leads to an interesting trade off of memory capacity vs bandwidth speed.

While I applaud HP for thinking outside the box and bringing a 16GB DIMM to market, don't assume it is the same DIMM as the others.  Remember, "One of these is not like the others...."

My First Race (with Dougnuts)!!

Today I ran my first race!  I am very excited but this race also requires an explanation and some pictures.  A few years ago some NC State University college students started a challenge.  You had to race from the Bell Tower on campus to the local Krispy Kreme (about 2.2miles), eat a dozen doughnuts, keep them down, and run back, all in an hour.

I am training for my first 10k coming up at the end of March in Chrleston, SC so I thought this would be a good warm up for me.  The event is local, it was the right distance, and it is a little crazy and different so I was in.  I registered just in time because registration was cut off the day after I signed up at 6,000 runners.  Krispy Kreme's from all over the state made over 72,000 doughnuts and trucked them in for the race.  I signed up for the casual group meaning that I didn't have to eat the full dozen.  I ended up eating three.  I ran the race in 1:14 but that really doesn't matter because you stop half way through the race and hang out, eat doughnuts, drink water, and just enjoy the atmosphere.  Some highlights of the race to prove it is a little different:

  • Mother and Daughter with KK T-shirts and the words "Family Bondin Southern Style" written on them
  • The quote "3 hours sleep, 9 beers, 12 doughnuts, I'm starting to feel bad"
  • We had the "privilege" off seeing two challengers "lose" their dozen doughnuts
  • Got to see Fat Bastard, Elvis, Superman, Wonder Woman (it was a buy), Elmo, Cookie Monster and many people running in just shorts or underwear in 30-40 degree weather
  • The famous quote "I'm never eating another doughnut again" more than once
Here are some pics:

 
 

Thursday, February 4, 2010

Cisco UCS - How Many FEX Uplinks Do I Need?

As a Data Center Architect, I have a constant need to know not just HOW to do things, but WHY to do things.  As I dig deeper into the Cisco UCS system I find the concept of FEX (Fabric Extenders) very facsinating.  The number of FEX uplinks may not seem like much, but a couple of cables can have a very significant impact on the design of a UCS system.  If need a refresher on how a UCS system is set up, please see my first article for more information.

What is a Fabric Extended (FEX)?

In very simple terms, the FEX serves as the "pipe" between the blades in a UCS chassis and the 6100 Fabric Interconnects (FI's).  Each FEX has a maximum of four 10GB connections.  Think of it this way, you can "choose" your bandwidth back to the FI's in one, two, or four 10GB increments (the three uplink option isn't supported).  If you plug in one, you get 10GB bandwidth spread over 8 blades.  Need more bandwidth? Plug in the second to get 20GB.  If you really need the maximum then you can go for all four connections for a total of 40GB per FEX.  Remember, each UCS chassis contains two FEX's and each FEX is connected to one (and only one - don't cross connect them!) 6100.  If you plug in all eight connections, you achieve a maximum of 80GB to the chassis or 10GB per blade.  If you are interested in how the traffic flows from the blades to the FEX port (referred to as pinning), here is a great link from Rodney Haywood that details this relationship.  Here is a picture of a FEX close up:

 

Here is how you would cable them (don't cross connect them!!)

Why does this matter?

FEX uplinks directly affect three different areas:
  1. The bandwidth from the 6100's to the chassis
  2. The number of chassis supported per pair of 6100's
  3. The number of vNICs supported by the upcoming UCS Palo blade card.
Let's tackle each of them one by one

1. Maximum Bandwidth per Chassis based on FEX Uplinks -> (FEX's * Uplinks *10)

To calculate the amount of bandwidth available to a chassis: (FEX's * Uplinks *10).  So, If I have two FEX's, each with 2 uplinks, I have 40GB at my disposal (2*2*10).

2. The number of chassis supported per 6100 is inversely proportional to the number of FEX uplinks -> ((total ports - uplinks) / FEX Uplinks per chassis)

Every time you use more than one FEX uplink, you actually reduce the number of chassis you can plug into the system.  Let me use a simple example.

Let's say you have a 20 port Cisco 6120 with the 6x10GB module in it for a total of 26 ports.  To make the math simple, you decide to dedicate six 10GB links for northbound (out of the chassis) traffic.  You can support up to 20 chassis by using a single FEX uplink per chassis.  What if you need a second FEX uplink?  The number of chassis goes down to 10 because you need two per chassis but you only have 20 ports to physically plug into.  If you need need 4 FEX uplinks, then you can only support 5 chassis per 6120.

To calculate the maximum number of chassis, this is the formula to use: (total ports - uplinks) / FEX Uplinks per chassis.  To use the example above, a 6120 with 4 uplinks yields 5 chassis ((26-6) / 4 = 5)

UPDATE:  As pointed out by UnixPlayer, what about uplinks out of the chassis??  Great question and I admit it was late when I wrote this.  I have now updated this to include uplinks.  You of course need some uplinks or this will happen to you!

3. The number of vNICs and vHBAs supported on a Palo card is proportional to the number of FEX uplinks -> ((15*FEX uplinks)-2)

 As pointed out by Kevin on his article, the number of uplinks determines the number of vNICs the Palo card can present.  The thoeretical maximum of the card is 128 but only 58 can be used currently.  This is for both vHBA's and vNICs.

As you can see, there are some interesting design choices to be made based on bandwidth, scalability, and virtual I/O.  I really like the customization ability of the UCS system to tailor to the requirements of the customer but it is also very important to understand the relationships presented above when designing the system.

Tuesday, February 2, 2010

Cisco UCS Information for "Server People"

I've been working with the UCS equipment as time allows for the last few weeks.  I've also had the privilege to visit Cisco TAC for UCS here in Raleigh, NC to pick their brains a bit.  Here is a quick bullet list of some features that I found interesting from my server based perspective.

  • The amount of 10GB over subscription from the UCS chassis to the 6100's (Fabric Interconnects) is proportional to the number of uplinks.  There are four connections maximum per FEX, per chassis.  One uplink will provide an 8:1 ratio, two uplinks a 4:1 ratio, and lastly four uplinks for a 2:1 ratio.  Three uplinks is not supported.  (My next article will be an detailed article on the FEX's)
  • This may seem obvious to the Cisco folks but it wasn't to me.  The 6100's are "backwards".  They are designed to be mounted in the back of the rack so all cabling is towards the rear of the UCS chassis.  Cooling is "front to back" on the 6100's to match the UCS chassis.
  • You can "Mix and Match" adapters cards on each blade because the uplink is a common 10GB fabric.  This means if you only need a few Palo cards in a chassis and maybe CNA's on the rest, you can do that.  Service Profiles won't be compatible but you do have that flexibility
  • Only Cisco memory is supported on UCS blades.  No 3rd Party memory
  • The UCS Chassis needs 2 power supplies.  It ships with zero.  3 power supplies provide N+1 redundancy and four power supplies provides N+N.  As more power supplies are added, the load is distributed evenly across each power supply
  • The UCS Chassis has 8 fans but needs 4 to operate so it is N+N redundant
  • When a Chassis is plugged in, multiple blades are powered up in serial fashion to prevent an in-rush current spike that could blow the circuits.  This has been a problem with other blades customers of mine in the past.
  • The 6100's are active/active for 10GB data but are active/passive for management of the chassis.  At any given time one 6100 is active and constantly passing information over the L1/L2 connections to keep the passive management module up to date
  • The FEX connections on the back of a UCS chassis CAN'T be cross connected to the 6100's.  I'll have more information on this in the next article
  • The UCS Manager allows up to 4 KVM connections at one time.  I'm still checking if this is 4 per UCS Manager, 4 per chassis, or 4 per blade (If you know, please leave a comment!)
  • The maximum number of vNICs the Palo card can present is 56 and is dependent on the number of FEX links from the chassis to the 6100's.  I'm still getting details on this information and I will post this in the near future
  • The 6100 Fabric Interconnects are licensed per port.  The 6120 comes with 8 ports licensed and the 6140 comes with 16 ports licensed.  Additional ports must be purchased individually, kind of like an FC switch.  This applies to both Northbound and Southbound traffic (I REALLY don't like this!!!)
  • Smart Net needs to be purchased on the 6100's, each chassis, the blades, and the expansion modules in the 6100's.  Smart Net lasts for one year so if you want three year coverage, you need to purchase quantity 3 of Smart Net item for each.  This is VERY different from HP and IBM servers.
  • The first three chassis in a UCS domain (managed by the same 6100's) communicate via an SEPROM to verify and prevent a split brain scenario in the event of the 6100's losing communication 
  • The UCS Manager includes the ability to e-mail alerts and all "call home" to Cisco, much like a NetApp storage system

Monday, January 25, 2010

Cisco UCS vs IBM and HP - Where are the Brains?

UPDATE: Thank you to everyone for the great comments!  Please look for the updated sections that I have highlighted below.  I have learned a lot from everyone and I will continue to update this as more information rolls in.  I welcome any and all comments.  Thank you!

As many of you know, my company recently acquired some very nice lab gear for customer demonstrations and proof of concept work.  Many of my peers already know the UCS systems inside and out but I really need hands on to "get it".

As I learn the UCS system I will share my experiences here.  My perspective is to share what is different (good and bad) about UCS compared to the IBM and HP Blade products.  Before anyone asks, I will only be covering IBM and HP.  If you have additional experiences, please share them in the comments.  I also have no intention of picking sides.  At the end of the day I sell and support all of the above systems and I can get the job done with all of them.  They all have their own unique strengths and weaknesses that I intend to highlight.

In case you aren't familiar with what UCS is, I suggest you take a look at Colin's post over on his blog.  He does a great job putting all the pieces together.  Plus, I'm going to steal a few of his graphics. (thanks Colin!!)

A UCS system consists of one or more chassis and a pair of Cisco 6120 switches that provide both the 10GB bandwidth to the blades as well as the management of the system.  The last part of that statement is the key to understanding how UCS is currently different from the competition.  I define management in this example as the control of the blade hardware state.  This includes identification, power on, power off, remote control, remote media, and the virtual I/O assignments for MAC and WWPN's.

By moving the management from the chassis level to the switch level, the solution can now take advantage of a multi-chassis environment.  Here's a simple modification of Colin's diagram to illustrate this point.



(UPDATED!) What are the limitations to the Cisco UCS model?
Someone asked in the comments how this scales.  Honestly that was a great question.  I'm still learning Cisco and I was wrapped up in making it work.  Let's take a look at that.  Currently you can have up to 5 chassis per pair of UCS Managers (Cisco 6100's).  That number will increase in the upcoming weeks and eventually the limit will top out at 40.  But, the more realistic limitation is either 10 or 20 depending on the number of FEX uplinks from the chassis to the 6100's unless you are using double wide blades.  If you don't understand what that means right now, don't sweat it.  I'll be posting about that shortly.


(UPDATED) What if you need to manage more than the chassis limitations today?
If you need to go above the limit, then you have two options.  The first option is to purchase another pair of 6100's to create another UCS System and they will be independent of each other.  The second option is provided by BMC software.  This will allow you to manage more chassis and the solution also provides additional enhancements.  I admit I know little to nothing about the product so I'll just post the link from the comments and you can take a look.  The brain mapping for that would like this.




How do you get into the brains?
Each 6120 has an ip address and both 6120's are linked together to create a clustered ip address.  The clustered ip is the preferred way to access the software.  The clustering is handled over dual 1GB links labeled L1 and L2 on each switch.  They are connected together like this:




Cisco uses a program to manage this environment called creatively enough, Cisco UCS Manager or UCSM.  To access UCSM, point a browser at the clustered ip address.  Once authenticated, you will be prompted to download a 20MB java package (yes it is java, yuck!).  Here is a pic of ours with both chassis powered up.




Notice that both chassis are in the same "pane of glass".  This allows for management of all the blades from one interface and the movement of server profiles (covered later) from one chassis to another within the same management tool.


How does this compare to IBM? 

IBM is a two part answer.

IBM Part One - Single Chassis Interface in AMM

IBM uses a module in each BladeCenter chassis called the Advanced Management Module (AMM).  There can be up to two AMM's in each chassis.  If there are two AMM's, one is active and the other is passive.  They share the configuration and a single ip address on the network.  In the case of failure of the primary, the passive module becomes active and communication resumes on the original ip address.  The AMM will control power state, identification, virtual media and remote control out of the box.  Virtual I/O (both WWPN and MAC) is an additional purchased license in the AMM.  The product is called the Blade Open Fabric Manager (BOFM).  I don't know if BOFM supports 10GB but I know it supports 1GB ethernet and 2/4GB FC.  This is what it would look like with brains in place:



As you can see, each chassis is managed individually.  In my experience, this is the most common configuration I have seen.

IBM Part Two - Multiple Chassis Management with IBM Director

IBM does have a free management product called IBM Director that can pull all this together into a single pane of glass.  The blade administration tasks are built into the interface and virtualized I/O is handled through the Advanced BladeCenter Open Fabric Manager.  Advanced BOFM is a Director plug-in and is a fee based product.  Logically it would look something like this:





The downside to this solution is you now have another server in your environment to manage.  In my experience Director is a little flaky at times but I also haven't tried the newest version which is a redesign to address many of the issues.

How does this compare to HP?


HP is a two part answer as well.  I haven't implemented HP's Virtual Connect over multiple chassis so I will ask that if you know this answer and can throw some links my way, please do and I will update this section.


(UPDATED!) HP Part One - Single Chassis Interface in Onboard Administrator (OA)


HP's approach is very similar to IBM.  HP's management modules are called the Onboard Administrator and there can be a maximum of two in each chassis.  HP is different from IBM because each module requires an ip address.  At any given time one ip address is active and one ip address is passive.  If you access the passive module on the network, it will tell you that you are on the passive module and instruct you connect to the active module.  Like the IBM AMM, the OA will control all basic functions such as power state, identification, virtual media, and remote control.  Like IBM, HP has a separate product for virtual I/O called Virtual Connect.  Unlike the IBM and Cisco products, HP's Virtual Connect is implemented at the I/O module level.  The only way to achieve virtual I/O is to purchase the HP I/O modules.  HP's brain mapping is a little different than IBM because you can connect up to four chassis into one interface.  Since you probably won't be able to power more than four chassis in a rack, think of it as consolidation at the rack level.



(UPDATED!) HP Part Two - Multiple Chassis Interface in HP Insight Tools


After you get to four chassis, HP Insight Tools need to be brought in to fulfill the needs.  Based on the comments below it appears that two products will fit the bill.  To manage the chassis and blade functions you will need Insight Dynamics VSE Server Suite and to manage the virtual I/O you will need the Virtual Connect Enterprise Manager product.  Both the Insight Dynamics VSE Server adn the Virtual Connect Enterprise Manager is fee based.




Summary

(If you made it this far, I'm impressed!)  Cisco's approach feels very "up to date".  I really like the idea of not having to add another server (and additional fees for virtualized I/O) to the environment for management of the products.  By moving all of the management centrally to the switches you are better able to see the environment and implement a multi-chassis/multi-rack solution.  IBM and HP offer a similar solution that has grown over time but the roots of the interface are in single chassis/rack management.  But, at the end of the day both IBM and HP offer a centralized management solution.

Thoughts?  Concerns?  Please leave a comment!

NetApp CIFs Tricks

Yes, the Cisco UCS blog posts will start up "soon".  Still putting the finishing touches on the first port.  In the meantime...

I recently had to perform the following during a CIFs (Windows File Sharing) installation from a NetApp storage controller.  The chances of me remembering this again aren't very good so I wanted to post it here for later.  We had two issues that caused us some grief.

Issue #1 - For whatever reason when looking for a domain controller it wasn't "attaching" to the local domain controller.  The system would ask for a list of domain controllers but then try to communicate with remote AD servers, some of which were behind firewalls.  NetApp is nice enough to allow us to "pin" the storage to a preferred list of domain controllers to correct this behavior.  From the command line, use the following commands:

  • cifs domaininfo - lists which domain controllers the NetApp is communicating with.  The preferred list is a list you specify, the favored list is the list AD thinks are closet to you, and then the rest are listed.
  • cifs prefdc - This command allows you to populate a list of the domain controllers you want to communicate with first.  More than one can be entered in the command seperated by spaces in the format: cifs prefdc add (domain) (dc1) (dc2) (etc...)
  • cifs resetdc - After a dc is added you need to reset the connection
  • cifs prefdc print - Shows the list
 Issue #2 - The site admin wasn't a domain admin.  This leads to many permission related issues because by default when a NetApp is added to AD only the local NetApp admin (created during CIFS setup) and the Domain Admins are in the machine administrators group.  We needed to add the site admin into the Administrators group on the NetApp.  This was achieved using the useradmin command.  Here is the syntax: useradmin domainuser add (username) -g Administrators

After these two steps were complete, we were able to proceed.

Tuesday, January 19, 2010

That's A Lot of Hardware!

Just a quick post today.  As some saw on Twitter yesterday I will be getting my hands on some pretty impressive hardware.  My company has decided to move our customer demo lab to our office and all the gear arrived yesterday.  Here's a few pictures for now but we will be setting all of this up over the next few weeks.  I will be posting some impressions and tips as I go.  With my HP and IBM Blade background I am hoping to write a good bit on the UCS experience.  In addition to the EMC NS-120, I am hoping to integrate our existing EMC NS-960 for some experience with that hardware as well.  Should be interesting!!

Picture #1 - 2x Cisco UCS Chassis each with 4 blades, 2x Cisco 6120 Nexus, 1x Cisco Nexus 5020




Picture #2 - A LOT of NetApp disk shelves (NetApp controller not pictured)



Picture #3 - EMC NS-120 still in the box (but not for long!)

Monday, January 18, 2010

Installing NetApp VSC According to Best Practices

If you haven't checked out NetApp's Virtual Service Console, you should.  I did an article on it after NetApp Insight which is available here.

Vaughn recently posted on setting up VSC access to the NetApp using RBAC (Role Based Access Control) permissions.  This procedure is not currently in the VSC manual.

Quick tangent: Creating RBAC for every product appears to be an ongoing trend within NetApp.  Documentation exists for RBAC installation on SMVI (it's in the manual), VSC (link above), Snap Drive in a virtual machine, and I think there is a RCU writeup around but I can't find it right now.  This is great from a security perspective but gets a little tedious if you are loading multiple products on the same NetApp controller, and double the pain if it is an HA unit! (HINT to NetApp, figure out a way to consolidate this please!!)

Let's say you were an early adopter to VSC and installed it per the manual.  You probably used root as the user id and you never enabled SSL on the filer.  If this is the case, you are sending the root password in clear text (Yikes!).  Based on Vaughn's article we can easily go back and fix this.

  • Configure and Enable SSH on each NetApp Controller if not already enabled
    • From the command line you can use the secureadmin setup ssl and secureadmin status  command as shown below. This can also be configured from FilerView -> Secure Admin

  •  Create the role, group, and user on each NetApp controller. Enter each line from the command line
    • useradmin role add vsc-role -a login-http-admin,api-aggr-list-info,api-cf-get-partner,api-cf-status,api-disk-list-info,api-ems-autosupport-log,api-fcp-adapter-list-info,api-fcp-get-cfmode,api-license-list-info,api-lun-get-vdisk-attributes,api-lun-list-info,api-lun-map-list-info,api-nfs-exportfs-list-rules,api-qtree-list,api-snmp-get,api-snmp-get-next,api-system-get-info,api-system-get-version,api-volume-autosize-get,api-volume-list-info,api-volume-options-list-info
    • useradmin group add vsc-group -r vsc-role
    • useradmin user add vsc-user -g vsc-group
  • From the vSphere Client, go to the NetApp tab, Repeat the following for each controller
    • Right Click on the controller and click Modify Credentials

  • Enter the newly created vsc-user id and password, check Use SSL and click OK


Congratulations, you have just configured your vCenter Server to communicate with the NetApp systems in safe and secure way!

    Thursday, January 14, 2010

    #vmtip Archive From Twitter

    For a few weeks now I have been posting VMware and storage related tips to Twitter.  I have been using the hash tag #vmtip for each of them.  I keep an archive of them in Evernote so I can remember what I have done but it isn't organized.  This is simply an attempt to better organize them into categories.  This won't be updated every day, but I will try to keep it somewhat up to date.

    Last Update: January 14th, 2010

    VMware Related Tips
    • #VMware tip: vSphereU1 increases the max# of vm's per host to 160 for up to 8 hosts (was 100), still only 40 vm's per hosts if >8 #vmtip
    • #VMware ESXi local boot only supported option today. ESXi Boot from SAN and PXE are both experimental right now (via @DuncanYB) #vmtip
    • Need to get data from a #VMware Workstation or ESX(i) vmdk? VMware Disk Mount Utility. It saved me this week! http://bit.ly/6rtY8e #vmtip
    VMware vCenter Related Tips
    • Prior to loading #VMware vCenter, make sure you set the final machine name, static ip address, and domain membership! #vmtip
    Virtual Machine Alignment Tips
    • #VMware tip: Windows 2008 vm's do not need alignment if created fresh. If it was upgraded from W2k03, it will be misaligned. #vmtip
    VMware Lab Manager Related Tips
    • #VMware Lab Manager Tip: If using VMFS, there is a maximum of 8 hosts per datastore due to disk chains. There is no limit for NFS. #vmtip
    • #VMWare Lab Manager Tip: Lab Manager disk chains can not span volumes due to the linked clone technology #vmtip
    NetApp Related Tips
    • #VMware on #NetApp tip: VSC will tweak ESXi installs for NetApp. Great since no NetApp Host Utilities Kit for ESXi #vmtip
    • #NetApp on #VMWare vmdk alignment tip: Windows Dynamic Disks, Linux LVM's and Citrix Servers can not be aligned with mbralgin #vmtip
    NetApp SMVI Related Tips
    • When doing #VMware SRM on #NetApp and using SMVI, you CAN'T take a VMware snapshot as part of the backup! #vmtip
    • This appears to be undocumented: #NetApp SMVI backup of Windows virtual machine on IDE disk are not eligible for single file restore #vmtip
    NetApp SnapDrive Related Tips
    • #VMware on #NetApp: When installing Snap Drive, check the install account is an admin on BOTH the server & filer before install! #vmtip
    • #NetApp SnapDrive 6.2 for Windows requires .NET 3.5 SP1 and 3 MS hotfixes (with reboot) BEFORE installation of SnapDrive. #vmtip

    Wednesday, January 13, 2010

    Creating VMware NFS Datastores on NetApp in 3 Easy Steps

    I am often asked by customers how to set up a VMware NFS datastore on NetApp storage.  The first time I received the question, I pointed them to the NetApp vSphere TR.  It turns out the information isn't currently in the document.  I spoke to Vaughn about it and this was an over site that will be corrected.  In the meantime, here is how I create NFS shares in 3 easy steps. I'm also taking screenshots from my lab for the first time, let me know what you think of the screenshot format vs. just a bullet list.

    Step One - Create the volume on the NetApp system

    •  Log into FilerView, Open Volumes, Click Add -> Click Next

    • Accept the default value of Flexible Volume and Click Next
    • Create a name for the volume, set the language type, and click Next

    • Choose which aggregate to create the volume in and click Next
    • Set the size of the Volume, please notice the pull down defaults to MB NOT GB!, I typically don't set a Snap Reserve but if you don't understand the implications of this, just use the default of 20%. Click Next


    • Click Commit
    • Click Volumes -> Manage to and you will see the newly created volume


    • If your NetApp system has been configured for CIFs, you will need to make a slight change to the Q-Tree type of the volume.  Click Volumes -> QTrees -> Manage.  If the QTree type is UNIX, skip to Step Two.  If the QTree type is NTFS, proceed.


    • Click on the volume link (sim3_vmware_01 in this example) to get the following screen
    • Change the QTree type to UNIX and click Apply


    Step Two - Create the NFS Export (Share the Volume)
    •  Click NFS -> Manage Exports -> Click on the Permissions for the newly created volume



    •  Make sure Read-Write Access, Root Access, and Security are all checked. Click Next


    • Click Next at the Export Path Screen but write down this path, you will need it later!
    • At the Read-Write Access Screen, uncheck the All Hosts box and enter the ip addresses of all the VMkernel ports for the vSphere server(s).  NOTE: This is not the Service Console IP address, it is the VMkernel ip address that vSphere will use to "talk" NFS to the storage

    • Repeat this process for the Root Access Screen and click Next
    • Click Next at the Security Menu accepting the defaults

    •  Click Commit. You are now finished configuring the NetApp System!
    Step Three - Create the share in vCenter

    • From the vCenter Client, Click on a vSphere server and click the configuration tab. Click Storage, Click Add Storage


    • Choose Network File System and Click Next
    • Enter the IP address of the NetApp Storage, the path to the export that you wrote down from step 2, and give your datastore a name as it will appear in vCenter


    • You should now see your storage



    UPDATE: I was hoping to stay away from the command line for this article.  This was really designed for users that are just beginning to get their feet wet with NFS.  But, as Mike pointed out, there is one command that should be run on each volume and this can only be achieved from the command line.  It is outlined on page 37 of the 1.0 version of the TR.  The command is: vol options (volume-name) no_atime_update on where volume name is the name of the volume (sim3_vmware_01 in my example).  Thank you for pointing that out Mike!


    A few final notes.  Once all of this is complete I usually test read/write access by pulling up the datastore browser and creating a folder in the datastore and then deleting it.  Also, if the datastore will be protected by NetApp's Snap Manager for Virtual Infrastructure then I will disable snapshots.  This is all detailed in Vaughn's vSphere TR.

    VMware Disk Mount Utility to the Rescue!

    I had a bit of a scare over the holidays. I usually keep two copies of my data at all times.  One copy is on my laptop and the second copy is on an external hard drive at the house. Well, what happens if you are installing a new OS on your laptop (one copy gone) and as you are copying back all of your data, the external hard drive starts clicking and throwing up errors (two copies gone).  Yikes!

    I tend to "sip my own champagne" (I don't "eat my own dog food", too crude of a reference) so I run my corporate workstation in a virtual machine with VMware Player.  All of my data was in one big 32GB vmdk file on the external hard disk.  I cracked the case on the USB disk and mounted it on my home PC.  The drive was recognized!  I tried to copy the vmdk off to the c:\ so I could transfer it to the laptop. The transfer was VERY slow. It was going to take most of the night and the next day to copy but I needed my data NOW!

    VMware Disk Mount Utility to the rescue!  In case you aren't familiar with the product, you can download it here and the manual is here.  It allows you to mount vmdk's from either Windows or Linux.  There are some limitations as specified in the document but you can mount VMware Workstation as well as ESX virtual machines.

    Using the tool I was able to mount the vmdk and just copy out the data I needed for work the next day.  Over the weekend I was able to recover all the rest of my data.  It took 3 days to copy 100GB worth of vm's and there were a few casualties.  My ESXi machine and a couple of XP builds were corrupt and I will have to recreate them.  I was lucky I got the data back but a big thanks to VMware for the tool!