- Cisco UCS M81KR Virtual Interface Card. In a UCS B250 M1 Extended Memory Blade Server, the UCS M81KR Virtual Interface Card must go in slot 0 only and can not be mixed with other adapters.
Tuesday, February 23, 2010
Configuring Multiple Palo Adapters in the Cisco UCS B250 Blade
I found out something interesting today that I wanted to share. Version 1.1.1 of the Cisco UCS Manager code only supports one Palo card on the B250 at this time. Here is the line from the release notes and the link to the release notes.
Monday, February 22, 2010
How Cisco UCS Deals with Split Brains
This will be a short post this morning. I wanted to pass along how Cisco UCS deals with a split brain scenario. I'll start by explaining how you would get into a split brain scenario. In normal operations, one of the 6100's is the active brain and the other is the stand-by. A split brain in UCS would happen when both of the cluster interconnects betweenthe 6100 Fabric Interconnects are severed (the L1 and L2 ports). The active brain still thinks he is active and the stand-by no longer sees the active so he tries to take over. You now have a potential power struggle because both brains think they are in charge.
Luckily the Cisco UCS folks are way ahead of this scenario. They added logic to the Serial EPROM (SEEPROM) in the UCS chassis to resolve the situation. The odd number of chassis that are added to a UCS Domain act as judges during split brains. For example with four chassis, three are acting as judges. A marker is added to the SEEPROM on these chassis to make them quorum resources. To clarify this a little further bit more, if there is an odd number of chassis, all of them will be used. If there is an even number of chassis, it will drop the last one (n-1) so the number of quorum chassis will always be odd.
When the split brain is detected, both 6100's will immediately demote themselves and then claim as many of the quorum resources as possible. Whoever claims the most quorum chassis wins and promotes himself back to the active manager. The scenario would look something like the following. Pretty slick!
Luckily the Cisco UCS folks are way ahead of this scenario. They added logic to the Serial EPROM (SEEPROM) in the UCS chassis to resolve the situation. The odd number of chassis that are added to a UCS Domain act as judges during split brains. For example with four chassis, three are acting as judges. A marker is added to the SEEPROM on these chassis to make them quorum resources. To clarify this a little further bit more, if there is an odd number of chassis, all of them will be used. If there is an even number of chassis, it will drop the last one (n-1) so the number of quorum chassis will always be odd.
When the split brain is detected, both 6100's will immediately demote themselves and then claim as many of the quorum resources as possible. Whoever claims the most quorum chassis wins and promotes himself back to the active manager. The scenario would look something like the following. Pretty slick!
Monday, February 15, 2010
Buying an HS22V for VMware? READ THIS!
I have had some interest from our customers in the new IBM HS22V Blade Server. There is a great overview of the details of the new blade over at Kevin's site here. I did find out one thing very interesting that I wanted to share. The HS22V is different from previous models because it will only take up to two 1.8 inch SSD drives. No hard drives here! That's a great advancement except for one thing; the list price of ONE of the drives is currently over $1600!!! This means over $3200 (list prices!) to load an operating system if you want a raid-1 set. That is a pretty high price. Here's a screenshot of the IBM configuration tool with the SSD drive.
But, if you are running VMware ESXi you have another option. Hidden in the other section (not the storage section) is an option for ESXi version 3.5 or 4.0. The best thing is it is only $75 list!!
This cost difference brings about an interesting choice for ESX based organizations vs. ESXi. How much are you willing to pay for that Service Console?
But, if you are running VMware ESXi you have another option. Hidden in the other section (not the storage section) is an option for ESXi version 3.5 or 4.0. The best thing is it is only $75 list!!
This cost difference brings about an interesting choice for ESX based organizations vs. ESXi. How much are you willing to pay for that Service Console?
Thursday, February 11, 2010
VMware PEX: Lab Manager Design and Implementation
VMware PEX: Lab Manager Design and Implementation (TECHMGT0922). This is written as I sit in the session so it could be messy.
Architecture
Good Article: VMware KB1000023 - How to Backup the VLM Database
Architecture
- Lab Manager Server (Windows based)
- vCenter Server
- One or more ESX 3.5 or 4.0 servers, ESXi 4 servers
- TCP port 902/903 for virtual machine console access
- Brower client to Lab Manager (LM) server is tcp443
- TCP 5212 from LM Server to ESX servers and TCP 443 from lm to vCetner
- Read the manual for all the requirments (installer checks for them and spits out errors)
- Install a service account on lm server - page 14-15 of user guide as details on permissions needed
- Pre-crreate resource pools if you want to use them so that it will pick them up at install time
- Pre-create all virtual switches, distributed switches, etc.
- Don't join it to the domain so nothing from AD would get pushed to it as a member server
- LM uses Linked Clones to save disk space
- Make sure the I/O can support your environment (just because you have the space doesn't mean you have enough I/O!)
- LUN locking and limit of 8 nodes if using vmfs (NFS doesn't have this limit)
- Understand the concept of disk chains and how they work (this isn't well documented)
- When Setting up LM server, create the default Physical network
- Design Considerations for Networking: Physical Network vs. Virtual, Fenced vs. Non-Fenced - Will IP's be from an IP Pool, DHCP, or Static?
- Physical network is nothing more than a connection out to a physical network
- Virtual network - a network that is may or may not be connected to a physical network (could be on a different ip, vlan, etc) - A virtual network can be connected to a physical network if needed upon deployment
- Fencing - The ability to create a fence around a configuration (group of vm's) so they are isolated from the rest of the network. If the fenced configuration needs to get out, it will do so through a NAT router (small Linux vm). In this case it would have an internal ip inside the fence and an external ip address outside the fence using the NAT router. This is great for machines and applications that all have the same ip. This way there will not be ip conflicts on the network.
- Host spanning - fencing isolation was limited to one host in version 3, the ability to cross servers with vMotion is called host spanning. Host Spanning needs the Distributed Switch (and Enterprise Plus license to get Distributed Switch)
- IP Pools - Takes a lot of ip addresses - if using fencing remember the the NAT router needs ip's as well
- Fancing can't use DHCP (DHCP can't cross the fencing to provide the address)
- IP Static Pool must be used
- At least one virtual machine needs to conntect to physical network (otherwise virtual network with no outside connection)
- Fence policy is traffic In and Out, All Blocked, or Out Only. There is no In Only policy.
- Be careful with outside communications if using fencing (many machines with same name but different ip's all hitting an outside source!)
- Domain COntroller - member servers can be in a configuration with the same name as others as long as the machine pasword with AD doesn't expire (30 days by default). Otherwise, put a clone of the DC in the configuration and run it private to the configuration group
- Domain Controller Clone - be careful - a cloned dc will come up with a .169 address because it detects one with the same ip address already out there. Best way to do this is clone the DC and completely isolate it from the production network.
- SQL server - if outside the fence, what happens when multiple configurations hit it?? Maybe different instances of the same database on the same server - adds a little bit of a manual intervention to the process
- Can create a workstation that is inlcuded in the configuration for the user to use as a workstation "in the fence"
Good Article: VMware KB1000023 - How to Backup the VLM Database
VMware PEX: Site Recovery Manger "Up and Running"
This VMware Partner Exchange Session (PEX) was Site Recovery Manger "Up and Running" (TECHBC0321). I'm writing as I go so this might be a little messy.
This session will focus on the problems typically encountered during SRM implementations.
Required Components
2x vCenter servers
2x SRM Servers
Replication Product from the Storage Vendor
SRA (Storage Array Replication) from the Storage Vendor
Install Workflow
1. vCenter at each site
2. SRM Server (seperate server)
3. SRM needs a DB instance
4. SRA (Often the most complex and causes the most problems) - Install on SRM server
SRA's Function
This session will focus on the problems typically encountered during SRM implementations.
Required Components
2x vCenter servers
2x SRM Servers
Replication Product from the Storage Vendor
SRA (Storage Array Replication) from the Storage Vendor
Install Workflow
1. vCenter at each site
2. SRM Server (seperate server)
3. SRM needs a DB instance
4. SRA (Often the most complex and causes the most problems) - Install on SRM server
SRA's Function
- Setup - Query for replicated luns, match luns to vm inventory
- Failover - Automates promotion of LUNS at remote site, and
- Testing - LUN Snapshot creation
- Not all SRA's are create equal. Each one is different and have different levels of effort put into the development Some require additional framework (Java JRE for example) Always read all release notes and the install guide prior to the install attempt
- Always download a fresh SRA FROM THE VMWARE SITE NOT THE VENDOR SITE, many vendors change versions on a frequent basis
- Whatever you do on one site, do it on the other site
- When configuring SRA at the protected site, it may fail if not all components are installed at the recovery site (not configured, just installed)
- What if no datastores appear but the SRA seems to be installed OK? This is because the datastore doesn't have any vm's on it
- Always verify you have all the needed license features on BOTH storage systems to fully support replication in BOTH directions
- Disparate networks (re-ip of servers) - Most Common
- Stretch vlans (no re-ip of servers) - Less Common
- DNS services
- Active Directory services - Could be dedicated for testing and failover or same production AD
- Considered Applications with Hard Coded IP's
- Remember Default Gateway and Subnet Mask
- When performing a recovery, the less changes the better (DOC-1491 in VMware Communities
- SRM Supports RDM's but it isn't recommended
- If using multiple virtual hard disks, make sure both of them are replicated (or exist) at both locations
- SRM does not support replicating virtual machines with snapshots
- Need port 80 https tunnel between sites for site pairing (it is encrypted but travels on port 80 instead of 443 to make security easier
- 150 protection groups / 1000 protected vm's
- A protection group can hold consist of datastores if a virtual machine spans datastores
Wednesday, February 10, 2010
VMware PEX: Reliable vCenter Database - Operations, Management and Troubleshooting
I was finally able to attend a VMware Partner Exchange (PEX) session that I was able to discuss. This session was Reliable vCenter Database - Operations, Management and Troubleshooting (TECHBC0330). This is being written as I'm in the session so it might be a little messy.
Sizing and Location
- vCenter at startup takes data from the Windows registry, the vpxd.cfg parameter file, and the vCenter database (VCDB).
- The executable name of the service is vpxd
- vpxd -p and -P are important because they are used to reset the password
- Almost everything you do in vCenter requires interaction with the database. For example:
- to start a vm - reads location in the db and send commands command
- If vCenter fails - VMotion and DRS will fail but hosts and vm's will continue to run
- vCenter won't start with corrupt or inaccesable database but it will run with an empty database
- HA will be able to execute commands but won't have any "eyes" to see how to execute them
Sizing and Location
- You have the option of a physical or virtual machine and you can co-locate the VCDB or put the VCDB on a separate server.
- Recommendations
- vCenter and VCDB - virtual and co-location only to 40hosts or 400 vm's
- if physical - must reatrt db's together, one could take down the other
- The speaker recommended seperate virtual vCenter and db servers with an anti-affinity rule to disable both vm's from being on the same hosts. (I'm not sure how I feel about that in the case of power failures)
- If the database server is virtual, you can take a db backup by cloning the vm
- Can be protected many ways (too much information to fast to list) but methods included physical rebuild, VMware HA, vCenter Heartbeat, MSCS, and FT
- The DSN Information is held in the registry: HKLM\Software\VMwareInc.\VMware Virtual Center\DB
- Four objects of value under that:
- 1 = DSN Name
- 2 = Login ID
- 3 = password (encrypted)
- 4 = driver being used
- The VCDB is mainly performance information (over 80% of the database typically)
- The other information is the configuration of accounts and security information for vCenter
- All db tables have the prefix VPX_ - It is NOT recommended to use the tables directly!
Sunday, February 7, 2010
Why VMWare's VMmark Scores Have Become Useless
Well, it was bound to happen. Every time an industry benchmark standard comes out, the manufacturers eventually figure out ways to "cook the books". I've seen a LOT of FUD flying around from both HP and Cisco lately about VMmark scores and I have been asked a lot of questions about both platforms. After a taking close look at the scores, I'm ready to throw in the towel.
Before I go further, take a look at the VMmark, 8 cores scores posted here. You will see the Cisco B200 Blade is on top right now (of the major vendors, I don't count Fujistu, sorry Fujistu) with 25.06 and the HP is next with the BL490 24.54. A couple of points:
What is the difference between 25.06 and 24.54. Maybe 1-2%? Honestly, not much if they both meet your needs and you won't be pushing them to their limits. I'm sorry but that is within a margin of error and/or the test could be reconfigured by everybody to meet the score. At the end of the day both of them will meet your needs very well and the title of "fastest blade" means nothing!
Take a look at the memory in the details for both of them. Both the HP 490 and B200 use 96GB of memory. Where is the B250 with the larger memory footprint? Where is the BL490 with either 144GB or 192GB of memory? You will also notice that the HP BL490 memory is running at 1333Mhz and the B200 is running at 1066 Mhz. Since there is no big jump in performance numbers the VMmark score isn't memory bandwidth bound or HP would have had an advantage. I suspect (although I don't have proof) that the VMmark score is now CPU bound and any memory above 96GB doesn't help the scores. I further think (again, no proof) that the test isn't pushing the maximum memory bandwidth because there is no change from 1333 Mhz to 1066 Mhz. It would be interesting to see if the drop to 800Mhz by HP would be noticed in the scores.
Take a look at the EMC Storage section on the Cisco benchmark. They are using an EMC CX-240 with SSD drives! There is NOTHING wrong with this, SSD's are coming down in prices but they provide a clear, known advantage to the IOP's numbers that could easily be the sole reason for the 1%-2% increase. I'm willing to bet that if HP used the same storage configuration, they would produce similar scores.
Cisco is using the Q-Logic CNA for the tests. Why didn't they use the Palo card? I suspect because it isn't "technically" released yet but that is the benchmark everyone wants to know about.
What I'm saying is that both HP and Cisco make great products and they will go to great lengths to make the other look bad. They are so close to each other from a VMmark score perspective that any clear difference can't be shown with the current test. Don't make a purchase based on a score!
Before I go further, take a look at the VMmark, 8 cores scores posted here. You will see the Cisco B200 Blade is on top right now (of the major vendors, I don't count Fujistu, sorry Fujistu) with 25.06 and the HP is next with the BL490 24.54. A couple of points:
What is the different between really really fast and really really really fast??
What is the difference between 25.06 and 24.54. Maybe 1-2%? Honestly, not much if they both meet your needs and you won't be pushing them to their limits. I'm sorry but that is within a margin of error and/or the test could be reconfigured by everybody to meet the score. At the end of the day both of them will meet your needs very well and the title of "fastest blade" means nothing!
Both Cisco and HP sell "big memory solutions" but they are no where to be seen!
Take a look at the memory in the details for both of them. Both the HP 490 and B200 use 96GB of memory. Where is the B250 with the larger memory footprint? Where is the BL490 with either 144GB or 192GB of memory? You will also notice that the HP BL490 memory is running at 1333Mhz and the B200 is running at 1066 Mhz. Since there is no big jump in performance numbers the VMmark score isn't memory bandwidth bound or HP would have had an advantage. I suspect (although I don't have proof) that the VMmark score is now CPU bound and any memory above 96GB doesn't help the scores. I further think (again, no proof) that the test isn't pushing the maximum memory bandwidth because there is no change from 1333 Mhz to 1066 Mhz. It would be interesting to see if the drop to 800Mhz by HP would be noticed in the scores.
Cisco is using an EMC SAN with SSD's on the back end!
Take a look at the EMC Storage section on the Cisco benchmark. They are using an EMC CX-240 with SSD drives! There is NOTHING wrong with this, SSD's are coming down in prices but they provide a clear, known advantage to the IOP's numbers that could easily be the sole reason for the 1%-2% increase. I'm willing to bet that if HP used the same storage configuration, they would produce similar scores.
Why didn't Cisco use the Palo card?
Cisco is using the Q-Logic CNA for the tests. Why didn't they use the Palo card? I suspect because it isn't "technically" released yet but that is the benchmark everyone wants to know about.
What am I trying to say here?
What I'm saying is that both HP and Cisco make great products and they will go to great lengths to make the other look bad. They are so close to each other from a VMmark score perspective that any clear difference can't be shown with the current test. Don't make a purchase based on a score!
Saturday, February 6, 2010
HP Blades Offer a 16GB DIMM, With a Catch
I found out something interesting on Friday, HP is offering a 16GB DIMM in their blades! My first thought was wow, that sucker is gonna be expensive (and it is!). But, after that I started to dig deeper as I always do, I found out something that is slightly disturbing. The 16GB DIMM is actually a quad rank DIMM and not a dual rank DIMM and it is only 1066Mhz speed. Many of you are saying... So what?
Well, this actually can make a difference from a design perspective. Take a look at the Memory tables in the BL460 and BL490 QuickSpecs and you will see what I mean.
HP BL460 G6 QuickSpecs
HP BL490 G6 QuickSpecs
Most of the DIMMs sold by the major vendors are dual rank and 1333Mhz Speed. Let me explain the concept of ranks first. According to the Intel Nehalem architecture, you can only have 8 ranks per memory channel. Each memory channel consisted of either 2 or 3 DIMMs per channel. I have more information on memory layout in this article I wrote on Scott Lowe's site awhile back. I never really discussed ranks because it was a limit you didn't hit. You were only using either 4 ranks (2xdual rank dimms on the BL460) or 6 ranks (3xdual ranks on the BL490). Quad rank DIMMs blows the math out of the water. You can only put 2 on a memory bus to generate 8 ranks. This means the BL490 no longer brings extra memory capacity to the table. Both the BL460 and BL490 top out at 192GB with the 16GB DIMMs.
Memory speed is the next issue. If you are running a memory bandwidth intensive application, you will expect about a 7% boost in performance by keeping the memory speed at 1333MHZ instead of dropping down to 1066Mhz. Because the maximum speed of the 16 GB DIMM is 1066Mhz you will never reach a 1333Mhz speed. Furthermore, populating both slots in the memory channel (the max of 12 DIMMs) drops the speed from 1066 Mhz to 800 Mhz. The performance drop from 1333 Mhz to 800 Mhz is over 30%!! This leads to an interesting trade off of memory capacity vs bandwidth speed.
While I applaud HP for thinking outside the box and bringing a 16GB DIMM to market, don't assume it is the same DIMM as the others. Remember, "One of these is not like the others...."
Well, this actually can make a difference from a design perspective. Take a look at the Memory tables in the BL460 and BL490 QuickSpecs and you will see what I mean.
HP BL460 G6 QuickSpecs
HP BL490 G6 QuickSpecs
Most of the DIMMs sold by the major vendors are dual rank and 1333Mhz Speed. Let me explain the concept of ranks first. According to the Intel Nehalem architecture, you can only have 8 ranks per memory channel. Each memory channel consisted of either 2 or 3 DIMMs per channel. I have more information on memory layout in this article I wrote on Scott Lowe's site awhile back. I never really discussed ranks because it was a limit you didn't hit. You were only using either 4 ranks (2xdual rank dimms on the BL460) or 6 ranks (3xdual ranks on the BL490). Quad rank DIMMs blows the math out of the water. You can only put 2 on a memory bus to generate 8 ranks. This means the BL490 no longer brings extra memory capacity to the table. Both the BL460 and BL490 top out at 192GB with the 16GB DIMMs.
Memory speed is the next issue. If you are running a memory bandwidth intensive application, you will expect about a 7% boost in performance by keeping the memory speed at 1333MHZ instead of dropping down to 1066Mhz. Because the maximum speed of the 16 GB DIMM is 1066Mhz you will never reach a 1333Mhz speed. Furthermore, populating both slots in the memory channel (the max of 12 DIMMs) drops the speed from 1066 Mhz to 800 Mhz. The performance drop from 1333 Mhz to 800 Mhz is over 30%!! This leads to an interesting trade off of memory capacity vs bandwidth speed.
While I applaud HP for thinking outside the box and bringing a 16GB DIMM to market, don't assume it is the same DIMM as the others. Remember, "One of these is not like the others...."
Labels:
HP
My First Race (with Dougnuts)!!
Today I ran my first race! I am very excited but this race also requires an explanation and some pictures. A few years ago some NC State University college students started a challenge. You had to race from the Bell Tower on campus to the local Krispy Kreme (about 2.2miles), eat a dozen doughnuts, keep them down, and run back, all in an hour.
I am training for my first 10k coming up at the end of March in Chrleston, SC so I thought this would be a good warm up for me. The event is local, it was the right distance, and it is a little crazy and different so I was in. I registered just in time because registration was cut off the day after I signed up at 6,000 runners. Krispy Kreme's from all over the state made over 72,000 doughnuts and trucked them in for the race. I signed up for the casual group meaning that I didn't have to eat the full dozen. I ended up eating three. I ran the race in 1:14 but that really doesn't matter because you stop half way through the race and hang out, eat doughnuts, drink water, and just enjoy the atmosphere. Some highlights of the race to prove it is a little different:
I am training for my first 10k coming up at the end of March in Chrleston, SC so I thought this would be a good warm up for me. The event is local, it was the right distance, and it is a little crazy and different so I was in. I registered just in time because registration was cut off the day after I signed up at 6,000 runners. Krispy Kreme's from all over the state made over 72,000 doughnuts and trucked them in for the race. I signed up for the casual group meaning that I didn't have to eat the full dozen. I ended up eating three. I ran the race in 1:14 but that really doesn't matter because you stop half way through the race and hang out, eat doughnuts, drink water, and just enjoy the atmosphere. Some highlights of the race to prove it is a little different:
- Mother and Daughter with KK T-shirts and the words "Family Bondin Southern Style" written on them
- The quote "3 hours sleep, 9 beers, 12 doughnuts, I'm starting to feel bad"
- We had the "privilege" off seeing two challengers "lose" their dozen doughnuts
- Got to see Fat Bastard, Elvis, Superman, Wonder Woman (it was a buy), Elmo, Cookie Monster and many people running in just shorts or underwear in 30-40 degree weather
- The famous quote "I'm never eating another doughnut again" more than once
Labels:
Life
Thursday, February 4, 2010
Cisco UCS - How Many FEX Uplinks Do I Need?
As a Data Center Architect, I have a constant need to know not just HOW to do things, but WHY to do things. As I dig deeper into the Cisco UCS system I find the concept of FEX (Fabric Extenders) very facsinating. The number of FEX uplinks may not seem like much, but a couple of cables can have a very significant impact on the design of a UCS system. If need a refresher on how a UCS system is set up, please see my first article for more information.
What is a Fabric Extended (FEX)?
In very simple terms, the FEX serves as the "pipe" between the blades in a UCS chassis and the 6100 Fabric Interconnects (FI's). Each FEX has a maximum of four 10GB connections. Think of it this way, you can "choose" your bandwidth back to the FI's in one, two, or four 10GB increments (the three uplink option isn't supported). If you plug in one, you get 10GB bandwidth spread over 8 blades. Need more bandwidth? Plug in the second to get 20GB. If you really need the maximum then you can go for all four connections for a total of 40GB per FEX. Remember, each UCS chassis contains two FEX's and each FEX is connected to one (and only one - don't cross connect them!) 6100. If you plug in all eight connections, you achieve a maximum of 80GB to the chassis or 10GB per blade. If you are interested in how the traffic flows from the blades to the FEX port (referred to as pinning), here is a great link from Rodney Haywood that details this relationship. Here is a picture of a FEX close up:
Why does this matter?
FEX uplinks directly affect three different areas:
To calculate the amount of bandwidth available to a chassis: (FEX's * Uplinks *10). So, If I have two FEX's, each with 2 uplinks, I have 40GB at my disposal (2*2*10).
Every time you use more than one FEX uplink, you actually reduce the number of chassis you can plug into the system. Let me use a simple example.
Let's say you have a 20 port Cisco 6120 with the 6x10GB module in it for a total of 26 ports. To make the math simple, you decide to dedicate six 10GB links for northbound (out of the chassis) traffic. You can support up to 20 chassis by using a single FEX uplink per chassis. What if you need a second FEX uplink? The number of chassis goes down to 10 because you need two per chassis but you only have 20 ports to physically plug into. If you need need 4 FEX uplinks, then you can only support 5 chassis per 6120.
To calculate the maximum number of chassis, this is the formula to use: (total ports - uplinks) / FEX Uplinks per chassis. To use the example above, a 6120 with 4 uplinks yields 5 chassis ((26-6) / 4 = 5)
UPDATE: As pointed out by UnixPlayer, what about uplinks out of the chassis?? Great question and I admit it was late when I wrote this. I have now updated this to include uplinks. You of course need some uplinks or this will happen to you!
As pointed out by Kevin on his article, the number of uplinks determines the number of vNICs the Palo card can present. The thoeretical maximum of the card is 128 but only 58 can be used currently. This is for both vHBA's and vNICs.
As you can see, there are some interesting design choices to be made based on bandwidth, scalability, and virtual I/O. I really like the customization ability of the UCS system to tailor to the requirements of the customer but it is also very important to understand the relationships presented above when designing the system.
What is a Fabric Extended (FEX)?
In very simple terms, the FEX serves as the "pipe" between the blades in a UCS chassis and the 6100 Fabric Interconnects (FI's). Each FEX has a maximum of four 10GB connections. Think of it this way, you can "choose" your bandwidth back to the FI's in one, two, or four 10GB increments (the three uplink option isn't supported). If you plug in one, you get 10GB bandwidth spread over 8 blades. Need more bandwidth? Plug in the second to get 20GB. If you really need the maximum then you can go for all four connections for a total of 40GB per FEX. Remember, each UCS chassis contains two FEX's and each FEX is connected to one (and only one - don't cross connect them!) 6100. If you plug in all eight connections, you achieve a maximum of 80GB to the chassis or 10GB per blade. If you are interested in how the traffic flows from the blades to the FEX port (referred to as pinning), here is a great link from Rodney Haywood that details this relationship. Here is a picture of a FEX close up:
Here is how you would cable them (don't cross connect them!!)
Why does this matter?
FEX uplinks directly affect three different areas:
- The bandwidth from the 6100's to the chassis
- The number of chassis supported per pair of 6100's
- The number of vNICs supported by the upcoming UCS Palo blade card.
1. Maximum Bandwidth per Chassis based on FEX Uplinks -> (FEX's * Uplinks *10)
To calculate the amount of bandwidth available to a chassis: (FEX's * Uplinks *10). So, If I have two FEX's, each with 2 uplinks, I have 40GB at my disposal (2*2*10).
2. The number of chassis supported per 6100 is inversely proportional to the number of FEX uplinks -> ((total ports - uplinks) / FEX Uplinks per chassis)
Every time you use more than one FEX uplink, you actually reduce the number of chassis you can plug into the system. Let me use a simple example.
Let's say you have a 20 port Cisco 6120 with the 6x10GB module in it for a total of 26 ports. To make the math simple, you decide to dedicate six 10GB links for northbound (out of the chassis) traffic. You can support up to 20 chassis by using a single FEX uplink per chassis. What if you need a second FEX uplink? The number of chassis goes down to 10 because you need two per chassis but you only have 20 ports to physically plug into. If you need need 4 FEX uplinks, then you can only support 5 chassis per 6120.
To calculate the maximum number of chassis, this is the formula to use: (total ports - uplinks) / FEX Uplinks per chassis. To use the example above, a 6120 with 4 uplinks yields 5 chassis ((26-6) / 4 = 5)
UPDATE: As pointed out by UnixPlayer, what about uplinks out of the chassis?? Great question and I admit it was late when I wrote this. I have now updated this to include uplinks. You of course need some uplinks or this will happen to you!
3. The number of vNICs and vHBAs supported on a Palo card is proportional to the number of FEX uplinks -> ((15*FEX uplinks)-2)
As pointed out by Kevin on his article, the number of uplinks determines the number of vNICs the Palo card can present. The thoeretical maximum of the card is 128 but only 58 can be used currently. This is for both vHBA's and vNICs.
As you can see, there are some interesting design choices to be made based on bandwidth, scalability, and virtual I/O. I really like the customization ability of the UCS system to tailor to the requirements of the customer but it is also very important to understand the relationships presented above when designing the system.
Tuesday, February 2, 2010
Cisco UCS Information for "Server People"
I've been working with the UCS equipment as time allows for the last few weeks. I've also had the privilege to visit Cisco TAC for UCS here in Raleigh, NC to pick their brains a bit. Here is a quick bullet list of some features that I found interesting from my server based perspective.
- The amount of 10GB over subscription from the UCS chassis to the 6100's (Fabric Interconnects) is proportional to the number of uplinks. There are four connections maximum per FEX, per chassis. One uplink will provide an 8:1 ratio, two uplinks a 4:1 ratio, and lastly four uplinks for a 2:1 ratio. Three uplinks is not supported. (My next article will be an detailed article on the FEX's)
- This may seem obvious to the Cisco folks but it wasn't to me. The 6100's are "backwards". They are designed to be mounted in the back of the rack so all cabling is towards the rear of the UCS chassis. Cooling is "front to back" on the 6100's to match the UCS chassis.
- You can "Mix and Match" adapters cards on each blade because the uplink is a common 10GB fabric. This means if you only need a few Palo cards in a chassis and maybe CNA's on the rest, you can do that. Service Profiles won't be compatible but you do have that flexibility
- Only Cisco memory is supported on UCS blades. No 3rd Party memory
- The UCS Chassis needs 2 power supplies. It ships with zero. 3 power supplies provide N+1 redundancy and four power supplies provides N+N. As more power supplies are added, the load is distributed evenly across each power supply
- The UCS Chassis has 8 fans but needs 4 to operate so it is N+N redundant
- When a Chassis is plugged in, multiple blades are powered up in serial fashion to prevent an in-rush current spike that could blow the circuits. This has been a problem with other blades customers of mine in the past.
- The 6100's are active/active for 10GB data but are active/passive for management of the chassis. At any given time one 6100 is active and constantly passing information over the L1/L2 connections to keep the passive management module up to date
- The FEX connections on the back of a UCS chassis CAN'T be cross connected to the 6100's. I'll have more information on this in the next article
- The UCS Manager allows up to 4 KVM connections at one time. I'm still checking if this is 4 per UCS Manager, 4 per chassis, or 4 per blade (If you know, please leave a comment!)
- The maximum number of vNICs the Palo card can present is 56 and is dependent on the number of FEX links from the chassis to the 6100's. I'm still getting details on this information and I will post this in the near future
- The 6100 Fabric Interconnects are licensed per port. The 6120 comes with 8 ports licensed and the 6140 comes with 16 ports licensed. Additional ports must be purchased individually, kind of like an FC switch. This applies to both Northbound and Southbound traffic (I REALLY don't like this!!!)
- Smart Net needs to be purchased on the 6100's, each chassis, the blades, and the expansion modules in the 6100's. Smart Net lasts for one year so if you want three year coverage, you need to purchase quantity 3 of Smart Net item for each. This is VERY different from HP and IBM servers.
- The first three chassis in a UCS domain (managed by the same 6100's) communicate via an SEPROM to verify and prevent a split brain scenario in the event of the 6100's losing communication
- The UCS Manager includes the ability to e-mail alerts and all "call home" to Cisco, much like a NetApp storage system
Subscribe to:
Posts (Atom)







