Cisco Nexus 5000 POC
Posted by Jas - on July 10th, 2009 in Data Center, Tips, vSphere | 10 Comments »

The past one week have been busy with VMware vSphere 4 and Cisco Nexus 5000 POC and the result is really disappointed. Personally I couldn’t believed and I think I may missed out something. If anyone have any idea or suggestion, please feel free to comment here or post reply at slow Performance with 10 Gb CNA card on vSphere 4 thread.
Benchmark Test Configuration
2 x VMware vSphere 4 hosts
2 x Dell PE2950 Hardware
– Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
– 16GB RAM
– Qlogic QLE8042 10GbE Mercury Converged Network Adapter(CNA) – connected to PCIe 8x slot
Updated: PERC 6/i version. 6.2.0-0013 & BOIS version. 2.6.1
VMware vSphere vCenter
– running as virtual machine
– 4vCPU
– 6GB RAM
– Microsoft Windows Server 2003 Standard edition
2 x Microsoft Windows Server 2003 Standard edition virtual machines
– 4vCPU
– 4GB RAM
Software:
Netperf
MTU 9000 setup
VMware vSphere host
esxcfg-vswitch -m 9000 vSwitch2
Windows Server 2003 standard edition virtual machine
Device Manager -> Network Adapters -> VMXNET3 Ethernet Adapter -> Properties -> Advanced -> Jumbo Packet -> Jumbo 9000 and Speed / Duplex -> 10Gbps Full Duplex
Cisco Nexus 5000K – Enable MTU 9000 and Jumbo Frame
VMware vSphere hosts
name: esx05
name: esx06
Windows Server 2003 standard edition virtual machine
name: test1 (running on esx06)
name: test2 (running on esx05)
Virtual Switch
Port Group name – test262 (Connected with single vNICs detected as Intel 82598EB 10 Gigabit AF Dual Port Network) with VLAN ID 1.
Note: vSphere auto detect as ISP8432 4Gb FCoE PCI Express HBA & Intel 82598EB 10 Gigabit
And the result:
Updated: 10 July 2009
CNA connected back to back on 2 VMware vSphere servers.

Probably you may refer to VMware documentation as link below:
- 10GugE Performance
- ESX Networking Planning
Another tricky part is, I’m managed to get total of 6G out of 10G if I running 10 VMs instances with 8192 Message Size and 163840 Socket Size as suggested in documentation as above. Again, single netperf session will get about 2.7G for Linux and 1.5G for Windows either Nexus 5K nor back to back connection. And you may get better result with 1 CPU compare to 4 CPUs or 8 CPUs which I believed a limitation on netperf itself.
Updated: 27 July 2009
Windows 2008 Standard Edition with 1vCPU, E1000 vNIC and 5G single file transfer.

Windows 2008 Standard Edition with 1vCPU, E1000 vNIC and netperf.

Windows 2008 Standard Edition with 1vCPU, VMXNET3 vNIC, Internet Download Manager HTTP multiple sessions download.

Windows 2008 Data Center Edition with 1vCPU, VMXNET3 vNIC, 5G single file transfer.

Windows 2008 Data Center Edition with 8vCPU, VMXNET3 vNIC, 5G single file transfer with default TCP setup.

Windows 2008 Data Center Edition with 8vCPU, VMXNET3 vNIC, 5G single file transfer with TCP tuning enabled.

Summary:
The result are not consistent and I believed they may have some limitation on VMware or Microsoft Windows Operating System or Qlogic CNA card driver.
Updated:
Thanks to Maurizio & Craig comments. I’m fully agreed with you guys that the Cisco Nexus 5000 is not the bottleneck but other factors.
Related posts:
- Nexus 1000V come with free when Virtualization with VMware vSphere 4 on Cisco UCS
- How to remove Cisco Nexus 1000V plugin
- Apply VMware vSphere 4 Update 1 Patches
- Cisco UCS will soon to be in Malaysia
- High Level Cisco UCS architecture






10 Responses
Hey
Windows 2003 std 32 bit only see 3.6 GB of RAM and does not support 8cvpu’s either….http://technet.microsoft.com/en-us/library/cc758523(WS.10).aspx maybe adjust that back to the 2 or 4 vcpu and 3.6 Gb and see what happens…
Cheers
David, it’s typo. It’s 4vCPU and actually I can see 8vCPU on Windows Server 2003 Standard Edition.
Dear Blogger,
the title of this blog is misleading because the results of this POC do not imply anything on the Nexus 5000 performance characteristics. Proper testing can be conducted with traffic generators or with tools that operate from memory and not from disk, these tools are not constrained by the socket buffer size (which is configurable), the server hardware should be adequate and if the test is conducted with a virtualized server, multiple VMs should be used.
The Cisco Nexus 5000 provides linerate 10 Gigabit forwarding and works perfectly fine with Converged Network Adapters and VMWARE ESX servers as described and advertised in public documents: http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-496511.pdf.
In a VMWARE environment used in conjunction with a Nexus 5000 and using a server with enough processing cores it is possible to drive close to 10 Gbps of throughput by following the testing guidelines provided by VMWARE: http://www.vmware.com/pdf/10GigE_performance.pdf
Even if the network provides 10 Gigabit bandwidth, the performance of file transfers is constrained by
the following factors (that are not network dependent):
Disk: IO performance of disk strorage limits the maximum achievable performance, so any file application test benchmark should be accompanied by the information about the disk I/O performance.
TCP stack: TCP stack implementation varies in different OSes and newer OS versions support features like the TCP Window Scale Option and autotuning based on the measured latency. The reason for using these features is that even with low latency, due to the high speed, the Bandwidth Delay Product exceeds the standard max receive window of 64KB, hence you need TCP tuning.
As an example Windows 2008 provides Receive Window Auto-tuning:
- normal (scaling is limited to 16MB window size)
- restricted (1MB window)
- highlyrestricted (256 KB window)
- experimental (1GB window)
Socket Buffer size: The File application socket buffer size limits the maximum TCP window. For an application to drive more than 200-400Mbps one needs to modify the settings of the SO_RCVBUF and SO_SNDBUF to similar values as the one indicated in the VMWARE document “10Gbps Networking Performance” whose configuration is application dependent.
Processing Cores: In a VMWARE environment running a benchmark from an individual VM is not going to provide a valid measurement of server performance. An example of a valid test would be to run either as many VMs as the number of cores with each VM affinitized to each core, or at the very least an individual VM should be given multiple vCPU and the Guest OS should be configured for Receive Side Scaling. In the end the test should look at the aggregate throughput from all VMs running on the same machine and not at the throughput of a single VM. Please refer to VMWARE documentation for instructions on how to run a proper server benchmark measurement.
TX versus RX performance: Network Adapters may offer better performance when Transmitting (TX)rather than in Receiving (RX), because the TX may offer more offload capabilities (such as Large Segment Offload). As a result a back-to-back test can potentially be throttled by the RX server. Given a machine with enough cores, this is negligeable, but, with a 4 core system as an example, it may very well be that the server could send close to 10 Gbps in TX and not be able to receive more than 4.5Gbps. When engineering a proper test with such a system it may be better to use more machines on the RX side.
The correct way to test performance capabilities of a server system that abstracts from the server storage I/O capabilities and from the file applications constraints is to test from Memory with tools like netperf or chariot which allow setting proper buffer socket size. In addition to this one should run several instances of the testing tool (e.g. netperf) in order to leverage all the available server cores.
Best Regards,
- Maurizio Portolani
This post had been quite some times and I think wat he tried to mean here is the finding during his POC. Further looking into it, it does not meant the nexus 5000 having the problem here. I think for multi sessions from the virtual environment, it should able to drive up the bandwidth usage. But for single session, there are always bottle neck at some where else which you could not utilize the bandwidth as expected. I personally tested on vMotion with CNA and nexus 5k, it happen within 25 seconds and the bandwidth could easily went up to 1.5Gbps during the vmotion happen. Netperf is just a tools for you to test out the scenario, but coming back to the situation, it does not practical to compare to the amount of the physical real data users will try to transfer within the network. Please do not get offended as is a result capture during the previous POC and I believe the guy here just want to share about his experience on it.
additional to that, for most of the comment you posted here had been rectified and acknowledge during the POC with the engineer from Cisco. It does prove the multi session capabilities and the limitation on the points you had mentioned earlier.
You do realize the QLE8042 is not supported on vSphere4 right? QLogic dropped support for this adapter at ESX 3.5 U2. You will not find this adapter on the vSphere 4 HCL.
you are right, I will prefer to go with EMULEX which is part of the HCL. Now there are 2nd generation CNA adapter available from qlogic and certify by vSphere 4
If you haven’t already purchased it I’d suggest you go with the QLogic 8142. It’s QL’s 2nd Gen CNA. Emulex is not far behind with releasing theirs. The 1st Gen CNAs are still supported with the N5K, but with the new FIP standards being solidified, don’t expect every other 10G switch to support pre-FIP cards (1st Gen). The 2nd Gen cards are about 1/2 the size and use 1/2 the power since the FC & Eth chips are now integrated together.
thanks for the sharing, as I got 1 new project which will position this and I am considering Qlogic 8142 chip now.
Check out QLogic Accelerates Move to Virtual Data Centers With Single Chip FCoE CNAs Now Certified With VMware vSphere(TM) 4 Press Release – http://ir.qlogic.com/phoenix.zhtml?c=85695&p=RssLanding&cat=news&id=1358476 -