DPDK (Data Plane Development Kit) provides high-performance packet processing libraries and user space drivers. Open vSwitch uses DPDK libraries to perform entirely within the user space. According to Intel ONP performance tests, using OVS with DPDK can provide a huge performance enhancement, increasing network packet throughput and reducing latency.
OVS-DPDK has been added to the oVirt Master branch as an experimental feature. The following post describes the OVS-DPDK installation and configuration procedures.
Please note: Accessing the OVS-DPDK feature requires installing the oVirt Master version. In addition, OVS-DPDK cannot access any features located within the Linux kernel. This includes Linux bridge, tun/tap devices, iptables, etc.
In order to achieve the best performance, please follow the instructions at: http://dpdk.org/doc/guides-16.11/linux_gsg/nic_perf_intel_platform.html
The network card must be on the supported vendor matrix, located here: http://dpdk.org/doc/nics
Ensure that your system supports 1GB hugepages:
grep -q pdpe1gb /proc/cpuinfo && echo "1GB supported"
- Hardware support: Make sure VT-d / AMD-Vi is enabled in BIOS
- Kernel support: When adding a new host via the oVirt GUI, go to Hosts -> Edit > Kernel, and add the following parameters to the Kernel command line:
iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=NIn the event that 1GB hugepages is not supported, drop the
default_hugepagesz=1GB hugepagesz=1G. The default hugepages size will be used instead.
It is best to configure the host before installing it on oVirt-engine. If changes are made after VDSM is installed, move the host to maintenance mode and reboot the host and re-install it for the changes to take effect.
Configuring OVS and DPDK on the Host
Install dpdk 16.11 packages:
# yum install dpdk dpdk-tools driverctl
Bind NICs to DPDK
In order to run a DPDK application, a suitable UIO module should be loaded into kernel memory. Available modules are: uio_pci_generic, igb_uio and vfio-pci. In many cases, the standard uio_pci_generic module included in the Linux kernel can provide the UIO capability. For some devices which lack support for legacy interrupts, igb_uio module should be used. Note that it is an out-of-tree kernel module. VFIO is the preferred driver in latest DPDK versions, in platforms that support VFIO.
All ports that you would like to run DPDK application should be bound to one of the modules before OVS is started. DPDK will ignore all devices which are not bound to one of the modules above.
# modprobe vfio_pci
Find the appropriate network devices you wish to bind:
# lspci | grep Ether
Review the status of current port bindings:
# driverctl -v list-devices
Bind the device to DPDK:
# driverctl set-override <PCI_ID> vfio-pci
Hugepages support is required for increasing performance. It allows the Linux kernel to manage larger pages, in addition to the standard 4K pages. Modern CPUs use a page-based mechanism to translate a virtual address into a physical address. The mapping is stored in page tables and can grow significantly large. To make the translation faster, we use a transparent cache called Translation Lookaside Buffer (TLB). A TLB miss will be handled by the hardware, which will increase the translation time significantly. TLB misses can be reduced by increasing the page size. Therefore, 1GB pages are preferable.
The number of pages can vary according to usage requirements but is limited to memory size. Hugepages must be marked as shared in order to enable host and guest communication, as packets are transmitted and received on buffers allocated on hugepages memory.
Review custom properties:
# engine-config -g UserDefinedVMProperties
Add engine custom properties for hugepages support (followed by engine restart)
# engine-config -s "UserDefinedVMProperties=hugepages=^.*$;hugepages_shared=(true|false)"
Enable DPDK in OVS
To enable DPDK functionality, ovs-vscwhitchd should receive the following parameters:
dpdk-initto allow OVS to initialize and support DPDK ports:
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
dpdk-socket-mem: A comma-separated list of the mount of memory to allocate from each NUMA node. It is important to allocate memory on all NUMA nodes that have DPDK interfaces associated with them, otherwise, the NUMA node cannot be used and errors will occur. For example, in a dual socket system, we allocate 1GB only for NUMA node 0:
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,0"
pmd-cpu-mask: PMD (poll mode driver) threads are used to poll DPDK devices for new packets instead of using the NIC driver that interacts with the kernel. To better scale the workloads across cores, multiple PMD threads can be created and pinned to CPU cores by explicitly specifying pmd-cpu-mask. Ensure that cores from the same NUMA nodes as the network device are enabled. In case of hyper-threading, it is best to use sibling cpus.
eg: To spawn 2 PMD threads and pin them to cores 1, 2:
# ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
dpdk-lcore-mask: Specifies the CPU cores on which DPDK lcore threads should be spawned and expects a hex string. These are threads for OVS usage other than PMD. They cannot overlap with PMD or vCPUs threads.
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x2
Restart openvswitch service:
# sudo systemctl restart openvswitch
Verify that DPDK is enabled:
# ovs-vsctl get Open_vSwitch . iface_types [dpdk, dpdkr, dpdkvhostuser, dpdkvhostuserclient, geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan]
OVS-DPDK in oVirt
- Upon creating a new cluster, choose OVS as the switch type.
- When setting up host networks, DPDK-bound NICs are shown in the dialog,as well as Linux-controlled ones
- When creating a network on a DPDK device, an OVS bridge with datapath_type=’netdev’ automatically attached to a DPDK device, will be created.
VMs will be created with vhost-user network interface. In order to support vhostuser interfaces, hugepages should be configured on the VM.
In the oVirt GUI, configure hugepages size, number of pages to use and enable shared hugepages.
Vhostuser XML file:
<interface type="vhostuser"> <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci" /> <mac address="00:1a:4a:16:01:54" /> <model type="virtio" /> <source mode="server" path="/var/run/vdsm/vhostuser/ee844c42-7075-3bf5-86c2-ec831f287c22" type="unix" /> <filterref filter="vdsm-no-mac-spoofing" /> <link state="up" /> <bandwidth /> </interface>
DPDK in the Guest
DPDK can be used in the guest in order to run a high-speed packet forwarding program between vhostuser ports. VM should be configured to support DPDK using the same steps introduced in the host: configure hugepages, install dpdk package, load the suitable kernel module and bind network devices to DPDK. Testpmd is a good example of a DPDK application designated for testing packet forwarding.
# sudo testpmd -l 0-3 --socket-mem 1024 -n 4 -- -i
-l: List of cores to run on
--socket-memory: memory to allocate on specific sockets
-n: number of memory channels to use
-i: interactive mode
For more options, please consult testpmd documentation.
- hugepages must be configured for each host via the command line.
- Shared hugepages must be requested for every VM via a custom property.
- Performance is highly affected from the hardware and the setup configuration.
- NIC Hotplug / Hotunplug and live migration are currently not supported.
Ways to achieve optimized performance will be discussed in a follow up post.