mii/qemu - qemu - Gitea: Git with a cup of tea

mii/qemu

mirror of https://github.com/mii443/qemu.git synced 2025-08-22 23:25:48 +00:00

Go to file

Ilya Maximets cb039ef3d9 net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack.  In the essence, the technology is
pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets.  Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket.  A chunk of userspace memory
is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data.  Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring.  After transmission, device will
return the buffer via Completion ring.  On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

  -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
  -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface.  It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
         driver support.  With a caveat of lower performance.

2. native - this does require support from the driver and allows to
            bypass skb allocation in the kernel and potentially use
            zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option.  To force 'copy' even in native
mode, use 'force-copy=on' option.  This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open.  Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU.  So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues.  It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
or CAP_BPF capabilities in order to load default XSK/XDP programs to
the network interface and configure BPF maps.  It is possible, however,
to run with no capabilities.  For that to work, an external process
with enough capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option.  Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.

There are few performance challenges with the current network backends.

First is that they do not support IO threads.  This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work.  This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance.  The fastest
"frontend" device is virtio-net.  But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa).  In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory.  Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations.  AF_XDP sockets do not
support any kinds of checksum or segmentation offloading.  Buffers
are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
support implementation for AF_XDP is in progress, but not ready yet.
Also, transmission in all non-zero-copy modes is synchronous, i.e.
done in a syscall.  That doesn't allow high packet rates on virtual
interfaces.

However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on top
of a physical NIC with zero-copy support.

Test setup:

2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
Network backend is configured to open the NIC directly in native mode.
The driver supports zero-copy.  NIC is configured to use 1 queue.

Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
for PPS testing.

iperf3 result:
 TCP stream      : 19.1 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
 Tx only         : 3.4 Mpps
 Rx only         : 2.0 Mpps
 L2 FWD Loopback : 1.5 Mpps

In skb mode the same setup shows much lower performance, similar to
the setup where pair of physical NICs is replaced with veth pair:

iperf3 result:
  TCP stream      : 9 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
  Tx only         : 1.2 Mpps
  Rx only         : 1.0 Mpps
  L2 FWD Loopback : 0.7 Mpps

Results in skb mode or over the veth are close to results of a tap
backend with vhost=on and disabled segmentation offloading bridged
with a NIC.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
Signed-off-by: Jason Wang <jasowang@redhat.com>

2023-09-18 14:36:13 +08:00

.github/workflows

github: fix config mistake preventing repo lockdown commenting

2022-04-26 16:12:26 +01:00

.gitlab/issue_templates

.gitlab/issue_templates: Move suggestions into comments

2022-12-15 15:19:24 +01:00

.gitlab-ci.d

.gitlab-ci.d/cirrus.yml: Update FreeBSD to v13.2

2023-08-30 14:57:50 +01:00

accel

arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE

2023-09-08 16:41:36 +01:00

audio

audio: spelling fixes

2023-09-08 13:08:52 +03:00

authz

error: Drop superfluous #include "qapi/qmp/qerror.h"

2023-02-23 13:56:14 +01:00

backends

tpm: fix crash when FD >= 1024 and unnecessary errors due to EINTR

2023-09-13 08:42:57 -04:00

block

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

2023-09-11 09:11:22 -04:00

bsd-user

trace-events: Fix the name of the tracing.rst file

2023-09-08 13:08:51 +03:00

chardev

misc/other: spelling fixes

2023-09-08 13:08:52 +03:00

common-user

common-user/host/ppc: Implement safe-syscall.inc.S

2023-01-23 14:39:48 -10:00

configs

target/loongarch: Add GDB support for loongarch32 mode

2023-08-24 11:17:56 +08:00

contrib

contrib/vhost-user-gpu: add support for sending dmabuf modifiers

2023-09-12 10:37:01 +04:00

crypto

crypto: Add SM4 constant parameter CK

2023-09-11 11:45:55 +10:00

disas

riscv/disas: Fix disas output of upper immediates

2023-07-19 14:30:04 +10:00

docs

docs: vhost-user-gpu: add protocol changes for dmabuf modifiers

2023-09-12 10:37:01 +04:00

dump

dump: kdump-zlib data pages not dumped with pvtime/aarch64

2023-08-07 15:46:59 +04:00

ebpf

trace-events: Fix the name of the tracing.rst file

2023-09-08 13:08:51 +03:00

fpu

fpu: Add float64_to_int{32,64}_modulo

2023-07-01 08:26:54 +02:00

fsdev

9pfs: deprecate 'proxy' backend

2023-07-06 11:42:08 +02:00

gdb-xml

target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-24 11:17:59 +08:00

gdbstub

configure, meson: remove target OS symbols from config-host.mak

2023-09-07 13:32:37 +02:00

host/include

other architectures: spelling fixes

2023-07-25 17:14:07 +03:00

e1000e: rename e1000e_ba_state and e1000e_write_hdr_to_rx_buffers

2023-09-18 14:36:13 +08:00

include

tap: Add check for USO features

2023-09-18 14:36:13 +08:00

io: follow coroutine AioContext in qio_channel_yield()

2023-09-07 20:32:11 -05:00

libdecnumber

libdecnumber/dpd/decimal64: Fix compiler warning from Clang 15

2022-11-11 09:13:52 +01:00

linux-headers

linux-headers: Update to Linux v6.6-rc1

2023-09-12 11:34:56 +02:00

linux-user

linux-user/riscv: Add new extensions to hwprobe

2023-09-11 11:45:55 +10:00

migration

migration: Add .save_prepare() handler to struct SaveVMHandlers

2023-09-11 08:34:06 +02:00

monitor

hw/char: Have FEWatchFunc handlers return G_SOURCE_CONTINUE/REMOVE

2023-08-31 19:47:43 +02:00

nbd

Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging

2023-09-08 10:06:25 -04:00

net

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

pc-bios

meson: compile bundled device trees

2023-09-07 13:32:14 +02:00

plugins

configure, meson: move --enable-plugins to meson

2023-09-07 13:32:37 +02:00

po: add ukrainian translation

2022-07-05 10:15:49 +02:00

python

Revert "mkvenv: work around broken pip installations on Debian 10"

2023-09-07 13:32:37 +02:00

qapi

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

qga

qga/: spelling fixes

2023-09-08 13:08:52 +03:00

qobject

docs/interop: Convert qmp-spec.txt to rST

2023-05-22 10:21:01 +02:00

qom

meson: Replace softmmu_ss -> system_ss

2023-06-20 10:01:30 +02:00

replay

meson: Replace softmmu_ss -> system_ss

2023-06-20 10:01:30 +02:00

roms

roms/opensbi: Upgrade from v1.3 to v1.3.1

2023-07-23 19:32:02 +10:00

scripts

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

scsi

io: follow coroutine AioContext in qio_channel_yield()

2023-09-07 20:32:11 -05:00

semihosting

accel/tcg: spelling fixes

2023-08-31 19:47:43 +02:00

softmmu

sysemu: Add prepare callback to struct VMChangeStateEntry

2023-09-11 08:34:05 +02:00

stats

meson: Replace softmmu_ss -> system_ss

2023-06-20 10:01:30 +02:00

storage-daemon

configure, meson: remove target OS symbols from config-host.mak

2023-09-07 13:32:37 +02:00

stubs

stubs/colo.c: spelling

2023-08-07 13:52:59 +03:00

subprojects

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

2023-09-07 10:29:06 -04:00

target

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

2023-09-13 13:41:27 -04:00

tcg

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

2023-09-07 10:29:06 -04:00

tests

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

tools

ebpf: fix compatibility with libbpf 1.0+

2023-03-10 17:26:47 +08:00

trace

meson: Replace softmmu_ss -> system_ss

2023-06-20 10:01:30 +02:00

ui: add precondition for dpy_get_ui_info()

2023-09-12 11:14:09 +04:00

util

util/iov: Avoid dynamic stack allocation

2023-09-07 20:32:11 -05:00

.dir-locals.el

…

.editorconfig

.editorconfig: update the automatic mode setting for Emacs

2021-03-10 15:34:11 +00:00

.exrc

…

.gdbinit

.gdbinit: load QEMU sub-commands when gdb starts

2017-06-07 14:38:45 +01:00

.git-blame-ignore-revs

metadata: add .git-blame-ignore-revs

2023-04-04 15:56:44 +01:00

.gitattributes

gitattributes: Cover Objective-C source files

2022-03-29 00:15:14 +02:00

.gitignore

configure: rename --enable-pypi to --enable-download, control subprojects too

2023-06-06 16:30:01 +02:00

.gitlab-ci.yml

docs: Document GitLab custom CI/CD variables

2021-07-29 07:56:01 +02:00

.gitmodules

meson: subprojects: replace berkeley-{soft,test}float-3 with wraps

2023-06-06 16:30:01 +02:00

.gitpublish

Add a git-publish configuration file

2018-03-05 09:03:17 +00:00

.mailmap

MAINTAINERS: Update Roman Bolshakov email address

2023-06-28 13:55:09 +02:00

.patchew.yml

scripts/checkpatch: roll diff tweaking into checkpatch itself

2021-06-25 10:08:33 +01:00

.readthedocs.yml

readthedocs: build with Python 3.6

2020-10-05 16:30:45 +01:00

.travis.yml

travis.yml: Add missing 'flex', 'bison' packages to 'GCC (user)' job

2023-04-20 11:24:50 +02:00

block.c

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

2023-09-11 09:11:22 -04:00

blockdev-nbd.c

qapi block: Elide redundant has_FOO in generated C

2022-12-14 20:03:25 +01:00

blockdev.c

cutils: Adjust signature of parse_uint[_full]

2023-06-02 12:27:19 -05:00

blockjob.c

blockjob: Fix AioContext locking in block_job_add_bdrv()

2023-06-28 08:46:21 +02:00

configure

Python: Drop support for Python 3.7

2023-09-07 13:32:37 +02:00

COPYING

…

COPYING.LIB

COPYING.LIB: Synchronize the LGPL 2.1 with the version from gnu.org

2019-01-30 11:01:22 +01:00

cpu.c

trivial: Simplify the spots that use TARGET_BIG_ENDIAN as a numeric value

2023-09-08 13:08:52 +03:00

cpus-common.c

cpu: expose qemu_cpu_list_lock for lock-guard use

2023-05-11 09:53:41 +01:00

event-loop-base.c

util/event-loop-base: Introduce options to set the thread pool size

2022-05-09 10:43:23 +01:00

gitdm.config

contrib/gitdm: add group map for AMD

2023-03-22 15:08:26 +00:00

hmp-commands-info.hx

accel/tcg: remove CONFIG_PROFILER

2023-06-26 17:33:00 +02:00

hmp-commands.hx

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

iothread.c

iothread: Set the GSource "name" field

2023-09-07 14:01:25 -04:00

job-qmp.c

qapi job: Elide redundant has_FOO in generated C

2022-12-14 20:04:47 +01:00

job.c

block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context

2022-10-27 20:14:11 +02:00

Kconfig

meson: Introduce target-specific Kconfig

2021-07-09 18:21:34 +02:00

Kconfig.host

vfio-user: build library

2022-06-15 16:42:33 +01:00

LICENSE

tcg/LICENSE: Remove out of date claim about TCG subdirectory licensing

2019-11-11 15:11:21 +01:00

MAINTAINERS

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

Makefile

configure, meson: remove target OS symbols from config-host.mak

2023-09-07 13:32:37 +02:00

memory_ldst.c.inc

exec/memory_ldst: Use correct type sizes

2021-05-26 08:35:51 -07:00

meson_options.txt

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

meson.build

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

module-common.c

…

os-posix.c

os-posix.c: remove unneeded #includes

2023-09-01 23:46:20 +02:00

os-win32.c

Remove qemu-common.h include from most units

2022-04-06 14:31:55 +02:00

page-vary-common.c

Remove qemu-common.h include from most units

2022-04-06 14:31:55 +02:00

page-vary.c

include: move target page bits declaration to page-vary.h

2022-04-06 14:31:43 +02:00

pythondeps.toml

Revert "tests: Use separate virtual environment for avocado"

2023-08-28 09:55:48 +02:00

qemu-bridge-helper.c

qemu-bridge-helper: relocate path to default ACL

2020-09-30 19:11:36 +02:00

qemu-edid.c

qemu-edid: Restrict input parameter -d to avoid division by zero

2022-10-12 13:38:15 +02:00

qemu-img-cmds.hx

qemu-img: Unify [-b [-F]] documentation

2022-02-01 13:49:15 +01:00

qemu-img.c

qemu-img: omit errno value in error message

2023-09-08 17:03:09 +02:00

qemu-io-cmds.c

qemu-iotests: test zone append operation

2023-05-15 08:18:10 -04:00

qemu-io.c

include: move qemu_*_exec_dir() to cutils

2022-05-28 11:42:56 +02:00

qemu-keymap.c

qemu-keymap: properly check return from xkb_keymap_mod_get_index

2023-07-03 12:51:21 +01:00

qemu-nbd.c

qemu-nbd: Restore "qemu-nbd -v --fork" output

2023-09-08 07:20:58 -05:00

qemu-options.hx

net: add initial support for AF_XDP network backend

2023-09-18 14:36:13 +08:00

qemu.nsi

nsis installer: Fix mouse-over descriptions for emulators

2022-03-18 10:55:15 +00:00

qemu.sasl

sasl: remove comment about obsolete kerberos versions

2021-06-14 13:28:50 +01:00

README.rst

README.rst: fix link formatting

2022-08-04 13:44:21 +02:00

replication.c

replication: move include out of root directory

2021-05-26 14:49:46 +02:00

trace-events

trace-events: remove the remaining vcpu trace events

2023-06-01 11:05:05 -04:00

VERSION

Open 8.2 development tree

2023-08-22 07:14:07 -07:00

version.rc

configure: remove CONFIG_FILEVERSION and CONFIG_PRODUCTVERSION

2021-01-02 21:03:37 +01:00

README.rst

===========
QEMU README
===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Documentation
=============

Documentation can be found hosted online at
`<https://www.qemu.org/documentation/>`_. The documentation for the
current development version that is available at
`<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/``
folder in the source tree, and is built by `Sphinx
<https://www.sphinx-doc.org/en/master/>`_.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:


.. code-block:: shell

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

* `<https://wiki.qemu.org/Hosts/Linux>`_
* `<https://wiki.qemu.org/Hosts/Mac>`_
* `<https://wiki.qemu.org/Hosts/W32>`_


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

.. code-block:: shell

   git clone https://gitlab.com/qemu-project/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the `style section
<https://www.qemu.org/docs/master/devel/style.html>`_ of
the Developers Guide.

Additional information on submitting patches can be found online via
the QEMU website

* `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_
* `<https://wiki.qemu.org/Contribute/TrivialPatches>`_

The QEMU website is also maintained under source control.

.. code-block:: shell

  git clone https://gitlab.com/qemu-project/qemu-web.git

* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to

*  `<https://github.com/stefanha/git-publish>`_

The workflow with 'git-publish' is:

.. code-block:: shell

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

.. code-block:: shell

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses GitLab issues to track bugs. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

* `<https://gitlab.com/qemu-project/qemu/-/issues>`_

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via GitLab.

For additional information on bug reporting consult:

* `<https://wiki.qemu.org/Contribute/ReportABug>`_


ChangeLog
=========

For version history and release notes, please visit
`<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for
more detailed information.


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

* `<mailto:qemu-devel@nongnu.org>`_
* `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_
* #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

* `<https://wiki.qemu.org/Contribute/StartHere>`_

Languages

C 83%

C++ 6.3%

Python 3.2%

Dylan 2.9%

Shell 1.6%

Other 2.8%