Commit Graph

90342 Commits

Author SHA1 Message Date
7adb961995 Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916' into staging
virtiofsd pull 2021-08-16

Two minor fixes; one for performance, the other seccomp
on s390x.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

# gpg: Signature made Thu 16 Sep 2021 14:51:38 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-virtiofs-20210916:
  virtiofsd: Reverse req_list before processing it
  tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-19 18:53:29 +01:00
bedd7e93d0 virtio-net: fix use after unmap/free for sg
When mergeable buffer is enabled, we try to set the num_buffers after
the virtqueue elem has been unmapped. This will lead several issues,
E.g a use after free when the descriptor has an address which belongs
to the non direct access region. In this case we use bounce buffer
that is allocated during address_space_map() and freed during
address_space_unmap().

Fixing this by storing the elems temporarily in an array and delay the
unmap after we set the the num_buffers.

This addresses CVE-2021-3748.

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: fbe78f4f55 ("virtio-net support")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-09-17 16:07:52 +08:00
080832e4f4 ebpf: only include in system emulators
eBPF files are being included in user emulators, which is useless and
also breaks compilation because ebpf/trace-events is only processed
if a system emulator is included in the build.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/566
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-09-17 16:07:52 +08:00
c99e34e537 Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-6.2-pull-request' into staging
Pull request linux-user 20210916

Code cleanup

# gpg: Signature made Thu 16 Sep 2021 16:11:58 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-6.2-pull-request:
  linux-user: Check lock_user result for ip_mreq_source sockopts
  linux-user: Drop unneeded includes from qemu.h
  linux-user: Don't include gdbstub.h in qemu.h
  linux-user: Split linux-user internals out of qemu.h
  linux-user: Split safe-syscall macro into its own header
  linux-user: Split mmap prototypes into user-mmap.h
  linux-user: Split loader-related prototypes into loader.h
  linux-user: Split signal-related prototypes into signal-common.h
  linux-user: Split strace prototypes into strace.h
  linux-user: Fix coding style nits in qemu.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-16 21:09:18 +01:00
eb8033f658 python: pylint 2.11 support
We're not ready to enforce f-strings everywhere, so just silence this
new warning.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Message-id: 20210916182248.721529-3-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2021-09-16 15:04:04 -04:00
5690b4370b python: Update for pylint 2.10
A few new annoyances. Of note is the new warning for an unspecified
encoding when opening a text file, which actually does indicate a
potentially real problem; see
https://www.python.org/dev/peps/pep-0597/#motivation

Use LC_CTYPE to determine an encoding to use for interpreting QEMU's
terminal output. Note that Python states: "language code and encoding
may be None if their values cannot be determined" -- use a platform
default as a backup.

Notes: Passing encoding=None will generate a suppressed warning on
Python 3.10+ that 'None' should not be passed as the encoding
argument. This behavior may be deprecated in the future and the default
switched to be a ubiquitous UTF-8. Opting in to the locale default will
be done by passing the encoding 'locale', but that isn't available in
3.6 through 3.9. Presumably this warning will be unsuppressed some time
prior to the actual switch and we can re-investigate these issues at
that time if necessary.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Message-id: 20210916182248.721529-2-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
2021-09-16 15:03:56 -04:00
74e43b04b0 linux-user: Check lock_user result for ip_mreq_source sockopts
In do_setsockopt(), the code path for the options which take a struct
ip_mreq_source (IP_BLOCK_SOURCE, IP_UNBLOCK_SOURCE,
IP_ADD_SOURCE_MEMBERSHIP and IP_DROP_SOURCE_MEMBERSHIP) fails to
check the return value from lock_user().  Handle this in the usual
way by returning -TARGET_EFAULT.

(In practice this was probably harmless because we'd pass a NULL
pointer to setsockopt() and the kernel would then return EFAULT.)

Fixes: Coverity CID 1459987
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210809155424.30968-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 17:04:21 +02:00
d1fe59377b Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-6.2-pull-request' into staging
Trivial patches pull request 20210916

# gpg: Signature made Thu 16 Sep 2021 15:09:39 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-6.2-pull-request:
  target/sparc: Make sparc_cpu_dump_state() static
  target/avr: Fix compiler errors (-Werror=enum-conversion)
  hw/vfio: Fix typo in comments
  intel_iommu: Fix typo in comments
  target/i386: spelling: occured=>occurred, mininum=>minimum
  configure: add missing pc-bios/qemu_vga.ndrv symlink in build tree
  spelling: sytem => system
  qdev: Complete qdev_init_gpio_out() documentation
  hw/i386/acpi-build: Fix a typo
  util: Remove redundant checks in the openpty()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-16 16:02:31 +01:00
046d91c83c virtiofsd: Reverse req_list before processing it
With the thread pool disabled, we add the requests in the queue to a
GList, processing by iterating over there afterwards.

For adding them, we're using "g_list_prepend()", which is more
efficient but causes the requests to be processed in reverse order,
breaking the read-ahead and request-merging optimizations in the host
for sequential operations.

According to the documentation, if you need to process the request
in-order, using "g_list_prepend()" and then reversing the list with
"g_list_reverse()" is more efficient than using "g_list_append()", so
let's do it that way.

Testing on a spinning disk (to boost the increase of read-ahead and
request-merging) shows a 4x improvement on sequential write fio test:

Test:
fio --directory=/mnt/virtio-fs --filename=fio-file1 --runtime=20
--iodepth=16 --size=4G --direct=1 --blocksize=4K --ioengine libaio
--rw write --name seqwrite-libaio

Without "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=22.4MiB/s][w=5735 IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=710: Tue Aug 24 12:58:16 2021
  write: IOPS=5709, BW=22.3MiB/s (23.4MB/s)(446MiB/20002msec); 0 zone resets
...

With "g_list_reverse()":
...
Jobs: 1 (f=1): [W(1)][100.0%][w=84.0MiB/s][w=21.5k IOPS][eta 00m:00s]
seqwrite-libaio: (groupid=0, jobs=1): err= 0: pid=716: Tue Aug 24 13:00:15 2021
  write: IOPS=21.3k, BW=83.1MiB/s (87.2MB/s)(1663MiB/20001msec); 0 zone resets
...

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20210824131158.39970-1-slp@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-09-16 14:50:48 +01:00
8cfd339b3d tools/virtiofsd: Add fstatfs64 syscall to the seccomp allowlist
The virtiofsd currently crashes on s390x when doing something like
this in the guest:

 mkdir -p /mnt/myfs
 mount -t virtiofs myfs /mnt/myfs
 touch /mnt/myfs/foo.txt
 stat -f /mnt/myfs/foo.txt

The problem is that the fstatfs64 syscall is called in this case
from the virtiofsd. We have to put it on the seccomp allowlist to
avoid that the daemon gets killed in this case.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001728
Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210914123214.181885-1-thuth@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-09-16 14:50:48 +01:00
9ac200acce target/sparc: Make sparc_cpu_dump_state() static
The sparc_cpu_dump_state() function is only called within
the same file. Make it static.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210916084002.1918445-1-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 14:52:46 +02:00
63cf61256a Merge remote-tracking branch 'remotes/kraxel/tags/vga-20210916-pull-request' into staging
virtio-gpu + ui: fence syncronization.
qxl: unbreak live migration.

# gpg: Signature made Thu 16 Sep 2021 06:56:03 BST
# gpg:                using RSA key A0328CFFB93A17A79901FE7D4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/vga-20210916-pull-request:
  virtio-gpu: Add gl_flushed callback
  ui/gtk-egl: Wait for the draw signal for dmabuf blobs
  ui: Create sync objects and fences only for blobs
  ui/egl: Add egl helpers to help with synchronization
  ui/gtk: Create a common release_dmabuf helper
  qxl: fix pre-save logic

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-16 13:48:09 +01:00
eba6814a0a target/avr: Fix compiler errors (-Werror=enum-conversion)
../target/avr/translate.c: In function ‘gen_jmp_ez’:
../target/avr/translate.c:1012:22: error: implicit conversion from ‘enum <anonymous>’ to ‘DisasJumpType’ [-Werror=enum-conversion]
 1012 |     ctx->base.is_jmp = DISAS_LOOKUP;
      |                      ^

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Michael Rolnik <mrolnik@gmail.com>
Message-Id: <20210706180936.249912-1-sw@weilnetz.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 12:11:16 +02:00
631ba5a128 hw/vfio: Fix typo in comments
Fix typo in comments:
*programatically  ==> programmatically
*disconecting  ==> disconnecting
*mulitple  ==> multiple
*timout  ==> timeout
*regsiter  ==> register
*forumula  ==> formula

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 11:57:01 +02:00
37557b09a6 intel_iommu: Fix typo in comments
Fix typo:
*Unknwon  ==> Unknown
*futher  ==> further
*configed  ==> configured

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210730014942.2311-1-caihuoqing@baidu.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 11:55:02 +02:00
7916b5fc8c target/i386: spelling: occured=>occurred, mininum=>minimum
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20210818141352.417716-1-mjt@msgid.tls.msk.ru>
[lv: add mininum=>minimum in subject]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-16 11:51:23 +02:00
57b6f58c1d Merge remote-tracking branch 'remotes/hreitz/tags/pull-block-2021-09-15' into staging
Block patches:
- Block-status cache for data regions
- qcow2 optimization (when using subclusters)
- iotests delinting, and let 297 (lint checker) cover named iotests
- qcow2 check improvements
- Added -F (target backing file format) option to qemu-img convert
- Mirror job fix
- Fix for when a migration is initiated while a backup job runs
- Fix for uncached qemu-img convert to a volume with 4k sectors (for an
  unaligned image)
- Minor gluster driver fix

# gpg: Signature made Wed 15 Sep 2021 18:39:11 BST
# gpg:                using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg:                issuer "hreitz@redhat.com"
# gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00  4D34 A1FA 40D0 9801 9CDF

* remotes/hreitz/tags/pull-block-2021-09-15: (32 commits)
  qemu-img: Add -F shorthand to convert
  qcow2-refcount: check_refblocks(): add separate message for reserved
  qcow2-refcount: check_refcounts_l1(): check reserved bits
  qcow2-refcount: improve style of check_refcounts_l1()
  qcow2-refcount: check_refcounts_l2(): check reserved bits
  qcow2-refcount: check_refcounts_l2(): check l2_bitmap
  qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap
  qcow2-refcount: introduce fix_l2_entry_by_zero()
  qcow2: introduce qcow2_parse_compressed_l2_entry() helper
  qcow2: compressed read: simplify cluster descriptor passing
  qcow2-refcount: improve style of check_refcounts_l2()
  qemu-img: Allow target be aligned to sector size
  qcow2: handle_dependencies(): relax conflict detection
  qcow2: refactor handle_dependencies() loop body
  simplebench: add img_bench_templater.py
  block: bdrv_inactivate_recurse(): check for permissions and fix crash
  tests: add migrate-during-backup
  block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()
  iotests/297: Cover tests/
  mirror-top-perms: Fix AbnormalShutdown path
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-15 18:55:59 +01:00
1899bf4737 qemu-img: Add -F shorthand to convert
Although we have long supported 'qemu-img convert -o
backing_file=foo,backing_fmt=bar', the fact that we have a shortcut -B
for backing_file but none for backing_fmt has made it more likely that
users accidentally run into:

qemu-img: warning: Deprecated use of backing file without explicit backing format

when using -B instead of -o.  For similarity with other qemu-img
commands, such as create and compare, add '-F $fmt' as the shorthand
for '-o backing_fmt=$fmt'.  Update iotest 122 for coverage of both
spellings.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210913131735.1948339-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
8fba395151 qcow2-refcount: check_refblocks(): add separate message for reserved
Split checking for reserved bits out of aligned offset check.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-11-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
98bc07d6cd qcow2-refcount: check_refcounts_l1(): check reserved bits
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-10-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
cd6efd60e9 qcow2-refcount: improve style of check_refcounts_l1()
- use g_autofree for l1_table
 - better name for size in bytes variable
 - reduce code blocks nesting
 - whitespaces, braces, newlines

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-9-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
289ef5f219 qcow2-refcount: check_refcounts_l2(): check reserved bits
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-8-vsementsov@virtuozzo.com>
[hreitz: Separated `type` declaration from statements]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
9631c7822e qcow2-refcount: check_refcounts_l2(): check l2_bitmap
Check subcluster bitmap of the l2 entry for different types of
clusters:

 - for compressed it must be zero
 - for allocated check consistency of two parts of the bitmap
 - for unallocated all subclusters should be unallocated
   (or zero-plain)

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Message-Id: <20210914122454.141075-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
5c3216c046 qcow2-refcount: fix_l2_entry_by_zero(): also zero L2 entry bitmap
We'll reuse the function to fix wrong L2 entry bitmap. Support it now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-6-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
a2debf6506 qcow2-refcount: introduce fix_l2_entry_by_zero()
Split fix_l2_entry_by_zero() out of check_refcounts_l2() to be
reused in further patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-5-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
a6e098462b qcow2: introduce qcow2_parse_compressed_l2_entry() helper
Add helper to parse compressed l2_entry and use it everywhere instead
of open-coding.

Note, that in most places we move to precise coffset/csize instead of
sector-aligned. Still it should work good enough for updating
refcounts.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-4-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
9a3978a46b qcow2: compressed read: simplify cluster descriptor passing
Let's pass the whole L2 entry and not bother with
L2E_COMPRESSED_OFFSET_SIZE_MASK.

It also helps further refactoring that adds generic
qcow2_parse_compressed_l2_entry() helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
786c22d9c2 qcow2-refcount: improve style of check_refcounts_l2()
- don't use same name for size in bytes and in entries
 - use g_autofree for l2_table
 - add whitespace
 - fix block comment style

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210914122454.141075-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 18:42:38 +02:00
7b7ab2d6c9 gitlab-ci: Mark manual-only jobs as allow_failure
If a gitlab CI job is marked as manual-only but is not marked
as allow_failure, then gitlab considers that the pipeline is
"blocked" until the job has been manually triggered. We need
to mark these manual-only jobs as also allow_failure: true
so that gitlab doesn't insist that they have run before it
will consider the pipeline to be complete.

Fixes: 4c9af1ea14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20210915123412.8232-1-peter.maydell@linaro.org
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
2021-09-15 16:43:16 +01:00
a1c62436a4 qemu-img: Allow target be aligned to sector size
We cannot write to images opened with O_DIRECT unless we allow them to
be resized so they are aligned to the sector size: Since 9c60a5d197,
bdrv_node_refresh_perm() ensures that for nodes whose length is not
aligned to the request alignment and where someone has taken a WRITE
permission, the RESIZE permission is taken, too).

Let qemu-img convert pass the BDRV_O_RESIZE flag (which causes
blk_new_open() to take the RESIZE permission) when using cache=none for
the target, so that when writing to it, it can be aligned to the target
sector size.

Without this patch, an error is returned:

$ qemu-img convert -f raw -O raw -t none foo.img /mnt/tmp/foo.img
qemu-img: Could not open '/mnt/tmp/foo.img': Cannot get 'write'
permission without 'resize': Image size is not a multiple of request
alignment

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1994266
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210819101200.64235-1-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15 15:54:07 +02:00
ff812c5563 qcow2: handle_dependencies(): relax conflict detection
There is no conflict and no dependency if we have parallel writes to
different subclusters of one cluster when the cluster itself is already
allocated. So, relax extra dependency.

Measure performance:
First, prepare build/qemu-img-old and build/qemu-img-new images.

cd scripts/simplebench
./img_bench_templater.py

Paste the following to stdin of running script:

qemu_img=../../build/qemu-img-{old|new}
$qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
$qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
        -w -t none -n /ssd/x.qcow2

The result:

All results are in seconds

------------------  ---------  ---------
                    old        new
-s 2K               6.7 ± 15%  6.2 ± 12%
                                 -7%
-s 2K -o 512        13 ± 3%    11 ± 5%
                                 -16%
-s $((1024*2+512))  9.5 ± 4%   8.4
                                 -12%
------------------  ---------  ---------

So small writes are more independent now and that helps to keep deeper
io queue which improves performance.

271 iotest output becomes racy for three allocation in one cluster.
Second and third writes may finish in different order. Second and
third requests don't depend on each other any more. Still they both
depend on first request anyway. Filter out second and third write
offsets to cover both possible outputs.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210824101517.59802-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
[hreitz: s/ an / and /]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
6d207d3501 qcow2: refactor handle_dependencies() loop body
No logic change, just prepare for the following commit. While being
here do also small grammar fix in a comment.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824101517.59802-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
5b3f7daaec simplebench: add img_bench_templater.py
Add simple grammar-parsing template benchmark. New tool consume test
template written in bash with some special grammar injections and
produces multiple tests, run them and finally print a performance
comparison table of different tests produced from one template.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210824101517.59802-2-vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
a13de40a05 block: bdrv_inactivate_recurse(): check for permissions and fix crash
We must not inactivate child when parent has write permissions on
it.

Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this
handler and it is used to flush caches, not for permission
manipulations.

So, let's simply check cumulative parent permissions before
inactivating the node.

This commit fixes a crash when we do migration during backup: prior to
the commit nothing prevents all nodes inactivation at migration finish
and following backup write to the target crashes on assertion
"assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

After the commit, we rely on the fact that copy-before-write filter
keeps write permission on target node to be able to write to it. So
inactivation fails and migration fails as expected.

Corresponding test now passes, so, enable it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210911120027.8063-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
2f43482733 tests: add migrate-during-backup
Add a simple test which tries to run migration during backup.
bdrv_inactivate_all() should fail. But due to bug (see next commit with
fix) it doesn't, nodes are inactivated and continued backup crashes
on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210911120027.8063-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
66fed30c9c block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()
In mirror_iteration() we call mirror_wait_on_conflicts() with
`self` parameter set to NULL.

Starting from commit d44dae1a7c we dereference `self` pointer in
mirror_wait_on_conflicts() without checks if it is not NULL.

Backtrace:
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
      at ../block/mirror.c:172
  172	                self->waiting_for_op = op;
  [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
  (gdb) bt
  #0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>)
      at ../block/mirror.c:172
  #1  0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491
  #2  0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917
  #3  0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
      at ../util/coroutine-ucontext.c:173
  #4  0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
      from /usr/lib64/libc.so.6

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404
Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210910124533.288318-1-sgarzare@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
098d983ea5 iotests/297: Cover tests/
297 so far does not check the named tests, which reside in the tests/
directory (i.e. full path tests/qemu-iotests/tests).  Fix it.

Thanks to the previous two commits, all named tests pass its scrutiny,
so we do not have to add anything to SKIP_FILES.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210902094017.32902-6-hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
b90d7a18b6 mirror-top-perms: Fix AbnormalShutdown path
The AbnormalShutdown exception class is not in qemu.machine, but in
qemu.machine.machine.  (qemu.machine.AbnormalShutdown was enough for
Python to find it in order to run this test, but pylint complains about
it.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210902094017.32902-5-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15 15:54:07 +02:00
d8c2e47dbe migrate-bitmaps-test: Fix pylint warnings
There are a couple of things pylint takes issue with:
- The "time" import is unused
- The import order (iotests should come last)
- get_bitmap_hash() doesn't use @self and so should be a function
- Semicolons at the end of some lines
- Parentheses after "if"
- Some lines are too long (80 characters instead of 79)
- inject_test_case()'s @name parameter shadows a top-level @name
  variable
- "lambda self: mc(self)" were equivalent to just "mc", but in
  inject_test_case(), it is not equivalent, so add a comment and disable
  the warning locally
- Always put two empty lines after a function
- f'exec: cat > /dev/null' does not need to be an f-string

Fix them.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210902094017.32902-4-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15 15:54:07 +02:00
e2ad17a62d migrate-bitmaps-postcopy-test: Fix pylint warnings
pylint complains that discards1_sha256 and all_discards_sha256 are first
set in non-__init__ methods.

These variables are not really class-variables anyway, so let them
instead be returned by start_postcopy(), thus silencing pylint.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902094017.32902-3-hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
26db7b23ce iotests/297: Drop 169 and 199 from the skip list
169 and 199 have been renamed and moved to tests/ (commit a44be0334b:
"iotests: rename and move 169 and 199 tests"), so we can drop them from
the skip list.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210902094017.32902-2-hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
cc16153f1f iotests: Fix use-{list,dict}-literal warnings
pylint proposes using `[]` instead of `list()` and `{}` instead of
`dict()`, because it is faster.  That seems simple enough, so heed its
advice.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824153540.177128-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-09-15 15:54:07 +02:00
81dcb9ca1f iotests: Fix unspecified-encoding pylint warnings
As of recently, pylint complains when `open()` calls are missing an
`encoding=` specified.  Everything we have should be UTF-8 (and in fact,
everything should be UTF-8, period (exceptions apply)), so use that.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210824153540.177128-2-hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2021-09-15 15:54:07 +02:00
9dbf6455f4 block/iscsi: Do not force-cap *pnum
bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-7-hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
72b4cabe5e block/gluster: Do not force-cap *pnum
bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-6-hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
869e7ee827 block/file-posix: Do not force-cap *pnum
bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-5-hreitz@redhat.com>
2021-09-15 15:54:07 +02:00
5a1cfd2150 block: Clarify that @bytes is no limit on *pnum
.bdrv_co_block_status() implementations are free to return a *pnum that
exceeds @bytes, because bdrv_co_block_status() in block/io.c will clamp
*pnum as necessary.

On the other hand, if drivers' implementations return values for *pnum
that are as large as possible, our recently introduced block-status
cache will become more effective.

So, make a note in block_int.h that @bytes is no upper limit for *pnum.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210812084148.14458-4-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2021-09-15 15:54:07 +02:00
0bc329fbb0 block: block-status cache for data regions
As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform.  The
main difference is that this time it is implemented as part of the
general block layer code.

The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow.  Given the
implementation is outside of qemu, there is little we can do about its
performance.

We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts.  So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.

To address this, we want to cache data regions.  Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.

(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine.  While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)

We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20210812084148.14458-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[hreitz: Added `local_file == bs` assertion, as suggested by Vladimir]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:06 +02:00
33ff4c9e08 block: Drop BDS comment regarding bdrv_append()
There is a comment above the BDS definition stating care must be taken
to consider handling newly added fields in bdrv_append().

Actually, this comment should have said "bdrv_swap()" as of 4ddc07cac
(nine years ago), and in any case, bdrv_swap() was dropped in
8e419aefa (six years ago).  So no such care is necessary anymore.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210812084148.14458-2-hreitz@redhat.com>
2021-09-15 15:54:06 +02:00
e24154d878 gluster: Align block-status tail
gluster's block-status implementation is basically a copy of that in
block/file-posix.c, there is only one thing missing, and that is
aligning trailing data extents to the request alignment (as added by
commit 9c3db310ff).

Note that 9c3db310ff mentions that "there seems to be no other block
driver that sets request_alignment and [...]", but while block/gluster.c
does indeed not set request_alignment, block/io.c's
bdrv_refresh_limits() will still default to an alignment of 512 because
block/gluster.c does not provide a byte-aligned read function.
Therefore, unaligned tails can conceivably occur, and so we should apply
the change from 9c3db310ff to gluster's block-status implementation.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20210805143603.59503-1-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15 15:54:06 +02:00