First, this permission never protected a node from being changed, as
generic child-replacing functions don't check it.
Second, it's a strange thing: it presents a permission of parent node
to change its child. But generally, children are replaced by different
mechanisms, like jobs or qmp commands, not by nodes.
Graph-mod permission is hard to understand. All other permissions
describe operations which done by parent node on its child: read,
write, resize. Graph modification operations are something completely
different.
The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
perm) is mirror_start_job, for s->target. Still modern code should use
bdrv_freeze_backing_chain() to protect from graph modification, if we
don't do it somewhere it may be considered as a bug. So, it's a bit
risky to drop GRAPH_MOD, and analyzing of possible loss of protection
is hard. But one day we should do it, let's do it now.
One more bit of information is that locking the corresponding byte in
file-posix doesn't make sense at all.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The return value of virtio_blk_handle_vq() is no longer used. Get rid of
it. This is a step towards unifying the dataplane and non-dataplane
virtqueue handler functions.
Prepare virtio_blk_handle_output() to be used by both dataplane and
non-dataplane by making the condition for starting ioeventfd more
specific. This way it won't trigger when dataplane has already been
started.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20211207132336.36627-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The virtqueue host notifier API
virtio_queue_aio_set_host_notifier_handler() polls the virtqueue for new
buffers. AioContext previously required a bool progress return value
indicating whether an event was handled or not. This is no longer
necessary because the AioContext polling API has been split into a poll
check function and an event handler function. The event handler is only
run when we know there is work to do, so it doesn't return bool.
The VirtIOHandleAIOOutput function signature is now the same as
VirtIOHandleOutput. Get rid of the bool return value.
Further simplifications will be made for virtio-blk and virtio-scsi in
the next patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20211207132336.36627-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
vhost-user-blk realize only attempts to reconnect if the previous
connection attempt failed on "a problem with the connection and not an
error related to the content (which would fail again the same way in the
next attempt)".
However this distinction is very subtle, and may be inadvertently broken
if the code changes somewhere deep down the stack and a new error gets
propagated up to here.
OTOH now that the number of reconnection attempts is limited it seems
harmless to try reconnecting on any error.
So relax the condition of whether to retry connecting to check for any
error.
This patch amends a527e312b5 "vhost-user-blk: Implement reconnection
during realize".
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Message-Id: <20211111153354.18807-2-rvkagan@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
The code that introduced "virtio-blk: Configure all host notifiers in
a single MR transaction" introduced a second loop variable to perform
cleanup in second loop, but mistakenly still refers to the first
loop variable within the second loop body.
Fixes: d0267da614 ("virtio-blk: Configure all host notifiers in a single MR transaction")
Signed-off-by: Mark Mielke <mark.mielke@gmail.com>
Message-id: CALm7yL08qarOu0dnQkTN+pa=BSRC92g31YpQQNDeAiT4yLZWQQ@mail.gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
backend_defaults property allow users to control if default block
properties should be decided with backend information.
If it is off, any backend information will be discarded, which is
suitable if you plan to perform live migration to a different disk backend.
If it is on, a block device may utilize backend information more
aggressively.
By default, it is auto, which uses backend information for block
sizes and ignores the others, which is consistent with the older
versions.
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20210705130458.97642-2-akihiko.odaki@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The device model batching its ioeventfds in a single MR transaction is
an optimization. Clarify this in virtio-scsi, virtio-blk and generic
virtio code. Also clarify that the transaction must commit before
closing ioeventfds so that no one is tempted to merge the loops
in the start functions error path and in the stop functions.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <162125799728.1394228.339855768563326832.stgit@bahia.lan>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Commit dabefdd6 removed code that was supposed to try reconnecting
during .realize(), but actually just crashed and had several design
problems.
This adds the feature back without the crash in simple cases while also
fixing some design problems: Reconnection is now only tried if there was
a problem with the connection and not an error related to the content
(which would fail again the same way in the next attempt). Reconnection
is limited to three attempts (four with the initial attempt) so that we
won't end up in an infinite loop if a problem is permanent. If the
backend restarts three times in the very short time window of device
initialisation, we have bigger problems and erroring out is the right
course of action.
In the case that a connection error occurs and we reconnect, the error
message is printed using error_report_err(), but otherwise ignored.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210609154658.350308-8-kwolf@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This function is the part that we will want to retry if the connection
is lost during initialisation, so factor it out to keep the following
patch simpler.
The error path for vhost_dev_get_config() forgot disconnecting the
chardev, add this while touching the code.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210609154658.350308-7-kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of just returning 0/-1 and letting the caller make up a
meaningless error message, add an Error parameter to allow reporting the
real error and switch to 0/-errno so that different kind of errors can
be distinguished in the caller.
config_len in vhost_user_get_config() is defined by the device, so if
it's larger than VHOST_USER_MAX_CONFIG_SIZE, this is a programming
error. Turn the corresponding check into an assertion.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210609154658.350308-6-kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This allows callers to return better error messages instead of making
one up while the real error ends up on stderr. Most callers can
immediately make use of this because they already have an Error
parameter themselves. The others just keep printing the error with
error_report_err().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210609154658.350308-2-kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change the '-device help' output from:
Storage devices:
name "floppy", bus floppy-bus, desc "virtual floppy drive"
name "isa-fdc", bus ISA
to:
Storage devices:
name "floppy", bus floppy-bus, desc "virtual floppy drive"
name "isa-fdc", bus ISA, desc "virtual floppy controller"
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210614193220.2007159-7-philmd@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Block layer patches
- vhost-user-blk: Fix error handling during initialisation
- Add test cases for the vhost-user-blk export
- Fix leaked Transaction objects
- qcow2: Expose dirty bit in 'qemu-img info'
# gpg: Signature made Tue 18 May 2021 11:57:46 BST
# gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg: issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
vhost-user-blk: Check that num-queues is supported by backend
virtio: Fail if iommu_platform is requested, but unsupported
vhost-user-blk: Get more feature flags from vhost device
vhost-user-blk: Improve error reporting in realize
vhost-user-blk: Don't reconnect during initialisation
vhost-user-blk: Make sure to set Error on realize failure
vhost-user-blk-test: test discard/write zeroes invalid inputs
tests/qtest: add multi-queue test case to vhost-user-blk-test
test: new qTest case to test the vhost-user-blk-server
block/export: improve vu_blk_sect_range_ok()
block: Fix Transaction leak in bdrv_reopen_multiple()
block: Fix Transaction leak in bdrv_root_attach_child()
qcow2: set bdi->is_dirty
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Creating a device with a number of queues that isn't supported by the
backend is pointless, the device won't work properly and the error
messages are rather confusing.
Just fail to create the device if num-queues is higher than what the
backend supports.
Since the relationship between num-queues and the number of virtqueues
depends on the specific device, this is an additional value that needs
to be initialised by the device. For convenience, allow leaving it 0 if
the check should be skipped. This makes sense for vhost-user-net where
separate vhost devices are used for the queues and custom initialisation
code is needed to perform the check.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1935031
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210429171316.162022-7-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that vhost_user_blk_connect() is not called from an event handler
any more, but directly from vhost_user_blk_device_realize(), we can
actually make use of Error again instead of calling error_report() in
the inner function and setting a more generic and therefore less useful
error message in realize() itself.
With Error, the callers are responsible for adding context if necessary
(such as the "-device" option the error refers to). Additional prefixes
are redundant and better omitted.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210429171316.162022-4-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a partial revert of commits 77542d4314 and bc79c87bcd.
Usually, an error during initialisation means that the configuration was
wrong. Reconnecting won't make the error go away, but just turn the
error condition into an endless loop. Avoid this and return errors
again.
Additionally, calling vhost_user_blk_disconnect() from the chardev event
handler could result in use-after-free because none of the
initialisation code expects that the device could just go away in the
middle. So removing the call fixes crashes in several places.
For example, using a num-queues setting that is incompatible with the
backend would result in a crash like this (dereferencing dev->opaque,
which is already NULL):
#0 0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
#1 0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
#2 0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#3 0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
#4 0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#5 0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
#6 0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
#7 0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
#8 0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
#9 0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
Note that this removes the ability to reconnect during initialisation
(but not during operation) when there is no permanent error, but the
backend restarts, as the implementation was buggy. This feature can be
added back in a follow-up series after changing error paths to
distinguish cases where retrying could help from cases with permanent
errors.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20210429171316.162022-3-kwolf@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
With the introduction of the nvme-subsystem device we are really
cluttering up the hw/block directory.
As suggested by Philippe previously, move the nvme emulation to hw/nvme.
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Validation of the max_active and max_open zoned parameters are
independent of any other state, so move them to the early
nvme_ns_check_constraints parameter checks.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
It is not an error to report more active/open zones supported than the
number of zones in the namespace.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The NvmeCtrl num_namespaces member is just an indirection for the
NVME_MAX_NAMESPACES constant.
Remove the indirection.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Streamline namespace array indexing such that both the subsystem and
controller namespaces arrays are 1-indexed.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
There is no need to look up the lba size and metadata size in the LBA
Format structure everytime we want to use it. And we use it a lot.
Cache the values in the NvmeNamespace and update them if the namespace
is formatted.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The inline nvme_ns_status() helper only has a single call site. Remove
it from the header file and inline it for real.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
In preparation for moving the nvme device into its own subtree, merge
the header files into one.
Also add missing copyright notice and add list of authors with
substantial contributions.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Get rid of the (reserved) double underscore use. Rename the "generic"
zone open function to nvme_zrm_open_flags() and add a generic `int
flags` argument instead which allows more flags to be easily added in
the future. There is at least one TP under standardization that would
add an additional flag.
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: Thomas Huth <thuth@redhat.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
While QEMU coding style prefers lowercase hexadecimals in constants, the
NVMe subsystem uses the format from the NVMe specifications in comments,
i.e. 'h' suffix instead of '0x' prefix.
Fix this up across the code base.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: updated message; added conversion in a couple of missing comments]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently IO Command Set Profile feature is supported, but the feature
support flag not set. Further, this feature is changable. Fix that.
Additionally, remove filling default value of the CQE result with zero,
since it will fall back to the default case anyway.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
[k.jensen: fix up commit message]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Currently in compare command metadata aio read blk_aio_preadv return
value ignored. Consider it and complete the block accounting.
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
Fixes: 0a384f923f ("hw/block/nvme: add compare command")
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>