Spice hooks the migration status changes to figure out when to
transmit information to the new spice server; but the migration
status in postcopy doesn't quite fit - the destination starts
running before the end of the source migration.
It's not a case of hanging off the migration status change to
postcopy-active either, since that happens before we stop the
guest CPU.
Fix it by sending a notify just after sending the device state,
and adding a flag that can be tested by the notify receiver.
Symptom:
spice handover doesn't work with the error:
red_worker.c:11540:display_channel_wait_for_migrate_data: timeout
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-id: 1456161452-25318-1-git-send-email-dgilbert@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When using 'migrate -b', we must make sure to take ownership of the
image before writing to it. Otherwise metadata would be thrown away on
migration completion; this was caught by the assertions introduced in
commit 09e0c771.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Within the if statement, time_spent is assured to be non-zero.
This patch just removes the check on time_spent when calculating mbs.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Although accesses to ram_list.dirty_memory[] use atomics so multiple
threads can safely dirty the bitmap, the data structure is not fully
thread-safe yet.
This patch handles the RAM hotplug case where ram_list.dirty_memory[] is
grown. ram_list.dirty_memory[] is change from a regular bitmap to an
RCU array of pointers to fixed-size bitmap blocks. Threads can continue
accessing bitmap blocks while the array is being extended. See the
comments in the code for an in-depth explanation of struct
DirtyMemoryBlocks.
I have tested that live migration with virtio-blk dataplane works.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <1453728801-5398-2-git-send-email-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace the uint8 softfloat-specific typedef with uint8_t.
This change was made with
find include hw fpu target-* -name '*.[ch]' | xargs sed -i -e 's/\buint8\b/uint8_t/g'
together with manual removal of the typedef definition and
manual fixing of more erroneous uses found via test compilation.
It turns out that the only code using this type is an accidental
use where uint8_t was intended anyway...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Acked-by: James Hogan <james.hogan@imgtec.com>
Message-id: 1452603315-27030-7-git-send-email-peter.maydell@linaro.org
So far, live migration with shared storage meant that the image is in a
not-really-ready don't-touch-me state on the destination while the
source is still actively using it, but after completing the migration,
the image was fully opened on both sides. This is bad.
This patch adds a block driver callback to inactivate images on the
source before completing the migration. Inactivation means that it goes
to a state as if it was just live migrated to the qemu instance on the
source (i.e. BDRV_O_INACTIVE is set). You're then supposed to continue
either on the source or on the destination, which takes ownership of the
image.
A typical migration looks like this now with respect to disk images:
1. Destination qemu is started, the image is opened with
BDRV_O_INACTIVE. The image is fully opened on the source.
2. Migration is about to complete. The source flushes the image and
inactivates it. Now both sides have the image opened with
BDRV_O_INACTIVE and are expecting the other side to still modify it.
3. One side (the destination on success) continues and calls
bdrv_invalidate_all() in order to take ownership of the image again.
This removes BDRV_O_INACTIVE on the resuming side; the flag remains
set on the other side.
This ensures that the same image isn't written to by both instances
(unless both are resumed, but then you get what you deserve). This is
important because .bdrv_close for non-BDRV_O_INACTIVE images could write
to the image file, which is definitely forbidden while another host is
using the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
This allows to send a partial array where the size is another
structure field multiplied by a constant.
Signed-off-by: Juan Quintela <quintela@redhat.com>
[PMM: updated to current master]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Add vmstate support for migrating arrays of CPU_DoubleU via
VMSTATE_CPUDOUBLE_ARRAY.
Signed-off-by: Juan Quintela <quintela@redhat.com>
[PMM: rebased, since files have all moved since 2012;
added VMSTATE_CPUDOUBLE_ARRAY_V for consistency with FLOAT64]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Error reporting patches for 2016-01-13
# gpg: Signature made Wed 13 Jan 2016 14:21:48 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
* remotes/armbru/tags/pull-error-2016-01-13: (41 commits)
checkpatch: Detect newlines in error_report and other error functions
error: Consistently name Error * objects err, and not errp
s390/sclp: Simplify control flow in sclp_realize()
hw/s390x: Rename local variables Error *l_err to just err
error: Clean up errors with embedded newlines (again)
vhdx: Fix "log that needs to be replayed" error message
pci-assign: Clean up "Failed to assign" error messages
vmdk: Clean up "Invalid extent lines" error message
vmdk: Clean up control flow in vmdk_parse_extents() a bit
error: Strip trailing '\n' from error string arguments (again)
qemu-io qemu-nbd: Use error_report() etc. instead of fprintf()
migration: Use error_reportf_err() instead of monitor_printf()
spapr: Use error_reportf_err()
error: Use error_prepend() where it makes obvious sense
error: Use error_reportf_err() where it makes obvious sense
error: Don't decorate original error message when adding to it
error: New error_prepend(), error_reportf_err()
test-throttle: Simplify qemu_init_main_loop() error handling
qemu-nbd: Clean up "Failed to load snapshot" error message
block: Clean up "Could not create temporary overlay" error message
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Both error_reportf_err() and monitor_printf() print to the same
destination when monitor_printf() is used correctly, i.e. within an
HMP monitor. Elsewhere, monitor_printf() does nothing, while
error_reportf_err() reports to stderr.
Both changed functions are HMP command handlers. These should only
run within an HMP monitor.
Unlike monitor_printf(), error_reportf_err() uses the error whole
instead of just its message obtained with error_get_pretty(). This
avoids suppressing its hint (see commit 50b7b00), but I don't think
the errors touched in this commit can come with hints.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-15-git-send-email-armbru@redhat.com>
Both error_report_err() and monitor_printf() print to the same
destination when monitor_printf() is used correctly, i.e. within an
HMP monitor. Elsewhere, monitor_printf() does nothing, while
error_report_err() reports to stderr.
Most changed functions are HMP command handlers. These should only
run within an HMP monitor. The one exception is bdrv_password_cb(),
which should also only run within an HMP monitor.
Four command handlers prefix the error message with the command name:
balloon, migrate_set_capability, migrate_set_parameter, migrate.
Pointless, drop.
Unlike monitor_printf(), error_report_err() uses the error whole
instead of just its message obtained with error_get_pretty(). This
avoids suppressing its hint (see commit 50b7b00). Example:
(qemu) device_add ivshmem,id=666
Parameter 'id' expects an identifier
Identifiers consist of letters, digits, '-', '.', '_', starting with a letter.
Try "help device_add" for more information
The "Identifiers consist of..." line is new with this patch.
Coccinelle semantic patch:
@@
expression M, E;
@@
- monitor_printf(M, "%s\n", error_get_pretty(E));
- error_free(E);
+ error_report_err(E);
@r1@
expression M, E;
format F;
position p;
@@
- monitor_printf(M, "...%@F@\n", error_get_pretty(E));@p
- error_free(E);
+ error_report_err(E);
@script:python@
p << r1.p;
@@
print "%s:%s:%s: prefix dropped" % (p[0].file, p[0].line, p[0].column)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1450452927-8346-4-git-send-email-armbru@redhat.com>
Now that we guarantee the user doesn't have any enum values
beginning with a single underscore, we can use that for our
own purposes. Renaming ENUM_MAX to ENUM__MAX makes it obvious
that the sentinel is generated.
This patch was mostly generated by applying a temporary patch:
|diff --git a/scripts/qapi.py b/scripts/qapi.py
|index e6d014b..b862ec9 100644
|--- a/scripts/qapi.py
|+++ b/scripts/qapi.py
|@@ -1570,6 +1570,7 @@ const char *const %(c_name)s_lookup[] = {
| max_index = c_enum_const(name, 'MAX', prefix)
| ret += mcgen('''
| [%(max_index)s] = NULL,
|+// %(max_index)s
| };
| ''',
| max_index=max_index)
then running:
$ cat qapi-{types,event}.c tests/test-qapi-types.c |
sed -n 's,^// \(.*\)MAX,s|\1MAX|\1_MAX|g,p' > list
$ git grep -l _MAX | xargs sed -i -f list
The only things not generated are the changes in scripts/qapi.py.
Rejecting enum members named 'MAX' is now useless, and will be dropped
in the next patch.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447836791-369-23-git-send-email-eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
[Rebased to current master, commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
My fix (84e7b80a) replaced the last_sent_block update that I'd
removed earlier; however it was too aggressive in the xbzrle case.
save_xbzrle_page might return '0' to mean that the page didn't
need sending since it was the same as the last sent version;
in this case we can't update 'last_sent_block' since we didn't
actually send it.
Symptom: 'Illegal RAM offset 1018000' as we try and send a page
to the wrong RAMBlock; potentially that could be a data
corruption if you were really unlucky.
Fixes: 84e7b80a05
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 1449765106-6528-1-git-send-email-dgilbert@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Dividing integer expressions transferred_bytes and time_spent, and then converting
the integer quotient to type double. Any remainder, or fractional part of the
quotient, is ignored. Fix this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
socket_writev_buffer() writes in a loop, using g_poll() to block. If
g_poll() fails, it tries to write more before the file descriptor is
ready. In theory, this could go into a tight loop. In practice,
errors other than EINTR are really unlikely, and when they happen,
we're probably screwed anyway, so we can just as well loop.
Clean it up a bit: retry poll on EINTR, keep ignoring other errors.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If we set migration speed in a very large value, block-migration will try to read
all data to the memory. Because
(block_mig_state.submitted + block_mig_state.read_done) * BLOCK_SIZE
will be overflow, and it will be always less than rate limit.
There is no need to read too many data into memory when the rate limit is very large.
So limit the memory usage can fix the overflow problem.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
madvise() returns EINVAL in the case of many failures, but also
returns it in cases where the host kernel doesn't have THP enabled.
Postcopy only really cares that THP is off before it detects faults,
and turns it back on afterwards; so we're going to have
to assume that if the madvise fails then the host just doesn't do
THP and we can carry on with the postcopy.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
to check that snapshot is available for all loaded block drivers.
The check bs != bs1 in hmp_info_snapshots is an optimization. The check
for availability of this snapshot will return always true as the list
of snapshots was collected from that image.
The patch also ensures proper locking.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The check is unneccesary, we read the value at the start of the
thread, use it, and never change it. The value is checked to be
non-NULL before thread creation.
Spotted by coverity, CID 1339211
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
I set current_time before the postcopy test but never use it;
(I think this was from the original version where it was time based).
Spotted by coverity, CID 1339208
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In a82d593b61 I accidentally removed the setting of
last_sent_block, put it back.
Symptoms:
Multithreaded compression only uses one thread.
Migration is a bit less efficient since it won't use 'cont' flags.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: a82d593b61
Signed-off-by: Juan Quintela <quintela@redhat.com>
Peter reported a lock error on MacOS after my a82d593b
patch.
migrate_get_current does one-time initialisation of
a bunch of variables.
migrate_init does reinitialisation even on a 2nd
migrate after a cancel.
The problem here was that I'd initialised the mutex
in migrate_get_current, and the memset in migrate_init
corrupted it.
Remove the memset and replace it by explicit initialisation
of fields that need initialising; this also turns out to be simpler
than the old code that had to preserve some fields.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: a82d593b
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Improve the text in both the qapi-schema and hmp help to point out
you need to set the postcopy-ram capability prior to issuing
migrate-start-postcopy.
Also fix the text of the migrate_start_postcopy error that
deals with capabilities.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Where the target page size is different from the host page
we special case it, but I messed up on the zero case check.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Where we have iterable, but non-postcopiable devices (e.g. htab
or block migration), complete them before forming the 'package'
but with the CPUs stopped. This stops them filling up the package.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Rest of the file already use that trick. 64bit offsets make no sense in
32bit archs, but that is ram_addr_t for you.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>