There are 27 pre-copy live migration scenarios being tested. In all of
these we force non-convergence and run for one iteration, then let it
converge and wait for completion during the second (or following)
iterations. At 3 mbps bandwidth limit the first iteration takes a very
long time (~30 seconds).
While it is important to test the migration passes and convergence
logic, it is overkill to do this for all 27 pre-copy scenarios. The
TLS migration scenarios in particular are merely exercising different
code paths during connection establishment.
To optimize time taken, switch most of the test scenarios to run
non-live (ie guest CPUs paused) with no bandwidth limits. This gives
a massive speed up for most of the test scenarios.
For test coverage the following scenarios are unchanged
* Precopy with UNIX sockets
* Precopy with UNIX sockets and dirty ring tracking
* Precopy with XBZRLE
* Precopy with UNIX compress
* Precopy with UNIX compress (nowait)
* Precopy with multifd
On a test machine this reduces execution time from 13 minutes to
8 minutes.
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230601161347.1803440-10-berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Change the migration test to use the new qtest event callback to watch
for the stop event. This ensures that we only watch for the STOP event
on the source QEMU. The previous code would set the single 'got_stop'
flag when either source or dest QEMU got the STOP event.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230601161347.1803440-6-berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This function duplicates logic of qtest_qmp_assert_success_ref.
The qtest_qmp_assert_success_ref method has better diagnostics
on failure because it prints the entire QMP response, instead
of just asserting on existance of the 'error' key.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230601161347.1803440-4-berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
The qmp_discard_response method simply ignores the result of the QMP
command, merely unref'ing the object. This is a bad idea for tests
as it leaves no trace if the QMP command unexpectedly failed. The
qtest_qmp_assert_success method will validate that the QMP command
returned without error, and if errors occur, it will print a message
on the console aiding debugging.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230421171411.566300-2-berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Add postcopy tests with compress enabled to ensure nothing breaks
with the refactoring in the next commits.
preempt+compress is blocked, so no test needed for that case.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
There has never been tests for migration with compress enabled.
Add suitable tests, testing with compress-wait-thread = false
too.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It is possible to have a build with both TCG and KVM disabled due to
Xen requiring the i386 and x86_64 binaries to be present in an aarch64
host.
If we build with --disable-tcg on the aarch64 host, we will end-up
with a QEMU binary (x86) that does not support TCG nor KVM.
Skip tests that crash or hang in the above scenario. Do not include
any test cases if TCG and KVM are missing.
Make sure that calls to qtest_has_accel are placed after g_test_init
in similar fashion to commit ae4b01b349 ("tests: Ensure TAP version is
printed before other messages") to avoid TAP parsing errors.
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230426180013.14814-9-farosas@suse.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The TAP protocol version line must be the first thing printed on
stdout. The migration test failed that requirement in certain
scenarios:
# Skipping test: Userfault not available (builtdtime)
TAP version 13
# random seed: R02Sc120c807f11053eb90bfea845ba1e368
1..32
# Start of x86_64 tests
# Start of migration tests
....
The TAP version is printed by g_test_init(), so we need to make
sure that any methods which print are run after that.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230317170553.592707-1-berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Thomas found an autoconverge test failure where the
migration completed before the autoconverge had kicked in.
To try and avoid this again:
a) Reduce the usleep in test_migrate_auto_converge
so that it should exit quicker when autoconverge kicks in
b) Make the loop exit immediately rather than have the sleep
when it does start autoconverge, otherwise the autoconverge
might succeed during the sleep.
c) Reduce inc_pct so auto converge happens more slowly
d) Reduce the max-bandwidth in migrate_ensure_non_converge
to make the ensure more ensure.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20230306152612.52291-1-dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
migration-test has been flaky for a long time, both in CI and
otherwise:
https://gitlab.com/qemu-project/qemu/-/jobs/3806090216
(a FreeBSD job)
32/648 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
on a local macos x86 box:
▶ 34/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed")) ERROR
34/621 qemu:qtest+qtest-i386 / qtest-i386/migration-test ERROR 168.12s killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-i386: Failed to peek at channel
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed"))
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
▶ 37/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed")) ERROR
37/621 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 174.37s killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed"))
(test program exited with status code -6)
In the cases where I've looked at the underlying log, this seems to
be in the migration/multifd/tcp/plain/cancel subtest. Disable that
specific subtest by default until somebody can track down the
underlying cause. Enthusiasts can opt back in by setting
QEMU_TEST_FLAKY_TESTS=1 in their environment.
We might need to disable more parts of this test if this isn't
sufficient to fix the flakiness.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-id: 20230302172211.4146376-1-peter.maydell@linaro.org
When running the migration test compiled with Clang from Fedora 37
and sanitizers enabled, there is an error complaining about unlink():
../tests/qtest/migration-test.c:1072:12: runtime error: null pointer
passed as argument 1, which is declared to never be null
/usr/include/unistd.h:858:48: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../tests/qtest/migration-test.c:1072:12 in
(test program exited with status code 1)
TAP parsing error: Too few tests run (expected 33, got 20)
The data->clientcert and data->clientkey pointers can indeed be unset
in some tests, so we have to check them before calling unlink() with
those.
While we're at it, I also noticed that the code is only freeing
some but not all of the allocated strings in this function, and
indeed, valgrind is also complaining about memory leaks here.
So let's call g_free() on all allocated strings to avoid leaking
memory here.
Message-Id: <20221125083054.117504-1-thuth@redhat.com>
Tested-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When tmpfs is NULL, a build warning is seen with GCC 9.3.0.
It's strange that GCC 11.2.0 on Ubuntu 22.04 does not catch this,
neither did the QEMU CI.
While we are here, improve the error message as well.
Reported-by: Shengjiang Wu <shengjiang.wu@windriver.com>
Fixes: e5553c1b8d ("tests/qtest: migration-test: Avoid using hardcoded /tmp")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221017132023.2228641-1-bmeng.cn@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Some migration test cases use TLS to communicate, but they fail on
Windows with the following error messages:
qemu-system-x86_64: TLS handshake failed: Insufficient credentials for that request.
qemu-system-x86_64: TLS handshake failed: Error in the pull function.
query-migrate shows failed migration: TLS handshake failed: Error in the pull function.
Disable them temporarily.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20220925113032.1949844-51-bmeng.cn@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Waiting for the serial output can take a couple of seconds - and since
we're doing a lot of migration tests, this time easily sums up to
multiple minutes. But if a test is supposed to fail, it does not make
much sense to wait for the source to be in the right state first, so
we can skip the waiting here. This way we can speed up all tests where
the migration is supposed to fail. In the gitlab-CI gprov-gcov test,
each of the migration-tests now run two minutes faster!
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220819053802.296584-2-thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220822165608.2980552-3-alex.bennee@linaro.org>
kvm_dirty_ring_supported() only checks whether the dirty ring support
is available on the x86 host, but it ignores whether the target QEMU
architecture is x86 or not. Thus the test_vcpu_dirty_limit() test
currently fails with the assert((strcmp(arch, "x86_64") == 0)) statement
in dirtylimit_start_vm() if the users run e.g. "make check-qtest-aarch64"
on their x86 host. Fix it by only executing the tests when we're running
with a x86_64 target QEMU binary with KVM.
Message-Id: <20220801114644.208197-1-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Thomas reported that auto-converge test will timeout on MacOS CI gatings.
Use the migrate_ensure_converge() helper too in the auto-converge as when
Daniel reworked the other test cases.
Since both max_bandwidth / downtime_limit will not be used for converge
calculations, make it simple by removing the remaining check, then we can
completely remove both variables altogether, since migrate_ensure_converge
is used the remaining check won't make much sense anyway.
Reported-by: Thomas Huth <thuth@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220728133516.92061-2-peterx@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
We just added TLS tests for precopy but not postcopy. Add the
corresponding test for vanilla postcopy.
Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185525.27692-1-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge
The different migration test cases are using a variety of settings
to ensure convergance/non-convergance. Introduce two helpers to
extra the common functionality and ensure consistency.
* Non-convergance: 1ms downtime, 30mbs bandwidth
* Convergance: 30s downtime, 1gbs bandwidth
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220628105434.295905-5-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
While 1 second might be enough to converge migration on a fast host,
this is not guaranteed, especially if using TLS in the tests without
hardware accelerated crypto available.
Increasing the downtime to 30 seconds should guarantee it can converge
in any sane scenario.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220628105434.295905-4-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When moving into the convergance phase, the precopy tests will first
look for a STOP event and once found will look for migration completion
status. If the test VM is not converging, the test suite will be waiting
for the STOP event forever. If we wait for the migration completion
status first, then we will trigger the previously added timeout and
prevent the test hanging forever.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220628105434.295905-3-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Various methods in the migration test call 'query_migrate' to fetch the
current status and then access a particular field. Almost all of these
cases expect the migration to be in a non-failed state. In the case of
'wait_for_migration_pass' in particular, if the status is 'failed' then
it will get into an infinite loop. By validating that the status is
not 'failed' the test suite will assert rather than hang when getting
into an unexpected state.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-10-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This validates that we correctly handle multifd migration success
and failure scenarios when using TLS with x509 certificates. There
are quite a few different scenarios that matter in relation to
hostname validation, but we skip a couple as we can assume that
the non-multifd coverage applies to some extent.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-9-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Most of the multifd migration test logic is common with the rest of the
precopy tests, so it can use the helper without difficulty. The only
exception of the multifd cancellation test which tries to run multiple
migrations in a row.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-7-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This validates that we correctly handle migration success and failure
scenarios when using TLS with x509 certificates. There are quite a few
different scenarios that matter in relation to hostname validation.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-5-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge due to ifdef change in 3