The TAP protocol version line must be the first thing printed on
stdout. The migration test failed that requirement in certain
scenarios:
# Skipping test: Userfault not available (builtdtime)
TAP version 13
# random seed: R02Sc120c807f11053eb90bfea845ba1e368
1..32
# Start of x86_64 tests
# Start of migration tests
....
The TAP version is printed by g_test_init(), so we need to make
sure that any methods which print are run after that.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230317170553.592707-1-berrange@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Thomas found an autoconverge test failure where the
migration completed before the autoconverge had kicked in.
To try and avoid this again:
a) Reduce the usleep in test_migrate_auto_converge
so that it should exit quicker when autoconverge kicks in
b) Make the loop exit immediately rather than have the sleep
when it does start autoconverge, otherwise the autoconverge
might succeed during the sleep.
c) Reduce inc_pct so auto converge happens more slowly
d) Reduce the max-bandwidth in migrate_ensure_non_converge
to make the ensure more ensure.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20230306152612.52291-1-dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
migration-test has been flaky for a long time, both in CI and
otherwise:
https://gitlab.com/qemu-project/qemu/-/jobs/3806090216
(a FreeBSD job)
32/648 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
on a local macos x86 box:
▶ 34/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed")) ERROR
34/621 qemu:qtest+qtest-i386 / qtest-i386/migration-test ERROR 168.12s killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-i386: Failed to peek at channel
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed"))
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
▶ 37/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed")) ERROR
37/621 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 174.37s killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed"))
(test program exited with status code -6)
In the cases where I've looked at the underlying log, this seems to
be in the migration/multifd/tcp/plain/cancel subtest. Disable that
specific subtest by default until somebody can track down the
underlying cause. Enthusiasts can opt back in by setting
QEMU_TEST_FLAKY_TESTS=1 in their environment.
We might need to disable more parts of this test if this isn't
sufficient to fix the flakiness.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-id: 20230302172211.4146376-1-peter.maydell@linaro.org
When running the migration test compiled with Clang from Fedora 37
and sanitizers enabled, there is an error complaining about unlink():
../tests/qtest/migration-test.c:1072:12: runtime error: null pointer
passed as argument 1, which is declared to never be null
/usr/include/unistd.h:858:48: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../tests/qtest/migration-test.c:1072:12 in
(test program exited with status code 1)
TAP parsing error: Too few tests run (expected 33, got 20)
The data->clientcert and data->clientkey pointers can indeed be unset
in some tests, so we have to check them before calling unlink() with
those.
While we're at it, I also noticed that the code is only freeing
some but not all of the allocated strings in this function, and
indeed, valgrind is also complaining about memory leaks here.
So let's call g_free() on all allocated strings to avoid leaking
memory here.
Message-Id: <20221125083054.117504-1-thuth@redhat.com>
Tested-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When tmpfs is NULL, a build warning is seen with GCC 9.3.0.
It's strange that GCC 11.2.0 on Ubuntu 22.04 does not catch this,
neither did the QEMU CI.
While we are here, improve the error message as well.
Reported-by: Shengjiang Wu <shengjiang.wu@windriver.com>
Fixes: e5553c1b8d ("tests/qtest: migration-test: Avoid using hardcoded /tmp")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221017132023.2228641-1-bmeng.cn@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Some migration test cases use TLS to communicate, but they fail on
Windows with the following error messages:
qemu-system-x86_64: TLS handshake failed: Insufficient credentials for that request.
qemu-system-x86_64: TLS handshake failed: Error in the pull function.
query-migrate shows failed migration: TLS handshake failed: Error in the pull function.
Disable them temporarily.
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20220925113032.1949844-51-bmeng.cn@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Waiting for the serial output can take a couple of seconds - and since
we're doing a lot of migration tests, this time easily sums up to
multiple minutes. But if a test is supposed to fail, it does not make
much sense to wait for the source to be in the right state first, so
we can skip the waiting here. This way we can speed up all tests where
the migration is supposed to fail. In the gitlab-CI gprov-gcov test,
each of the migration-tests now run two minutes faster!
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220819053802.296584-2-thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220822165608.2980552-3-alex.bennee@linaro.org>
kvm_dirty_ring_supported() only checks whether the dirty ring support
is available on the x86 host, but it ignores whether the target QEMU
architecture is x86 or not. Thus the test_vcpu_dirty_limit() test
currently fails with the assert((strcmp(arch, "x86_64") == 0)) statement
in dirtylimit_start_vm() if the users run e.g. "make check-qtest-aarch64"
on their x86 host. Fix it by only executing the tests when we're running
with a x86_64 target QEMU binary with KVM.
Message-Id: <20220801114644.208197-1-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Thomas reported that auto-converge test will timeout on MacOS CI gatings.
Use the migrate_ensure_converge() helper too in the auto-converge as when
Daniel reworked the other test cases.
Since both max_bandwidth / downtime_limit will not be used for converge
calculations, make it simple by removing the remaining check, then we can
completely remove both variables altogether, since migrate_ensure_converge
is used the remaining check won't make much sense anyway.
Reported-by: Thomas Huth <thuth@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220728133516.92061-2-peterx@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
We just added TLS tests for precopy but not postcopy. Add the
corresponding test for vanilla postcopy.
Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185525.27692-1-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge
The different migration test cases are using a variety of settings
to ensure convergance/non-convergance. Introduce two helpers to
extra the common functionality and ensure consistency.
* Non-convergance: 1ms downtime, 30mbs bandwidth
* Convergance: 30s downtime, 1gbs bandwidth
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220628105434.295905-5-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
While 1 second might be enough to converge migration on a fast host,
this is not guaranteed, especially if using TLS in the tests without
hardware accelerated crypto available.
Increasing the downtime to 30 seconds should guarantee it can converge
in any sane scenario.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220628105434.295905-4-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
When moving into the convergance phase, the precopy tests will first
look for a STOP event and once found will look for migration completion
status. If the test VM is not converging, the test suite will be waiting
for the STOP event forever. If we wait for the migration completion
status first, then we will trigger the previously added timeout and
prevent the test hanging forever.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220628105434.295905-3-berrange@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Various methods in the migration test call 'query_migrate' to fetch the
current status and then access a particular field. Almost all of these
cases expect the migration to be in a non-failed state. In the case of
'wait_for_migration_pass' in particular, if the status is 'failed' then
it will get into an infinite loop. By validating that the status is
not 'failed' the test suite will assert rather than hang when getting
into an unexpected state.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-10-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This validates that we correctly handle multifd migration success
and failure scenarios when using TLS with x509 certificates. There
are quite a few different scenarios that matter in relation to
hostname validation, but we skip a couple as we can assume that
the non-multifd coverage applies to some extent.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-9-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Most of the multifd migration test logic is common with the rest of the
precopy tests, so it can use the helper without difficulty. The only
exception of the multifd cancellation test which tries to run multiple
migrations in a row.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-7-berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This validates that we correctly handle migration success and failure
scenarios when using TLS with x509 certificates. There are quite a few
different scenarios that matter in relation to hostname validation.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220426160048.812266-5-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge due to ifdef change in 3
Since commit a2ce7dbd91 ("meson: convert tests/qtest to meson"),
libqtest.h is under libqos/ directory, while libqtest.c is still in
qtest/. Move back to its original location to avoid mixing with libqos/.
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
The migration precopy testing helper function always expects the
migration to run to a completion state. There will be test scenarios
for TLS where expect either the client or server to fail the migration.
This expands the helper to cope with these scenarios.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220310171821.3724080-12-berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
There are alot of different scenarios to test with migration due to the
wide number of parameters and capabilities available. To enable sharing
of the basic precopy test scenario, we need to be able to set arbitrary
parameters and capabilities before the migration is initiated, but don't
want to have all this logic in the common helper function. Solve this
by defining two hooks that can be provided by the test case, one before
migration starts and one after migration finishes.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220310171821.3724080-10-berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
test_migrate_start() will release the MigrateStart structure that passed
in, however that's not super clear to the caller because after the call
returned the pointer can still be referenced by the callers. It can easily
be a source of use-after-free.
Let's pass in a double pointer of that, then we can safely clear the
pointer for the caller after the struct is released.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220301083925.33483-26-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Fixup apply since I didn't take 24/25
Even if <linux/kvm.h> seems to exist for all archs on linux, however including
it with __linux__ defined seems to be not working yet as it'll try to include
asm/kvm.h and that can be missing for archs that do not support kvm.
To fix this (instead of any attempt to fix linux headers..), we can mark the
header to be x86_64 only, because it's so far only service for adding the kvm
dirty ring test.
Fixes: 1f546b709d ("tests: migration-test: Add dirty ring test")
Reported-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210728214128.206198-1-peterx@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
OpenBSD doesn't like :0 as an address, switch to using 127.0.0.1
in baddest; it's really testing the :0 port number that isn't allowed
on anything.
(The test doesn't currently run anyway because of the userfault
problem that Peter noticed, but this gets us closer to being able to
reenable it)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210719185217.122105-1-dgilbert@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daude <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>