If we are exiting due to an error/finish/.... Just don't try to even
touch the channel with one IO operation.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Fill everything with zero, so the padding fields are also initialized.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Use WITH_RCU_READ_LOCK_GUARD to avoid exiting colo_init_ram_cache
without releasing RCU.
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
../migration/ram.c: In function ‘multifd_recv_thread’:
/home/elmarco/src/qq/include/qapi/error.h:165:5: error: ‘block’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
165 | error_setg_internal((errp), __FILE__, __LINE__, __func__, \
| ^~~~~~~~~~~~~~~~~~~
../migration/ram.c:818:15: note: ‘block’ was declared here
818 | RAMBlock *block;
| ^~~~~
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During migration, a tmp page is allocated so that we could place a whole
host page during postcopy.
Currently the page is allocated during load stage, this is a little bit
late. And more important, if we failed to allocate it, the error is not
checked properly. Even it is NULL, we would still use it.
This patch moves the allocation to setup stage and if failed error
message would be printed and caller would notice it.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit f3f491fcd6 ('Postcopy: Maintain unsentmap') introduced
unsentmap to track not yet sent pages.
This is not necessary since:
* unsentmap is a sub-set of bmap before postcopy start
* unsentmap is the summation of bmap and unsentmap after canonicalizing
This patch just removes it.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190819061843.28642-3-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
All pages, either partially sent or partially dirty, will be discarded in
postcopy_send_discard_bm_ram(), since we update the unsentmap to be
unsentmap = unsentmap | dirty in ram_postcopy_send_discard_bitmap().
This is not necessary to do discard when canonicalizing bitmap. And by
doing so, we separate the page discard into two individual steps:
* canonicalize bitmap
* discard page
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190819061843.28642-2-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When encounter error, multifd_send_thread should always notify who pay
attention to it before exit. Otherwise it may block migration_thread
at multifd_send_sync_main forever.
Error as follow:
-------------------------------------------------------------------------------
(gdb) bt
#0 0x00007f4d669dfa0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
#1 0x00007f4d669dfa9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007f4d669dfb3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3 0x0000562ccf0a5614 in qemu_sem_wait (sem=sem@entry=0x562cd1b698e8) at util/qemu-thread-posix.c:319
#4 0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099
#5 0x0000562ccecb95f4 in ram_save_iterate (f=0x562cd0ecc000, opaque=<optimized out>) at /qemu/migration/ram.c:3550
#6 0x0000562ccef43c23 in qemu_savevm_state_iterate (f=0x562cd0ecc000, postcopy=false) at migration/savevm.c:1189
#7 0x0000562ccef3dcf3 in migration_iteration_run (s=0x562cd09fabf0) at migration/migration.c:3131
#8 migration_thread (opaque=opaque@entry=0x562cd09fabf0) at migration/migration.c:3258
#9 0x0000562ccf0a4c26 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502
#10 0x00007f4d669d9e25 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f4d6670635d in clone () from /lib64/libc.so.6
(gdb) f 4
#4 0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099
1099 qemu_sem_wait(&p->sem_sync);
(gdb) list
1094 }
1095 for (i = 0; i < migrate_multifd_channels(); i++) {
1096 MultiFDSendParams *p = &multifd_send_state->params[i];
1097
1098 trace_multifd_send_sync_main_wait(p->id);
1099 qemu_sem_wait(&p->sem_sync);
1100 }
1101 trace_multifd_send_sync_main(multifd_send_state->packet_num);
1102 }
1103
(gdb) p i
$1 = 0
(gdb) p multifd_send_state->params[0].pending_job
$2 = 2 //It means the job before MULTIFD_FLAG_SYNC has already fail
(gdb) p multifd_send_state->params[0].quit
$3 = true
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1567044996-2362-1-git-send-email-ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
There is a race between TCG and accesses to the dirty log:
vCPU thread reader thread
----------------------- -----------------------
TLB check -> slow path
notdirty_mem_write
write to RAM
set dirty flag
clear dirty flag
TLB check -> fast path
read memory
write to RAM
Fortunately, in order to fix it, no change is required to the
vCPU thread. However, the reader thread must delay the read after
the vCPU thread has finished the write. This can be approximated
conservatively by run_on_cpu, which waits for the end of the current
translation block.
A similar technique is used by KVM, which has to do a synchronous TLB
flush after doing a test-and-clear of the dirty-page flags.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The purpose of the calculation is to find a HostPage which is partially
dirty.
* fixup_start_addr points to the start of the HostPage to discard
* run_start points to the next HostPage to check
While in the middle stage, there would two cases for run_start:
* aligned with HostPage means this is not partially dirty
* not aligned means this is partially dirty
When it is aligned, no work and calculation is necessary. run_start
already points to the start of next HostPage and is ready to continue.
When it is not aligned, the calculation could be simplified with:
* fixup_start_addr = QEMU_ALIGN_DOWN(run_start, host_ratio)
* run_start = QEMU_ALIGN_UP(run_start, host_ratio)
By doing so, run_start always points to the next HostPage to check.
fixup_start_addr always points to the HostPage to discard.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190806004648.8659-2-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
In postcopy-ram.c, we provide three functions to discard certain
RAMBlock range:
* postcopy_discard_send_init()
* postcopy_discard_send_range()
* postcopy_discard_send_finish()
Currently, we allocate/deallocate PostcopyDiscardState for each RAMBlock
on sending discard information to destination. This is not necessary and
the same data area could be reused for each RAMBlock.
This patch defines PostcopyDiscardState a static variable. By doing so:
1) avoid memory allocation and deallocation to the system
2) avoid potential failure of memory allocation
3) hide some details for their users
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Message-Id: <20190724010721.2146-1-richardw.yang@linux.intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When migrate_cancel a multifd migration, if run sequence like this:
[source] [destination]
multifd_send_sync_main[finish]
multifd_recv_thread wait &p->sem_sync
shutdown to_dst_file
detect error from_src_file
send RAM_SAVE_FLAG_EOS[fail] [no chance to run multifd_recv_sync_main]
multifd_load_cleanup
join multifd receive thread forever
will lead destination qemu hung at following stack:
pthread_join
qemu_thread_join
multifd_load_cleanup
process_incoming_migration_co
coroutine_trampoline
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
When we 'migrate_cancel' a multifd migration, live_migration thread may
hung forever at some points, because of multifd_send_thread has already
exit for socket error:
1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state->
channels_ready)
2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state->
sem_sync)
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
Remove spurious not needed bits
When we 'migrate_cancel' a multifd migration, live_migration thread may
go into endless loop in multifd_send_pages functions.
Reproduce steps:
(qemu) migrate_set_capability multifd on
(qemu) migrate -d url
(qemu) [wait a while]
(qemu) migrate_cancel
Then may get live_migration 100% cpu usage in following stack:
pthread_mutex_lock
qemu_mutex_lock_impl
multifd_send_pages
multifd_queue_page
ram_save_multifd_page
ram_save_target_page
ram_save_host_page
ram_find_and_save_block
ram_find_and_save_block
ram_save_iterate
qemu_savevm_state_iterate
migration_iteration_run
migration_thread
qemu_thread_start
start_thread
clone
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reproduce the problem:
migrate
migrate_cancel
migrate
Error happen for memory migration
The reason as follows:
1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
1 by a series of cpu_physical_memory_set_dirty_range
2. migration start:ram_init_bitmaps
- memory_global_dirty_log_start: begin log diry
- memory_global_dirty_log_sync: sync dirty bitmap to
ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
- migration_bitmap_sync_range: sync ram_list.
dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
3. migration data...
4. migrate_cancel, will stop log dirty
5. migration start:ram_init_bitmaps
- memory_global_dirty_log_start: begin log diry
- memory_global_dirty_log_sync: sync dirty bitmap to
ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
- migration_bitmap_sync_range: sync ram_list.
dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
Here RAMBlock.bmap only have new logged dirty pages, don't contain
the whole guest pages.
Signed-off-by: Ivan Ren <ivanren@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <1563115879-2715-1-git-send-email-ivanren@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
By removing the share ram check, qemu is able to migrate
to private destination ram when x-ignore-shared capability
is on. Then we can create multiple destination VMs based
on the same source VM.
This changes the x-ignore-shared migration capability to
work similar to Lai's original bypass-shared-memory
work(https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00003.html)
which enables kata containers (https://katacontainers.io)
to implement the VM templating feature.
An example usage in kata containers(https://katacontainers.io):
1. Start the source VM:
qemu-system-x86 -m 2G \
-object memory-backend-file,id=mem0,size=2G,share=on,mem-path=/tmpfs/template-memory \
-numa node,memdev=mem0
2. Stop the template VM, set migration x-ignore-shared capability,
migrate "exec:cat>/tmpfs/state", quit it
3. Start target VM:
qemu-system-x86 -m 2G \
-object memory-backend-file,id=mem0,size=2G,share=off,mem-path=/tmpfs/template-memory \
-numa node,memdev=mem0 \
-incoming defer
4. connect to target VM qmp, set migration x-ignore-shared capability,
migrate_incoming "exec:cat /tmpfs/state"
5. create more target VMs repeating 3 and 4
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Yury Kotov <yury-kotov@yandex-team.ru>
Cc: Jiangshan Lai <laijs@hyper.sh>
Cc: Xu Wang <xu@hyper.sh>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1560494113-1141-1-git-send-email-tao.peng@linux.alibaba.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Currently we are doing log_clear() right after log_sync() which mostly
keeps the old behavior when log_clear() was still part of log_sync().
This patch tries to further optimize the migration log_clear() code
path to split huge log_clear()s into smaller chunks.
We do this by spliting the whole guest memory region into memory
chunks, whose size is decided by MigrationState.clear_bitmap_shift (an
example will be given below). With that, we don't do the dirty bitmap
clear operation on the remote node (e.g., KVM) when we fetch the dirty
bitmap, instead we explicitly clear the dirty bitmap for the memory
chunk for each of the first time we send a page in that chunk.
Here comes an example.
Assuming the guest has 64G memory, then before this patch the KVM
ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory.
If after the patch, let's assume when the clear bitmap shift is 18,
then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB. Then
instead of sending a big 64G ioctl, we'll send 64 small ioctls, each
of the ioctl will cover 1G of the guest memory. For each of the 64
small ioctls, we'll only send if any of the page in that small chunk
was going to be sent right away.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190603065056.25211-12-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
cpu_physical_memory_sync_dirty_bitmap() has one RAMBlock* as
parameter, which means that it must be with RCU read lock held
already. Taking it again inside seems redundant. Removing it.
Instead comment on the functions about the RCU read lock.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190603065056.25211-2-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
On receiving RAM_SAVE_FLAG_EOS, multifd_recv_sync_main() is called to
synchronize receive threads. Current synchronization mechanism is to wait
for each channel's sem_sync semaphore. This semaphore is triggered by a
packet with MULTIFD_FLAG_SYNC flag. While in current implementation, we
don't do multifd_send_sync_main() to send such packet when
blk_mig_bulk_active() is true.
This will leads to the receive threads won't notify
multifd_recv_sync_main() by sem_sync. And multifd_recv_sync_main() will
always wait there.
[Note]: normal migration test works, while didn't test the
blk_mig_bulk_active() case. Since not sure how to produce this
situation.
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20190612014337.11255-1-richardw.yang@linux.intel.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Trivial fixes 06/06/2019
# gpg: Signature made Thu 06 Jun 2019 12:05:50 BST
# gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg: issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C
* remotes/vivier2/tags/trivial-branch-pull-request:
hw/watchdog/wdt_i6300esb: Use DEVICE() macro to access DeviceState.qdev
hw/scsi: Use the QOM BUS() macro to access BusState.qbus
hw/sd: Use the QOM BUS() macro to access BusState.qbus
hw/audio/ac97: Use the QOM DEVICE() macro to access DeviceState.qdev
hw/vfio/pci: Use the QOM DEVICE() macro to access DeviceState.qdev
hw/usb-storage: Use the QOM DEVICE() macro to access DeviceState.qdev
hw/isa: Use the QOM DEVICE() macro to access DeviceState.qdev
hw/s390x/event-facility: Use the QOM BUS() macro to access BusState.qbus
hw/pci-bridge: Use the QOM BUS() macro to access BusState.qbus
hw/scsi/vmw_pvscsi: Use qbus_reset_all() directly
docs/devel/build-system: Update an example
test: Fix make target check-report.tap
util: Adjust qemu_guest_getrandom_nofail for Coverity
vhost: fix incorrect print type
migration: fix a typo
hw/rdma: Delete unused headers inclusion
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>