Virt Bug   

1. BUG 介绍

1.1 使用二分法定位

一般能够定位到 bug 是在某个准确的点引入的,那么修复这个 bug 一般都会比较容易了。对 KVM和 QEMU 开源社区来说,它们的源代码仓库都是使用 git 工具来管理的。而git 提供了便捷的命令工具 “git bisect” 来支持通过二分法来查找引入 bug 的代码修改。

# git bisect start
# git bisect bad
# git bisect good 8b19d450ad18
到这里 git 已经将当前版本切换到了中间的一个版本。

经过编译、测试后,告诉 Git 当前版本是 good/bad,循环就能找到 bug
# git bisect good/bad

在 进程查找过程中,可能遇到切换到中间某个版本不能编译或有其他 bug 存在而不能编译或有其他 bug 存在而不能对查找中的 bug  进行验证,那么可以使用 git bisect skip 命令跳过当前版本。
对于已知某个目录的代码引入了这个 bug,在执行二分法查找时,可以指定仅对某个目录来做,如
git bisect start -- arch/x86/kvm/ 

1.2 kvm test framework

kvm autotest

http://avocado-framework.readthedocs.io/en/45.0/GetStartedGuide.html#writing-a-simple-test

2. QEMU 的 bug 管理系统

https://bugs.launchpad.net/qemu  

bug例子:

https://bugs.launchpad.net/qemu/+bug/1013467

3. Xen Debug

3.1 查看日志

cat /var/log/xen/*.log

3.2 查看 debug 信息

xl -vvv 

3.3 查看 dmesg 信息

xl dmesg

3.4 查看 qemu 信息

cat  /var/log/xen/qemu-dm-vm2.log
qemu-system-i386: -drive file=pmu_xen.qcow2,if=ide,index=0,media=disk,format=qcow2,cache=writeback: Could not open backing file: Could not open '/share/xvs/img/linux/ia32e_rhel7u2_default.img': No such file or directory

4. KVM BUG

4.1 Search BUG

查询KVM bug https://bugzilla.kernel.org/buglist.cgi?quicksearch=KVM

KVM的bug 管理系统在 https://bugzilla.kernel.org/ 在提交bug 时需要选择“虚拟化”,然后在 bug的具体描述中选择 KVM 这个组件为对象来提交 bug.

bug 例子: https://bugzilla.kernel.org/show_bug.cgi?id=43328

4.2 Submit BUG

4.3 Examples

[Bug 155841] Kernel panic with high network tarffic to KVM guest

https://bugzilla.kernel.org/show_bug.cgi?id=155841
https://github.com/ffnord/ffnord-puppet-gateway
Our KVM host crashes randomly when there is high network traffic to the guest. For the host crashlog see the attached screenshot.
We are using this guest as a Freifunk (freifunk.net) gateway. We suspect that this occurs when to many nodes are connecting to the gateway.

At startup of the host error messages like the following are shown:
[   45.769454] kvm: zapping shadow pages for mmio generation wraparound
[   52.593232] kvm [1807]: vcpu0 unhandled rdmsr: 0x606
[   76.593085] kvm [1807]: vcpu0 unhandled rdmsr: 0x611
[   76.593138] kvm [1807]: vcpu0 unhandled rdmsr: 0x639
[   76.593183] kvm [1807]: vcpu0 unhandled rdmsr: 0x641
[   76.593227] kvm [1807]: vcpu0 unhandled rdmsr: 0x619
[   76.607495] kvm [1807]: vcpu0 unhandled rdmsr: 0x611
[   76.607559] kvm [1807]: vcpu0 unhandled rdmsr: 0x639
[   76.607609] kvm [1807]: vcpu0 unhandled rdmsr: 0x641
[   76.607658] kvm [1807]: vcpu0 unhandled rdmsr: 0x619

This happening under high network load is only our suspicion.

Can we increase vring size over 1024

> Subject: Re: Can we increase vring size over 1024?
>
> On Fri, Sep 02, 2016 at 06:55:35AM +0000, Gonglei (Arei) wrote:
> > Hi Michael & all,
> >
> > Michael, you made a presentation about the virto 1.1's new features
> > in KVM
> Forum last week.
> > That's wonderful!
> >
> > And I'd like to know can we increase vring size over 1024, such as
> > 4096 or
> 8192?
> >
> > My colleage had asked the same question in 2014, but she didn't get
> > a
> definite answare,
> > So, I want to rewake up the dissusstion about this. Becase for the
> virtio-crypto device,
> > I also need to increase the vring size to get better performance and
> thoughput, but the Qemu
> > side limit the thought as VIRTQUEUE_MAX_SIZE is 1024.
> >
> >[QA-virtio]:Why vring size is limited to 1024?
> >
> http://qemu.11.n7.nabble.com/QA-virtio-Why-vring-size-is-limited-to-10
> 24-td2
> 92450.html
> >
> > Avi Kivity said that google cloud exposed the vring size to 16k.
> >
> > Regards,
> > -Gonglei
>
> Fundamentally, the reason is that the ring size currently also sets
> the max s/g list length, and linux hosts can't support bigger lists.
>
But I don't think this is a problem.
Vring is just a container, we can say the max request's length is 1024, but the capacity of container shouldn't be the length of max request. For example, we can put 4K requests with one s/g list into vring at one time if the vring size is 4096, and 4 requests with 1024 s/g list into vring at one time.
Ignoring the indirect table support. Am I right?