1% Feature Levelling
2% Revision 1
3
4\clearpage
5
6# Basics
7
8---------------- ----------------------------------------------------
9         Status: **Supported**
10
11   Architecture: x86
12
13      Component: Hypervisor, toolstack, guest
14---------------- ----------------------------------------------------
15
16
17# Overview
18
19On native hardware, a kernel will boot, detect features, typically optimise
20certain codepaths based on the available features, and expect the features to
21remain available until it shuts down.
22
23The same expectation exists for virtual machines, and it is up to the
24hypervisor/toolstack to fulfill this expectation for the lifetime of the
25virtual machine, including across migrate/suspend/resume.
26
27
28# User details
29
30Many factors affect the featureset which a VM may use:
31
32* The CPU itself
33* The BIOS/firmware/microcode version and settings
34* The hypervisor version and command line settings
35* Further restrictions the toolstack chooses to apply
36
37A firmware or software upgrade might reduce the available set of features
38(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell
39processors), as may editing the settings.
40
41It is unsafe to make any assumption about features remaining consistent across
42a host reboot.  Xen recalculates all information from scratch each boot, and
43provides the information for the toolstack to consume.
44
45`xl` currently has no facilities to help the user collect appropriate feature
46information from relevant hosts and compute appropriate feature specifications
47for use in host or domain configurations.  (`xl` being a single-host
48toolstack, it would in any case need external support for accessing remote
49hosts eg via ssh, in the form of automation software like GNU parallel or
50ansible.)
51
52# Technical details
53
54The `CPUID` instruction is used by software to query for features.  In the
55virtualisation usecase, guest software should query Xen rather than hardware
56directly.  However, `CPUID` is an unprivileged instruction which doesn't
57fault, complicating the task of hiding hardware features from guests.
58
59Important files:
60
61* Hypervisor
62    * `xen/arch/x86/cpu/*.c`
63    * `xen/arch/x86/cpuid.c`
64    * `xen/include/asm-x86/cpuid-autogen.h`
65    * `xen/include/public/arch-x86/cpufeatureset.h`
66    * `xen/tools/gen-cpuid.py`
67* `libxc`
68    * `tools/libxc/xc_cpuid_x86.c`
69
70## Ability to control CPUID
71
72### HVM
73
74HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen
75on all `CPUID` instructions, allowing Xen full control over all information.
76
77### PV
78
79The `CPUID` instruction is unprivileged, so executing it in a PV guest will
80not trap, leaving Xen no direct ability to control the information returned.
81
82### Xen Forced Emulation Prefix
83
84Xen-aware PV software can make use of the 'Forced Emulation Prefix'
85
86> `ud2a; .ascii 'xen'; cpuid`
87
88which Xen recognises as a deliberate attempt to get the fully-controlled
89`CPUID` information rather than the hardware-reported information.  This only
90works with cooperative software.
91
92### Masking and Override MSRs
93
94AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow
95direct control of the values returned for certain `CPUID` leaves.  These MSRs
96allow any result to be returned, including the ability to advertise features
97which are not actually supported.
98
99Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of
100_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID`
101instructions requesting specific feature bitmap sets.  The exact MSRs, and
102which feature bitmap sets they affect are hardware specific.  These MSRs allow
103features to be hidden by clearing the appropriate bit in the mask, but does
104not allow unsupported features to be advertised.
105
106### CPUID Faulting
107
108Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to
109cause `CPUID` instruction executed in PV guests to fault.  This allows Xen
110full control over all information, exactly like HVM guests.
111
112## Compile time
113
114As some features depend on other features, it is important that, when
115disabling a certain feature, we disable all features which depend on it.  This
116allows runtime logic to be simplified, by being able to rely on testing only
117the single appropriate feature, rather than the entire feature dependency
118chain.
119
120To speed up runtime calculation of feature dependencies, the dependency chain
121is calculated and flattened by `xen/tools/gen-cpuid.py` to create
122`xen/include/asm-x86/cpuid-autogen.h` from
123`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to
124disable all dependent features of a specific disabled feature in constant
125time.
126
127## Host boot
128
129As Xen boots, it will enumerate the features it can see.  This is stored as
130the *raw_featureset*.
131
132Errata checks and command line arguments are then taken into account to reduce
133the *raw_featureset* into the *host_featureset*, which is the set of
134features Xen uses.  On hardware with masking/override MSRs, the default MSR
135values are picked from the *host_featureset*.
136
137The *host_featureset* is then used to calculate the *pv_featureset* and
138*hvm_featureset*, which are the maximum featuresets Xen is willing to offer
139to PV and HVM guests respectively.
140
141In addition, Xen will calculate how much control it has over non-cooperative
142PV `CPUID` instructions, storing this information as *levelling_caps*.
143
144## Domain creation
145
146The toolstack can query each of the calculated featureset via
147`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via
148`XEN_SYSCTL_get_cpu_levelling_caps`.
149
150These data should be used by the toolstack when choosing the eventual
151featureset to offer to the guest.
152
153Once a featureset has been chosen, it is set (implicitly or explicitly) via
154`XEN_DOMCTL_set_cpuid`.  Xen will clamp the toolstacks choice to the
155appropriate PV or HVM featureset.  On hardware with masking/override MSRs, the
156guest cpuid policy is reflected in the MSRs, which are context switched with
157other vcpu state.
158
159# Limitations
160
161A guest which ignores the provided feature information and manually probes for
162features will be able to find some of them.  e.g. There is no way of forcibly
163preventing a guest from using 1GB superpages if the hardware supports it.
164
165Some information simply cannot be hidden from guests.  There is no way to
166control certain behaviour such as the hardware MXCSR_MASK or x87 FPU exception
167behaviour.
168
169
170# Testing
171
172Feature levelling is a very wide area, and used all over the hypervisor.
173Please ask on xen-devel for help identifying more specific tests which could
174be of use.
175
176
177# Known issues / Areas for improvement
178
179The feature querying and levelling functions should exposed in a
180convenient-to-use way by `xl`.
181
182Xen currently has no concept of per-{socket,core,thread} CPUID information.
183As a result, details such as APIC IDs, topology and cache information do not
184match real hardware, and do not match the documented expectations in the Intel
185and AMD system manuals.
186
187The CPU feature flags are the only information which the toolstack has a
188sensible interface for querying and levelling.  Other information in the CPUID
189policy is important and should be levelled (e.g. maxphysaddr).
190
191The CPUID policy is currently regenerated from scratch by the receiving side,
192once memory and vcpu content has been restored.  This means that the receiving
193Xen cannot verify the memory/vcpu content against the CPUID policy, and can
194end up running a guest which will subsequently crash.  The CPUID policy should
195be at the head of the migration stream.
196
197MSRs are another source of features for guests.  There is no general provision
198for controlling the available MSRs.  E.g. 64bit versions of Windows notice
199changes in IA32_MISC_ENABLE, and suffer a BSOD 0x109 (Critical Structure
200Corruption)
201
202
203# References
204
205[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf)
206
207[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf)
208
209
210# History
211
212------------------------------------------------------------------------
213Date       Revision Version  Notes
214---------- -------- -------- -------------------------------------------
2152016-05-31 1        Xen 4.7  Document written
216---------- -------- -------- -------------------------------------------
217