1% Feature Levelling 2% Revision 1 3 4\clearpage 5 6# Basics 7 8---------------- ---------------------------------------------------- 9 Status: **Supported** 10 11 Architecture: x86 12 13 Component: Hypervisor, toolstack, guest 14---------------- ---------------------------------------------------- 15 16 17# Overview 18 19On native hardware, a kernel will boot, detect features, typically optimise 20certain codepaths based on the available features, and expect the features to 21remain available until it shuts down. 22 23The same expectation exists for virtual machines, and it is up to the 24hypervisor/toolstack to fulfill this expectation for the lifetime of the 25virtual machine, including across migrate/suspend/resume. 26 27 28# User details 29 30Many factors affect the featureset which a VM may use: 31 32* The CPU itself 33* The BIOS/firmware/microcode version and settings 34* The hypervisor version and command line settings 35* Further restrictions the toolstack chooses to apply 36 37A firmware or software upgrade might reduce the available set of features 38(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell 39processors), as may editing the settings. 40 41It is unsafe to make any assumption about features remaining consistent across 42a host reboot. Xen recalculates all information from scratch each boot, and 43provides the information for the toolstack to consume. 44 45`xl` currently has no facilities to help the user collect appropriate feature 46information from relevant hosts and compute appropriate feature specifications 47for use in host or domain configurations. (`xl` being a single-host 48toolstack, it would in any case need external support for accessing remote 49hosts eg via ssh, in the form of automation software like GNU parallel or 50ansible.) 51 52# Technical details 53 54The `CPUID` instruction is used by software to query for features. In the 55virtualisation usecase, guest software should query Xen rather than hardware 56directly. However, `CPUID` is an unprivileged instruction which doesn't 57fault, complicating the task of hiding hardware features from guests. 58 59Important files: 60 61* Hypervisor 62 * `xen/arch/x86/cpu/*.c` 63 * `xen/arch/x86/cpuid.c` 64 * `xen/include/asm-x86/cpuid-autogen.h` 65 * `xen/include/public/arch-x86/cpufeatureset.h` 66 * `xen/tools/gen-cpuid.py` 67* `libxc` 68 * `tools/libxc/xc_cpuid_x86.c` 69 70## Ability to control CPUID 71 72### HVM 73 74HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen 75on all `CPUID` instructions, allowing Xen full control over all information. 76 77### PV 78 79The `CPUID` instruction is unprivileged, so executing it in a PV guest will 80not trap, leaving Xen no direct ability to control the information returned. 81 82### Xen Forced Emulation Prefix 83 84Xen-aware PV software can make use of the 'Forced Emulation Prefix' 85 86> `ud2a; .ascii 'xen'; cpuid` 87 88which Xen recognises as a deliberate attempt to get the fully-controlled 89`CPUID` information rather than the hardware-reported information. This only 90works with cooperative software. 91 92### Masking and Override MSRs 93 94AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow 95direct control of the values returned for certain `CPUID` leaves. These MSRs 96allow any result to be returned, including the ability to advertise features 97which are not actually supported. 98 99Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of 100_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID` 101instructions requesting specific feature bitmap sets. The exact MSRs, and 102which feature bitmap sets they affect are hardware specific. These MSRs allow 103features to be hidden by clearing the appropriate bit in the mask, but does 104not allow unsupported features to be advertised. 105 106### CPUID Faulting 107 108Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to 109cause `CPUID` instruction executed in PV guests to fault. This allows Xen 110full control over all information, exactly like HVM guests. 111 112## Compile time 113 114As some features depend on other features, it is important that, when 115disabling a certain feature, we disable all features which depend on it. This 116allows runtime logic to be simplified, by being able to rely on testing only 117the single appropriate feature, rather than the entire feature dependency 118chain. 119 120To speed up runtime calculation of feature dependencies, the dependency chain 121is calculated and flattened by `xen/tools/gen-cpuid.py` to create 122`xen/include/asm-x86/cpuid-autogen.h` from 123`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to 124disable all dependent features of a specific disabled feature in constant 125time. 126 127## Host boot 128 129As Xen boots, it will enumerate the features it can see. This is stored as 130the *raw_featureset*. 131 132Errata checks and command line arguments are then taken into account to reduce 133the *raw_featureset* into the *host_featureset*, which is the set of 134features Xen uses. On hardware with masking/override MSRs, the default MSR 135values are picked from the *host_featureset*. 136 137The *host_featureset* is then used to calculate the *pv_featureset* and 138*hvm_featureset*, which are the maximum featuresets Xen is willing to offer 139to PV and HVM guests respectively. 140 141In addition, Xen will calculate how much control it has over non-cooperative 142PV `CPUID` instructions, storing this information as *levelling_caps*. 143 144## Domain creation 145 146The toolstack can query each of the calculated featureset via 147`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via 148`XEN_SYSCTL_get_cpu_levelling_caps`. 149 150These data should be used by the toolstack when choosing the eventual 151featureset to offer to the guest. 152 153Once a featureset has been chosen, it is set (implicitly or explicitly) via 154`XEN_DOMCTL_set_cpuid`. Xen will clamp the toolstacks choice to the 155appropriate PV or HVM featureset. On hardware with masking/override MSRs, the 156guest cpuid policy is reflected in the MSRs, which are context switched with 157other vcpu state. 158 159# Limitations 160 161A guest which ignores the provided feature information and manually probes for 162features will be able to find some of them. e.g. There is no way of forcibly 163preventing a guest from using 1GB superpages if the hardware supports it. 164 165Some information simply cannot be hidden from guests. There is no way to 166control certain behaviour such as the hardware MXCSR_MASK or x87 FPU exception 167behaviour. 168 169 170# Testing 171 172Feature levelling is a very wide area, and used all over the hypervisor. 173Please ask on xen-devel for help identifying more specific tests which could 174be of use. 175 176 177# Known issues / Areas for improvement 178 179The feature querying and levelling functions should exposed in a 180convenient-to-use way by `xl`. 181 182Xen currently has no concept of per-{socket,core,thread} CPUID information. 183As a result, details such as APIC IDs, topology and cache information do not 184match real hardware, and do not match the documented expectations in the Intel 185and AMD system manuals. 186 187The CPU feature flags are the only information which the toolstack has a 188sensible interface for querying and levelling. Other information in the CPUID 189policy is important and should be levelled (e.g. maxphysaddr). 190 191The CPUID policy is currently regenerated from scratch by the receiving side, 192once memory and vcpu content has been restored. This means that the receiving 193Xen cannot verify the memory/vcpu content against the CPUID policy, and can 194end up running a guest which will subsequently crash. The CPUID policy should 195be at the head of the migration stream. 196 197MSRs are another source of features for guests. There is no general provision 198for controlling the available MSRs. E.g. 64bit versions of Windows notice 199changes in IA32_MISC_ENABLE, and suffer a BSOD 0x109 (Critical Structure 200Corruption) 201 202 203# References 204 205[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf) 206 207[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf) 208 209 210# History 211 212------------------------------------------------------------------------ 213Date Revision Version Notes 214---------- -------- -------- ------------------------------------------- 2152016-05-31 1 Xen 4.7 Document written 216---------- -------- -------- ------------------------------------------- 217