1% Intel Memory Bandwidth Allocation (MBA) Feature 2% Revision 1.8 3 4\clearpage 5 6# Basics 7 8---------------- ---------------------------------------------------- 9 Status: **Tech Preview** 10 11Architecture(s): Intel x86 12 13 Component(s): Hypervisor, toolstack 14 15 Hardware: MBA is supported on Skylake Server and beyond 16---------------- ---------------------------------------------------- 17 18# Terminology 19 20* CAT Cache Allocation Technology 21* CBM Capacity BitMasks 22* CDP Code and Data Prioritization 23* COS/CLOS Class of Service 24* HW Hardware 25* MBA Memory Bandwidth Allocation 26* MSRs Machine Specific Registers 27* PSR Intel Platform Shared Resource 28* THRTL Throttle value or delay value 29 30# Overview 31 32The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate 33control over memory bandwidth available per-core. This feature provides OS/ 34hypervisor the ability to slow misbehaving apps/domains by using a credit-based 35throttling mechanism. 36 37# User details 38 39* Feature Enabling: 40 41 Add "psr=mba" to boot line parameter to enable MBA feature. 42 43* xl interfaces: 44 45 1. `psr-mba-show [domain-id|domain-name]`: 46 47 Show memory bandwidth throttling for domain. Under different modes, it 48 shows different type of data. 49 50 There are two modes: 51 Linear mode: the input precision is defined as 100-(MBA_MAX). For instance, 52 if the MBA_MAX value is 90, the input precision is 10%. Values not an even 53 multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10% 54 delay applied) by HW automatically. The response of throttling value is 55 linear. 56 57 Non-linear mode: input delay values are powers-of-two from zero to the 58 MBA_MAX value from CPUID. In this case any values not a power of two will 59 be rounded down the next nearest power of two by HW automatically. The 60 response of throttling value is non-linear. 61 62 For linear mode, it shows the decimal value. For non-linear mode, it shows 63 hexadecimal value. 64 65 2. `psr-mba-set [OPTIONS] <domain-id|domain-name> <throttling>`: 66 67 Set memory bandwidth throttling for domain. 68 69 Options: 70 '-s': Specify the socket to process, otherwise all sockets are processed. 71 72 Throttling value set in register implies the approximate amount of delaying 73 the traffic between core and memory. Higher throttling value result in 74 lower bandwidth. The max throttling value (MBA_MAX) supported can be 75 obtained through CPUID inside hypervisor. Users can fetch the MBA_MAX value 76 using the `psr-hwinfo` xl command. 77 78# Technical details 79 80MBA is a member of Intel PSR features, it shares the base PSR infrastructure 81in Xen. 82 83## Hardware perspective 84 85 MBA defines a range of MSRs to support specifying a delay value (Thrtl) per 86 COS, with details below. 87 88 ``` 89 +----------------------------+----------------+ 90 | MSR (per socket) | Address | 91 +----------------------------+----------------+ 92 | IA32_L2_QOS_Ext_BW_Thrtl_0 | 0xD50 | 93 +----------------------------+----------------+ 94 | ... | ... | 95 +----------------------------+----------------+ 96 | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n | 97 +----------------------------+----------------+ 98 ``` 99 100 When context switch happens, the COS ID of domain is written to per-hyper- 101 thread MSR `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation 102 according to the throttling value stored in the Thrtl MSR register. 103 104## The relationship between MBA and CAT/CDP 105 106 Generally speaking, MBA is completely independent of CAT/CDP, and any 107 combination may be applied at any time, e.g. enabling MBA with CAT 108 disabled. 109 110 But it needs to be noticed that MBA shares COS infrastructure with CAT, 111 although MBA is enumerated by different CPUID leaf from CAT (which 112 indicates that the max COS of MBA may be different from CAT). In some 113 cases, a domain is permitted to have a COS that is beyond one (or more) 114 of PSR features but within the others. For instance, let's assume the max 115 COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned 116 9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA, 117 the HW works as default value is set since COS 9 is beyond the max COS (8) 118 of MBA. 119 120## Design Overview 121 122* Core COS/Thrtl association 123 124 When enforcing Memory Bandwidth Allocation, all cores of domains have 125 the same default Thrtl MSR (COS0) which stores the same Thrtl (0). The 126 default Thrtl MSR is used only in hypervisor and is transparent to tool stack 127 and user. 128 129 System administrators can change PSR allocation policy at runtime by 130 using the tool stack. Since MBA shares COS ID with CAT/CDP, a COS ID 131 corresponds to a 2-tuple, like [CBM, Thrtl] with only-CAT enabled, when CDP 132 is enabled, the COS ID corresponds to a 3-tuple, like [Code_CBM, Data_CBM, 133 Thrtl]. If neither CAT nor CDP is enabled, things are easier, since one COS 134 ID corresponds to one Thrtl. 135 136* VCPU schedule 137 138 This part reuses CAT COS infrastructure. 139 140* Multi-sockets 141 142 Different sockets may have different MBA capabilities (like max COS) 143 although it is consistent on the same socket. So the capability 144 of per-socket MBA is specified. 145 146 This part reuses CAT COS infrastructure. 147 148## Implementation Description 149 150* Hypervisor interfaces: 151 152 1. Boot line param: "psr=mba" to enable the feature. 153 154 2. SYSCTL: 155 - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information. 156 157 3. DOMCTL: 158 - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain. 159 - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain. 160 161* xl interfaces: 162 163 1. psr-mba-show [domain-id] 164 Show system/domain runtime MBA throttling value. For linear mode, 165 it shows the decimal value. For non-linear mode, it shows hexadecimal 166 value. 167 => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL 168 169 2. psr-mba-set [OPTIONS] <domain-id> <throttling> 170 Set bandwidth throttling for a domain. 171 => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL 172 173 3. psr-hwinfo 174 Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA. 175 => XEN_SYSCTL_PSR_MBA_get_info 176 177* Key data structure: 178 179 1. Feature HW info 180 181 ``` 182 struct { 183 unsigned int thrtl_max; 184 bool linear; 185 } mba; 186 187 - Member `thrtl_max` 188 189 `thrtl_max` is the max throttling value to be set, i.e. MBA_MAX. 190 191 - Member `linear` 192 193 `linear` means the response of delay value is linear or not. 194 195 As mentioned above, MBA is a member of Intel PSR features, it shares the 196 base PSR infrastructure in Xen. For example, the 'cos_max' is a common HW 197 property for all features. So, for other data structure details, please 198 refer to 'intel_psr_cat_cdp.pandoc'. 199 200# Limitations 201 202MBA can only work on HW which supports it (check CPUID). 203 204# Testing 205 206We can execute these commands to verify MBA on different HWs supporting them. 207 208For example: 209 1. User can get the MBA hardware info through 'psr-hwinfo' command. From 210 result, user can know if this hardware works under linear mode or non- 211 linear mode, the max throttling value (MBA_MAX) and so on. 212 213 root@:~$ xl psr-hwinfo --mba 214 Memory Bandwidth Allocation (MBA): 215 Socket ID : 0 216 Linear Mode : Enabled 217 Maximum COS : 7 218 Maximum Throttling Value: 90 219 Default Throttling Value: 0 220 221 2. Then, user can set a throttling value to a domain. For example, set '10', 222 i.e 10% delay. 223 224 root@:~$ xl psr-mba-set 1 10 225 226 3. User can check the current configuration of the domain through 227 'psr-mab-show'. For linear mode, the decimal value is shown. 228 229 root@:~$ xl psr-mba-show 1 230 Socket ID : 0 231 Default THRTL : 0 232 ID NAME THRTL 233 1 ubuntu14 10 234 235# Areas for improvement 236 237N/A 238 239# Known issues 240 241N/A 242 243# References 244 245"INTEL RESOURCE DIRECTOR TECHNOLOGY (INTEL RDT) ALLOCATION FEATURES" [Intel 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html) 246 247# History 248 249------------------------------------------------------------------------ 250Date Revision Version Notes 251---------- -------- -------- ------------------------------------------- 2522017-01-10 1.0 Xen 4.9 Design document written 2532017-07-10 1.1 Xen 4.10 Changes: 254 1. Modify data structure according to latest 255 codes; 256 2. Add content for 'Areas for improvement'; 257 3. Other minor changes. 2582017-08-09 1.2 Xen 4.10 Changes: 259 1. Remove a special character to avoid error when 260 building pandoc. 2612017-08-15 1.3 Xen 4.10 Changes: 262 1. Add terminology 'HW'. 263 2. Change 'COS ID of VCPU' to 'COS ID of domain'. 264 3. Change 'COS register' to 'Thrtl MSR'. 265 4. Explain the value shown for 'psr-mba-show' under 266 different modes. 267 5. Remove content in 'Areas for improvement'. 2682017-08-16 1.4 Xen 4.10 Changes: 269 1. Add '<>' for mandatory argument. 2702017-08-30 1.5 Xen 4.10 Changes: 271 1. Modify words in 'Overview' to make it easier to 272 understand. 273 2. Explain 'linear/non-linear' modes before mention 274 them. 275 3. Explain throttling value more accurate. 276 4. Explain 'MBA_MAX'. 277 5. Correct some words in 'Design Overview'. 278 6. Change 'mba_info' to 'mba' according to code 279 changes. Also, modify contents of it. 280 7. Add context in 'Testing' part to make things 281 more clear. 282 8. Remove 'n<64' to avoid out-of-sync. 2832017-09-21 1.6 Xen 4.10 Changes: 284 1. Add 'domain-name' as parameter of 'psr-mba-show/ 285 psr-mba-set'. 286 2. Fix some wordings. 287 3. Explain how user can know the MBA_MAX. 288 4. Move the description of 'Linear mode/Non-linear 289 mode' into section of 'psr-mba-show'. 290 5. Change 'per-thread' to 'per-hyper-thread'. 2912017-09-29 1.7 Xen 4.10 Changes: 292 1. Correct some words. 293 2. Change 'xl psr-mba-set 1 0xa' to 294 'xl psr-mba-set 1 10' 2952017-10-08 1.8 Xen 4.10 Changes: 296 1. Correct some words. 297---------- -------- -------- ------------------------------------------- 298