1% Intel Memory Bandwidth Allocation (MBA) Feature
2% Revision 1.8
3
4\clearpage
5
6# Basics
7
8---------------- ----------------------------------------------------
9         Status: **Tech Preview**
10
11Architecture(s): Intel x86
12
13   Component(s): Hypervisor, toolstack
14
15       Hardware: MBA is supported on Skylake Server and beyond
16---------------- ----------------------------------------------------
17
18# Terminology
19
20* CAT         Cache Allocation Technology
21* CBM         Capacity BitMasks
22* CDP         Code and Data Prioritization
23* COS/CLOS    Class of Service
24* HW          Hardware
25* MBA         Memory Bandwidth Allocation
26* MSRs        Machine Specific Registers
27* PSR         Intel Platform Shared Resource
28* THRTL       Throttle value or delay value
29
30# Overview
31
32The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
33control over memory bandwidth available per-core. This feature provides OS/
34hypervisor the ability to slow misbehaving apps/domains by using a credit-based
35throttling mechanism.
36
37# User details
38
39* Feature Enabling:
40
41  Add "psr=mba" to boot line parameter to enable MBA feature.
42
43* xl interfaces:
44
45  1. `psr-mba-show [domain-id|domain-name]`:
46
47     Show memory bandwidth throttling for domain. Under different modes, it
48     shows different type of data.
49
50     There are two modes:
51     Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
52     if the MBA_MAX value is 90, the input precision is 10%. Values not an even
53     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
54     delay applied) by HW automatically. The response of throttling value is
55     linear.
56
57     Non-linear mode: input delay values are powers-of-two from zero to the
58     MBA_MAX value from CPUID. In this case any values not a power of two will
59     be rounded down the next nearest power of two by HW automatically. The
60     response of throttling value is non-linear.
61
62     For linear mode, it shows the decimal value. For non-linear mode, it shows
63     hexadecimal value.
64
65  2. `psr-mba-set [OPTIONS] <domain-id|domain-name> <throttling>`:
66
67     Set memory bandwidth throttling for domain.
68
69     Options:
70     '-s': Specify the socket to process, otherwise all sockets are processed.
71
72     Throttling value set in register implies the approximate amount of delaying
73     the traffic between core and memory. Higher throttling value result in
74     lower bandwidth. The max throttling value (MBA_MAX) supported can be
75     obtained through CPUID inside hypervisor. Users can fetch the MBA_MAX value
76     using the `psr-hwinfo` xl command.
77
78# Technical details
79
80MBA is a member of Intel PSR features, it shares the base PSR infrastructure
81in Xen.
82
83## Hardware perspective
84
85  MBA defines a range of MSRs to support specifying a delay value (Thrtl) per
86  COS, with details below.
87
88  ```
89   +----------------------------+----------------+
90   | MSR (per socket)           |    Address     |
91   +----------------------------+----------------+
92   | IA32_L2_QOS_Ext_BW_Thrtl_0 |     0xD50      |
93   +----------------------------+----------------+
94   | ...                        |  ...           |
95   +----------------------------+----------------+
96   | IA32_L2_QOS_Ext_BW_Thrtl_n |     0xD50+n    |
97   +----------------------------+----------------+
98  ```
99
100  When context switch happens, the COS ID of domain is written to per-hyper-
101  thread MSR `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation
102  according to the throttling value stored in the Thrtl MSR register.
103
104## The relationship between MBA and CAT/CDP
105
106  Generally speaking, MBA is completely independent of CAT/CDP, and any
107  combination may be applied at any time, e.g. enabling MBA with CAT
108  disabled.
109
110  But it needs to be noticed that MBA shares COS infrastructure with CAT,
111  although MBA is enumerated by different CPUID leaf from CAT (which
112  indicates that the max COS of MBA may be different from CAT). In some
113  cases, a domain is permitted to have a COS that is beyond one (or more)
114  of PSR features but within the others. For instance, let's assume the max
115  COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned
116  9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA,
117  the HW works as default value is set since COS 9 is beyond the max COS (8)
118  of MBA.
119
120## Design Overview
121
122* Core COS/Thrtl association
123
124  When enforcing Memory Bandwidth Allocation, all cores of domains have
125  the same default Thrtl MSR (COS0) which stores the same Thrtl (0). The
126  default Thrtl MSR is used only in hypervisor and is transparent to tool stack
127  and user.
128
129  System administrators can change PSR allocation policy at runtime by
130  using the tool stack. Since MBA shares COS ID with CAT/CDP, a COS ID
131  corresponds to a 2-tuple, like [CBM, Thrtl] with only-CAT enabled, when CDP
132  is enabled, the COS ID corresponds to a 3-tuple, like [Code_CBM, Data_CBM,
133  Thrtl]. If neither CAT nor CDP is enabled, things are easier, since one COS
134  ID corresponds to one Thrtl.
135
136* VCPU schedule
137
138  This part reuses CAT COS infrastructure.
139
140* Multi-sockets
141
142  Different sockets may have different MBA capabilities (like max COS)
143  although it is consistent on the same socket. So the capability
144  of per-socket MBA is specified.
145
146  This part reuses CAT COS infrastructure.
147
148## Implementation Description
149
150* Hypervisor interfaces:
151
152  1. Boot line param: "psr=mba" to enable the feature.
153
154  2. SYSCTL:
155          - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information.
156
157  3. DOMCTL:
158          - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain.
159          - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain.
160
161* xl interfaces:
162
163  1. psr-mba-show [domain-id]
164          Show system/domain runtime MBA throttling value. For linear mode,
165          it shows the decimal value. For non-linear mode, it shows hexadecimal
166          value.
167          => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL
168
169  2. psr-mba-set [OPTIONS] <domain-id> <throttling>
170          Set bandwidth throttling for a domain.
171          => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL
172
173  3. psr-hwinfo
174          Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA.
175          => XEN_SYSCTL_PSR_MBA_get_info
176
177* Key data structure:
178
179  1. Feature HW info
180
181     ```
182     struct {
183         unsigned int thrtl_max;
184         bool linear;
185     } mba;
186
187     - Member `thrtl_max`
188
189       `thrtl_max` is the max throttling value to be set, i.e. MBA_MAX.
190
191     - Member `linear`
192
193       `linear` means the response of delay value is linear or not.
194
195     As mentioned above, MBA is a member of Intel PSR features, it shares the
196     base PSR infrastructure in Xen. For example, the 'cos_max' is a common HW
197     property for all features. So, for other data structure details, please
198     refer to 'intel_psr_cat_cdp.pandoc'.
199
200# Limitations
201
202MBA can only work on HW which supports it (check CPUID).
203
204# Testing
205
206We can execute these commands to verify MBA on different HWs supporting them.
207
208For example:
209  1. User can get the MBA hardware info through 'psr-hwinfo' command. From
210     result, user can know if this hardware works under linear mode or non-
211     linear mode, the max throttling value (MBA_MAX) and so on.
212
213    root@:~$ xl psr-hwinfo --mba
214    Memory Bandwidth Allocation (MBA):
215    Socket ID       : 0
216    Linear Mode     : Enabled
217    Maximum COS     : 7
218    Maximum Throttling Value: 90
219    Default Throttling Value: 0
220
221  2. Then, user can set a throttling value to a domain. For example, set '10',
222     i.e 10% delay.
223
224    root@:~$ xl psr-mba-set 1 10
225
226  3. User can check the current configuration of the domain through
227     'psr-mab-show'. For linear mode, the decimal value is shown.
228
229    root@:~$ xl psr-mba-show 1
230    Socket ID       : 0
231    Default THRTL   : 0
232       ID                     NAME            THRTL
233        1                 ubuntu14             10
234
235# Areas for improvement
236
237N/A
238
239# Known issues
240
241N/A
242
243# References
244
245"INTEL RESOURCE DIRECTOR TECHNOLOGY (INTEL RDT) ALLOCATION FEATURES" [Intel 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
246
247# History
248
249------------------------------------------------------------------------
250Date       Revision Version  Notes
251---------- -------- -------- -------------------------------------------
2522017-01-10 1.0      Xen 4.9  Design document written
2532017-07-10 1.1      Xen 4.10 Changes:
254                             1. Modify data structure according to latest
255                                codes;
256                             2. Add content for 'Areas for improvement';
257                             3. Other minor changes.
2582017-08-09 1.2      Xen 4.10 Changes:
259                             1. Remove a special character to avoid error when
260                                building pandoc.
2612017-08-15 1.3      Xen 4.10 Changes:
262                             1. Add terminology 'HW'.
263                             2. Change 'COS ID of VCPU' to 'COS ID of domain'.
264                             3. Change 'COS register' to 'Thrtl MSR'.
265                             4. Explain the value shown for 'psr-mba-show' under
266                                different modes.
267                             5. Remove content in 'Areas for improvement'.
2682017-08-16 1.4      Xen 4.10 Changes:
269                             1. Add '<>' for mandatory argument.
2702017-08-30 1.5      Xen 4.10 Changes:
271                             1. Modify words in 'Overview' to make it easier to
272                                understand.
273                             2. Explain 'linear/non-linear' modes before mention
274                                them.
275                             3. Explain throttling value more accurate.
276                             4. Explain 'MBA_MAX'.
277                             5. Correct some words in 'Design Overview'.
278                             6. Change 'mba_info' to 'mba' according to code
279                                changes. Also, modify contents of it.
280                             7. Add context in 'Testing' part to make things
281                                more clear.
282                             8. Remove 'n<64' to avoid out-of-sync.
2832017-09-21 1.6      Xen 4.10 Changes:
284                             1. Add 'domain-name' as parameter of 'psr-mba-show/
285                                psr-mba-set'.
286                             2. Fix some wordings.
287                             3. Explain how user can know the MBA_MAX.
288                             4. Move the description of 'Linear mode/Non-linear
289                                mode' into section of 'psr-mba-show'.
290                             5. Change 'per-thread' to 'per-hyper-thread'.
2912017-09-29 1.7      Xen 4.10 Changes:
292                             1. Correct some words.
293                             2. Change 'xl psr-mba-set 1 0xa' to
294                                'xl psr-mba-set 1 10'
2952017-10-08 1.8      Xen 4.10 Changes:
296                             1. Correct some words.
297---------- -------- -------- -------------------------------------------
298