1# Xenstore Migration
2
3## Background
4
5The design for *Non-Cooperative Migration of Guests*[1] explains that extra
6save records are required in the migrations stream to allow a guest running PV
7drivers to be migrated without its co-operation. Moreover the save records must
8include details of registered xenstore watches as well as content; information
9that cannot currently be recovered from `xenstored`, and hence some extension
10to the xenstored implementations will also be required.
11
12As a similar set of data is needed for transferring xenstore data from one
13instance to another when live updating xenstored this document proposes an
14image format for a 'migration stream' suitable for both purposes.
15
16## Proposal
17
18The image format consists of a _header_ followed by 1 or more _records_. Each
19record consists of a type and length field, followed by any data mandated by
20the record type. At minimum there will be a single record of type `END`
21(defined below).
22
23### Header
24
25The header identifies the stream as a `xenstore` stream, including the version
26of the specification that it complies with.
27
28All fields in this header must be in _big-endian_ byte order, regardless of
29the setting of the endianness bit.
30
31
32```
33    0       1       2       3       4       5       6       7    octet
34+-------+-------+-------+-------+-------+-------+-------+-------+
35| ident                                                         |
36+-------------------------------+-------------------------------|
37| version                       | flags                         |
38+-------------------------------+-------------------------------+
39```
40
41
42| Field     | Description                                       |
43|-----------|---------------------------------------------------|
44| `ident`   | 0x78656e73746f7265 ('xenstore' in ASCII)          |
45|           |                                                   |
46| `version` | 0x00000001 (the version of the specification)     |
47|           |                                                   |
48| `flags`   | 0 (LSB): Endianness: 0 = little, 1 = big          |
49|           |                                                   |
50|           | 1-31: Reserved (must be zero)                     |
51
52### Records
53
54Records immediately follow the header and have the following format:
55
56
57```
58    0       1       2       3       4       5       6       7    octet
59+-------+-------+-------+-------+-------+-------+-------+-------+
60| type                          | len                           |
61+-------------------------------+-------------------------------+
62| body
63...
64|       | padding (0 to 7 octets)                               |
65+-------+-------------------------------------------------------+
66```
67
68NOTE: padding octets here and in all subsequent format specifications must be
69      written as zero and should be ignored when the stream is read.
70
71
72| Field  | Description                                          |
73|--------|------------------------------------------------------|
74| `type` | 0x00000000: END                                      |
75|        | 0x00000001: GLOBAL_DATA                              |
76|        | 0x00000002: CONNECTION_DATA                          |
77|        | 0x00000003: WATCH_DATA                               |
78|        | 0x00000004: TRANSACTION_DATA                         |
79|        | 0x00000005: NODE_DATA                                |
80|        | 0x00000006 - 0xFFFFFFFF: reserved for future use     |
81|        |                                                      |
82| `len`  | The length (in octets) of `body`                     |
83|        |                                                      |
84| `body` | The type-specific record data                        |
85
86Some records will depend on other records in the migration stream. Records
87upon which other records depend must always appear earlier in the stream.
88
89The various formats of the type-specific data are described in the following
90sections:
91
92\pagebreak
93
94### END
95
96The end record marks the end of the image, and is the final record
97in the stream.
98
99```
100    0       1       2       3       4       5       6       7    octet
101+-------+-------+-------+-------+-------+-------+-------+-------+
102```
103
104
105The end record contains no fields; its body length is 0.
106
107\pagebreak
108
109### GLOBAL_DATA
110
111This record is only relevant for live update. It contains details of global
112xenstored state that needs to be restored.
113
114```
115    0       1       2       3    octet
116+-------+-------+-------+-------+
117| rw-socket-fd                  |
118+-------------------------------+
119| ro-socket-fd                  |
120+-------------------------------+
121```
122
123
124| Field          | Description                                  |
125|----------------|----------------------------------------------|
126| `rw-socket-fd` | The file descriptor of the socket accepting  |
127|                | read-write connections                       |
128|                |                                              |
129| `ro-socket-fd` | The file descriptor of the socket accepting  |
130|                | read-only connections                        |
131
132xenstored will resume in the original process context. Hence `rw-socket-fd` and
133`ro-socket-fd` simply specify the file descriptors of the sockets. Sockets
134are not always used, however, and so -1 will be used to denote an unused
135socket.
136
137
138\pagebreak
139
140### CONNECTION_DATA
141
142For live update the image format will contain a `CONNECTION_DATA` record for
143each connection to xenstore. For migration it will only contain a record for
144the domain being migrated.
145
146
147```
148    0       1       2       3       4       5       6       7    octet
149+-------+-------+-------+-------+-------+-------+-------+-------+
150| conn-id                       | conn-type     | flags         |
151+-------------------------------+---------------+---------------+
152| conn-spec
153...
154+---------------+---------------+-------------------------------+
155| in-data-len   | out-resp-len  | out-data-len                  |
156+---------------+---------------+-------------------------------+
157| data
158...
159```
160
161
162| Field          | Description                                  |
163|----------------|----------------------------------------------|
164| `conn-id`      | A non-zero number used to identify this      |
165|                | connection in subsequent connection-specific |
166|                | records                                      |
167|                |                                              |
168| `conn-type`    | 0x0000: shared ring                          |
169|                | 0x0001: socket                               |
170|                | 0x0002 - 0xFFFF: reserved for future use     |
171|                |                                              |
172| `flags`        | A bit-wise OR of:                            |
173|                | 0001: read-only                              |
174|                |                                              |
175| `conn-spec`    | See below                                    |
176|                |                                              |
177| `in-data-len`  | The length (in octets) of any data read      |
178|                | from the connection not yet processed        |
179|                |                                              |
180| `out-resp-len` | The length (in octets) of a partial response |
181|                | not yet written to the connection            |
182|                |                                              |
183| `out-data-len` | The length (in octets) of any pending data   |
184|                | not yet written to the connection, including |
185|                | a partial response (see `out-resp-len`)      |
186|                |                                              |
187| `data`         | Pending data: first in-data-len octets of    |
188|                | read data, then out-data-len octets of       |
189|                | written data (any of both may be empty)      |
190
191In case of live update the connection record for the connection via which
192the live update command was issued will contain the response for the live
193update command in the pending not yet written data.
194
195\pagebreak
196
197The format of `conn-spec` is dependent upon `conn-type`.
198
199For `shared ring` connections it is as follows:
200
201
202```
203    0       1       2       3       4       5       6       7    octet
204+---------------+---------------+---------------+---------------+
205| domid         | tdomid        | evtchn                        |
206+-------------------------------+-------------------------------+
207```
208
209
210| Field     | Description                                       |
211|-----------|---------------------------------------------------|
212| `domid`   | The domain-id that owns the shared page           |
213|           |                                                   |
214| `tdomid`  | The domain-id that `domid` acts on behalf of if   |
215|           | it has been subject to an SET_TARGET              |
216|           | operation [2] or DOMID_INVALID [3] otherwise      |
217|           |                                                   |
218| `evtchn`  | The port number of the interdomain channel used   |
219|           | by `domid` to communicate with xenstored          |
220|           |                                                   |
221
222Since the ABI guarantees that entry 1 in `domid`'s grant table will always
223contain the GFN of the shared page.
224
225For `socket` connections it is as follows:
226
227
228```
229+---------------+---------------+---------------+---------------+
230| socket-fd                     | pad                           |
231+-------------------------------+-------------------------------+
232```
233
234
235| Field       | Description                                     |
236|-------------|-------------------------------------------------|
237| `socket-fd` | The file descriptor of the connected socket     |
238
239This type of connection is only relevant for live update, where the xenstored
240resumes in the original process context. Hence `socket-fd` simply specify
241the file descriptor of the socket connection.
242
243\pagebreak
244
245### WATCH_DATA
246
247The image format will contain a `WATCH_DATA` record for each watch registered
248by a connection for which there is `CONNECTION_DATA` record previously present.
249
250
251```
252    0       1       2       3    octet
253+-------+-------+-------+-------+
254| conn-id                       |
255+---------------+---------------+
256| wpath-len     | token-len     |
257+---------------+---------------+
258| wpath
259...
260| token
261...
262```
263
264
265| Field       | Description                                     |
266|-------------|-------------------------------------------------|
267| `conn-id`   | The connection that issued the `WATCH`          |
268|             | operation [2]                                   |
269|             |                                                 |
270| `wpath-len` | The length (in octets) of `wpath` including the |
271|             | NUL terminator                                  |
272|             |                                                 |
273| `token-len` | The length (in octets) of `token` including the |
274|             | NUL terminator                                  |
275|             |                                                 |
276| `wpath`     | The watch path, as specified in the `WATCH`     |
277|             | operation                                       |
278|             |                                                 |
279| `token`     | The watch identifier token, as specified in the |
280|             | `WATCH` operation                               |
281
282\pagebreak
283
284### TRANSACTION_DATA
285
286The image format will contain a `TRANSACTION_DATA` record for each transaction
287that is pending on a connection for which there is `CONNECTION_DATA` record
288previously present.
289
290
291```
292    0       1       2       3    octet
293+-------+-------+-------+-------+
294| conn-id                       |
295+-------------------------------+
296| tx-id                         |
297+-------------------------------+
298```
299
300
301| Field          | Description                                  |
302|----------------|----------------------------------------------|
303| `conn-id`      | The connection that issued the               |
304|                | `TRANSACTION_START` operation [2]            |
305|                |                                              |
306| `tx-id`        | The transaction id passed back to the domain |
307|                | by the `TRANSACTION_START` operation         |
308
309\pagebreak
310
311### NODE_DATA
312
313For live update the image format will contain a `NODE_DATA` record for each
314node in xenstore. For migration it will only contain a record for the nodes
315relating to the domain being migrated. The `NODE_DATA` may be related to
316a _committed_ node (globally visible in xenstored) or a _pending_ node (created
317or modified by a transaction for which there is also a `TRANSACTION_DATA`
318record previously present).
319
320
321```
322    0       1       2       3    octet
323+-------+-------+-------+-------+
324| conn-id                       |
325+-------------------------------+
326| tx-id                         |
327+---------------+---------------+
328| path-len      | value-len     |
329+---------------+---------------+
330| access        | perm-count    |
331+---------------+---------------+
332| perm1                         |
333+-------------------------------+
334...
335+-------------------------------+
336| permN                         |
337+---------------+---------------+
338| path
339...
340| value
341...
342```
343
344
345| Field        | Description                                    |
346|--------------|------------------------------------------------|
347| `conn-id`    | If this value is non-zero then this record     |
348|              | related to a pending transaction               |
349|              |                                                |
350| `tx-id`      | This value should be ignored if `conn-id` is   |
351|              | zero. Otherwise it specifies the id of the     |
352|              | pending transaction                            |
353|              |                                                |
354| `path-len`   | The length (in octets) of `path` including the |
355|              | NUL terminator                                 |
356|              |                                                |
357| `value-len`  | The length (in octets) of `value` (which will  |
358|              | be zero for a deleted node)                    |
359|              |                                                |
360| `access`     | This value should be ignored if this record    |
361|              | does not relate to a pending transaction,      |
362|              | otherwise it specifies the accesses made to    |
363|              | the node and hence is a bitwise OR of:         |
364|              |                                                |
365|              | 0x0001: read                                   |
366|              | 0x0002: written                                |
367|              |                                                |
368|              | The value will be zero for a deleted node      |
369|              |                                                |
370| `perm-count` | The number (N) of node permission specifiers   |
371|              | (which will be 0 for a node deleted in a       |
372|              | pending transaction)                           |
373|              |                                                |
374| `perm1..N`   | A list of zero or more node permission         |
375|              | specifiers (see below)                         |
376|              |                                                |
377| `path`       | The absolute path of the node                  |
378|              |                                                |
379| `value`      | The node value (which may be empty or contain  |
380|              | NUL octets)                                    |
381
382
383A node permission specifier has the following format:
384
385
386```
387    0       1       2       3    octet
388+-------+-------+-------+-------+
389| perm  | pad   | domid         |
390+-------+-------+---------------+
391```
392
393| Field   | Description                                         |
394|---------|-----------------------------------------------------|
395| `perm`  | One of the ASCII values `w`, `r`, `b` or `n` as     |
396|         | specified for the `SET_PERMS` operation [2]         |
397|         |                                                     |
398| `domid` | The domain-id to which the permission relates       |
399
400Note that perm1 defines the domain owning the code. See [4] for more
401explanation of node permissions.
402
403* * *
404
405[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
406
407[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
408
409[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/xen.h;hb=HEAD#l612
410
411[4] https://wiki.xen.org/wiki/XenBus
412