1# Xenstore Migration 2 3## Background 4 5The design for *Non-Cooperative Migration of Guests*[1] explains that extra 6save records are required in the migrations stream to allow a guest running PV 7drivers to be migrated without its co-operation. Moreover the save records must 8include details of registered xenstore watches as well as content; information 9that cannot currently be recovered from `xenstored`, and hence some extension 10to the xenstored implementations will also be required. 11 12As a similar set of data is needed for transferring xenstore data from one 13instance to another when live updating xenstored this document proposes an 14image format for a 'migration stream' suitable for both purposes. 15 16## Proposal 17 18The image format consists of a _header_ followed by 1 or more _records_. Each 19record consists of a type and length field, followed by any data mandated by 20the record type. At minimum there will be a single record of type `END` 21(defined below). 22 23### Header 24 25The header identifies the stream as a `xenstore` stream, including the version 26of the specification that it complies with. 27 28All fields in this header must be in _big-endian_ byte order, regardless of 29the setting of the endianness bit. 30 31 32``` 33 0 1 2 3 4 5 6 7 octet 34+-------+-------+-------+-------+-------+-------+-------+-------+ 35| ident | 36+-------------------------------+-------------------------------| 37| version | flags | 38+-------------------------------+-------------------------------+ 39``` 40 41 42| Field | Description | 43|-----------|---------------------------------------------------| 44| `ident` | 0x78656e73746f7265 ('xenstore' in ASCII) | 45| | | 46| `version` | 0x00000001 (the version of the specification) | 47| | | 48| `flags` | 0 (LSB): Endianness: 0 = little, 1 = big | 49| | | 50| | 1-31: Reserved (must be zero) | 51 52### Records 53 54Records immediately follow the header and have the following format: 55 56 57``` 58 0 1 2 3 4 5 6 7 octet 59+-------+-------+-------+-------+-------+-------+-------+-------+ 60| type | len | 61+-------------------------------+-------------------------------+ 62| body 63... 64| | padding (0 to 7 octets) | 65+-------+-------------------------------------------------------+ 66``` 67 68NOTE: padding octets here and in all subsequent format specifications must be 69 written as zero and should be ignored when the stream is read. 70 71 72| Field | Description | 73|--------|------------------------------------------------------| 74| `type` | 0x00000000: END | 75| | 0x00000001: GLOBAL_DATA | 76| | 0x00000002: CONNECTION_DATA | 77| | 0x00000003: WATCH_DATA | 78| | 0x00000004: TRANSACTION_DATA | 79| | 0x00000005: NODE_DATA | 80| | 0x00000006 - 0xFFFFFFFF: reserved for future use | 81| | | 82| `len` | The length (in octets) of `body` | 83| | | 84| `body` | The type-specific record data | 85 86Some records will depend on other records in the migration stream. Records 87upon which other records depend must always appear earlier in the stream. 88 89The various formats of the type-specific data are described in the following 90sections: 91 92\pagebreak 93 94### END 95 96The end record marks the end of the image, and is the final record 97in the stream. 98 99``` 100 0 1 2 3 4 5 6 7 octet 101+-------+-------+-------+-------+-------+-------+-------+-------+ 102``` 103 104 105The end record contains no fields; its body length is 0. 106 107\pagebreak 108 109### GLOBAL_DATA 110 111This record is only relevant for live update. It contains details of global 112xenstored state that needs to be restored. 113 114``` 115 0 1 2 3 octet 116+-------+-------+-------+-------+ 117| rw-socket-fd | 118+-------------------------------+ 119| ro-socket-fd | 120+-------------------------------+ 121``` 122 123 124| Field | Description | 125|----------------|----------------------------------------------| 126| `rw-socket-fd` | The file descriptor of the socket accepting | 127| | read-write connections | 128| | | 129| `ro-socket-fd` | The file descriptor of the socket accepting | 130| | read-only connections | 131 132xenstored will resume in the original process context. Hence `rw-socket-fd` and 133`ro-socket-fd` simply specify the file descriptors of the sockets. Sockets 134are not always used, however, and so -1 will be used to denote an unused 135socket. 136 137 138\pagebreak 139 140### CONNECTION_DATA 141 142For live update the image format will contain a `CONNECTION_DATA` record for 143each connection to xenstore. For migration it will only contain a record for 144the domain being migrated. 145 146 147``` 148 0 1 2 3 4 5 6 7 octet 149+-------+-------+-------+-------+-------+-------+-------+-------+ 150| conn-id | conn-type | flags | 151+-------------------------------+---------------+---------------+ 152| conn-spec 153... 154+---------------+---------------+-------------------------------+ 155| in-data-len | out-resp-len | out-data-len | 156+---------------+---------------+-------------------------------+ 157| data 158... 159``` 160 161 162| Field | Description | 163|----------------|----------------------------------------------| 164| `conn-id` | A non-zero number used to identify this | 165| | connection in subsequent connection-specific | 166| | records | 167| | | 168| `conn-type` | 0x0000: shared ring | 169| | 0x0001: socket | 170| | 0x0002 - 0xFFFF: reserved for future use | 171| | | 172| `flags` | A bit-wise OR of: | 173| | 0001: read-only | 174| | | 175| `conn-spec` | See below | 176| | | 177| `in-data-len` | The length (in octets) of any data read | 178| | from the connection not yet processed | 179| | | 180| `out-resp-len` | The length (in octets) of a partial response | 181| | not yet written to the connection | 182| | | 183| `out-data-len` | The length (in octets) of any pending data | 184| | not yet written to the connection, including | 185| | a partial response (see `out-resp-len`) | 186| | | 187| `data` | Pending data: first in-data-len octets of | 188| | read data, then out-data-len octets of | 189| | written data (any of both may be empty) | 190 191In case of live update the connection record for the connection via which 192the live update command was issued will contain the response for the live 193update command in the pending not yet written data. 194 195\pagebreak 196 197The format of `conn-spec` is dependent upon `conn-type`. 198 199For `shared ring` connections it is as follows: 200 201 202``` 203 0 1 2 3 4 5 6 7 octet 204+---------------+---------------+---------------+---------------+ 205| domid | tdomid | evtchn | 206+-------------------------------+-------------------------------+ 207``` 208 209 210| Field | Description | 211|-----------|---------------------------------------------------| 212| `domid` | The domain-id that owns the shared page | 213| | | 214| `tdomid` | The domain-id that `domid` acts on behalf of if | 215| | it has been subject to an SET_TARGET | 216| | operation [2] or DOMID_INVALID [3] otherwise | 217| | | 218| `evtchn` | The port number of the interdomain channel used | 219| | by `domid` to communicate with xenstored | 220| | | 221 222Since the ABI guarantees that entry 1 in `domid`'s grant table will always 223contain the GFN of the shared page. 224 225For `socket` connections it is as follows: 226 227 228``` 229+---------------+---------------+---------------+---------------+ 230| socket-fd | pad | 231+-------------------------------+-------------------------------+ 232``` 233 234 235| Field | Description | 236|-------------|-------------------------------------------------| 237| `socket-fd` | The file descriptor of the connected socket | 238 239This type of connection is only relevant for live update, where the xenstored 240resumes in the original process context. Hence `socket-fd` simply specify 241the file descriptor of the socket connection. 242 243\pagebreak 244 245### WATCH_DATA 246 247The image format will contain a `WATCH_DATA` record for each watch registered 248by a connection for which there is `CONNECTION_DATA` record previously present. 249 250 251``` 252 0 1 2 3 octet 253+-------+-------+-------+-------+ 254| conn-id | 255+---------------+---------------+ 256| wpath-len | token-len | 257+---------------+---------------+ 258| wpath 259... 260| token 261... 262``` 263 264 265| Field | Description | 266|-------------|-------------------------------------------------| 267| `conn-id` | The connection that issued the `WATCH` | 268| | operation [2] | 269| | | 270| `wpath-len` | The length (in octets) of `wpath` including the | 271| | NUL terminator | 272| | | 273| `token-len` | The length (in octets) of `token` including the | 274| | NUL terminator | 275| | | 276| `wpath` | The watch path, as specified in the `WATCH` | 277| | operation | 278| | | 279| `token` | The watch identifier token, as specified in the | 280| | `WATCH` operation | 281 282\pagebreak 283 284### TRANSACTION_DATA 285 286The image format will contain a `TRANSACTION_DATA` record for each transaction 287that is pending on a connection for which there is `CONNECTION_DATA` record 288previously present. 289 290 291``` 292 0 1 2 3 octet 293+-------+-------+-------+-------+ 294| conn-id | 295+-------------------------------+ 296| tx-id | 297+-------------------------------+ 298``` 299 300 301| Field | Description | 302|----------------|----------------------------------------------| 303| `conn-id` | The connection that issued the | 304| | `TRANSACTION_START` operation [2] | 305| | | 306| `tx-id` | The transaction id passed back to the domain | 307| | by the `TRANSACTION_START` operation | 308 309\pagebreak 310 311### NODE_DATA 312 313For live update the image format will contain a `NODE_DATA` record for each 314node in xenstore. For migration it will only contain a record for the nodes 315relating to the domain being migrated. The `NODE_DATA` may be related to 316a _committed_ node (globally visible in xenstored) or a _pending_ node (created 317or modified by a transaction for which there is also a `TRANSACTION_DATA` 318record previously present). 319 320 321``` 322 0 1 2 3 octet 323+-------+-------+-------+-------+ 324| conn-id | 325+-------------------------------+ 326| tx-id | 327+---------------+---------------+ 328| path-len | value-len | 329+---------------+---------------+ 330| access | perm-count | 331+---------------+---------------+ 332| perm1 | 333+-------------------------------+ 334... 335+-------------------------------+ 336| permN | 337+---------------+---------------+ 338| path 339... 340| value 341... 342``` 343 344 345| Field | Description | 346|--------------|------------------------------------------------| 347| `conn-id` | If this value is non-zero then this record | 348| | related to a pending transaction | 349| | | 350| `tx-id` | This value should be ignored if `conn-id` is | 351| | zero. Otherwise it specifies the id of the | 352| | pending transaction | 353| | | 354| `path-len` | The length (in octets) of `path` including the | 355| | NUL terminator | 356| | | 357| `value-len` | The length (in octets) of `value` (which will | 358| | be zero for a deleted node) | 359| | | 360| `access` | This value should be ignored if this record | 361| | does not relate to a pending transaction, | 362| | otherwise it specifies the accesses made to | 363| | the node and hence is a bitwise OR of: | 364| | | 365| | 0x0001: read | 366| | 0x0002: written | 367| | | 368| | The value will be zero for a deleted node | 369| | | 370| `perm-count` | The number (N) of node permission specifiers | 371| | (which will be 0 for a node deleted in a | 372| | pending transaction) | 373| | | 374| `perm1..N` | A list of zero or more node permission | 375| | specifiers (see below) | 376| | | 377| `path` | The absolute path of the node | 378| | | 379| `value` | The node value (which may be empty or contain | 380| | NUL octets) | 381 382 383A node permission specifier has the following format: 384 385 386``` 387 0 1 2 3 octet 388+-------+-------+-------+-------+ 389| perm | pad | domid | 390+-------+-------+---------------+ 391``` 392 393| Field | Description | 394|---------|-----------------------------------------------------| 395| `perm` | One of the ASCII values `w`, `r`, `b` or `n` as | 396| | specified for the `SET_PERMS` operation [2] | 397| | | 398| `domid` | The domain-id to which the permission relates | 399 400Note that perm1 defines the domain owning the code. See [4] for more 401explanation of node permissions. 402 403* * * 404 405[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md 406 407[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt 408 409[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/xen.h;hb=HEAD#l612 410 411[4] https://wiki.xen.org/wiki/XenBus 412