1<!-- 2 3Although you may be viewing an alternate representation, this document 4is sourced in Markdown, a light-duty markup scheme, and is optimized for 5the [kramdown](http://kramdown.rubyforge.org/) transformer. 6 7See the accompanying README. External link targets are referenced at the 8end of this file. 9 10--> 11 12Specification for WebP Lossless Bitstream 13========================================= 14 15_Jyrki Alakuijala, Ph.D., Google, Inc., 2012-06-19_ 16 17Paragraphs marked as \[AMENDED\] were amended on 2014-09-16. 18 19Abstract 20-------- 21 22WebP lossless is an image format for lossless compression of ARGB 23images. The lossless format stores and restores the pixel values 24exactly, including the color values for zero alpha pixels. The 25format uses subresolution images, recursively embedded into the format 26itself, for storing statistical data about the images, such as the used 27entropy codes, spatial predictors, color space conversion, and color 28table. LZ77, Huffman coding, and a color cache are used for compression 29of the bulk data. Decoding speeds faster than PNG have been 30demonstrated, as well as 25% denser compression than can be achieved 31using today's PNG format. 32 33 34* TOC placeholder 35{:toc} 36 37 38Nomenclature 39------------ 40 41ARGB 42: A pixel value consisting of alpha, red, green, and blue values. 43 44ARGB image 45: A two-dimensional array containing ARGB pixels. 46 47color cache 48: A small hash-addressed array to store recently used colors, to be able 49 to recall them with shorter codes. 50 51color indexing image 52: A one-dimensional image of colors that can be indexed using a small 53 integer (up to 256 within WebP lossless). 54 55color transform image 56: A two-dimensional subresolution image containing data about 57 correlations of color components. 58 59distance mapping 60: Changes LZ77 distances to have the smallest values for pixels in 2D 61 proximity. 62 63entropy image 64: A two-dimensional subresolution image indicating which entropy coding 65 should be used in a respective square in the image, i.e., each pixel 66 is a meta Huffman code. 67 68Huffman code 69: A classic way to do entropy coding where a smaller number of bits are 70 used for more frequent codes. 71 72LZ77 73: Dictionary-based sliding window compression algorithm that either 74 emits symbols or describes them as sequences of past symbols. 75 76meta Huffman code 77: A small integer (up to 16 bits) that indexes an element in the meta 78 Huffman table. 79 80predictor image 81: A two-dimensional subresolution image indicating which spatial 82 predictor is used for a particular square in the image. 83 84prefix coding 85: A way to entropy code larger integers that codes a few bits of the 86 integer using an entropy code and codifies the remaining bits raw. 87 This allows for the descriptions of the entropy codes to remain 88 relatively small even when the range of symbols is large. 89 90scan-line order 91: A processing order of pixels, left-to-right, top-to-bottom, starting 92 from the left-hand-top pixel, proceeding to the right. Once a row is 93 completed, continue from the left-hand column of the next row. 94 95 961 Introduction 97-------------- 98 99This document describes the compressed data representation of a WebP 100lossless image. It is intended as a detailed reference for WebP lossless 101encoder and decoder implementation. 102 103In this document, we extensively use C programming language syntax to 104describe the bitstream, and assume the existence of a function for 105reading bits, `ReadBits(n)`. The bytes are read in the natural order of 106the stream containing them, and bits of each byte are read in 107least-significant-bit-first order. When multiple bits are read at the 108same time, the integer is constructed from the original data in the 109original order. The most significant bits of the returned integer are 110also the most significant bits of the original data. Thus the statement 111 112~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 113b = ReadBits(2); 114~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 115 116is equivalent with the two statements below: 117 118~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 119b = ReadBits(1); 120b |= ReadBits(1) << 1; 121~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 122 123We assume that each color component (e.g. alpha, red, blue and green) is 124represented using an 8-bit byte. We define the corresponding type as 125uint8. A whole ARGB pixel is represented by a type called uint32, an 126unsigned integer consisting of 32 bits. In the code showing the behavior 127of the transformations, alpha value is codified in bits 31..24, red in 128bits 23..16, green in bits 15..8 and blue in bits 7..0, but 129implementations of the format are free to use another representation 130internally. 131 132Broadly, a WebP lossless image contains header data, transform 133information and actual image data. Headers contain width and height of 134the image. A WebP lossless image can go through four different types of 135transformation before being entropy encoded. The transform information 136in the bitstream contains the data required to apply the respective 137inverse transforms. 138 139 1402 RIFF Header 141------------- 142 143The beginning of the header has the RIFF container. This consists of the 144following 21 bytes: 145 146 1. String "RIFF" 147 2. A little-endian 32 bit value of the block length, the whole size 148 of the block controlled by the RIFF header. Normally this equals 149 the payload size (file size minus 8 bytes: 4 bytes for the 'RIFF' 150 identifier and 4 bytes for storing the value itself). 151 3. String "WEBP" (RIFF container name). 152 4. String "VP8L" (chunk tag for lossless encoded image data). 153 5. A little-endian 32-bit value of the number of bytes in the 154 lossless stream. 155 6. One byte signature 0x2f. 156 157The first 28 bits of the bitstream specify the width and height of the 158image. Width and height are decoded as 14-bit integers as follows: 159 160~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 161int image_width = ReadBits(14) + 1; 162int image_height = ReadBits(14) + 1; 163~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 164 165The 14-bit dynamics for image size limit the maximum size of a WebP 166lossless image to 16384✕16384 pixels. 167 168The alpha_is_used bit is a hint only, and should not impact decoding. 169It should be set to 0 when all alpha values are 255 in the picture, and 1701 otherwise. 171 172~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 173int alpha_is_used = ReadBits(1); 174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 175 176The version_number is a 3 bit code that must be set to 0. Any other value 177should be treated as an error. \[AMENDED\] 178 179~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180int version_number = ReadBits(3); 181~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 183 1843 Transformations 185----------------- 186 187Transformations are reversible manipulations of the image data that can 188reduce the remaining symbolic entropy by modeling spatial and color 189correlations. Transformations can make the final compression more dense. 190 191An image can go through four types of transformation. A 1 bit indicates 192the presence of a transform. Each transform is allowed to be used only 193once. The transformations are used only for the main level ARGB image: 194the subresolution images have no transforms, not even the 0 bit 195indicating the end-of-transforms. 196 197Typically an encoder would use these transforms to reduce the Shannon 198entropy in the residual image. Also, the transform data can be decided 199based on entropy minimization. 200 201~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 202while (ReadBits(1)) { // Transform present. 203 // Decode transform type. 204 enum TransformType transform_type = ReadBits(2); 205 // Decode transform data. 206 ... 207} 208 209// Decode actual image data (Section 4). 210~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 211 212If a transform is present then the next two bits specify the transform 213type. There are four types of transforms. 214 215~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 216enum TransformType { 217 PREDICTOR_TRANSFORM = 0, 218 COLOR_TRANSFORM = 1, 219 SUBTRACT_GREEN = 2, 220 COLOR_INDEXING_TRANSFORM = 3, 221}; 222~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 223 224The transform type is followed by the transform data. Transform data 225contains the information required to apply the inverse transform and 226depends on the transform type. Next we describe the transform data for 227different types. 228 229 230### Predictor Transform 231 232The predictor transform can be used to reduce entropy by exploiting the 233fact that neighboring pixels are often correlated. In the predictor 234transform, the current pixel value is predicted from the pixels already 235decoded (in scan-line order) and only the residual value (actual - 236predicted) is encoded. The _prediction mode_ determines the type of 237prediction to use. We divide the image into squares and all the pixels 238in a square use same prediction mode. 239 240The first 3 bits of prediction data define the block width and height in 241number of bits. The number of block columns, `block_xsize`, is used in 242indexing two-dimensionally. 243 244~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 245int size_bits = ReadBits(3) + 2; 246int block_width = (1 << size_bits); 247int block_height = (1 << size_bits); 248#define DIV_ROUND_UP(num, den) ((num) + (den) - 1) / (den)) 249int block_xsize = DIV_ROUND_UP(image_width, 1 << size_bits); 250~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 251 252The transform data contains the prediction mode for each block of the 253image. All the `block_width * block_height` pixels of a block use same 254prediction mode. The prediction modes are treated as pixels of an image 255and encoded using the same techniques described in 256[Chapter 4](#image-data). 257 258For a pixel _x, y_, one can compute the respective filter block address 259by: 260 261~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 262int block_index = (y >> size_bits) * block_xsize + 263 (x >> size_bits); 264~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 265 266There are 14 different prediction modes. In each prediction mode, the 267current pixel value is predicted from one or more neighboring pixels 268whose values are already known. 269 270We choose the neighboring pixels (TL, T, TR, and L) of the current pixel 271(P) as follows: 272 273~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 274O O O O O O O O O O O 275O O O O O O O O O O O 276O O O O TL T TR O O O O 277O O O O L P X X X X X 278X X X X X X X X X X X 279X X X X X X X X X X X 280~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 281 282where TL means top-left, T top, TR top-right, L left pixel. 283At the time of predicting a value for P, all pixels O, TL, T, TR and L 284have been already processed, and pixel P and all pixels X are unknown. 285 286Given the above neighboring pixels, the different prediction modes are 287defined as follows. 288 289| Mode | Predicted value of each channel of the current pixel | 290| ------ | ------------------------------------------------------- | 291| 0 | 0xff000000 (represents solid black color in ARGB) | 292| 1 | L | 293| 2 | T | 294| 3 | TR | 295| 4 | TL | 296| 5 | Average2(Average2(L, TR), T) | 297| 6 | Average2(L, TL) | 298| 7 | Average2(L, T) | 299| 8 | Average2(TL, T) | 300| 9 | Average2(T, TR) | 301| 10 | Average2(Average2(L, TL), Average2(T, TR)) | 302| 11 | Select(L, T, TL) | 303| 12 | ClampAddSubtractFull(L, T, TL) | 304| 13 | ClampAddSubtractHalf(Average2(L, T), TL) | 305 306 307`Average2` is defined as follows for each ARGB component: 308 309~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 310uint8 Average2(uint8 a, uint8 b) { 311 return (a + b) / 2; 312} 313~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 314 315The Select predictor is defined as follows: 316 317~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 318uint32 Select(uint32 L, uint32 T, uint32 TL) { 319 // L = left pixel, T = top pixel, TL = top left pixel. 320 321 // ARGB component estimates for prediction. 322 int pAlpha = ALPHA(L) + ALPHA(T) - ALPHA(TL); 323 int pRed = RED(L) + RED(T) - RED(TL); 324 int pGreen = GREEN(L) + GREEN(T) - GREEN(TL); 325 int pBlue = BLUE(L) + BLUE(T) - BLUE(TL); 326 327 // Manhattan distances to estimates for left and top pixels. 328 int pL = abs(pAlpha - ALPHA(L)) + abs(pRed - RED(L)) + 329 abs(pGreen - GREEN(L)) + abs(pBlue - BLUE(L)); 330 int pT = abs(pAlpha - ALPHA(T)) + abs(pRed - RED(T)) + 331 abs(pGreen - GREEN(T)) + abs(pBlue - BLUE(T)); 332 333 // Return either left or top, the one closer to the prediction. 334 if (pL < pT) { // \[AMENDED\] 335 return L; 336 } else { 337 return T; 338 } 339} 340~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 341 342The functions `ClampAddSubtractFull` and `ClampAddSubtractHalf` are 343performed for each ARGB component as follows: 344 345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 346// Clamp the input value between 0 and 255. 347int Clamp(int a) { 348 return (a < 0) ? 0 : (a > 255) ? 255 : a; 349} 350~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 351 352~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 353int ClampAddSubtractFull(int a, int b, int c) { 354 return Clamp(a + b - c); 355} 356~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 357 358~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 359int ClampAddSubtractHalf(int a, int b) { 360 return Clamp(a + (a - b) / 2); 361} 362~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 363 364There are special handling rules for some border pixels. If there is a 365prediction transform, regardless of the mode \[0..13\] for these pixels, 366the predicted value for the left-topmost pixel of the image is 3670xff000000, L-pixel for all pixels on the top row, and T-pixel for all 368pixels on the leftmost column. 369 370Addressing the TR-pixel for pixels on the rightmost column is 371exceptional. The pixels on the rightmost column are predicted by using 372the modes \[0..13\] just like pixels not on border, but by using the 373leftmost pixel on the same row as the current TR-pixel. The TR-pixel 374offset in memory is the same for border and non-border pixels. 375 376 377### Color Transform 378 379The goal of the color transform is to decorrelate the R, G and B values 380of each pixel. Color transform keeps the green (G) value as it is, 381transforms red (R) based on green and transforms blue (B) based on green 382and then based on red. 383 384As is the case for the predictor transform, first the image is divided 385into blocks and the same transform mode is used for all the pixels in a 386block. For each block there are three types of color transform elements. 387 388~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 389typedef struct { 390 uint8 green_to_red; 391 uint8 green_to_blue; 392 uint8 red_to_blue; 393} ColorTransformElement; 394~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 395 396The actual color transformation is done by defining a color transform 397delta. The color transform delta depends on the `ColorTransformElement`, 398which is the same for all the pixels in a particular block. The delta is 399added during color transform. The inverse color transform then is just 400subtracting those deltas. 401 402The color transform function is defined as follows: 403 404~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 405void ColorTransform(uint8 red, uint8 blue, uint8 green, 406 ColorTransformElement *trans, 407 uint8 *new_red, uint8 *new_blue) { 408 // Transformed values of red and blue components 409 uint32 tmp_red = red; 410 uint32 tmp_blue = blue; 411 412 // Applying transform is just adding the transform deltas 413 tmp_red += ColorTransformDelta(trans->green_to_red, green); 414 tmp_blue += ColorTransformDelta(trans->green_to_blue, green); 415 tmp_blue += ColorTransformDelta(trans->red_to_blue, red); 416 417 *new_red = tmp_red & 0xff; 418 *new_blue = tmp_blue & 0xff; 419} 420~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 421 422`ColorTransformDelta` is computed using a signed 8-bit integer 423representing a 3.5-fixed-point number, and a signed 8-bit RGB color 424channel (c) \[-128..127\] and is defined as follows: 425 426~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 427int8 ColorTransformDelta(int8 t, int8 c) { 428 return (t * c) >> 5; 429} 430~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 431 432A conversion from the 8-bit unsigned representation (uint8) to the 8-bit 433signed one (int8) is required before calling ColorTransformDelta(). 434It should be performed using 8-bit two's complement (that is: uint8 range 435\[128-255\] is mapped to the \[-128, -1\] range of its converted int8 value). 436 437The multiplication is to be done using more precision (with at least 43816-bit dynamics). The sign extension property of the shift operation 439does not matter here: only the lowest 8 bits are used from the result, 440and there the sign extension shifting and unsigned shifting are 441consistent with each other. 442 443Now we describe the contents of color transform data so that decoding 444can apply the inverse color transform and recover the original red and 445blue values. The first 3 bits of the color transform data contain the 446width and height of the image block in number of bits, just like the 447predictor transform: 448 449~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 450int size_bits = ReadBits(3) + 2; 451int block_width = 1 << size_bits; 452int block_height = 1 << size_bits; 453~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 454 455The remaining part of the color transform data contains 456`ColorTransformElement` instances corresponding to each block of the 457image. `ColorTransformElement` instances are treated as pixels of an 458image and encoded using the methods described in 459[Chapter 4](#image-data). 460 461During decoding, `ColorTransformElement` instances of the blocks are 462decoded and the inverse color transform is applied on the ARGB values of 463the pixels. As mentioned earlier, that inverse color transform is just 464subtracting `ColorTransformElement` values from the red and blue 465channels. 466 467~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 468void InverseTransform(uint8 red, uint8 green, uint8 blue, 469 ColorTransformElement *p, 470 uint8 *new_red, uint8 *new_blue) { 471 // Applying inverse transform is just subtracting the 472 // color transform deltas 473 red -= ColorTransformDelta(p->green_to_red_, green); 474 blue -= ColorTransformDelta(p->green_to_blue_, green); 475 blue -= ColorTransformDelta(p->red_to_blue_, red & 0xff); 476 477 *new_red = red & 0xff; 478 *new_blue = blue & 0xff; 479} 480~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 481 482 483### Subtract Green Transform 484 485The subtract green transform subtracts green values from red and blue 486values of each pixel. When this transform is present, the decoder needs 487to add the green value to both red and blue. There is no data associated 488with this transform. The decoder applies the inverse transform as 489follows: 490 491~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 492void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) { 493 *red = (*red + green) & 0xff; 494 *blue = (*blue + green) & 0xff; 495} 496~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 497 498This transform is redundant as it can be modeled using the color 499transform, but it is still often useful. Since it can extend the 500dynamics of the color transform and there is no additional data here, 501the subtract green transform can be coded using fewer bits than a 502full-blown color transform. 503 504 505### Color Indexing Transform 506 507If there are not many unique pixel values, it may be more efficient to 508create a color index array and replace the pixel values by the array's 509indices. The color indexing transform achieves this. (In the context of 510WebP lossless, we specifically do not call this a palette transform 511because a similar but more dynamic concept exists in WebP lossless 512encoding: color cache.) 513 514The color indexing transform checks for the number of unique ARGB values 515in the image. If that number is below a threshold (256), it creates an 516array of those ARGB values, which is then used to replace the pixel 517values with the corresponding index: the green channel of the pixels are 518replaced with the index; all alpha values are set to 255; all red and 519blue values to 0. 520 521The transform data contains color table size and the entries in the 522color table. The decoder reads the color indexing transform data as 523follows: 524 525~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 526// 8 bit value for color table size 527int color_table_size = ReadBits(8) + 1; 528~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 529 530The color table is stored using the image storage format itself. The 531color table can be obtained by reading an image, without the RIFF 532header, image size, and transforms, assuming a height of one pixel and 533a width of `color_table_size`. The color table is always 534subtraction-coded to reduce image entropy. The deltas of palette colors 535contain typically much less entropy than the colors themselves, leading 536to significant savings for smaller images. In decoding, every final 537color in the color table can be obtained by adding the previous color 538component values by each ARGB component separately, and storing the 539least significant 8 bits of the result. 540 541The inverse transform for the image is simply replacing the pixel values 542(which are indices to the color table) with the actual color table 543values. The indexing is done based on the green component of the ARGB 544color. 545 546~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 547// Inverse transform 548argb = color_table[GREEN(argb)]; 549~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 550 551If the index is equal or larger than color_table_size, the argb color value 552should be set to 0x00000000 (transparent black). \[AMENDED\] 553 554When the color table is small (equal to or less than 16 colors), several 555pixels are bundled into a single pixel. The pixel bundling packs several 556(2, 4, or 8) pixels into a single pixel, reducing the image width 557respectively. Pixel bundling allows for a more efficient joint 558distribution entropy coding of neighboring pixels, and gives some 559arithmetic coding-like benefits to the entropy code, but it can only be 560used when there are a small number of unique values. 561 562`color_table_size` specifies how many pixels are combined together: 563 564~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 565int width_bits; 566if (color_table_size <= 2) { 567 width_bits = 3; 568} else if (color_table_size <= 4) { 569 width_bits = 2; 570} else if (color_table_size <= 16) { 571 width_bits = 1; 572} else { 573 width_bits = 0; 574} 575~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 576 577`width_bits` has a value of 0, 1, 2 or 3. A value of 0 indicates no 578pixel bundling to be done for the image. A value of 1 indicates that two 579pixels are combined together, and each pixel has a range of \[0..15\]. A 580value of 2 indicates that four pixels are combined together, and each 581pixel has a range of \[0..3\]. A value of 3 indicates that eight pixels 582are combined together and each pixel has a range of \[0..1\], i.e., a 583binary value. 584 585The values are packed into the green component as follows: 586 587 * `width_bits` = 1: for every x value where x ≡ 0 (mod 2), a green 588 value at x is positioned into the 4 least-significant bits of the 589 green value at x / 2, a green value at x + 1 is positioned into the 590 4 most-significant bits of the green value at x / 2. 591 * `width_bits` = 2: for every x value where x ≡ 0 (mod 4), a green 592 value at x is positioned into the 2 least-significant bits of the 593 green value at x / 4, green values at x + 1 to x + 3 in order to the 594 more significant bits of the green value at x / 4. 595 * `width_bits` = 3: for every x value where x ≡ 0 (mod 8), a green 596 value at x is positioned into the least-significant bit of the green 597 value at x / 8, green values at x + 1 to x + 7 in order to the more 598 significant bits of the green value at x / 8. 599 600 6014 Image Data 602------------ 603 604Image data is an array of pixel values in scan-line order. 605 606### 4.1 Roles of Image Data 607 608We use image data in five different roles: 609 610 1. ARGB image: Stores the actual pixels of the image. 611 1. Entropy image: Stores the 612 [meta Huffman codes](#decoding-of-meta-huffman-codes). The red and green 613 components of a pixel define the meta Huffman code used in a particular 614 block of the ARGB image. 615 1. Predictor image: Stores the metadata for [Predictor 616 Transform](#predictor-transform). The green component of a pixel defines 617 which of the 14 predictors is used within a particular block of the 618 ARGB image. 619 1. Color transform image. It is created by `ColorTransformElement` values 620 (defined in [Color Transform](#color-transform)) for different blocks of 621 the image. Each `ColorTransformElement` `'cte'` is treated as a pixel whose 622 alpha component is `255`, red component is `cte.red_to_blue`, green 623 component is `cte.green_to_blue` and blue component is `cte.green_to_red`. 624 1. Color indexing image: An array of of size `color_table_size` (up to 256 625 ARGB values) storing the metadata for the 626 [Color Indexing Transform](#color-indexing-transform). This is stored as an 627 image of width `color_table_size` and height `1`. 628 629### 4.2 Encoding of Image data 630 631The encoding of image data is independent of its role. 632 633The image is first divided into a set of fixed-size blocks (typically 16x16 634blocks). Each of these blocks are modeled using their own entropy codes. Also, 635several blocks may share the same entropy codes. 636 637**Rationale:** Storing an entropy code incurs a cost. This cost can be minimized 638if statistically similar blocks share an entropy code, thereby storing that code 639only once. For example, an encoder can find similar blocks by clustering them 640using their statistical properties, or by repeatedly joining a pair of randomly 641selected clusters when it reduces the overall amount of bits needed to encode 642the image. 643 644Each pixel is encoded using one of the three possible methods: 645 646 1. Huffman coded literal: each channel (green, red, blue and alpha) is 647 entropy-coded independently; 648 2. LZ77 backward reference: a sequence of pixels are copied from elsewhere 649 in the image; or 650 3. Color cache code: using a short multiplicative hash code (color cache 651 index) of a recently seen color. 652 653The following sub-sections describe each of these in detail. 654 655#### 4.2.1 Huffman Coded Literals 656 657The pixel is stored as Huffman coded values of green, red, blue and alpha (in 658that order). See [this section](#decoding-entropy-coded-image-data) for details. 659 660#### 4.2.2 LZ77 Backward Reference 661 662Backward references are tuples of _length_ and _distance code_: 663 664 * Length indicates how many pixels in scan-line order are to be copied. 665 * Distance code is a number indicating the position of a previously seen 666 pixel, from which the pixels are to be copied. The exact mapping is 667 described [below](#distance-mapping). 668 669The length and distance values are stored using **LZ77 prefix coding**. 670 671LZ77 prefix coding divides large integer values into two parts: the _prefix 672code_ and the _extra bits_: the prefix code is stored using an entropy code, 673while the extra bits are stored as they are (without an entropy code). 674 675**Rationale**: This approach reduces the storage requirement for the entropy 676code. Also, large values are usually rare, and so extra bits would be used for 677very few values in the image. Thus, this approach results in a better 678compression overall. 679 680The following table denotes the prefix codes and extra bits used for storing 681different range of values. 682 683Note: The maximum backward reference length is limited to 4096. Hence, only the 684first 24 prefix codes (with the respective extra bits) are meaningful for length 685values. For distance values, however, all the 40 prefix codes are valid. 686 687| Value range | Prefix code | Extra bits | 688| --------------- | ----------- | ---------- | 689| 1 | 0 | 0 | 690| 2 | 1 | 0 | 691| 3 | 2 | 0 | 692| 4 | 3 | 0 | 693| 5..6 | 4 | 1 | 694| 7..8 | 5 | 1 | 695| 9..12 | 6 | 2 | 696| 13..16 | 7 | 2 | 697| ... | ... | ... | 698| 3072..4096 | 23 | 10 | 699| ... | ... | ... | 700| 524289..786432 | 38 | 18 | 701| 786433..1048576 | 39 | 18 | 702 703The pseudocode to obtain a (length or distance) value from the prefix code is 704as follows: 705 706~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 707if (prefix_code < 4) { 708 return prefix_code + 1; 709} 710int extra_bits = (prefix_code - 2) >> 1; 711int offset = (2 + (prefix_code & 1)) << extra_bits; 712return offset + ReadBits(extra_bits) + 1; 713~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 714 715**Distance Mapping:** 716{:#distance-mapping} 717 718As noted previously, distance code is a number indicating the position of a 719previously seen pixel, from which the pixels are to be copied. This sub-section 720defines the mapping between a distance code and the position of a previous 721pixel. 722 723The distance codes larger than 120 denote the pixel-distance in scan-line 724order, offset by 120. 725 726The smallest distance codes \[1..120\] are special, and are reserved for a close 727neighborhood of the current pixel. This neighborhood consists of 120 pixels: 728 729 * Pixels that are 1 to 7 rows above the current pixel, and are up to 8 columns 730 to the left or up to 7 columns to the right of the current pixel. \[Total 731 such pixels = `7 * (8 + 1 + 7) = 112`\]. 732 * Pixels that are in same row as the current pixel, and are up to 8 columns to 733 the left of the current pixel. \[`8` such pixels\]. 734 735The mapping between distance code `i` and the neighboring pixel offset 736`(xi, yi)` is as follows: 737 738~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 739(0, 1), (1, 0), (1, 1), (-1, 1), (0, 2), (2, 0), (1, 2), (-1, 2), 740(2, 1), (-2, 1), (2, 2), (-2, 2), (0, 3), (3, 0), (1, 3), (-1, 3), 741(3, 1), (-3, 1), (2, 3), (-2, 3), (3, 2), (-3, 2), (0, 4), (4, 0), 742(1, 4), (-1, 4), (4, 1), (-4, 1), (3, 3), (-3, 3), (2, 4), (-2, 4), 743(4, 2), (-4, 2), (0, 5), (3, 4), (-3, 4), (4, 3), (-4, 3), (5, 0), 744(1, 5), (-1, 5), (5, 1), (-5, 1), (2, 5), (-2, 5), (5, 2), (-5, 2), 745(4, 4), (-4, 4), (3, 5), (-3, 5), (5, 3), (-5, 3), (0, 6), (6, 0), 746(1, 6), (-1, 6), (6, 1), (-6, 1), (2, 6), (-2, 6), (6, 2), (-6, 2), 747(4, 5), (-4, 5), (5, 4), (-5, 4), (3, 6), (-3, 6), (6, 3), (-6, 3), 748(0, 7), (7, 0), (1, 7), (-1, 7), (5, 5), (-5, 5), (7, 1), (-7, 1), 749(4, 6), (-4, 6), (6, 4), (-6, 4), (2, 7), (-2, 7), (7, 2), (-7, 2), 750(3, 7), (-3, 7), (7, 3), (-7, 3), (5, 6), (-5, 6), (6, 5), (-6, 5), 751(8, 0), (4, 7), (-4, 7), (7, 4), (-7, 4), (8, 1), (8, 2), (6, 6), 752(-6, 6), (8, 3), (5, 7), (-5, 7), (7, 5), (-7, 5), (8, 4), (6, 7), 753(-6, 7), (7, 6), (-7, 6), (8, 5), (7, 7), (-7, 7), (8, 6), (8, 7) 754~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 755 756For example, distance code `1` indicates offset of `(0, 1)` for the neighboring 757pixel, that is, the pixel above the current pixel (0-pixel difference in 758X-direction and 1 pixel difference in Y-direction). Similarly, distance code 759`3` indicates left-top pixel. 760 761The decoder can convert a distances code 'i' to a scan-line order distance 762'dist' as follows: 763 764~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 765(xi, yi) = distance_map[i] 766dist = x + y * xsize 767if (dist < 1) { 768 dist = 1 769} 770~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 771 772where 'distance_map' is the mapping noted above and `xsize` is the width of the 773image in pixels. 774 775 776#### 4.2.3 Color Cache Coding 777 778Color cache stores a set of colors that have been recently used in the image. 779 780**Rationale:** This way, the recently used colors can sometimes be referred to 781more efficiently than emitting them using other two methods (described in 782[4.2.1](#huffman-coded-literals) and [4.2.2](#lz77-backward-reference)). 783 784Color cache codes are stored as follows. First, there is a 1-bit value that 785indicates if the color cache is used. If this bit is 0, no color cache codes 786exist, and they are not transmitted in the Huffman code that decodes the green 787symbols and the length prefix codes. However, if this bit is 1, the color cache 788size is read next: 789 790~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 791int color_cache_code_bits = ReadBits(4); 792int color_cache_size = 1 << color_cache_code_bits; 793~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 794 795`color_cache_code_bits` defines the size of the color_cache by (1 << 796`color_cache_code_bits`). The range of allowed values for 797`color_cache_code_bits` is \[1..11\]. Compliant decoders must indicate a 798corrupted bitstream for other values. 799 800A color cache is an array of size `color_cache_size`. Each entry 801stores one ARGB color. Colors are looked up by indexing them by 802(0x1e35a7bd * `color`) >> (32 - `color_cache_code_bits`). Only one 803lookup is done in a color cache; there is no conflict resolution. 804 805In the beginning of decoding or encoding of an image, all entries in all 806color cache values are set to zero. The color cache code is converted to 807this color at decoding time. The state of the color cache is maintained 808by inserting every pixel, be it produced by backward referencing or as 809literals, into the cache in the order they appear in the stream. 810 811 8125 Entropy Code 813-------------- 814 815### 5.1 Overview 816 817Most of the data is coded using [canonical Huffman code][canonical_huff]. Hence, 818the codes are transmitted by sending the _Huffman code lengths_, as opposed to 819the actual _Huffman codes_. 820 821In particular, the format uses **spatially-variant Huffman coding**. In other 822words, different blocks of the image can potentially use different entropy 823codes. 824 825**Rationale**: Different areas of the image may have different characteristics. So, allowing them to use different entropy codes provides more flexibility and 826potentially a better compression. 827 828### 5.2 Details 829 830The encoded image data consists of two parts: 831 832 1. Meta Huffman codes 833 1. Entropy-coded image data 834 835#### 5.2.1 Decoding of Meta Huffman Codes 836 837As noted earlier, the format allows the use of different Huffman codes for 838different blocks of the image. _Meta Huffman codes_ are indexes identifying 839which Huffman codes to use in different parts of the image. 840 841Meta Huffman codes may be used _only_ when the image is being used in the 842[role](#roles-of-image-data) of an _ARGB image_. 843 844There are two possibilities for the meta Huffman codes, indicated by a 1-bit 845value: 846 847 * If this bit is zero, there is only one meta Huffman code used everywhere in 848 the image. No more data is stored. 849 * If this bit is one, the image uses multiple meta Huffman codes. These meta 850 Huffman codes are stored as an _entropy image_ (described below). 851 852**Entropy image:** 853 854The entropy image defines which Huffman codes are used in different parts of the 855image, as described below. 856 857The first 3-bits contain the `huffman_bits` value. The dimensions of the entropy 858image are derived from 'huffman_bits'. 859 860~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 861int huffman_bits = ReadBits(3) + 2; 862int huffman_xsize = DIV_ROUND_UP(xsize, 1 << huffman_bits); 863int huffman_ysize = DIV_ROUND_UP(ysize, 1 << huffman_bits); 864~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 865 866where `DIV_ROUND_UP` is as defined [earlier](#predictor-transform). 867 868Next bits contain an entropy image of width `huffman_xsize` and height 869`huffman_ysize`. 870 871**Interpretation of Meta Huffman Codes:** 872 873For any given pixel (x, y), there is a set of five Huffman codes associated with 874it. These codes are (in bitstream order): 875 876 * **Huffman code #1**: used for green channel, backward-reference length and 877 color cache 878 * **Huffman code #2, #3 and #4**: used for red, blue and alpha channels 879 respectively. 880 * **Huffman code #5**: used for backward-reference distance. 881 882From here on, we refer to this set as a **Huffman code group**. 883 884The number of Huffman code groups in the ARGB image can be obtained by finding 885the _largest meta Huffman code_ from the entropy image: 886 887~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 888int num_huff_groups = max(entropy image) + 1; 889~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 890where `max(entropy image)` indicates the largest Huffman code stored in the 891entropy image. 892 893As each Huffman code groups contains five Huffman codes, the total number of 894Huffman codes is: 895 896~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 897int num_huff_codes = 5 * num_huff_groups; 898~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 899 900Given a pixel (x, y) in the ARGB image, we can obtain the corresponding Huffman 901codes to be used as follows: 902 903~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 904int position = (y >> huffman_bits) * huffman_xsize + (x >> huffman_bits); 905int meta_huff_code = (entropy_image[pos] >> 8) & 0xffff; 906HuffmanCodeGroup huff_group = huffman_code_groups[meta_huff_code]; 907~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 908 909where, we have assumed the existence of `HuffmanCodeGroup` structure, which 910represents a set of five Huffman codes. Also, `huffman_code_groups` is an array 911of `HuffmanCodeGroup` (of size `num_huff_groups`). 912 913The decoder then uses Huffman code group `huff_group` to decode the pixel 914(x, y) as explained in the [next section](#decoding-entropy-coded-image-data). 915 916#### 5.2.2 Decoding Entropy-coded Image Data 917 918For the current position (x, y) in the image, the decoder first identifies the 919corresponding Huffman code group (as explained in the last section). Given the 920Huffman code group, the pixel is read and decoded as follows: 921 922Read next symbol S from the bitstream using Huffman code #1. \[See 923[next section](#decoding-the-code-lengths) for details on decoding the Huffman 924code lengths\]. Note that S is any integer in the range `0` to 925`(256 + 24 + ` [`color_cache_size`](#color-cache-code)`- 1)`. 926 927The interpretation of S depends on its value: 928 929 1. if S < 256 930 1. Use S as the green component 931 1. Read red from the bitstream using Huffman code #2 932 1. Read blue from the bitstream using Huffman code #3 933 1. Read alpha from the bitstream using Huffman code #4 934 1. if S < 256 + 24 935 1. Use S - 256 as a length prefix code 936 1. Read extra bits for length from the bitstream 937 1. Determine backward-reference length L from length prefix code and the 938 extra bits read. 939 1. Read distance prefix code from the bitstream using Huffman code #5 940 1. Read extra bits for distance from the bitstream 941 1. Determine backward-reference distance D from distance prefix code and 942 the extra bits read. 943 1. Copy the L pixels (in scan-line order) from the sequence of pixels 944 prior to them by D pixels. 945 1. if S >= 256 + 24 946 1. Use S - (256 + 24) as the index into the color cache. 947 1. Get ARGB color from the color cache at that index. 948 949 950**Decoding the Code Lengths:** 951{:#decoding-the-code-lengths} 952 953This section describes the details about reading a symbol from the bitstream by 954decoding the Huffman code length. 955 956The Huffman code lengths can be coded in two ways. The method used is specified 957by a 1-bit value. 958 959 * If this bit is 1, it is a _simple code length code_, and 960 * If this bit is 0, it is a _normal code length code_. 961 962**(i) Simple Code Length Code:** 963 964This variant is used in the special case when only 1 or 2 Huffman code lengths 965are non-zero, and are in the range of \[0, 255\]. All other Huffman code lengths 966are implicitly zeros. 967 968The first bit indicates the number of non-zero code lengths: 969 970~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 971int num_code_lengths = ReadBits(1) + 1; 972~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 973 974The first code length is stored either using a 1-bit code for values of 0 and 1, 975or using an 8-bit code for values in range \[0, 255\]. The second code length, 976when present, is coded as an 8-bit code. 977 978~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 979int is_first_8bits = ReadBits(1); 980code_lengths[0] = ReadBits(1 + 7 * is_first_8bits); 981if (num_code_lengths == 2) { 982 code_lengths[1] = ReadBits(8); 983} 984~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 985 986**Note:** Another special case is when _all_ Huffman code lengths are _zeros_ 987(an empty Huffman code). For example, a Huffman code for distance can be empty 988if there are no backward references. Similarly, Huffman codes for alpha, red, 989and blue can be empty if all pixels within the same meta Huffman code are 990produced using the color cache. However, this case doesn't need a special 991handling, as empty Huffman codes can be coded as those containing a single 992symbol `0`. 993 994**(ii) Normal Code Length Code:** 995 996The code lengths of a Huffman code are read as follows: `num_code_lengths` 997specifies the number of code lengths; the rest of the code lengths 998(according to the order in `kCodeLengthCodeOrder`) are zeros. 999 1000~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1001int kCodeLengthCodes = 19; 1002int kCodeLengthCodeOrder[kCodeLengthCodes] = { 1003 17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 1004}; 1005int code_lengths[kCodeLengthCodes] = { 0 }; // All zeros. 1006int num_code_lengths = 4 + ReadBits(4); 1007for (i = 0; i < num_code_lengths; ++i) { 1008 code_lengths[kCodeLengthCodeOrder[i]] = ReadBits(3); 1009} 1010~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1011 1012 * Code length code \[0..15\] indicates literal code lengths. 1013 * Value 0 means no symbols have been coded. 1014 * Values \[1..15\] indicate the bit length of the respective code. 1015 * Code 16 repeats the previous non-zero value \[3..6\] times, i.e., 1016 3 + `ReadBits(2)` times. If code 16 is used before a non-zero 1017 value has been emitted, a value of 8 is repeated. 1018 * Code 17 emits a streak of zeros \[3..10\], i.e., 3 + `ReadBits(3)` 1019 times. 1020 * Code 18 emits a streak of zeros of length \[11..138\], i.e., 1021 11 + `ReadBits(7)` times. 1022 1023 10246 Overall Structure of the Format 1025--------------------------------- 1026 1027Below is a view into the format in Backus-Naur form. It does not cover 1028all details. End-of-image (EOI) is only implicitly coded into the number 1029of pixels (xsize * ysize). 1030 1031 1032#### Basic Structure 1033 1034~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1035<format> ::= <RIFF header><image size><image stream> 1036<image stream> ::= <optional-transform><spatially-coded image> 1037~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1038 1039 1040#### Structure of Transforms 1041 1042~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1043<optional-transform> ::= (1-bit value 1; <transform> <optional-transform>) | 1044 1-bit value 0 1045<transform> ::= <predictor-tx> | <color-tx> | <subtract-green-tx> | 1046 <color-indexing-tx> 1047<predictor-tx> ::= 2-bit value 0; <predictor image> 1048<predictor image> ::= 3-bit sub-pixel code ; <entropy-coded image> 1049<color-tx> ::= 2-bit value 1; <color image> 1050<color image> ::= 3-bit sub-pixel code ; <entropy-coded image> 1051<subtract-green-tx> ::= 2-bit value 2 1052<color-indexing-tx> ::= 2-bit value 3; <color-indexing image> 1053<color-indexing image> ::= 8-bit color count; <entropy-coded image> 1054~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1055 1056 1057#### Structure of the Image Data 1058 1059~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1060<spatially-coded image> ::= <meta huffman><entropy-coded image> 1061<entropy-coded image> ::= <color cache info><huffman codes><lz77-coded image> 1062<meta huffman> ::= 1-bit value 0 | 1063 (1-bit value 1; <entropy image>) 1064<entropy image> ::= 3-bit subsample value; <entropy-coded image> 1065<color cache info> ::= 1 bit value 0 | 1066 (1-bit value 1; 4-bit value for color cache size) 1067<huffman codes> ::= <huffman code group> | <huffman code group><huffman codes> 1068<huffman code group> ::= <huffman code><huffman code><huffman code> 1069 <huffman code><huffman code> 1070 See "Interpretation of Meta Huffman codes" to 1071 understand what each of these five Huffman codes are 1072 for. 1073<huffman code> ::= <simple huffman code> | <normal huffman code> 1074<simple huffman code> ::= see "Simple code length code" for details 1075<normal huffman code> ::= <code length code>; encoded code lengths 1076<code length code> ::= see section "Normal code length code" 1077<lz77-coded image> ::= ((<argb-pixel> | <lz77-copy> | <color-cache-code>) 1078 <lz77-coded image>) | "" 1079~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1080 1081A possible example sequence: 1082 1083~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1084<RIFF header><image size>1-bit value 1<subtract-green-tx> 10851-bit value 1<predictor-tx>1-bit value 0<meta huffman> 1086<color cache info><huffman codes> 1087<lz77-coded image> 1088~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1089 1090[canonical_huff]: http://en.wikipedia.org/wiki/Canonical_Huffman_code 1091