1<!--
2
3Although you may be viewing an alternate representation, this document
4is sourced in Markdown, a light-duty markup scheme, and is optimized for
5the [kramdown](http://kramdown.rubyforge.org/) transformer.
6
7See the accompanying README. External link targets are referenced at the
8end of this file.
9
10-->
11
12Specification for WebP Lossless Bitstream
13=========================================
14
15_Jyrki Alakuijala, Ph.D., Google, Inc., 2012-06-19_
16
17Paragraphs marked as \[AMENDED\] were amended on 2014-09-16.
18
19Abstract
20--------
21
22WebP lossless is an image format for lossless compression of ARGB
23images. The lossless format stores and restores the pixel values
24exactly, including the color values for zero alpha pixels. The
25format uses subresolution images, recursively embedded into the format
26itself, for storing statistical data about the images, such as the used
27entropy codes, spatial predictors, color space conversion, and color
28table. LZ77, Huffman coding, and a color cache are used for compression
29of the bulk data. Decoding speeds faster than PNG have been
30demonstrated, as well as 25% denser compression than can be achieved
31using today's PNG format.
32
33
34* TOC placeholder
35{:toc}
36
37
38Nomenclature
39------------
40
41ARGB
42: A pixel value consisting of alpha, red, green, and blue values.
43
44ARGB image
45: A two-dimensional array containing ARGB pixels.
46
47color cache
48: A small hash-addressed array to store recently used colors, to be able
49  to recall them with shorter codes.
50
51color indexing image
52: A one-dimensional image of colors that can be indexed using a small
53  integer (up to 256 within WebP lossless).
54
55color transform image
56: A two-dimensional subresolution image containing data about
57  correlations of color components.
58
59distance mapping
60: Changes LZ77 distances to have the smallest values for pixels in 2D
61  proximity.
62
63entropy image
64: A two-dimensional subresolution image indicating which entropy coding
65  should be used in a respective square in the image, i.e., each pixel
66  is a meta Huffman code.
67
68Huffman code
69: A classic way to do entropy coding where a smaller number of bits are
70  used for more frequent codes.
71
72LZ77
73: Dictionary-based sliding window compression algorithm that either
74  emits symbols or describes them as sequences of past symbols.
75
76meta Huffman code
77: A small integer (up to 16 bits) that indexes an element in the meta
78  Huffman table.
79
80predictor image
81: A two-dimensional subresolution image indicating which spatial
82  predictor is used for a particular square in the image.
83
84prefix coding
85: A way to entropy code larger integers that codes a few bits of the
86  integer using an entropy code and codifies the remaining bits raw.
87  This allows for the descriptions of the entropy codes to remain
88  relatively small even when the range of symbols is large.
89
90scan-line order
91: A processing order of pixels, left-to-right, top-to-bottom, starting
92  from the left-hand-top pixel, proceeding to the right. Once a row is
93  completed, continue from the left-hand column of the next row.
94
95
961 Introduction
97--------------
98
99This document describes the compressed data representation of a WebP
100lossless image. It is intended as a detailed reference for WebP lossless
101encoder and decoder implementation.
102
103In this document, we extensively use C programming language syntax to
104describe the bitstream, and assume the existence of a function for
105reading bits, `ReadBits(n)`. The bytes are read in the natural order of
106the stream containing them, and bits of each byte are read in
107least-significant-bit-first order. When multiple bits are read at the
108same time, the integer is constructed from the original data in the
109original order. The most significant bits of the returned integer are
110also the most significant bits of the original data. Thus the statement
111
112~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113b = ReadBits(2);
114~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115
116is equivalent with the two statements below:
117
118~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119b = ReadBits(1);
120b |= ReadBits(1) << 1;
121~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122
123We assume that each color component (e.g. alpha, red, blue and green) is
124represented using an 8-bit byte. We define the corresponding type as
125uint8. A whole ARGB pixel is represented by a type called uint32, an
126unsigned integer consisting of 32 bits. In the code showing the behavior
127of the transformations, alpha value is codified in bits 31..24, red in
128bits 23..16, green in bits 15..8 and blue in bits 7..0, but
129implementations of the format are free to use another representation
130internally.
131
132Broadly, a WebP lossless image contains header data, transform
133information and actual image data. Headers contain width and height of
134the image. A WebP lossless image can go through four different types of
135transformation before being entropy encoded. The transform information
136in the bitstream contains the data required to apply the respective
137inverse transforms.
138
139
1402 RIFF Header
141-------------
142
143The beginning of the header has the RIFF container. This consists of the
144following 21 bytes:
145
146   1. String "RIFF"
147   2. A little-endian 32 bit value of the block length, the whole size
148      of the block controlled by the RIFF header. Normally this equals
149      the payload size (file size minus 8 bytes: 4 bytes for the 'RIFF'
150      identifier and 4 bytes for storing the value itself).
151   3. String "WEBP" (RIFF container name).
152   4. String "VP8L" (chunk tag for lossless encoded image data).
153   5. A little-endian 32-bit value of the number of bytes in the
154      lossless stream.
155   6. One byte signature 0x2f.
156
157The first 28 bits of the bitstream specify the width and height of the
158image. Width and height are decoded as 14-bit integers as follows:
159
160~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
161int image_width = ReadBits(14) + 1;
162int image_height = ReadBits(14) + 1;
163~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164
165The 14-bit dynamics for image size limit the maximum size of a WebP
166lossless image to 16384✕16384 pixels.
167
168The alpha_is_used bit is a hint only, and should not impact decoding.
169It should be set to 0 when all alpha values are 255 in the picture, and
1701 otherwise.
171
172~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
173int alpha_is_used = ReadBits(1);
174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175
176The version_number is a 3 bit code that must be set to 0. Any other value
177should be treated as an error. \[AMENDED\]
178
179~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
180int version_number = ReadBits(3);
181~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182
183
1843 Transformations
185-----------------
186
187Transformations are reversible manipulations of the image data that can
188reduce the remaining symbolic entropy by modeling spatial and color
189correlations. Transformations can make the final compression more dense.
190
191An image can go through four types of transformation. A 1 bit indicates
192the presence of a transform. Each transform is allowed to be used only
193once. The transformations are used only for the main level ARGB image:
194the subresolution images have no transforms, not even the 0 bit
195indicating the end-of-transforms.
196
197Typically an encoder would use these transforms to reduce the Shannon
198entropy in the residual image. Also, the transform data can be decided
199based on entropy minimization.
200
201~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
202while (ReadBits(1)) {  // Transform present.
203  // Decode transform type.
204  enum TransformType transform_type = ReadBits(2);
205  // Decode transform data.
206  ...
207}
208
209// Decode actual image data (Section 4).
210~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
211
212If a transform is present then the next two bits specify the transform
213type. There are four types of transforms.
214
215~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
216enum TransformType {
217  PREDICTOR_TRANSFORM             = 0,
218  COLOR_TRANSFORM                 = 1,
219  SUBTRACT_GREEN                  = 2,
220  COLOR_INDEXING_TRANSFORM        = 3,
221};
222~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
223
224The transform type is followed by the transform data. Transform data
225contains the information required to apply the inverse transform and
226depends on the transform type. Next we describe the transform data for
227different types.
228
229
230### Predictor Transform
231
232The predictor transform can be used to reduce entropy by exploiting the
233fact that neighboring pixels are often correlated. In the predictor
234transform, the current pixel value is predicted from the pixels already
235decoded (in scan-line order) and only the residual value (actual -
236predicted) is encoded. The _prediction mode_ determines the type of
237prediction to use. We divide the image into squares and all the pixels
238in a square use same prediction mode.
239
240The first 3 bits of prediction data define the block width and height in
241number of bits. The number of block columns, `block_xsize`, is used in
242indexing two-dimensionally.
243
244~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
245int size_bits = ReadBits(3) + 2;
246int block_width = (1 << size_bits);
247int block_height = (1 << size_bits);
248#define DIV_ROUND_UP(num, den) ((num) + (den) - 1) / (den))
249int block_xsize = DIV_ROUND_UP(image_width, 1 << size_bits);
250~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
251
252The transform data contains the prediction mode for each block of the
253image. All the `block_width * block_height` pixels of a block use same
254prediction mode. The prediction modes are treated as pixels of an image
255and encoded using the same techniques described in
256[Chapter 4](#image-data).
257
258For a pixel _x, y_, one can compute the respective filter block address
259by:
260
261~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262int block_index = (y >> size_bits) * block_xsize +
263                  (x >> size_bits);
264~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
265
266There are 14 different prediction modes. In each prediction mode, the
267current pixel value is predicted from one or more neighboring pixels
268whose values are already known.
269
270We choose the neighboring pixels (TL, T, TR, and L) of the current pixel
271(P) as follows:
272
273~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
274O    O    O    O    O    O    O    O    O    O    O
275O    O    O    O    O    O    O    O    O    O    O
276O    O    O    O    TL   T    TR   O    O    O    O
277O    O    O    O    L    P    X    X    X    X    X
278X    X    X    X    X    X    X    X    X    X    X
279X    X    X    X    X    X    X    X    X    X    X
280~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
281
282where TL means top-left, T top, TR top-right, L left pixel.
283At the time of predicting a value for P, all pixels O, TL, T, TR and L
284have been already processed, and pixel P and all pixels X are unknown.
285
286Given the above neighboring pixels, the different prediction modes are
287defined as follows.
288
289| Mode   | Predicted value of each channel of the current pixel    |
290| ------ | ------------------------------------------------------- |
291|  0     | 0xff000000 (represents solid black color in ARGB)       |
292|  1     | L                                                       |
293|  2     | T                                                       |
294|  3     | TR                                                      |
295|  4     | TL                                                      |
296|  5     | Average2(Average2(L, TR), T)                            |
297|  6     | Average2(L, TL)                                         |
298|  7     | Average2(L, T)                                          |
299|  8     | Average2(TL, T)                                         |
300|  9     | Average2(T, TR)                                         |
301| 10     | Average2(Average2(L, TL), Average2(T, TR))              |
302| 11     | Select(L, T, TL)                                        |
303| 12     | ClampAddSubtractFull(L, T, TL)                          |
304| 13     | ClampAddSubtractHalf(Average2(L, T), TL)                |
305
306
307`Average2` is defined as follows for each ARGB component:
308
309~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
310uint8 Average2(uint8 a, uint8 b) {
311  return (a + b) / 2;
312}
313~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
314
315The Select predictor is defined as follows:
316
317~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
318uint32 Select(uint32 L, uint32 T, uint32 TL) {
319  // L = left pixel, T = top pixel, TL = top left pixel.
320
321  // ARGB component estimates for prediction.
322  int pAlpha = ALPHA(L) + ALPHA(T) - ALPHA(TL);
323  int pRed = RED(L) + RED(T) - RED(TL);
324  int pGreen = GREEN(L) + GREEN(T) - GREEN(TL);
325  int pBlue = BLUE(L) + BLUE(T) - BLUE(TL);
326
327  // Manhattan distances to estimates for left and top pixels.
328  int pL = abs(pAlpha - ALPHA(L)) + abs(pRed - RED(L)) +
329           abs(pGreen - GREEN(L)) + abs(pBlue - BLUE(L));
330  int pT = abs(pAlpha - ALPHA(T)) + abs(pRed - RED(T)) +
331           abs(pGreen - GREEN(T)) + abs(pBlue - BLUE(T));
332
333  // Return either left or top, the one closer to the prediction.
334  if (pL < pT) {     // \[AMENDED\]
335    return L;
336  } else {
337    return T;
338  }
339}
340~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
341
342The functions `ClampAddSubtractFull` and `ClampAddSubtractHalf` are
343performed for each ARGB component as follows:
344
345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346// Clamp the input value between 0 and 255.
347int Clamp(int a) {
348  return (a < 0) ? 0 : (a > 255) ?  255 : a;
349}
350~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
351
352~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
353int ClampAddSubtractFull(int a, int b, int c) {
354  return Clamp(a + b - c);
355}
356~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
357
358~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
359int ClampAddSubtractHalf(int a, int b) {
360  return Clamp(a + (a - b) / 2);
361}
362~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
363
364There are special handling rules for some border pixels. If there is a
365prediction transform, regardless of the mode \[0..13\] for these pixels,
366the predicted value for the left-topmost pixel of the image is
3670xff000000, L-pixel for all pixels on the top row, and T-pixel for all
368pixels on the leftmost column.
369
370Addressing the TR-pixel for pixels on the rightmost column is
371exceptional. The pixels on the rightmost column are predicted by using
372the modes \[0..13\] just like pixels not on border, but by using the
373leftmost pixel on the same row as the current TR-pixel. The TR-pixel
374offset in memory is the same for border and non-border pixels.
375
376
377### Color Transform
378
379The goal of the color transform is to decorrelate the R, G and B values
380of each pixel. Color transform keeps the green (G) value as it is,
381transforms red (R) based on green and transforms blue (B) based on green
382and then based on red.
383
384As is the case for the predictor transform, first the image is divided
385into blocks and the same transform mode is used for all the pixels in a
386block. For each block there are three types of color transform elements.
387
388~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
389typedef struct {
390  uint8 green_to_red;
391  uint8 green_to_blue;
392  uint8 red_to_blue;
393} ColorTransformElement;
394~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
395
396The actual color transformation is done by defining a color transform
397delta. The color transform delta depends on the `ColorTransformElement`,
398which is the same for all the pixels in a particular block. The delta is
399added during color transform. The inverse color transform then is just
400subtracting those deltas.
401
402The color transform function is defined as follows:
403
404~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405void ColorTransform(uint8 red, uint8 blue, uint8 green,
406                    ColorTransformElement *trans,
407                    uint8 *new_red, uint8 *new_blue) {
408  // Transformed values of red and blue components
409  uint32 tmp_red = red;
410  uint32 tmp_blue = blue;
411
412  // Applying transform is just adding the transform deltas
413  tmp_red  += ColorTransformDelta(trans->green_to_red, green);
414  tmp_blue += ColorTransformDelta(trans->green_to_blue, green);
415  tmp_blue += ColorTransformDelta(trans->red_to_blue, red);
416
417  *new_red = tmp_red & 0xff;
418  *new_blue = tmp_blue & 0xff;
419}
420~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
421
422`ColorTransformDelta` is computed using a signed 8-bit integer
423representing a 3.5-fixed-point number, and a signed 8-bit RGB color
424channel (c) \[-128..127\] and is defined as follows:
425
426~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
427int8 ColorTransformDelta(int8 t, int8 c) {
428  return (t * c) >> 5;
429}
430~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
431
432A conversion from the 8-bit unsigned representation (uint8) to the 8-bit
433signed one (int8) is required before calling ColorTransformDelta().
434It should be performed using 8-bit two's complement (that is: uint8 range
435\[128-255\] is mapped to the \[-128, -1\] range of its converted int8 value).
436
437The multiplication is to be done using more precision (with at least
43816-bit dynamics). The sign extension property of the shift operation
439does not matter here: only the lowest 8 bits are used from the result,
440and there the sign extension shifting and unsigned shifting are
441consistent with each other.
442
443Now we describe the contents of color transform data so that decoding
444can apply the inverse color transform and recover the original red and
445blue values. The first 3 bits of the color transform data contain the
446width and height of the image block in number of bits, just like the
447predictor transform:
448
449~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
450int size_bits = ReadBits(3) + 2;
451int block_width = 1 << size_bits;
452int block_height = 1 << size_bits;
453~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
454
455The remaining part of the color transform data contains
456`ColorTransformElement` instances corresponding to each block of the
457image. `ColorTransformElement` instances are treated as pixels of an
458image and encoded using the methods described in
459[Chapter 4](#image-data).
460
461During decoding, `ColorTransformElement` instances of the blocks are
462decoded and the inverse color transform is applied on the ARGB values of
463the pixels. As mentioned earlier, that inverse color transform is just
464subtracting `ColorTransformElement` values from the red and blue
465channels.
466
467~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
468void InverseTransform(uint8 red, uint8 green, uint8 blue,
469                      ColorTransformElement *p,
470                      uint8 *new_red, uint8 *new_blue) {
471  // Applying inverse transform is just subtracting the
472  // color transform deltas
473  red  -= ColorTransformDelta(p->green_to_red_,  green);
474  blue -= ColorTransformDelta(p->green_to_blue_, green);
475  blue -= ColorTransformDelta(p->red_to_blue_, red & 0xff);
476
477  *new_red = red & 0xff;
478  *new_blue = blue & 0xff;
479}
480~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
481
482
483### Subtract Green Transform
484
485The subtract green transform subtracts green values from red and blue
486values of each pixel. When this transform is present, the decoder needs
487to add the green value to both red and blue. There is no data associated
488with this transform. The decoder applies the inverse transform as
489follows:
490
491~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
492void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) {
493  *red  = (*red  + green) & 0xff;
494  *blue = (*blue + green) & 0xff;
495}
496~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
497
498This transform is redundant as it can be modeled using the color
499transform, but it is still often useful. Since it can extend the
500dynamics of the color transform and there is no additional data here,
501the subtract green transform can be coded using fewer bits than a
502full-blown color transform.
503
504
505### Color Indexing Transform
506
507If there are not many unique pixel values, it may be more efficient to
508create a color index array and replace the pixel values by the array's
509indices. The color indexing transform achieves this. (In the context of
510WebP lossless, we specifically do not call this a palette transform
511because a similar but more dynamic concept exists in WebP lossless
512encoding: color cache.)
513
514The color indexing transform checks for the number of unique ARGB values
515in the image. If that number is below a threshold (256), it creates an
516array of those ARGB values, which is then used to replace the pixel
517values with the corresponding index: the green channel of the pixels are
518replaced with the index; all alpha values are set to 255; all red and
519blue values to 0.
520
521The transform data contains color table size and the entries in the
522color table. The decoder reads the color indexing transform data as
523follows:
524
525~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
526// 8 bit value for color table size
527int color_table_size = ReadBits(8) + 1;
528~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
529
530The color table is stored using the image storage format itself. The
531color table can be obtained by reading an image, without the RIFF
532header, image size, and transforms, assuming a height of one pixel and
533a width of `color_table_size`. The color table is always
534subtraction-coded to reduce image entropy. The deltas of palette colors
535contain typically much less entropy than the colors themselves, leading
536to significant savings for smaller images. In decoding, every final
537color in the color table can be obtained by adding the previous color
538component values by each ARGB component separately, and storing the
539least significant 8 bits of the result.
540
541The inverse transform for the image is simply replacing the pixel values
542(which are indices to the color table) with the actual color table
543values. The indexing is done based on the green component of the ARGB
544color.
545
546~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
547// Inverse transform
548argb = color_table[GREEN(argb)];
549~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
550
551If the index is equal or larger than color_table_size, the argb color value
552should be set to 0x00000000 (transparent black).  \[AMENDED\]
553
554When the color table is small (equal to or less than 16 colors), several
555pixels are bundled into a single pixel. The pixel bundling packs several
556(2, 4, or 8) pixels into a single pixel, reducing the image width
557respectively. Pixel bundling allows for a more efficient joint
558distribution entropy coding of neighboring pixels, and gives some
559arithmetic coding-like benefits to the entropy code, but it can only be
560used when there are a small number of unique values.
561
562`color_table_size` specifies how many pixels are combined together:
563
564~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
565int width_bits;
566if (color_table_size <= 2) {
567  width_bits = 3;
568} else if (color_table_size <= 4) {
569  width_bits = 2;
570} else if (color_table_size <= 16) {
571  width_bits = 1;
572} else {
573  width_bits = 0;
574}
575~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
576
577`width_bits` has a value of 0, 1, 2 or 3. A value of 0 indicates no
578pixel bundling to be done for the image. A value of 1 indicates that two
579pixels are combined together, and each pixel has a range of \[0..15\]. A
580value of 2 indicates that four pixels are combined together, and each
581pixel has a range of \[0..3\]. A value of 3 indicates that eight pixels
582are combined together and each pixel has a range of \[0..1\], i.e., a
583binary value.
584
585The values are packed into the green component as follows:
586
587  * `width_bits` = 1: for every x value where x0 (mod 2), a green
588    value at x is positioned into the 4 least-significant bits of the
589    green value at x / 2, a green value at x + 1 is positioned into the
590    4 most-significant bits of the green value at x / 2.
591  * `width_bits` = 2: for every x value where x0 (mod 4), a green
592    value at x is positioned into the 2 least-significant bits of the
593    green value at x / 4, green values at x + 1 to x + 3 in order to the
594    more significant bits of the green value at x / 4.
595  * `width_bits` = 3: for every x value where x0 (mod 8), a green
596    value at x is positioned into the least-significant bit of the green
597    value at x / 8, green values at x + 1 to x + 7 in order to the more
598    significant bits of the green value at x / 8.
599
600
6014 Image Data
602------------
603
604Image data is an array of pixel values in scan-line order.
605
606### 4.1 Roles of Image Data
607
608We use image data in five different roles:
609
610  1. ARGB image: Stores the actual pixels of the image.
611  1. Entropy image: Stores the
612     [meta Huffman codes](#decoding-of-meta-huffman-codes). The red and green
613     components of a pixel define the meta Huffman code used in a particular
614     block of the ARGB image.
615  1. Predictor image: Stores the metadata for [Predictor
616     Transform](#predictor-transform). The green component of a pixel defines
617     which of the 14 predictors is used within a particular block of the
618     ARGB image.
619  1. Color transform image. It is created by `ColorTransformElement` values
620     (defined in [Color Transform](#color-transform)) for different blocks of
621     the image. Each `ColorTransformElement` `'cte'` is treated as a pixel whose
622     alpha component is `255`, red component is `cte.red_to_blue`, green
623     component is `cte.green_to_blue` and blue component is `cte.green_to_red`.
624  1. Color indexing image: An array of of size `color_table_size` (up to 256
625     ARGB values) storing the metadata for the
626     [Color Indexing Transform](#color-indexing-transform). This is stored as an
627     image of width `color_table_size` and height `1`.
628
629### 4.2 Encoding of Image data
630
631The encoding of image data is independent of its role.
632
633The image is first divided into a set of fixed-size blocks (typically 16x16
634blocks). Each of these blocks are modeled using their own entropy codes. Also,
635several blocks may share the same entropy codes.
636
637**Rationale:** Storing an entropy code incurs a cost. This cost can be minimized
638if statistically similar blocks share an entropy code, thereby storing that code
639only once. For example, an encoder can find similar blocks by clustering them
640using their statistical properties, or by repeatedly joining a pair of randomly
641selected clusters when it reduces the overall amount of bits needed to encode
642the image.
643
644Each pixel is encoded using one of the three possible methods:
645
646  1. Huffman coded literal: each channel (green, red, blue and alpha) is
647     entropy-coded independently;
648  2. LZ77 backward reference: a sequence of pixels are copied from elsewhere
649     in the image; or
650  3. Color cache code: using a short multiplicative hash code (color cache
651     index) of a recently seen color.
652
653The following sub-sections describe each of these in detail.
654
655#### 4.2.1 Huffman Coded Literals
656
657The pixel is stored as Huffman coded values of green, red, blue and alpha (in
658that order). See [this section](#decoding-entropy-coded-image-data) for details.
659
660#### 4.2.2 LZ77 Backward Reference
661
662Backward references are tuples of _length_ and _distance code_:
663
664  * Length indicates how many pixels in scan-line order are to be copied.
665  * Distance code is a number indicating the position of a previously seen
666    pixel, from which the pixels are to be copied. The exact mapping is
667    described [below](#distance-mapping).
668
669The length and distance values are stored using **LZ77 prefix coding**.
670
671LZ77 prefix coding divides large integer values into two parts: the _prefix
672code_ and the _extra bits_: the prefix code is stored using an entropy code,
673while the extra bits are stored as they are (without an entropy code).
674
675**Rationale**: This approach reduces the storage requirement for the entropy
676code. Also, large values are usually rare, and so extra bits would be used for
677very few values in the image. Thus, this approach results in a better
678compression overall.
679
680The following table denotes the prefix codes and extra bits used for storing
681different range of values.
682
683Note: The maximum backward reference length is limited to 4096. Hence, only the
684first 24 prefix codes (with the respective extra bits) are meaningful for length
685values. For distance values, however, all the 40 prefix codes are valid.
686
687| Value range     | Prefix code | Extra bits |
688| --------------- | ----------- | ---------- |
689| 1               | 0           | 0          |
690| 2               | 1           | 0          |
691| 3               | 2           | 0          |
692| 4               | 3           | 0          |
693| 5..6            | 4           | 1          |
694| 7..8            | 5           | 1          |
695| 9..12           | 6           | 2          |
696| 13..16          | 7           | 2          |
697| ...             | ...         | ...        |
698| 3072..4096      | 23          | 10         |
699| ...             | ...         | ...        |
700| 524289..786432  | 38          | 18         |
701| 786433..1048576 | 39          | 18         |
702
703The pseudocode to obtain a (length or distance) value from the prefix code is
704as follows:
705
706~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
707if (prefix_code < 4) {
708  return prefix_code + 1;
709}
710int extra_bits = (prefix_code - 2) >> 1;
711int offset = (2 + (prefix_code & 1)) << extra_bits;
712return offset + ReadBits(extra_bits) + 1;
713~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
714
715**Distance Mapping:**
716{:#distance-mapping}
717
718As noted previously, distance code is a number indicating the position of a
719previously seen pixel, from which the pixels are to be copied. This sub-section
720defines the mapping between a distance code and the position of a previous
721pixel.
722
723The distance codes larger than 120 denote the pixel-distance in scan-line
724order, offset by 120.
725
726The smallest distance codes \[1..120\] are special, and are reserved for a close
727neighborhood of the current pixel. This neighborhood consists of 120 pixels:
728
729  * Pixels that are 1 to 7 rows above the current pixel, and are up to 8 columns
730    to the left or up to 7 columns to the right of the current pixel. \[Total
731    such pixels = `7 * (8 + 1 + 7) = 112`\].
732  * Pixels that are in same row as the current pixel, and are up to 8 columns to
733    the left of the current pixel. \[`8` such pixels\].
734
735The mapping between distance code `i` and the neighboring pixel offset
736`(xi, yi)` is as follows:
737
738~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
739(0, 1),  (1, 0),  (1, 1),  (-1, 1), (0, 2),  (2, 0),  (1, 2),  (-1, 2),
740(2, 1),  (-2, 1), (2, 2),  (-2, 2), (0, 3),  (3, 0),  (1, 3),  (-1, 3),
741(3, 1),  (-3, 1), (2, 3),  (-2, 3), (3, 2),  (-3, 2), (0, 4),  (4, 0),
742(1, 4),  (-1, 4), (4, 1),  (-4, 1), (3, 3),  (-3, 3), (2, 4),  (-2, 4),
743(4, 2),  (-4, 2), (0, 5),  (3, 4),  (-3, 4), (4, 3),  (-4, 3), (5, 0),
744(1, 5),  (-1, 5), (5, 1),  (-5, 1), (2, 5),  (-2, 5), (5, 2),  (-5, 2),
745(4, 4),  (-4, 4), (3, 5),  (-3, 5), (5, 3),  (-5, 3), (0, 6),  (6, 0),
746(1, 6),  (-1, 6), (6, 1),  (-6, 1), (2, 6),  (-2, 6), (6, 2),  (-6, 2),
747(4, 5),  (-4, 5), (5, 4),  (-5, 4), (3, 6),  (-3, 6), (6, 3),  (-6, 3),
748(0, 7),  (7, 0),  (1, 7),  (-1, 7), (5, 5),  (-5, 5), (7, 1),  (-7, 1),
749(4, 6),  (-4, 6), (6, 4),  (-6, 4), (2, 7),  (-2, 7), (7, 2),  (-7, 2),
750(3, 7),  (-3, 7), (7, 3),  (-7, 3), (5, 6),  (-5, 6), (6, 5),  (-6, 5),
751(8, 0),  (4, 7),  (-4, 7), (7, 4),  (-7, 4), (8, 1),  (8, 2),  (6, 6),
752(-6, 6), (8, 3),  (5, 7),  (-5, 7), (7, 5),  (-7, 5), (8, 4),  (6, 7),
753(-6, 7), (7, 6),  (-7, 6), (8, 5),  (7, 7),  (-7, 7), (8, 6),  (8, 7)
754~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
755
756For example, distance code `1` indicates offset of `(0, 1)` for the neighboring
757pixel, that is, the pixel above the current pixel (0-pixel difference in
758X-direction and 1 pixel difference in Y-direction). Similarly, distance code
759`3` indicates left-top pixel.
760
761The decoder can convert a distances code 'i' to a scan-line order distance
762'dist' as follows:
763
764~~~~~~~~~~~~~~~~~~~~~~~~~~~~
765(xi, yi) = distance_map[i]
766dist = x + y * xsize
767if (dist < 1) {
768  dist = 1
769}
770~~~~~~~~~~~~~~~~~~~~~~~~~~~~
771
772where 'distance_map' is the mapping noted above and `xsize` is the width of the
773image in pixels.
774
775
776#### 4.2.3 Color Cache Coding
777
778Color cache stores a set of colors that have been recently used in the image.
779
780**Rationale:** This way, the recently used colors can sometimes be referred to
781more efficiently than emitting them using other two methods (described in
782[4.2.1](#huffman-coded-literals) and [4.2.2](#lz77-backward-reference)).
783
784Color cache codes are stored as follows. First, there is a 1-bit value that
785indicates if the color cache is used. If this bit is 0, no color cache codes
786exist, and they are not transmitted in the Huffman code that decodes the green
787symbols and the length prefix codes. However, if this bit is 1, the color cache
788size is read next:
789
790~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
791int color_cache_code_bits = ReadBits(4);
792int color_cache_size = 1 << color_cache_code_bits;
793~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
794
795`color_cache_code_bits` defines the size of the color_cache by (1 <<
796`color_cache_code_bits`). The range of allowed values for
797`color_cache_code_bits` is \[1..11\]. Compliant decoders must indicate a
798corrupted bitstream for other values.
799
800A color cache is an array of size `color_cache_size`. Each entry
801stores one ARGB color. Colors are looked up by indexing them by
802(0x1e35a7bd * `color`) >> (32 - `color_cache_code_bits`). Only one
803lookup is done in a color cache; there is no conflict resolution.
804
805In the beginning of decoding or encoding of an image, all entries in all
806color cache values are set to zero. The color cache code is converted to
807this color at decoding time. The state of the color cache is maintained
808by inserting every pixel, be it produced by backward referencing or as
809literals, into the cache in the order they appear in the stream.
810
811
8125 Entropy Code
813--------------
814
815### 5.1 Overview
816
817Most of the data is coded using [canonical Huffman code][canonical_huff]. Hence,
818the codes are transmitted by sending the _Huffman code lengths_, as opposed to
819the actual _Huffman codes_.
820
821In particular, the format uses **spatially-variant Huffman coding**. In other
822words, different blocks of the image can potentially use different entropy
823codes.
824
825**Rationale**: Different areas of the image may have different characteristics. So, allowing them to use different entropy codes provides more flexibility and
826potentially a better compression.
827
828### 5.2 Details
829
830The encoded image data consists of two parts:
831
832  1. Meta Huffman codes
833  1. Entropy-coded image data
834
835#### 5.2.1 Decoding of Meta Huffman Codes
836
837As noted earlier, the format allows the use of different Huffman codes for
838different blocks of the image. _Meta Huffman codes_ are indexes identifying
839which Huffman codes to use in different parts of the image.
840
841Meta Huffman codes may be used _only_ when the image is being used in the
842[role](#roles-of-image-data) of an _ARGB image_.
843
844There are two possibilities for the meta Huffman codes, indicated by a 1-bit
845value:
846
847  * If this bit is zero, there is only one meta Huffman code used everywhere in
848    the image. No more data is stored.
849  * If this bit is one, the image uses multiple meta Huffman codes. These meta
850    Huffman codes are stored as an _entropy image_ (described below).
851
852**Entropy image:**
853
854The entropy image defines which Huffman codes are used in different parts of the
855image, as described below.
856
857The first 3-bits contain the `huffman_bits` value. The dimensions of the entropy
858image are derived from 'huffman_bits'.
859
860~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
861int huffman_bits = ReadBits(3) + 2;
862int huffman_xsize = DIV_ROUND_UP(xsize, 1 << huffman_bits);
863int huffman_ysize = DIV_ROUND_UP(ysize, 1 << huffman_bits);
864~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
865
866where `DIV_ROUND_UP` is as defined [earlier](#predictor-transform).
867
868Next bits contain an entropy image of width `huffman_xsize` and height
869`huffman_ysize`.
870
871**Interpretation of Meta Huffman Codes:**
872
873For any given pixel (x, y), there is a set of five Huffman codes associated with
874it. These codes are (in bitstream order):
875
876  * **Huffman code #1**: used for green channel, backward-reference length and
877    color cache
878  * **Huffman code #2, #3 and #4**: used for red, blue and alpha channels
879    respectively.
880  * **Huffman code #5**: used for backward-reference distance.
881
882From here on, we refer to this set as a **Huffman code group**.
883
884The number of Huffman code groups in the ARGB image can be obtained by finding
885the _largest meta Huffman code_ from the entropy image:
886
887~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
888int num_huff_groups = max(entropy image) + 1;
889~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
890where `max(entropy image)` indicates the largest Huffman code stored in the
891entropy image.
892
893As each Huffman code groups contains five Huffman codes, the total number of
894Huffman codes is:
895
896~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
897int num_huff_codes = 5 * num_huff_groups;
898~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
899
900Given a pixel (x, y) in the ARGB image, we can obtain the corresponding Huffman
901codes to be used as follows:
902
903~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
904int position = (y >> huffman_bits) * huffman_xsize + (x >> huffman_bits);
905int meta_huff_code = (entropy_image[pos] >> 8) & 0xffff;
906HuffmanCodeGroup huff_group = huffman_code_groups[meta_huff_code];
907~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
908
909where, we have assumed the existence of `HuffmanCodeGroup` structure, which
910represents a set of five Huffman codes. Also, `huffman_code_groups` is an array
911of `HuffmanCodeGroup` (of size `num_huff_groups`).
912
913The decoder then uses Huffman code group `huff_group` to decode the pixel
914(x, y) as explained in the [next section](#decoding-entropy-coded-image-data).
915
916#### 5.2.2 Decoding Entropy-coded Image Data
917
918For the current position (x, y) in the image, the decoder first identifies the
919corresponding Huffman code group (as explained in the last section). Given the
920Huffman code group, the pixel is read and decoded as follows:
921
922Read next symbol S from the bitstream using Huffman code #1. \[See
923[next section](#decoding-the-code-lengths) for details on decoding the Huffman
924code lengths\]. Note that S is any integer in the range `0` to
925`(256 + 24 + ` [`color_cache_size`](#color-cache-code)`- 1)`.
926
927The interpretation of S depends on its value:
928
929  1. if S < 256
930     1. Use S as the green component
931     1. Read red from the bitstream using Huffman code #2
932     1. Read blue from the bitstream using Huffman code #3
933     1. Read alpha from the bitstream using Huffman code #4
934  1. if S < 256 + 24
935     1. Use S - 256 as a length prefix code
936     1. Read extra bits for length from the bitstream
937     1. Determine backward-reference length L from length prefix code and the
938        extra bits read.
939     1. Read distance prefix code from the bitstream using Huffman code #5
940     1. Read extra bits for distance from the bitstream
941     1. Determine backward-reference distance D from distance prefix code and
942        the extra bits read.
943     1. Copy the L pixels (in scan-line order) from the sequence of pixels
944        prior to them by D pixels.
945  1. if S >= 256 + 24
946     1. Use S - (256 + 24) as the index into the color cache.
947     1. Get ARGB color from the color cache at that index.
948
949
950**Decoding the Code Lengths:**
951{:#decoding-the-code-lengths}
952
953This section describes the details about reading a symbol from the bitstream by
954decoding the Huffman code length.
955
956The Huffman code lengths can be coded in two ways. The method used is specified
957by a 1-bit value.
958
959  * If this bit is 1, it is a _simple code length code_, and
960  * If this bit is 0, it is a _normal code length code_.
961
962**(i) Simple Code Length Code:**
963
964This variant is used in the special case when only 1 or 2 Huffman code lengths
965are non-zero, and are in the range of \[0, 255\]. All other Huffman code lengths
966are implicitly zeros.
967
968The first bit indicates the number of non-zero code lengths:
969
970~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
971int num_code_lengths = ReadBits(1) + 1;
972~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
973
974The first code length is stored either using a 1-bit code for values of 0 and 1,
975or using an 8-bit code for values in range \[0, 255\]. The second code length,
976when present, is coded as an 8-bit code.
977
978~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
979int is_first_8bits = ReadBits(1);
980code_lengths[0] = ReadBits(1 + 7 * is_first_8bits);
981if (num_code_lengths == 2) {
982  code_lengths[1] = ReadBits(8);
983}
984~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
985
986**Note:** Another special case is when _all_ Huffman code lengths are _zeros_
987(an empty Huffman code). For example, a Huffman code for distance can be empty
988if there are no backward references. Similarly, Huffman codes for alpha, red,
989and blue can be empty if all pixels within the same meta Huffman code are
990produced using the color cache. However, this case doesn't need a special
991handling, as empty Huffman codes can be coded as those containing a single
992symbol `0`.
993
994**(ii) Normal Code Length Code:**
995
996The code lengths of a Huffman code are read as follows: `num_code_lengths`
997specifies the number of code lengths; the rest of the code lengths
998(according to the order in `kCodeLengthCodeOrder`) are zeros.
999
1000~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1001int kCodeLengthCodes = 19;
1002int kCodeLengthCodeOrder[kCodeLengthCodes] = {
1003  17, 18, 0, 1, 2, 3, 4, 5, 16, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
1004};
1005int code_lengths[kCodeLengthCodes] = { 0 };  // All zeros.
1006int num_code_lengths = 4 + ReadBits(4);
1007for (i = 0; i < num_code_lengths; ++i) {
1008  code_lengths[kCodeLengthCodeOrder[i]] = ReadBits(3);
1009}
1010~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1011
1012  * Code length code \[0..15\] indicates literal code lengths.
1013    * Value 0 means no symbols have been coded.
1014    * Values \[1..15\] indicate the bit length of the respective code.
1015  * Code 16 repeats the previous non-zero value \[3..6\] times, i.e.,
1016    3 + `ReadBits(2)` times.  If code 16 is used before a non-zero
1017    value has been emitted, a value of 8 is repeated.
1018  * Code 17 emits a streak of zeros \[3..10\], i.e., 3 + `ReadBits(3)`
1019    times.
1020  * Code 18 emits a streak of zeros of length \[11..138\], i.e.,
1021    11 + `ReadBits(7)` times.
1022
1023
10246 Overall Structure of the Format
1025---------------------------------
1026
1027Below is a view into the format in Backus-Naur form. It does not cover
1028all details. End-of-image (EOI) is only implicitly coded into the number
1029of pixels (xsize * ysize).
1030
1031
1032#### Basic Structure
1033
1034~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1035<format> ::= <RIFF header><image size><image stream>
1036<image stream> ::= <optional-transform><spatially-coded image>
1037~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1038
1039
1040#### Structure of Transforms
1041
1042~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1043<optional-transform> ::= (1-bit value 1; <transform> <optional-transform>) |
1044                         1-bit value 0
1045<transform> ::= <predictor-tx> | <color-tx> | <subtract-green-tx> |
1046                <color-indexing-tx>
1047<predictor-tx> ::= 2-bit value 0; <predictor image>
1048<predictor image> ::= 3-bit sub-pixel code ; <entropy-coded image>
1049<color-tx> ::= 2-bit value 1; <color image>
1050<color image> ::= 3-bit sub-pixel code ; <entropy-coded image>
1051<subtract-green-tx> ::= 2-bit value 2
1052<color-indexing-tx> ::= 2-bit value 3; <color-indexing image>
1053<color-indexing image> ::= 8-bit color count; <entropy-coded image>
1054~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1055
1056
1057#### Structure of the Image Data
1058
1059~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1060<spatially-coded image> ::= <meta huffman><entropy-coded image>
1061<entropy-coded image> ::= <color cache info><huffman codes><lz77-coded image>
1062<meta huffman> ::= 1-bit value 0 |
1063                   (1-bit value 1; <entropy image>)
1064<entropy image> ::= 3-bit subsample value; <entropy-coded image>
1065<color cache info> ::= 1 bit value 0 |
1066                       (1-bit value 1; 4-bit value for color cache size)
1067<huffman codes> ::= <huffman code group> | <huffman code group><huffman codes>
1068<huffman code group> ::= <huffman code><huffman code><huffman code>
1069                         <huffman code><huffman code>
1070                         See "Interpretation of Meta Huffman codes" to
1071                         understand what each of these five Huffman codes are
1072                         for.
1073<huffman code> ::= <simple huffman code> | <normal huffman code>
1074<simple huffman code> ::= see "Simple code length code" for details
1075<normal huffman code> ::= <code length code>; encoded code lengths
1076<code length code> ::= see section "Normal code length code"
1077<lz77-coded image> ::= ((<argb-pixel> | <lz77-copy> | <color-cache-code>)
1078                       <lz77-coded image>) | ""
1079~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1080
1081A possible example sequence:
1082
1083~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1084<RIFF header><image size>1-bit value 1<subtract-green-tx>
10851-bit value 1<predictor-tx>1-bit value 0<meta huffman>
1086<color cache info><huffman codes>
1087<lz77-coded image>
1088~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1089
1090[canonical_huff]: http://en.wikipedia.org/wiki/Canonical_Huffman_code
1091