Algorithms Featured

Implementing JPEG Image Compression Algorithm using MATLAB

In this tutorial, we will implement a image compression algorithm using MATLAB. JPEG stands for Joint Photographic Experts Group, which was a group of image processing experts that devised a standard for compressing images (ISO).

Ex10si0n Yan

Nov 14, 2022 • 9 min read

Hence, JPEG (or JPG) is not really a file format but rather an image compression standard. The JPEG standard is complicated with many different options and color space regulations. It was not widely adopted. A much simpler standard version was advocated at the same time, called JFIF. This is the image compression algorithm that most people mean when they say JPEG compression, and the one that we will be describing in this class. Note that the file extensions .jpeg and .jpg have stuck, even though the underneath algorithm is (strictly speaking) JFIF compression.

The Underlying Assumptions of the JPEG Algorithm

The JPEG algorithm is designed specifically for the human eye. It exploits the following biological properties of human sight:

We are more sensitive to the illuminocity of color, rather than the chromatric value of an image
We are not particularly sensitive to high-frequency content in images.

Hence, the algorithm can be neatly illustrated in the following diagram:

Matlab Implementation: Encoder

Assume img is the image we have read in. In JPEG compression, all compression processes should be carried on in YCbCr color space. In this color space, Cb is the blue component relative to the green component while Cr is the red component relative to the green component.

Here is the built-in Matlab code for converting the color space.

% RGB to YCbCr
Y = rgb2ycbcr(img);
Y_dsp = Y;

The YCbCr is a more convenient colorspace for image compression because it separates an image's illuminance and chromatic strength. Since our eyes are not particularly sensitive to chrominance, we can "downsample" that. Here, half the amount of "color" and generate the Y_dsp representing downsampled Y. The difference between original and downsampled image is imperceptible, as you can see from the result. (Image from pi.math.cornell.edu)

% Downsample Cb and Cr channels
Y_dsp(:,:,2) = 2 * round(Y_dsp(:,:,2)/2);
Y_dsp(:,:,3) = 2 * round(Y_dsp(:,:,3)/2);

Following the steps, we need to apply the following processes for each 8x8 blocks in the image. Y_dct is the matrix contains frequency domain of original image specifically for every 8x8 block. In another word, we apply discrete cosine transform (DCT) for every 8x8 block and store these 8x8 results matrices into Y_dct.

% seperate image as 8x8 blocks
[row, column, ~] = size(Y_dsp);
Y_dct = zeros(size(Y_dsp));

for channel = 1:3
    for j = 1:8:row - 7
        for k = 1:8:column - 7
        % DCT
        % quantization
        % zig-zag
        % entropy coding
        % encapsulation
        end
    end
end

Then we apply DCT inside the preceding tripe for loop, this process transform the original image from spatial domain to frequency domain:

% 2D DCT
Y_dct(j:j+7, k:k+7, channel) = dct2(Y_dsp(j:j+7, k:k+7, channel));

Quantization is a process in which we take a couple of values in a specific range and turns them into a discrete value. (also called discretization). It is intuitive that after DCT, quantization converts the higher frequency coefficients in the output matrix to 0.

Besides, the intensity of quantization determines the extend of data retaining for the image. Quantization is applied and how much of higher frequency information is lost, and this is the reason why JPEG is a lossy compression.

Since we have got the matrix in frequency domain, we could apply quantization using a quantization table Q.

Q = [16 11 10 16 24 40 51 61 ;
     12 12 14 19 26 28 60 55 ;
     14 13 16 24 40 57 69 56 ;
     14 17 22 29 51 87 80 62 ;
     18 22 37 56 68 109 103 77 ;
     24 35 55 64 81 104 113 92 ;
     49 64 78 87 103 121 120 101 ;
     72 92 95 98 112 100 103 99];

% Quantization using table Q
Y_dct(j:j+7, k:k+7, channel) = round(Y_dct(j:j+7, k:k+7, channel) ./ (Q * quality_coef));

After quantization, the image has been compressed a lot, however; in order to have a extreme compression, we should consider the potential of compressing the quantized data using entropy coding (i.e. Huffman Coding).

However, since Y_dct now is a 2D matrix, we need to convert it to 1D firstly. Yet, there is many ways of converting 2D array to 1D. Due to the characteristic of "DCTed" array - large numbers always lies on top-left, zig-zag is a better choice.

To implement zig-zag, we could declare a zig-zag table as follows.

zigzag = [1 2 6 7 15 16 28 29;
          3 5 8 14 17 27 30 43;
          4 9 13 18 26 31 42 44;
          10 12 19 25 32 41 45 54;
          11 20 24 33 40 46 53 55;
          21 23 34 39 47 52 56 61;
          22 35 38 48 51 57 60 62;
          36 37 49 50 58 59 63 64];

This table is designed by the process of zig-zagging, which is illustrated in the following diagram. Zig-zag ensures top-left first, then positive (non-negative) numbers in the generated 1D array should be lies on the staring points.

Here is the code for zig-zagging.

% Zigzag scan and convert to 1D array
Y_lin = zeros(1, 64);
for i = 1:8
    Y_lin(zigzag(i, :)) = Y_dct(j+i-1, k:k+7, channel);
end

Till now, we have turned the 2D Y_dct into Y_lin which represents linear 1D array. To better handling this 1D array, DC and AC are two essential part to be implemented.

DC is the different first digit of Y_lin between the preceding first digit for every adjacent 8x8 blocks. AC is the rest of digits (2, 64) in Y_lin for the current 8x8 block.

% Extract DC (first digit diff)
if first == 1
    DC_prev = Y_lin(1);
    DC = Y_lin(1);
    first = 0;
else
    DC = Y_lin(1) - DC_prev;
    DC_prev = Y_lin(1);
end

% Extract AC
AC = Y_lin(2:end);

AC should be applied with Run Length Encoding (RLE) if you are not familiar with this, please refer: filestore.aqa.org.uk(PDF). The code below is RLE codec.

function bit_stream = rle(input_bits, mode)
    bit_stream = [];
    % encoder
    if mode == "en"
        bit_stream = input_bits(1);
        count = 1;
        for i = 2:length(input_bits)
            if input_bits(i) == input_bits(i-1)
                count = count + 1;
            else
                bit_stream = [bit_stream, count, input_bits(i)];
                count = 1;
            end
        end
        bit_stream = [bit_stream, count];
    end
    
    % decoder
    if mode == "de"
        for i = 1:2:length(input_bits)
            bit_stream = [bit_stream repmat(input_bits(i), 1, input_bits(i+1))];
        end
    end
end

Run Length Codec

Hence, we could apply RLE to AC, and append BC, AC to a data payload.

% Run length encoding
AC = rle(AC, "en");

bit_string = [DC, AC];

Combine AC DC to data payload

Huffman encoding is a pretty concise and elaborate entropy encoding yields optimal prefix code that is commonly used for lossless data compression. Code for a simple implementation of Huffman encoding in Python could be found at:

Then we could apply built-in Huffman encoding to all payloads all at onces.

% Huffman encoding on all [DC + AC]
% bit_stream is concatenation of bit_strings

[unique_symbols, ~, symbol_indices] = unique(bit_stream);
symbol_counts = accumarray(symbol_indices(:), 1);
symbol_counts = symbol_counts / sum(symbol_counts);
dict = huffmandict(unique_symbols, symbol_counts);
bit_stream = huffmanenco(bit_stream, dict);

Huffman Encoding

Then we could store dict into header and bit_stream as compressed image data, they encapsulate them to a single file. That's it, we have implemented a JPEG encoder (Image to bitstream) now.

Matlab Implementation: Bit stream

In the generated data (say bit stream), it should store some information inside its header, such as, dictionary header_dict used for huffman encoding. Moreover, since AC was applied through RLE, and then we encapsulate it with DC inside a payload. There should be a header for maintaining how long for each payload, in this implementation, we uses header_ac_seperator to store this information. Besides, resolution of orginial image should also be stored because without it we cannot specify the image row-column ratio when rebuilding it, that make sense. Finally, quality coeffeicient should also be stored in the header since it is determined only when encoding.

Matlab Implementation: Decoder

Inversely, to employ JPEG decoder, just inverse these processes. Firstly, after receiving content for bit stream, we apply huffman decoding to it

bit_stream = huffmandeco(bin2dec(content), header_dict);
header_ac_seperator = rle(header_ac_seperator, "de");
block_row = header_resolution(1) / 8;
block_col = header_resolution(2) / 8;
img = zeros(header_resolution);

Then, refering header_ac_seperator, we could re-constructuct each payload [DC AC] which we have mentioned previously.

for i = 1:length(header_ac_seperator)
    cut = uint32(header_ac_seperator(i));
    DC = bit_stream(head);
    AC = bit_stream(head+1:head+cut);

    % RLE decoding
    AC = rle(AC, "de");

    % Calculate DC
    if head ~= 1
        DC = DC_prev + DC;
        DC_prev = DC;
    end

    block = [DC AC];
    
    % more codes ...

Then, we applied inverse zig-zag using the same zig-zagging sequence

% inverse zigzag, table zigzag from previous section
block = block(zigzag);

And inverse quatization.

% inverse quantization
block = block .* Q * quality_coef;

And inverse DCT.

% inverse DCT
block = idct2(block);
block = uint8(block);

We can rebuild each block in the image.

img(current_row:current_row+7, current_col:current_col+7, current_channel) = block;

To make the iteration works, some iteration variables should be changed during the loop

current_col = current_col + 8;

if current_col > block_col * 8
    current_col = 1;
    current_row = current_row + 8;
end

if current_row > block_row * 8
    current_row = 1;
    current_channel = current_channel + 1;
end

head = head + cut + 1;

Finally, we change the color space from YCbCr to RGB

img = ycbcr2rgb(img);

Run it and have a look.

We can see the image compression ratio in comparing original headless image data vs. compressed header and content:

   Compression coefficient: 1.1800
Before compression (bytes): 480000
 After compression (bvtes): 304188
         Compression ratio: 1.5780
           PSNR value (dB): 65.829

Bingo! We have achieved a JPEG decoder now by inversely designing the encoder. Take a look at the decoded compressed image.

We can see that in the compressed image, pixels on the edges of the guy is quite weird, it is because the mini-block-wise DCT sampling have different rounded value. To compared with the original image, the image different could be calculated by setting threshold as 3

diff = abs(uint8(original - compressed));
diff(diff>3) = 255;

Now we have visualized the different between original and compressed image.

Conclusion

We have implemented a JPEG codec in this tutorial, now you have learned:

YCbCr color space
Discrete Cosine Transform and Inverse Discrete Cosine Transform (to be added)
JPEG encoding process (image to blocks - DCT - quantization - zigzagging - entropy encoding - encapsulation)
JPEG decoding process (decapsulation - entropy decoding - inverse zigzagging - inverse quantization - iDCT - blocks to image)
Some of the JPEG headers

You can find the code at my Github via the following link: