The Fundamental Problem of Image Storage
A raw, uncompressed digital image stores a color value for every single pixel. A 12-megapixel camera photo, at 8 bits per channel for red, green, and blue, requires approximately 36 megabytes of raw storage. Transmitting and storing images at this size is impractical for most purposes. Compression algorithms exploit two properties of natural images to achieve dramatic size reductions: spatial redundancy (neighboring pixels tend to have similar values) and perceptual redundancy (human eyes cannot perceive all the detail that sensors capture).
JPEG: Discrete Cosine Transform
JPEG (Joint Photographic Experts Group) compression was standardized in 1992 and remains the most widely used format for photographs. It uses the Discrete Cosine Transform (DCT) to convert pixel data into frequency components.
The process works in stages. First, the image is converted from RGB color space to YCbCr, which separates luminance (brightness) from chrominance (color). Since human eyes are more sensitive to brightness variations than color variations, the color channels can be downsampled aggressively with little perceptible loss — a technique called chroma subsampling.
Next, the image is divided into 8×8 pixel blocks. Each block is processed by the DCT, which transforms the 64 pixel values into 64 frequency coefficients. The top-left coefficient represents the average color of the block (DC component), while the remaining coefficients represent increasingly fine spatial details (AC components).
These frequency coefficients are then quantized — divided by a step size from a quantization table and rounded. High-frequency coefficients that represent fine details are quantized aggressively (divided by large numbers), discarding subtle variations the eye barely notices. Low-frequency coefficients representing broad areas of color are quantized more gently.
The quantized values are encoded using Huffman coding, a lossless entropy compression that assigns shorter codes to more common values. The quality slider in JPEG encoders controls the quantization tables — lower quality means larger step sizes and more aggressive rounding.
This explains why JPEG creates blocky artifacts at low quality. The 8×8 blocks become visible because the boundary between neighboring blocks, which should be smooth, becomes abrupt when coarse quantization throws away the detail needed to blend them.
PNG: DEFLATE and Filtering
PNG (Portable Network Graphics) was designed in 1995 as a lossless replacement for GIF. It achieves compression through two stages: filtering and DEFLATE compression.
Before compression, PNG applies a filter to each row of pixels. Filters like "Sub" (each pixel minus the pixel to its left), "Up" (each pixel minus the pixel above), and "Paeth" (a combination of both neighbors and the diagonal neighbor) transform pixel values into differences. Since these differences tend to be small numbers clustered near zero, they are much more compressible than the original absolute values.
The filtered data is then compressed with DEFLATE, the algorithm used in ZIP files. DEFLATE combines LZ77 (which finds repeated sequences and replaces them with back-references) and Huffman coding. In natural images, the same patterns of color often repeat — large areas of similar texture, repeating gradients, common shapes — and LZ77 back-references capture this redundancy efficiently.
PNG compression is fully lossless: the original pixel values are restored exactly on decompression. This makes PNG ideal for graphics with sharp edges, text, logos, and any content where exact pixel values matter. For photographs, PNG files are typically much larger than JPEG at equivalent quality.
WebP: VP8 Prediction Coding
WebP was developed by Google based on the VP8 video codec. Its lossy mode uses block prediction, a technique more sophisticated than JPEG's DCT approach.
The image is divided into macroblocks (up to 16×16 pixels). For each macroblock, the encoder tries various prediction modes: extrapolating from the left neighbor, from the top neighbor, a combination of both, or a DC (flat) prediction using the average of neighbors. It selects the prediction that most closely matches the actual macroblock content.
Only the residual — the difference between the prediction and the actual block — needs to be encoded. For areas where the prediction is accurate, the residual is small and compresses extremely well. This prediction mechanism is why WebP often achieves 25-35% better compression than JPEG at equivalent visual quality.
Lossless WebP uses a completely different algorithm called LZ77+Huffman+Color Cache. It exploits spatial correlation through backward references (referencing earlier regions of the image) and a palette of recently seen colors. Unlike PNG, the predictor is applied to 2D blocks rather than rows, allowing more complex pattern matching.
Lossless vs Lossy: The Fundamental Distinction
Lossless compression preserves every bit of the original data. The decompressed file is identical to the input, byte for byte. PNG and lossless WebP are lossless — you can compress and decompress any number of times and always get back the exact original.
Lossy compression discards data that is deemed perceptually unimportant. The decompressed file is an approximation of the original. JPEG and lossy WebP are lossy — each compression cycle introduces new artifacts. Compressing a JPEG and saving as JPEG repeatedly (recompressing) introduces progressive degradation.
The practical implication: always keep original source files in lossless or raw format. Export to JPEG or lossy WebP only when you have a final version ready for distribution. Never use a JPEG as the source file for further editing.
Why Not All Algorithms Work Equally Well
JPEG's DCT struggles with high-contrast boundaries because its 8×8 block structure creates visible blocking. This is why JPEG works well for photographs (which have smooth, gradual color transitions) but looks poor for text, cartoons, and line art (which have sharp edges that cross block boundaries).
PNG's row-based filtering works well for graphics with horizontal patterns but is less optimal for images with complex diagonal or two-dimensional patterns.
WebP's 2D block prediction handles a broader range of content types and tends to outperform both JPEG and PNG in combined quality/size trade-offs. For most web use cases — both photographs and graphics — WebP is the most efficient modern format available.