Why PDF Compression Is More Complex Than It Looks
Compressing a PDF is not a single operation. A PDF file contains multiple types of content — text, vector graphics, raster images, fonts, and structural metadata — each of which requires a different compression strategy. The "compress PDF" button in most tools applies a bundle of techniques simultaneously, but understanding each one helps you make smarter choices.
The most impactful element is almost always raster images. A single high-resolution photograph can account for 90% of a PDF's file size. Text and vector graphics compress extremely efficiently and rarely need aggressive treatment. Fonts, if embedded fully, can add several megabytes, but font subsetting usually handles this well.
Lossless Compression
Lossless compression reduces file size without discarding any data. The original content can be perfectly reconstructed from the compressed version. In PDF, lossless compression is applied to text streams, vector graphics, and certain image types using algorithms like Flate (essentially ZIP/DEFLATE) or LZW.
For text content, lossless compression is always appropriate. There is no reason to lose data when compressing a character stream — the file size savings from lossy text compression would be negligible anyway, since text streams are already compact.
PNG images inside a PDF use lossless compression natively. Screenshots, diagrams, charts, and any graphic with sharp edges and flat colors benefit from lossless compression because lossy algorithms introduce visible artifacts around high-contrast boundaries.
The trade-off with lossless compression is ceiling. You can only shrink a file so much without discarding data. A 10 MB JPEG image inside a PDF will remain close to 10 MB after lossless recompression because JPEG data is already compressed.
Lossy Compression
Lossy compression permanently discards some data to achieve much greater size reduction. Once applied, the original cannot be perfectly restored. For images, this typically means reducing color precision, blurring fine details, or applying block-based compression artifacts.
JPEG is the most common lossy compression for photographic images. It works by transforming image data into frequency components and discarding the high-frequency components that human eyes are least sensitive to. At quality level 80 out of 100, JPEG typically reduces image data by 60-70% compared to lossless PNG with barely perceptible quality loss in natural photographs.
The problem arises with images that have sharp edges, text, or solid colors. JPEG compression creates ringing artifacts around text embedded in images and blurry boundaries where flat colors meet. For these images, lossless compression or WebP is a better choice.
Quality Levels and What They Mean
PDF compression tools typically offer quality presets. Understanding what each preset actually does helps set expectations.
**Screen / Low quality (72-96 DPI):** Images are downsampled to match screen resolution and compressed with aggressive JPEG quality settings (around 50-65). File size reduction is dramatic — often 70-90% smaller. Appropriate for PDFs that will only ever be viewed on screen and never printed. Text remains crisp because it is vector data, not raster.
**eBook / Medium quality (150 DPI):** A good balance for most use cases. Images are compressed to 150 DPI with JPEG quality around 75-80. Reduction is typically 50-70%. Suitable for documents shared digitally where occasional printing at letter size is acceptable.
**Print / High quality (300 DPI):** Images preserved at print resolution with mild compression (JPEG quality 85-90). Size reduction is modest, perhaps 20-40%, but output quality is indistinguishable from the original at any normal viewing or printing size.
**Prepress / Maximum quality:** Minimal or no image compression. Suitable for professional printing workflows where color fidelity is critical. File sizes remain large.
When Quality Loss Is Visible
The visibility of quality loss depends on three factors: the original image content, the compression level applied, and how the output is used.
Natural photographs are forgiving. The human visual system is not very sensitive to the specific pixel values in a photograph, especially in areas of gradual tonal variation. Aggressive JPEG compression on photos is often invisible to casual observers at normal viewing sizes.
Images with text, fine lines, or sharp boundaries are unforgiving. A PDF containing scanned documents or presentations with text rendered as images will show clear degradation at moderate compression levels. If the PDF contains scanned pages, consider OCR first and then compressing — the text layer is vector and compresses without quality loss, leaving only background images to be compressed.
Color gradients and illustrations with flat areas can show banding or blockiness at high compression. For business graphics with solid colors and precise brand colors, lossless compression is safer.
Practical Guidelines
For most general-purpose PDFs — presentations, reports, brochures — the medium quality preset at 150 DPI is the right starting point. It typically achieves a 50-70% reduction while maintaining acceptable quality for both screen and light printing.
Always keep the original file. Compression is often irreversible, and the "optimal" setting varies by use case. Compress a copy, compare it visually with the original at 100% zoom, and only distribute the compressed version when you are satisfied with the quality.
For PDFs that will be archived for legal or compliance purposes, prioritize quality over file size. A few extra megabytes is a trivial storage cost compared to the problems that arise when compressed documents are found to be unreadable years later.