Why PDF Files Get So Large in the First Place
The biggest culprit is scanning. An A4 page captured at 300 DPI is roughly 8.7 million pixels; at 600 DPI it is four times that. Office scanners frequently store those pixels with lossless or lightly compressed encodings, so a single page can weigh several megabytes and a modest contract bundle balloons past the limits of every mail server it meets. Born-digital PDFs have their own weight problems: full font programs embedded once per source application, high-resolution logos pasted onto every page, and layers of duplicate resources accumulated through years of edits and re-saves. The result is familiar to every office worker — a "couldn't send: attachment too large" bounce at the worst possible moment.
How Local Rasterization Compression Works
This compressor uses the same strategy commercial scan software uses, executed entirely on your own
machine. First, the pdf.js rendering
engine — the one built into Firefox — paints each page onto an offscreen canvas at a controlled
resolution. Second, that canvas is re-encoded as a JPEG, where quantization and chroma subsampling
discard visual detail human eyes barely register, which is where the dramatic savings come from.
Third, pdf-lib wraps the optimized
images into a brand-new document with the original page dimensions, so the file looks and prints the
same size as before. Every byte of this pipeline lives in your browser's memory: the document is
never transmitted, and the before-and-after sizes reported at the end are measured locally.
The three levels trade resolution and JPEG quality against file size. Light renders at high resolution with gentle compression — visually near-identical, suitable for documents that will be printed. Balanced is the recommended default for screen reading and email. Extreme prioritizes minimum size for upload caps and slow connections. One honest trade-off applies to all three: because pages are re-encoded as images, text in the output is no longer selectable or searchable. Keep the original as your working copy; treat the compressed file as the shipping copy.
Choosing the Right Level — and When Not to Compress
Match the level to the destination. Most corporate mail systems cap attachments at 10–25 MB: Balanced clears that for almost any scanned agreement, and Extreme handles hundred-page binders. Portals with hard upload limits — government filings, job applications, university submissions — are Extreme territory. Conversely, skip compression entirely when the recipient needs to search, copy or legally rely on the text layer, when the document will be processed by OCR or an e-filing system, or when it is the only copy you hold. If you are assembling a packet, merge first with Merge PDF and compress once at the end — one pass over the final document beats compressing the parts twice.
Lossy vs Lossless: Understanding the Trade-Off Honestly
Many PDF size reducers advertise dramatic savings without explaining where the bytes went, which is how users end up archiving a degraded file as their only copy. The honest taxonomy is simple. Lossless optimization — rewriting internal structures, deduplicating resources, packing objects into compressed streams — preserves every property of the document but typically saves only ten to twenty percent, because the heavy payload in a large PDF is image data that is already encoded. Lossy compression, the method used here, re-encodes that image data at a lower fidelity and is the only approach that reliably turns a fifty-megabyte scan into a two-megabyte attachment. The professional workflow follows from that distinction: treat the original as the archival master and never overwrite it; produce a compressed copy at the moment of sending, sized for the channel it must pass through; and if a recipient later needs the pristine version, generate a fresh copy from the master rather than re-compressing the compressed file, since quality losses compound across generations exactly as they do with re-saved JPEG photographs.