First for mac news, reviews and know-how
SEARCH FOR:   Advanced Search
Guest  Level 00    Register Log in

Features


The big squeeze

12th April 2006 [MacUser]

Lossless data compression as a rule produces bigger files than lossy compression, but it's generally used where any data loss could be catastrophic; you don't examine every blade of grass in a photograph, but God help you if a figure in a spreadsheet gets knocked out of whack.

According to Darryl Lovato, one of the co-founders and chief technical officer of the company that brought us StuffIt, most lossless compression is based on prediction and pattern recognition. The prediction methods used are similar to the same algorithms employed in the analysis of the stock market. Put simply, the system can look at a text file, say, and, if it sees the word 'Apple' appear often, can begin to assume that a word beginning Ap... (particularly with that case change) is likely to be Apple. That's only one of the techniques used (see Run-length encoding, right, for another).

Media files are much more space-hungry than text documents, though, and we can use lossy compression for these. It works on the principle that we won't mind sacrificing some detail; indeed, in many cases, we won't even notice. The human eye can discriminate around ten million colours. Most modern computers can - at least in theory - display almost 17 million colours. How many of us would notice if it displayed only half as many discrete colours? In reality, the tricks lossy encoders use are much more subtle and attuned to our perceptions, but the principle is the same whether you're saving a photo as a Jpeg, ripping a CD to MP3 or exporting
 
 
ADVERTISEMENT
a home movie to H.264.

Here's where it gets a little confusing, though: the border between the two approaches is much more blurred than you might think. The Gif image format is a lossless format, saving all data and introducing no compound compression errors if a file was to be saved and resaved. However, most Gif implementations can't display full colour, so the images have to be 'quantised' - a process of constraining something to a discrete set of values - usually by dithering dots of colour from a 256-wide palette. Quantisation is a lossy process by definition, so although the final data is saved in a lossless format, that data is lossy. See?

And then there's Jpeg, which, as you'll see below, does also use lossless compression algorithms.

StuffIt

The latest version of StuffIt, now from Smith Micro, claims to be able to compress Jpegs by 30% more on average, with no further loss in quality. Given that Jpeg is a lossy format, any scepticism could be forgiven, but the theory at least is simple.

The last step of saving an image as a Jpeg is

to encode the lossy data losslessly using Huffman coding, developed by MIT student David Huffman

in 1951. Huffman's encoding solution was more efficient than his professor's, whose joint work on the Shannon-Fano coding method is now rarely

put to any practical use.

However, more than 50 years later, an even better solution was developed by the makers of StuffIt. Effectively, this reverses the last step of the Jpeg process to give the raw lossy data. It then applies a new lossless encoding algorithm to this data in place of the Huffman coding, resulting in smaller file sizes. The basic Jpeg compression remains untouched; all that's happened is that the lossy compressed data is losslessly compressed more efficiently for storage. As such, heavily compressed Jpegs can be even further squeezed down using StuffIt's methods.

Although at the moment StuffIt only applies this technique to Jpegs, there's no reason why it can't be used to further compress music and movies, too.

Continued....

Related News
Related Reviews