Using Zlib in Erlang

The simplest way to compress and decompress binaries in Erlang is zlib:compress and zlib:decompress.
This works great for large data and on unreliable media.
However for small data the headers and checksum take a relatively large size, which is often unneeded when the media is reliable.
The zlib library supports removing the header and checksum information.
Erlang exposes this through the zlib:zip and zlib:unzip functions.

But if you’re use case is a lot of small documents you can do better using the API:

compress(Data,Dict) ->
	zlib:deflateInit(Z, best_compression,deflated, -15, 9, default),
	zlib:deflateSetDictionary(Z, Dictionary),
	B1 = zlib:deflate(Z, Data),
	B2 = zlib:deflate(Z, <<>>, finish),
	list_to_binary([B1, B2]).
uncompress(Data,Dict) ->
	zlib:inflateInit(Z, -15),
	zlib:inflateSetDictionary(Z, Dict),
	B1 = zlib:inflate(Z, Data),

For long documents; specifying a dictionary doesn’t make a huge difference, but if you are storing/transmitting a lot of short XML/JSON snippets it can make a difference.
If you don’t know what dictionary to use just specify an empty binary to get default behavior.

The zlib docs say this about specifying a dictionary:

The dictionary should consist of strings (byte sequences) that are likely to be encountered later in the data to be compressed, with the most commonly used strings preferably put towards the end of the dictionary. Using a dictionary is most useful when the data to be compressed is short and can be predicted with good accuracy; the data can then be compressed better than with the default empty dictionary.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>