MongoDB Network Compression: A Win-Win

Rofl Facts
6 min readNov 29, 2023

--

The capacity of MongoDB to compress data between the client and the server is a little-known feature. An excellent story about how compression decreased network traffic from roughly 140 Mbps to 65 Mbps can be found on the website of the CRM business Close. As Close points out, you can save a little money with a straightforward setup adjustment, as cloud data transfer fees start at $0.01 per gigabyte and go up from there.

Compressors supported by MongoDB include:

Installing the required compression library and then supplying the compressor as an argument when connecting to MongoDB are all that are needed to enable compression from the client. As an illustration:

client = MongoClient('mongodb://localhost', compressors='zstd')

You can use the two tuneable Python scripts provided in this article, read-from-mongo.py and write-to-mongo.py, to observe the effects of network compression for yourself.

Setup

Client Configuration

At the very least, adjust params.py to include your connection string. The batch size to read (100 records) and insert (1 MB) as well as the number of bytes to read and insert (by default, 10 MB) are additional tunables:

# Read to Mongo
target_read_database = 'sample_airbnb'
target_read_collection = 'listingsAndReviews'
megabytes_to_read = 10
batch_size = 100 # Batch size in records (for reads)

# Write to Mongo
drop_collection = True # Drop collection on run
target_write_database = 'test'
target_write_collection = 'network-compression-test'
megabytes_to_insert = 10
batch_size_mb = 1 # Batch size of bulk insert in megabytes

Compression Library

The python-snappy package is needed for Python’s snappy compression.

pip3 install python-snappy

The zstandard package is needed for the zstd compression.

pip3 install zstandard

Python has built-in support for zlib compression.

Sample Data

The Sample AirBnB Listings Dataset is used by my read-from-mongo.py script, but ANY dataset will work for this test.
Sample data is generated by the write-to-mongo.py script by means of the Python module Faker.

pip3 install faker

Execution

Read from Mongo

Since the cloud providers charge for data egress, it is advantageous to minimise network traffic.
First, let’s execute the script using the default setting of no network compression:

✗ python3 read-from-mongo.py

MongoDB Network Compression Test
Network Compression: Off
Now: 2021-11-03 12:24:00.904843

Collection to read from: sample_airbnb.listingsAndReviews
Bytes to read: 10 MB
Bulk read size: 100 records

1 megabytes read at 307.7 kilobytes/second
2 megabytes read at 317.6 kilobytes/second
3 megabytes read at 323.5 kilobytes/second
4 megabytes read at 318.0 kilobytes/second
5 megabytes read at 327.1 kilobytes/second
6 megabytes read at 325.3 kilobytes/second
7 megabytes read at 326.0 kilobytes/second
8 megabytes read at 324.0 kilobytes/second
9 megabytes read at 322.7 kilobytes/second
10 megabytes read at 321.0 kilobytes/second

8600 records read in 31 seconds (276.0 records/second)

MongoDB Server Reported Megabytes Out: 188.278 MB

It’s clear that you have seen that the reported megabytes (188 MB) exceed our test size of 10 MB by more than 18 times. This can be caused by a number of factors, including data replication to secondary nodes, other workloads operating on the server, and the TCP packet being larger than just the data. Pay attention to the difference in the other test runs.
An optional compression argument, which must be one of snappy, zlib, or zstd, is accepted by the script. Now let’s rerun the test with snappy, which is known to be quick even at the expense of some compression:

✗ python3 read-from-mongo.py -c "snappy"

MongoDB Network Compression Test
Network Compression: snappy
Now: 2021-11-03 12:24:41.602969

Collection to read from: sample_airbnb.listingsAndReviews
Bytes to read: 10 MB
Bulk read size: 100 records

1 megabytes read at 500.8 kilobytes/second
2 megabytes read at 493.8 kilobytes/second
3 megabytes read at 486.7 kilobytes/second
4 megabytes read at 480.7 kilobytes/second
5 megabytes read at 480.1 kilobytes/second
6 megabytes read at 477.6 kilobytes/second
7 megabytes read at 488.4 kilobytes/second
8 megabytes read at 482.3 kilobytes/second
9 megabytes read at 482.4 kilobytes/second
10 megabytes read at 477.6 kilobytes/second

8600 records read in 21 seconds (410.7 records/second)

MongoDB Server Reported Megabytes Out: 126.55 MB

We reported 62 MBs fewer bytes out using snappy compression. That’s a reduction of 33%. However, notice that it took 10 less seconds to read the 10 MB of data. Additionally, that adds 33% to performance!
Now let’s give it another go with zlib, which can accomplish higher compression at the cost of lower speed.
An extra compression level is supported by zlib compression. I’ve set it to 9 (max compression) for this test.

✗ python3 read-from-mongo.py -c "zlib"

MongoDB Network Compression Test
Network Compression: zlib
Now: 2021-11-03 12:25:07.493369

Collection to read from: sample_airbnb.listingsAndReviews
Bytes to read: 10 MB
Bulk read size: 100 records

1 megabytes read at 362.0 kilobytes/second
2 megabytes read at 373.4 kilobytes/second
3 megabytes read at 394.8 kilobytes/second
4 megabytes read at 393.3 kilobytes/second
5 megabytes read at 398.1 kilobytes/second
6 megabytes read at 397.4 kilobytes/second
7 megabytes read at 402.9 kilobytes/second
8 megabytes read at 397.7 kilobytes/second
9 megabytes read at 402.7 kilobytes/second
10 megabytes read at 401.6 kilobytes/second

8600 records read in 25 seconds (345.4 records/second)

MongoDB Server Reported Megabytes Out: 67.705 MB

Although it took 4 seconds longer, we were able to achieve a 64% reduction in network egress with zlib compression set to its maximum compression level. That still represents a 19% performance gain over employing no compression at all, though.
Now let’s perform one last test with zstd, which claims to combine the compression effectiveness of zlib with the quickness of snappy:

✗ python3 read-from-mongo.py -c "zstd"

MongoDB Network Compression Test
Network Compression: zstd
Now: 2021-11-03 12:25:40.075553

Collection to read from: sample_airbnb.listingsAndReviews
Bytes to read: 10 MB
Bulk read size: 100 records

1 megabytes read at 886.1 kilobytes/second
2 megabytes read at 798.1 kilobytes/second
3 megabytes read at 772.2 kilobytes/second
4 megabytes read at 735.7 kilobytes/second
5 megabytes read at 734.4 kilobytes/second
6 megabytes read at 714.8 kilobytes/second
7 megabytes read at 709.4 kilobytes/second
8 megabytes read at 698.5 kilobytes/second
9 megabytes read at 701.9 kilobytes/second
10 megabytes read at 693.9 kilobytes/second

8600 records read in 14 seconds (596.6 records/second)

MongoDB Server Reported Megabytes Out: 61.254 MB

Indeed, zstd lives up to its hype as it achieves a 55% increase in performance and a 68% boost in compression!

Write to Mongo

We frequently don’t pay the cloud providers for data influx. But what can be anticipated from write workloads, considering the significant gains in performance that come with read workloads?
A randomly generated document is written to the database and collection specified in params.py — test.network_compression_test is the default — by the write-to-mongo.py script.
Let’s perform the test without compression once more:

python3 write-to-mongo.py

MongoDB Network Compression Test
Network Compression: Off
Now: 2021-11-03 12:47:03.658036

Bytes to insert: 10 MB
Bulk insert batch size: 1 MB

1 megabytes inserted at 614.3 kilobytes/second
2 megabytes inserted at 639.3 kilobytes/second
3 megabytes inserted at 652.0 kilobytes/second
4 megabytes inserted at 631.0 kilobytes/second
5 megabytes inserted at 640.4 kilobytes/second
6 megabytes inserted at 645.3 kilobytes/second
7 megabytes inserted at 649.9 kilobytes/second
8 megabytes inserted at 652.7 kilobytes/second
9 megabytes inserted at 654.9 kilobytes/second
10 megabytes inserted at 657.2 kilobytes/second

27778 records inserted in 15.0 seconds

MongoDB Server Reported Megabytes In: 21.647 MB

Thus, 27,778 records may be written in 15 seconds. Let’s try the same thing with compression using zstd:

✗ python3 write-to-mongo.py -c 'zstd'

MongoDB Network Compression Test
Network Compression: zstd
Now: 2021-11-03 12:48:16.485174

Bytes to insert: 10 MB
Bulk insert batch size: 1 MB

1 megabytes inserted at 599.4 kilobytes/second
2 megabytes inserted at 645.4 kilobytes/second
3 megabytes inserted at 645.8 kilobytes/second
4 megabytes inserted at 660.1 kilobytes/second
5 megabytes inserted at 669.5 kilobytes/second
6 megabytes inserted at 665.3 kilobytes/second
7 megabytes inserted at 671.0 kilobytes/second
8 megabytes inserted at 675.2 kilobytes/second
9 megabytes inserted at 675.8 kilobytes/second
10 megabytes inserted at 676.7 kilobytes/second

27778 records inserted in 15.0 seconds

MongoDB Server Reported Megabytes In: 8.179 MB

There is a 62% decrease in the megabytes we recorded. Our writing performance didn’t change, though. In my opinion, the majority of this can be attributed to the duration required by the Faker library to produce the sample data. Still, it’s a victory because the compression was increased without affecting performance.

Measurement

There are two ways that network traffic can be measured. This script reports on the difference between the reading at the beginning and end of the test run using the `db.serverStatus() physicalBytesOut and physicalBytesIn. As previously indicated, additional network traffic occurring on the server skews our data, but when my tests are done, they consistently show an improvement.

Using a network analysis tool like Wireshark is an additional choice. However, that is now outside the purview of this page.
In summary, compression results in a network traffic reduction of about 60%, matching the benefit observed by Close. More significantly, compression significantly enhanced read performance as well. That benefits both parties.

--

--