MongoDB: Bulk Write Operations

Rofl Facts
4 min readDec 6, 2023

Overview

Clients can execute write operations in bulk using MongoDB. A single collection is impacted by bulk write operations. Applications can choose the appropriate amount of acknowledgement needed for bulk write operations with MongoDB.

Bulk insert, update, and delete operations can be carried out using the db.collection.bulkWrite() method.

Bulk inserts are also supported by MongoDB using the db.collection. insertMany() function.

Ordered vs Unordered Operations

Both ordered and unordered bulk write operations are possible.

MongoDB performs operations serially using an ordered series of actions. MongoDB will not execute any more write operations in the list if an error arises while processing one of the write operations. View as ordered Write in Bulk

MongoDB can perform actions in parallel with an unordered list, although this behaviour is not guaranteed. MongoDB will process the remaining write operations in the list even if there is an error during one of the write operations. View Example of Unordered Bulk Write.

Because each action in an ordered list must wait for the preceding operation to finish, performing an ordered list of operations on a sharded collection will typically be slower than executing an unordered list.

BulkWrite() operates in an orderly manner by default. Set ordered: false in the options document to specify unordered write operations.

Refer to the execution of operations

bulkWrite() Methods

The write operations listed below are supported by bulkWrite():

BulkWrite() receives each write operation as a document in an array.

Example

This section’s example makes use of the pizza collection:

db.pizzas.insertMany( [
{ _id: 0, type: "pepperoni", size: "small", price: 4 },
{ _id: 1, type: "cheese", size: "medium", price: 7 },
{ _id: 2, type: "vegan", size: "large", price: 8 }
] )

These operations are performed on the pizzas collection by the bulkWrite() example that follows:

  • Adds two documents using insertOne.
  • Deletes a document using deleteOne.
  • Updates a document using updateOne.
  • Replaces a document using replaceOne.
try {
db.pizzas.bulkWrite( [
{ insertOne: { document: { _id: 3, type: "beef", size: "medium", price: 6 } } },
{ insertOne: { document: { _id: 4, type: "sausage", size: "large", price: 10 } } },
{ updateOne: {
filter: { type: "cheese" },
update: { $set: { price: 8 } }
} },
{ deleteOne: { filter: { type: "pepperoni"} } },
{ replaceOne: {
filter: { type: "vegan" },
replacement: { type: "tofu", size: "small", price: 4 }
} }
] )
} catch( error ) {
print( error )
}

An example output that provides a synopsis of the operations that were finished is:

{
acknowledged: true,
insertedCount: 2,
insertedIds: { '0': 3, '1': 4 },
matchedCount: 2,
modifiedCount: 2,
deletedCount: 1,
upsertedCount: 0,
upsertedIds: {}
}

See bulkWrite() Examples for additional examples.

Strategies for Bulk Inserts to a Sharded Collection

Performance of sharded clusters can be impacted by large bulk insert activities, such as initial data inserts or regular data imports. Take into account the following tactics for bulk inserts:

Pre-Split the Collection

The collection has only one initial chunk, which is located on a single shard, if the sharded collection is empty. After that, it will take time for MongoDB to gather data, split it up, and send the split portions to the various shards. As explained in Split Chunks in a Sharded Cluster, you can pre-split the collection to minimise this performance expense.

Unordered Writes to mongos

You can use bulkWrite() with the optional ordered parameter set to false to achieve better write performance to sharded clusters. Mongos has the ability to send writes to several shards at once. As explained in Split Chunks in a Sharded Cluster, pre-split empty collections first.

Avoid Monotonic Throttling

Every piece of data that is inserted goes to the last chunk in the collection, which will always end up on a single shard, if your shard key rises monotonically during an insert. As a result, the cluster’s insert capacity will never be greater than that of that individual shard.

Consider making the following changes to your application if your insert volume is greater than what a single shard can handle and if you are unable to prevent a shard key that increases monotonically:

  • Reverse the shard key’s binary bits. This keeps the data intact and prevents the insertion order from growing with the value sequence.
  • “Shuffle” the inserts by switching the first and last 16-bit words.

Note: The leading and trailing 16-bit words of the generated BSON ObjectIds are switched in the following C++ example so that they no longer increase monotonically.

using namespace mongo;
OID make_an_id() {
OID x = OID::gen();
const unsigned char *p = x.getData();
swap( (unsigned short&) p[0], (unsigned short&) p[10] );
return x;
}

void foo() {
// create an object
BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
// now we may insert o into a sharded collection
}

--

--