MongoDB - Safely Upserting Subdocuments Into Arrays

The Problem

Question: how do you safely upsert a new translation into the translations array?

Assume that the translations array should not be allowed to have more than one entry with the same language code.


{
   "_id" : ObjectId("abcdefg"),
   "category" : "CategoryName",
   "key" : "YourStringKey",
   "text" : "The text to translate",
   "translations" : [
      {
         "_id" : ObjectId("zyxwvu"),
         "language" : "es",
         "text" : "Spanish translation"
      }
   ]
}
               

How Would You Do It In SQL?

In SQL you'd have your translations as a separate table, maybe something like this:


CREATE TABLE MasterText (
   MasterText INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   Category VARCHAR(30) NOT NULL,
   Key VARCHAR(30) NOT NULL,
   SourceText TEXT
)

CREATE TABLE Translations (
   TranslationID INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   MasterTextID INTEGER UNSIGNED NOT NULL,
   TranslationLang VARCHAR(8) NOT NULL,
   TranslatedText TEXT,
   FOREIGN KEY (MasterTextID) REFERENCES MasterText(MasterTextID)

)
                  

SQL - Transactions

You could use a transaction to SELECT, check to see if you needed to insert or update, then do the write operation.


START TRANSACTION
SELECT TranslationID FROM Translations ... (see if the row exists)
// if the row already exists:
UPDATE Translations ...
// if it did not exist:
INSERT INTO Translations
COMMIT
END TRANSACTION
                  

SQL - Insert / Select

Or, if you preferred not to use transactions, you could avoid duplicate inserts by using a INSERT ... SELECT with a LEFT JOIN.


INSERT INTO Translations
       (MasterTextID, TranslationLang, TranslatedText)
SELECT mt.MasterTextID, 'fr', 'French translation'
  FROM MasterText mt
  LEFT JOIN Translations t
    ON mt.MasterTextID = t.MasterTextID
   AND t.TranslationLang = 'fr'
 WHERE mt.Category = 'CategoryName'
   AND mt.Key = 'YourStringKey'
   AND t.TranslationID IS NULL
                  

This ensures you won't insert more than one - even in a race condition. Then you could follow up with an update to finish the "upsert".

SQL - On Duplicate Key Update

Or, if you were using MySQL you could use "ON DUPLICATE KEY UPDATE".


INSERT INTO Translations
       (MasterTextID, TranslationLang, TranslatedText)
VALUES ($masterTextID, 'fr', 'French translation')
    ON DUPLICATE KEY
UPDATE TranslatedText = 'French translation'
                  

This Doesn't Work in MongoDB

  1. Transactions won't work because there are none. So no select/check if you should insert or update/then write.
  2. Left joins won't work because there are none. So no insert ... select where the existing row is null.
  3. On duplicate key update won't work because that's not a feature for MongoDB either.

So what do you do in MongoDB?

Wrong: Find/Insert or Update

Mongo does not offer transactions, so you can not atomically do a find and then do:


if (exists) {
   update();
} else {
   insert();
}
                  

Wrong: Update With Upsert

Mongo has the following feature:


db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey' },
   { ... your update command },
   { upsert: true }
)
                  

The problem is that the "upsert" functionality in MongoDB is only made for upserting entire documents - not for subdocuments.

Wrong: $push

You can not simply do a $push. If two conflicting processes do a $push (somewhat-) simultaneously, you end up with two translations for the same language - which breaks your application's rules.


$ mongo safeupserts
db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey' },
   { $push: { translations:
      { _id: new ObjectId(), language: 'fr', text: 'French translation' }
   } }
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
                  

This is the same problem you'd have if you did two inserts in SQL, but without a SELECT and a LEFT JOIN to check for duplicates.

Wrong: $push

The race condition of using $push can result in duplicates.


{
   "_id" : ObjectId("abcdefg"),
   "category" : "CategoryName",
   "key" : "YourStringKey",
   "text" : "The source text.",
   "translations" : [
      {
         "_id" : ObjectId("zyxwvu"),
         "language" : "es",
         "text" : "Spanish translation"
      },
      {
         "_id" : ObjectId("mnopqrs"),
         "language" : "fr",
         "text" : "French translation"
      },
      {
         "_id" : ObjectId("ghijlm"),
         "language" : "fr",
         "text" : "French translation"
      }
   ]
}
                  

Wrong: Version Field

Some approaches suggest using a select, then insert or update approach, and relying on a version field to guarantee that you don't overwrite something. For example:


var o = findObject(),
    t = o.translationFor('fr');

if (t) {
   var originalVersion = t._version;
   t._version++;
   t.text = 'French translation'
   // update WHERE DB version = originalVersion
   db.translatablestrings.update(
      { category: 'CategoryName', key: 'YourStringKey',
        'translations.language': 'fr', 'translations._version': originalVersion },
      { $set: { 'translations.$.text': 'French Translation', 'translations.$._version': t._version } }
   )
} else {
   t = new Translation('fr', 'French translation'); // has _version = 1
   db.translatablestrings.update(
      { category: 'CategoryName', key: 'YourStringKey',
        'translations.language': { $ne: 'fr' } },
      { $push: { translations: t } }
   )
}
                  

Right: Push, Check Success, Update

Here's the right approach. It guarantees you won't end up with duplicates even in a race condition.

Start by trying to push the subdocument, but only to a document that doesn't already have a matching subdocument.


db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } },
   { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } }
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
                  

Because there was not already a French translation, we successfully pushed one.

After First Push

If you did one successful push, but there was a race condition and a second operation also did a push, what would happen?


// to prove that this operation can run safely multiple times,
// note what happens when you re-run it:
db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } },
   { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } }
)
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
                  

Nothing was matched because our document already had a subdocument with 'fr' as the language.

Putting "Up" in "Upsert"

So, what if one was already inserted and you need to now do an update?


// as always, we start by trying to push our new or updated translation
// if there's not already a translation for that language:
db.translatablestrings.update( ... $push: { ... } ... }) // same as previous slide
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })

if (result.nMatched === 0) {
   // now we know there's an existing document and we simply
   // do an update directly to the subdocument itself
   db.translatablestrings.update(
      { category: 'CategoryName', key: 'YourStringKey', 'translations.language': 'fr' },
      { $set: { 'translations.$.text': 'French translation' } }
   )
}
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
                  

And it worked! Running it over and over is safe.