improve migration doc
This commit is contained in:
		| @@ -33,7 +33,7 @@ clean: | |||||||
| html: | html: | ||||||
| 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html | ||||||
| 	@echo | 	@echo | ||||||
| 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html." | 	@echo "Build finished. Check $(BUILDDIR)/html/index.html" | ||||||
|  |  | ||||||
| dirhtml: | dirhtml: | ||||||
| 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml | 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml | ||||||
|   | |||||||
| @@ -14,6 +14,6 @@ User Guide | |||||||
|    gridfs |    gridfs | ||||||
|    signals |    signals | ||||||
|    text-indexes |    text-indexes | ||||||
|  |    migration | ||||||
|    logging-monitoring |    logging-monitoring | ||||||
|    mongomock |    mongomock | ||||||
|    migration |  | ||||||
|   | |||||||
| @@ -1,6 +1,6 @@ | |||||||
| ============================== | =================== | ||||||
| Handling migration | Documents migration | ||||||
| ============================== | =================== | ||||||
|  |  | ||||||
| The structure of your documents and their associated mongoengine schemas are likely | The structure of your documents and their associated mongoengine schemas are likely | ||||||
| to change over the lifetime of an application. This section provides guidance and | to change over the lifetime of an application. This section provides guidance and | ||||||
| @@ -10,7 +10,10 @@ Due to the very flexible nature of mongodb, migrations of models aren't trivial | |||||||
| for people that know about `alembic` for `sqlalchemy`, there is unfortunately no equivalent | for people that know about `alembic` for `sqlalchemy`, there is unfortunately no equivalent | ||||||
| library that will manage the migration in an automatic fashion for mongoengine. | library that will manage the migration in an automatic fashion for mongoengine. | ||||||
|  |  | ||||||
| First of all, let's take a simple example of model change and review the different option you | Example 1: Addition of a field | ||||||
|  | ============================== | ||||||
|  |  | ||||||
|  | Let's start by taking a simple example of a model change and review the different option you | ||||||
| have to deal with the migration. | have to deal with the migration. | ||||||
|  |  | ||||||
| Let's assume we start with the following schema and save an instance: | Let's assume we start with the following schema and save an instance: | ||||||
| @@ -20,13 +23,13 @@ Let's assume we start with the following schema and save an instance: | |||||||
|     class User(Document): |     class User(Document): | ||||||
|         name = StringField() |         name = StringField() | ||||||
|  |  | ||||||
|     User(name=username).save() |     User(name="John Doe").save() | ||||||
|  |  | ||||||
|     # print the objects as they exist in mongodb |     # print the objects as they exist in mongodb | ||||||
|     print(User.objects().as_pymongo())    # [{u'_id': ObjectId('5d06b9c3d7c1f18db3e7c874'), u'name': u'John'}] |     print(User.objects().as_pymongo())    # [{u'_id': ObjectId('5d06b9c3d7c1f18db3e7c874'), u'name': u'John Doe'}] | ||||||
|  |  | ||||||
| On the next version of your application, let's now assume that a new field `enabled` gets added to the | On the next version of your application, let's now assume that a new field `enabled` gets added to the | ||||||
| existing User model with a `default=True`. Thus you simply update the `User` class to the following: | existing ``User`` model with a `default=True`. Thus you simply update the ``User`` class to the following: | ||||||
|  |  | ||||||
| .. code-block:: python | .. code-block:: python | ||||||
|  |  | ||||||
| @@ -34,23 +37,24 @@ existing User model with a `default=True`. Thus you simply update the `User` cla | |||||||
|         name = StringField(required=True) |         name = StringField(required=True) | ||||||
|         enabled = BooleaField(default=True) |         enabled = BooleaField(default=True) | ||||||
|  |  | ||||||
| Without migration, we now reload an object from the database into the `User` class and checks its `enabled` | Without applying any migration, we now reload an object from the database into the ``User`` class | ||||||
| attribute: | and checks its `enabled` attribute: | ||||||
|  |  | ||||||
| .. code-block:: python | .. code-block:: python | ||||||
|  |  | ||||||
|     assert User.objects.count() == 1 |     assert User.objects.count() == 1 | ||||||
|     user = User.objects().first() |     user = User.objects().first() | ||||||
|     assert user.enabled is True |     assert user.enabled is True | ||||||
|     print(User.objects(enabled=True).count())    # 0 ! uh? |     assert User.objects(enabled=True).count() == 0    # uh? | ||||||
|     print(User.objects(enabled=False).count())   # 0 ! uh? |     assert User.objects(enabled=False).count() == 0   # uh? | ||||||
|  |  | ||||||
|     # but this is consistent with what we have in database |     # this is consistent with what we have in the database | ||||||
|  |     # in fact, 'enabled' does not exist | ||||||
|     print(User.objects().as_pymongo().first())    # {u'_id': ObjectId('5d06b9c3d7c1f18db3e7c874'), u'name': u'John'} |     print(User.objects().as_pymongo().first())    # {u'_id': ObjectId('5d06b9c3d7c1f18db3e7c874'), u'name': u'John'} | ||||||
|     assert User.objects(enabled=None).count() == 1 |     assert User.objects(enabled=None).count() == 1 | ||||||
|  |  | ||||||
| As you can see, even if the document wasn't updated, mongoengine applies the default value seemlessly when it | As you can see, even if the document wasn't updated, mongoengine applies the default value seamlessly when it | ||||||
| loads the pymongo dict into a `User` instance. At first sight it looks like you don't need to migrate the | loads the pymongo dict into a ``User`` instance. At first sight it looks like you don't need to migrate the | ||||||
| existing documents when adding new fields but this actually leads to inconsistencies when it comes to querying. | existing documents when adding new fields but this actually leads to inconsistencies when it comes to querying. | ||||||
|  |  | ||||||
| In fact, when querying, mongoengine isn't trying to account for the default value of the new field and so | In fact, when querying, mongoengine isn't trying to account for the default value of the new field and so | ||||||
| @@ -62,9 +66,202 @@ as a standalone script: | |||||||
|  |  | ||||||
| .. code-block:: python | .. code-block:: python | ||||||
|  |  | ||||||
|  |     # Use mongoengine to set a default value for a given field | ||||||
|     User.objects().update(enabled=True) |     User.objects().update(enabled=True) | ||||||
|     # or |     # or use pymongo | ||||||
|     user_coll = User._get_collection() |     user_coll = User._get_collection() | ||||||
|     user_coll.update_many({}, {'$set': {'enabled': True}}) |     user_coll.update_many({}, {'$set': {'enabled': True}}) | ||||||
|  |  | ||||||
|  |  | ||||||
|  | Example 2: Inheritance change | ||||||
|  | ============================= | ||||||
|  |  | ||||||
|  | Let's consider the following example: | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     class Human(Document): | ||||||
|  |         name = StringField() | ||||||
|  |         meta = {"allow_inheritance": True} | ||||||
|  |  | ||||||
|  |     class Jedi(Human): | ||||||
|  |         dark_side = BooleanField() | ||||||
|  |         light_saber_color = StringField() | ||||||
|  |  | ||||||
|  |     Jedi(name="Darth Vader", dark_side=True, light_saber_color="red").save() | ||||||
|  |     Jedi(name="Obi Wan Kenobi", dark_side=False, light_saber_color="blue").save() | ||||||
|  |  | ||||||
|  |     assert Human.objects.count() == 2 | ||||||
|  |     assert Jedi.objects.count() == 2 | ||||||
|  |  | ||||||
|  |     # Let's check how these documents got stored in mongodb | ||||||
|  |     print(Jedi.objects.as_pymongo()) | ||||||
|  |     # [ | ||||||
|  |     #   {'_id': ObjectId('5fac4aaaf61d7fb06046e0f9'), '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, 'light_saber_color': 'red'}, | ||||||
|  |     #   {'_id': ObjectId('5fac4ac4f61d7fb06046e0fa'), '_cls': 'Human.Jedi', 'name': 'Obi Wan Kenobi', 'dark_side': False, 'light_saber_color': 'blue'} | ||||||
|  |     # ] | ||||||
|  |  | ||||||
|  | As you can observe, when you use inheritance, MongoEngine stores a field named '_cls' behind the scene to keep | ||||||
|  | track of the Document class. | ||||||
|  |  | ||||||
|  | Let's now take the scenario that you want to refactor the inheritance schema and: | ||||||
|  | - Have the Jedi's with dark_side=True/False become GoodJedi's/DarkSith | ||||||
|  | - get rid of the 'dark_side' field | ||||||
|  |  | ||||||
|  | move to the following schemas: | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     # unchanged | ||||||
|  |     class Human(Document): | ||||||
|  |         name = StringField() | ||||||
|  |         meta = {"allow_inheritance": True} | ||||||
|  |  | ||||||
|  |     # attribute 'dark_side' removed | ||||||
|  |     class GoodJedi(Human): | ||||||
|  |         light_saber_color = StringField() | ||||||
|  |  | ||||||
|  |     # new class | ||||||
|  |     class BadSith(Human): | ||||||
|  |         light_saber_color = StringField() | ||||||
|  |  | ||||||
|  | MongoEngine doesn't know about the change or how to map them with the existing data | ||||||
|  | so if you don't apply any migration, you will observe a strange behavior, as if the collection was suddenly | ||||||
|  | empty. | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     # As a reminder, the documents that we inserted | ||||||
|  |     # have the _cls field = 'Human.Jedi' | ||||||
|  |  | ||||||
|  |     # Following has no match | ||||||
|  |     # because the query that is used behind the scene is | ||||||
|  |     # filtering on {'_cls': 'Human.GoodJedi'} | ||||||
|  |     assert GoodJedi.objects().count() == 0 | ||||||
|  |  | ||||||
|  |     # Following has also no match | ||||||
|  |     # because it is filtering on {'_cls': {'$in': ('Human', 'Human.GoodJedi', 'Human.BadSith')}} | ||||||
|  |     # which has no match | ||||||
|  |     assert Human.objects.count() == 0 | ||||||
|  |     assert Human.objects.first() is None | ||||||
|  |  | ||||||
|  |     # If we bypass MongoEngine and make use of underlying driver (PyMongo) | ||||||
|  |     # we can see that the documents are there | ||||||
|  |     humans_coll = Human._get_collection() | ||||||
|  |     assert humans_coll.count_documents({}) == 2 | ||||||
|  |     # print first document | ||||||
|  |     print(humans_coll.find_one()) | ||||||
|  |     # {'_id': ObjectId('5fac4aaaf61d7fb06046e0f9'), '_cls': 'Human.Jedi', 'name': 'Darth Vader', 'dark_side': True, 'light_saber_color': 'red'} | ||||||
|  |  | ||||||
|  | As you can see, first obvious problem is that we need to modify '_cls' values based on existing values of | ||||||
|  | 'dark_side' documents. | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     humans_coll = Human._get_collection() | ||||||
|  |     old_class = 'Human.Jedi' | ||||||
|  |     good_jedi_class = 'Human.GoodJedi' | ||||||
|  |     bad_sith_class = 'Human.BadSith' | ||||||
|  |     humans_coll.update_many({'_cls': old_class, 'dark_side': False}, {'$set': {'_cls': good_jedi_class}}) | ||||||
|  |     humans_coll.update_many({'_cls': old_class, 'dark_side': True}, {'$set': {'_cls': bad_sith_class}}) | ||||||
|  |  | ||||||
|  | Let's now check if querying improved in MongoEngine: | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     assert GoodJedi.objects().count() == 1  # Hoorah! | ||||||
|  |     assert BadSith.objects().count() == 1   # Hoorah! | ||||||
|  |     assert Human.objects.count() == 2       # Hoorah! | ||||||
|  |  | ||||||
|  |     # let's now check that documents load correctly | ||||||
|  |     jedi = GoodJedi.objects().first() | ||||||
|  |     # raises FieldDoesNotExist: The fields "{'dark_side'}" do not exist on the document "Human.GoodJedi" | ||||||
|  |  | ||||||
|  | In fact we only took care of renaming the _cls values but we havn't removed the 'dark_side' fields | ||||||
|  | which does not exist anymore on the GoodJedi's and BadSith's models. | ||||||
|  | Let's remove the field from the collections: | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     humans_coll = Human._get_collection() | ||||||
|  |     humans_coll.update_many({}, {'$unset': {'dark_side': 1}}) | ||||||
|  |  | ||||||
|  | .. note:: We did this migration in 2 different steps for the sake of example but it could have been combined | ||||||
|  |     with the migration of the _cls fields: :: | ||||||
|  |  | ||||||
|  |         humans_coll.update_many( | ||||||
|  |             {'_cls': old_class, 'dark_side': False}, | ||||||
|  |             { | ||||||
|  |                 '$set': {'_cls': good_jedi_class}, | ||||||
|  |                 '$unset': {'dark_side': 1} | ||||||
|  |             } | ||||||
|  |         ) | ||||||
|  |  | ||||||
|  |  | ||||||
|  | And verify that the documents now load correctly: | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     jedi = GoodJedi.objects().first() | ||||||
|  |     assert jedi.name == "Obi Wan Kenobi" | ||||||
|  |  | ||||||
|  |     sith = BadSith.objects().first() | ||||||
|  |     assert sith.name == "Darth Vader" | ||||||
|  |  | ||||||
|  |  | ||||||
|  | An other way of dealing with this migration is to iterate over | ||||||
|  | the documents and update/replace them one by one. This is way slower but | ||||||
|  | it is often useful for complex migrations of Document models. | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     for doc in humans_coll.find(): | ||||||
|  |         if doc['_cls'] == 'Human.Jedi': | ||||||
|  |             doc['_cls'] =  'Human.BadSith' if doc['dark_side'] else 'Human.GoodJedi' | ||||||
|  |             doc.pop('dark_side') | ||||||
|  |             humans_coll.replace_one({'_id': doc['_id']}, doc) | ||||||
|  |  | ||||||
|  | .. warning:: Be aware of this `flaw <https://groups.google.com/g/mongodb-user/c/AFC1ia7MHzk>`_ if you modify documents while iterating | ||||||
|  |  | ||||||
|  | Recommendations | ||||||
|  | =============== | ||||||
|  |  | ||||||
|  | - Write migration scripts whenever you do changes to the model schemas | ||||||
|  | - Using :class:`~mongoengine.DynamicDocument` or ``meta = {"strict": False}`` may help to avoid some migrations or to have the 2 versions of your application to co-exist. | ||||||
|  | - Write post-processing checks to verify that migrations script worked. See below | ||||||
|  |  | ||||||
|  | Post-processing checks | ||||||
|  | ====================== | ||||||
|  |  | ||||||
|  | The following recipe can be used to sanity check a Document collection after you applied migration. | ||||||
|  | It does not make any assumption on what was migrated, it will fetch 1000 objects randomly and | ||||||
|  | run some quick checks on the documents to make sure the document looks OK. As it is, it will fail | ||||||
|  | on the first occurrence of an error but this is something that can be adapted based on your needs. | ||||||
|  |  | ||||||
|  | .. code-block:: python | ||||||
|  |  | ||||||
|  |     def get_random_oids(collection, sample_size): | ||||||
|  |         pipeline = [{"$project": {'_id': 1}}, {"$sample": {"size": sample_size}}] | ||||||
|  |         return [s['_id'] for s in collection.aggregate(pipeline)] | ||||||
|  |  | ||||||
|  |     def get_random_documents(DocCls, sample_size): | ||||||
|  |         doc_collection = DocCls._get_collection() | ||||||
|  |         random_oids = get_random_oids(doc_collection, sample_size) | ||||||
|  |         return DocCls.objects(id__in=random_oids) | ||||||
|  |  | ||||||
|  |     def check_documents(DocCls, sample_size): | ||||||
|  |         for doc in get_random_documents(DocCls, sample_size): | ||||||
|  |             # general validation (types and values) | ||||||
|  |             doc.validate() | ||||||
|  |  | ||||||
|  |             # load all subfields, | ||||||
|  |             # this may trigger additional queries if you have ReferenceFields | ||||||
|  |             # so it may be slow | ||||||
|  |             for field in doc._fields: | ||||||
|  |                 try: | ||||||
|  |                     getattr(doc, field) | ||||||
|  |                 except Exception: | ||||||
|  |                     LOG.warning(f"Could not load field {field} in Document {doc.id}") | ||||||
|  |                     raise | ||||||
|  |  | ||||||
|  |     check_documents(Human, sample_size=1000) | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user