How I implement entity level version control and why
We don’t need versioning…oh wait, we do need versioning
If you are not sure if you need version control for a specific entity type in your system then you probably need version control. If not now, at some inconvenient point in the near future. But you can’t justify the cost of implementing version control upfront, because you might not need it and that wouldn’t be very agile. So what should you do? I’ve been through this loop a couple of times and arrived at a fairly straightforward solution that you can roll out as and when you need it. It’s based on the following assumptions.
- The content being versioned is mostly text fields
- You don’t need multiple concurrent drafts
- Version control is annoying to implement and systems without it are easier to develop, report, debug, build interfaces for etc.
- Building version control up-front slows down development and leads to more re-work because you are trying to implement the basic workflow and version integration at the same time. Building and verifying the non-versioned system first is preferable to going ‘all-in’.
- New versions of entities are created less frequently than they are accessed (more reads than writes)
- Most of the time you don’t care about anything but the current version
- You don’t want other APIs to know about the existence of other versions
- You need to render a history list of previous versions
- Other entities in the system will reference (via foreign keys) the entity in question
Step 1. Don’t implement versioning
Build your entity API and storage without caring about version. Focus on getting the behaviour, workflow and feature set figured out before you worry about version control
Example: The Document Entity
We shall create a Model for the Document entity without version control
Now confirm that this meets the business requirements! It’s a lot easier to refactor things before adding versioning
Step 2. Adding the Version table
Create a Version table which shadows the fields that need to be versioned. Do not copy all the attributes. Just the ones that you actually want to version.
In addition, we add a current boolean field that will be true only for the current editable version of the document. You can set a database constraint on the combined document and current columns to ensure that only one ‘current’ version exists for each document in the system.
We also need to add a version num to the Document Model class:
Step 3. How it works.
To create a new document, we just add new Document model setting the fields to the initial values we desire and setting the version_num to 1. We do not need to create a version model yet.
+--------------------------+
| Document Model |
+--------------------------+
| id | 1 |
+--------------------------+
| name | Doc1 |
+--------------------------+
| summary | A doc... |
+--------------------------+
| body | Lorem Ip...|
+--------------------------+
| version_num | 1 |
+--------------------------+
To create a new version, we create a new draft version by creating a DocumentVersion model, copying the following field data: name, summary, body. We set the version_num to one more than the version_num on the Document model. And we set the current flag to true.
+--------------------------+
| DocumentVersion Model |
+--------------------------+
| id | 1 |
+--------------------------+
| name | Doc1 |
+--------------------------+
| summary | A doc... |
+--------------------------+
| body | Lorem Ip...|
+--------------------------+
| version_num | 2 |
+--------------------------+
| current | true |
+--------------------------+
To edit the draft version, we simple allow the user to directly manipulate the name, summary and body fields on the model we just created. It is always possible for the system to locate the current draft for a given doc id by simply searching for the row in the version table with matching doc ID and current == True. You can even code this up as a helper method on the Document Model class (assuming django).
+--------------------------+
| DocumentVersion Model |
+--------------------------+
| id | 1 |
+--------------------------+
| name | EDIT |
+--------------------------+
| summary | Edited doc.|
+--------------------------+
| body | Edit Ip... |
+--------------------------+
| version_num | 2 |
+--------------------------+
| current | true |
+--------------------------+
To publish the draft version you just have to straight-copy the following fields from the DocumentVersion Model: name, summary, body and version_num to the associated Document Model. In the same transaction you’ll have to set the current status of the DocumentVersion to false:
+--------------------------+ +--------------------------+
| Document Model | | DocumentVersion Model |
+--------------------------+ +--------------------------+
| id | 1 | | id | 1 |
+--------------------------+ +--------------------------+
| name | EDIT |<-- | name | EDIT |
+--------------------------+ +--------------------------+
| summary | Edited doc |<-- | summary | Edited doc.|
+--------------------------+ +--------------------------+
| body | Edit Ip... |<-- | body | Edit Ip... |
+--------------------------+ +--------------------------+
| version_num | 2 |<-- | version_num | 2 |
+--------------------------+ +--------------------------+
| current | false |
+--------------------------+
To delete a current draft you can simply delete the record from the DocumentVersion table. No other work is requried.
To render a history view of the Document, you can just output the contents of the DocumentVersion table for rows matching the document ID in question.
To render the current published version you just simply render the fields from the Document Model
To render the current draft version you just simply render the fields from the current DocumentVersion Model. The field names and types are the same so you should be able to interchange them fairly readily.
Advantages
- The original entity table (e.g. Document table) always contains the latest published content. There is no need for other APIs or modules to know anything about versions or have to filter out drafts etc.
- Other entities can maintain references to the entity without having to update them when new versions are published
- Retrieving history is a simple filtered query, no joins required.
- The EntityVersion table always contains the full history
- Conceptually easy to follow and debug
- Readily accommodates more complex edit workflows like approval steps etc.
- If necessary you could truncate the Version table without impacting the rest of the application
- Very simple logic
- Easy to integrate ‘after the fact’
Disadvantages
- Needs safeguards for concurrent editing
- Cannot easily support multiple ‘current’ drafts (although this is rarely necessary)