UUID is a relatively new data type in the tech world. We will discuss its pros and cons.
Based on real experience. The information is based on building multi-million enterprise applications.
Prologue
We began developing enterprise-level software five years ago. For that, we evaluated the database engine to use. We chose PostgreSQL among MySQL, MariaDB, MS-SQL, and PostgreSQL.
Choosing a DB engine is important to the discussion. More on this in a bit.
The next important decision was to decide on a datatype for the primary key. The options were
Each option has its own pros and cons. So we must carefully consider which one suits our needs the best before implementing it. Today, through this post, I am sharing our evaluation criteria and the results we achieved.
In this post, I am using UUID (Universal Unique IDentifier) to refer to both UUID and GUID. A GUID (Global Unique Identifier) is a Microsoft implementation of the UUID.
Our experience
Following were the benefits and drawbacks of using a UUID data type.
Pros
This is helpful when we need to synchronise data from various sources. Certain applications need internal integrations. ( we did ).
During such integrations, string patterns or incremental integers will give a PK conflict. However, UUIDs being unique will NOT.
A client can generate a UUID without a server connection. Later, this value can be passed to the database to store.
Whereas in the case of an autogenerated integer value. Only the DB can generate the primary key. Therefore, an extra call is required to retrieve the primary key.
This means no one can predict the values of the column. And so cannot read data based on guesswork.
Cons
This is where DB engines become important.
PostgreSQL and MS-SQL servers have UUID as a native data type. But MySQL, MariaDB, and Oracle do not.
If a database engine doesn’t support UUID. Then maintaining it as the primary key has an impact on speed due to string comparison.
https://www.w3resource.com/sql/data-type.php
Because UUID is not a JSON datatype, there is a higher cost in terms of CPU and
memory. This occurs when serializing/ deserializing a UUID.
https://www.geeksforgeeks.org/json-data-types/
If the database server does not support it natively, more storage is needed.
While working with native data types, storage is 16 bytes. With varchars, storage is 36 bytes.
We can’t get an insertion order by ordering on this field.
These Ids aren’t actually human-readable. Therefore, debugging can be a challenge.
For instance, suppose we use UUID as a PK for simple master data such as country or city. When we use 2-3 characters of code instead of a UUID, the code is much easier to read.
What to do ?
We have discussed the principles of using UUIDs. Let us discuss some practical aspects now.
The following table describes considerations while choosing data types for DB primary keys.
Datatype |
Description |
---|---|
UUID |
●Synchronization needed.
Software that has multiple systems creating data. Such data is subsequently synchronized to a central database.
●Acceptable storage/ performance costs.
The application is able to take a bit of CPU and memory hit.
●Eventing
UUIDs are useful when there is a need to generate events for Kafka ( or another messaging system ). They help distribute information to underlying consumers. And help with concurrent insertion into the database.
|
Integer |
●A commonly used pattern.
●Fast.
●The database is the sole owner of the identifier.
Where the record identifier is solely generated by the database. Whenever any record is inserted into a table. An auto-generated integer value is added as a PK for the record by the database engine.
●Transactional tables
No synchronization is required from any other system.
|
String |
●Master data tables
When the data itself has 2-10 character long unique keys, which can be a PK. A countries and cities table, for example.
●DB undecided.
The database engine has yet to be determined. Furthermore, software must support a database engine that lacks native UUID support.
●As key length grows, it may be less performant.
|
Wrapping up
Our software is deployed with two big clients. We have all 3 data types as PK. (each for a different use case ).
Hope this information was helpful. Feel free to reach out for questions. Thanks.