UUIDs - Good, Bad and the Ugly

UUID is a relatively new data type in the tech world. We will discuss its pros and cons.

Based on real experience. The information is based on building multi-million enterprise applications.

Prologue

We began developing enterprise-level software five years ago. For that, we evaluated the database engine to use. We chose PostgreSQL among MySQL, MariaDB, MS-SQL, and PostgreSQL.

Choosing a DB engine is important to the discussion. More on this in a bit.

The next important decision was to decide on a datatype for the primary key. The options were

●Integer

●UUID

●String

Each option has its own pros and cons. So we must carefully consider which one suits our needs the best before implementing it. Today, through this post, I am sharing our evaluation criteria and the results we achieved.

In this post, I am using UUID (Universal Unique IDentifier) to refer to both UUID and GUID. A GUID (Global Unique Identifier) is a Microsoft implementation of the UUID.

Our experience

Following were the benefits and drawbacks of using a UUID data type.

Pros

1.Does not generate PrimaryKey-Conflict.

This is helpful when we need to synchronise data from various sources. Certain applications need internal integrations. ( we did ).

During such integrations, string patterns or incremental integers will give a PK conflict. However, UUIDs being unique will NOT.

2.Generation without a server connection.

A client can generate a UUID without a server connection. Later, this value can be passed to the database to store.

Whereas in the case of an autogenerated integer value. Only the DB can generate the primary key. Therefore, an extra call is required to retrieve the primary key.

3.Provides column-level security.

This means no one can predict the values of the column. And so cannot read data based on guesswork.

Cons

1.Not an ANSI data type.

This is where DB engines become important.

PostgreSQL and MS-SQL servers have UUID as a native data type. But MySQL, MariaDB, and Oracle do not.

If a database engine doesn’t support UUID. Then maintaining it as the primary key has an impact on speed due to string comparison.

https://www.w3resource.com/sql/data-type.php

2.Higher CPU cost.

Because UUID is not a JSON datatype, there is a higher cost in terms of CPU and

memory. This occurs when serializing/ deserializing a UUID.

https://www.geeksforgeeks.org/json-data-types/

3.Higher Memory cost.

If the database server does not support it natively, more storage is needed.

While working with native data types, storage is 16 bytes. With varchars, storage is 36 bytes.

4.Insertion order unavailable.

We can’t get an insertion order by ordering on this field.

5.Not human readable.

These Ids aren’t actually human-readable. Therefore, debugging can be a challenge.

For instance, suppose we use UUID as a PK for simple master data such as country or city. When we use 2-3 characters of code instead of a UUID, the code is much easier to read.

What to do ?

We have discussed the principles of using UUIDs. Let us discuss some practical aspects now.

The following table describes considerations while choosing data types for DB primary keys.

. Datatype	. Description
. . . . . . . UUID	. ●Synchronization needed. . Software that has multiple systems creating data. Such data is subsequently synchronized to a central database. . ●Acceptable storage/ performance costs. . The application is able to take a bit of CPU and memory hit. . ●Eventing . UUIDs are useful when there is a need to generate events for Kafka ( or another messaging system ). They help distribute information to underlying consumers. And help with concurrent insertion into the database. .
. . . . . Integer	. ●A commonly used pattern. . ●Fast. . ●The database is the sole owner of the identifier. . Where the record identifier is solely generated by the database. Whenever any record is inserted into a table. An auto-generated integer value is added as a PK for the record by the database engine. . ●Transactional tables . No synchronization is required from any other system. .
. . . . String	. ●Master data tables . When the data itself has 2-10 character long unique keys, which can be a PK. A countries and cities table, for example. . ●DB undecided. . The database engine has yet to be determined. Furthermore, software must support a database engine that lacks native UUID support. . ●As key length grows, it may be less performant. .

Datatype

Description

UUID

●Synchronization needed.

Software that has multiple systems creating data. Such data is subsequently synchronized to a central database.

●Acceptable storage/ performance costs.

The application is able to take a bit of CPU and memory hit.

●Eventing

UUIDs are useful when there is a need to generate events for Kafka ( or another messaging system ). They help distribute information to underlying consumers. And help with concurrent insertion into the database.

Integer

●A commonly used pattern.

●Fast.

●The database is the sole owner of the identifier.

Where the record identifier is solely generated by the database. Whenever any record is inserted into a table. An auto-generated integer value is added as a PK for the record by the database engine.

●Transactional tables

No synchronization is required from any other system.

String

●Master data tables

When the data itself has 2-10 character long unique keys, which can be a PK. A countries and cities table, for example.

●DB undecided.

The database engine has yet to be determined. Furthermore, software must support a database engine that lacks native UUID support.

●As key length grows, it may be less performant.

Wrapping up

Our software is deployed with two big clients. We have all 3 data types as PK. (each for a different use case ).

●For distributed and synchronized data – we use UUIDs

●For data with natural keys ( State, City, Address ) – we use Strings.

●In utility data ( logging, tracking ) – we use Integers.

Hope this information was helpful. Feel free to reach out for questions. Thanks.

– Shalini Garg