PostgreSQL id column

Problem

You use PostgreSQL as database. Your model includes an id column type Serial with a sequence to emulate an auto increase like with MySQL.

You´re doing inserts and suddenly your PostgreSQL id column throws primary key errors.

Solution

You have two options:

  • 1. Never set the id column
  • 2. Sync the sequence with the highest id value

Option 1 (Never set the id column):

Make sure that your code never sets the id column in a PostgreSQL table. But that is easier said than done.

E.g. the Python Framework Django provides the bulk_create() method on a model. This speeds up insertion but doesn’t sync the sequence of the id column. So you have to sync it after the execution.

Also SQL Scripts could set the id column in PostgreSQL explicitly, e.g. when you copy entries into another table.

In both cases you need Option 2

Option 2 (Sync the sequence with the highest id value):

Your second option is to sync the sequence with the highest id value used. For a one off shot this can be done manually with (P)SQL and I will show you how.

For a regular requirement in a Python Django bulk_create, I will show you the Python code how to do it too.

SQL

Assuming you used the field id, the following will do the sync:

SELECT setval(pg_get_serial_sequence('"$tablename"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "$tablename";

Just replace $tablename with the actual table name, e.g. product_product:

SELECT setval(pg_get_serial_sequence('"product_product"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "product_product";

All symbols are required.

Python Django Framework

Wrapped in a method can use the following:

def reset_sequence():
    sequence_sql = connection.ops.sequence_reset_sql(no_style(), [$model_name])
    with connection.cursor() as cursor:
        for sql in sequence_sql:
            cursor.execute(sql)

Just replace $model_name with the Model in question, e.g. for the Model ProductAvailability:

def reset_sequence():
    sequence_sql = connection.ops.sequence_reset_sql(no_style(), [ProductAvailability])
    with connection.cursor() as cursor:
        for sql in sequence_sql:
            cursor.execute(sql)

Explanation

PostgreSQL id columns work a bit different. As long as no value is set for the id column the sequence attached to it (comparable with a function) is executed and the data is in sync with the sequence.

But when an entry is saved that has an explicitly set value for the id column PostgreSQL does not execute the sequence. This means the function is out of sync with the existing data, which will likely result in a primary key violation.

Background

A serial id like 1,2,3,4,5, …, 100, 101, … as the primary key in a database is a decision that is not supported by all users of relational databases. There are multiple reasons for it:

Information disclosure

Depending on requirements you don´t are not allowed to expose too much information. For example imagine a social network says they have 10 Million users and you can easily verify as a user that there are only 20000.

Easily scrapable

Let say you have a public user profile link with the id in it. Now someone can make external calls and collect the public data of all users.

And it increases your server costs as you need to serve those requests.

I stop here. You see there is a reason that a PostgreSQL id column should be a configurable function to use the kind of key you require.

You just need to be aware of the quirks this brings when you want a simple auto increment primary key.

Let me know if it helped you.

Best,

Frank

Sources:

https://django.readthedocs.io/en/stable/ref/databases.html#manually-specifying-values-of-auto-incrementing-primary-keys

https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/

Leave a Reply