What I learned from DjangoConEU 2021

Disclaimer: This is a summary as per my understanding of the talks. The original content of each talk is property of their respective authors. The intent is not to copy, but to summarize what I have learned.

DjangoCon Europe 2021 was intense. There were many things to learn or re-learn. It reminded me how much I already knew and things I have tried and they didn't work out as it seemed. Overall it was a good investment of 3 days.

Here's what I learned this year. There were many good talks apart from the ones listed below. These are some that interest me.

Tl;dr

  1. Server side rendered HTML is making a comeback with technologies like hotwire, HTMX and Unicorn.
  2. Django handles default values itself. It is not done by database.
  3. Migrations should be treated with respect.
  4. Use configuration classes to override settings for clients. Also use Interface-Implementation design to handle client variations in a complex project.
  5. Postgres Indexes when used properly are VERY powerful.

Talks

Unlocking the Full Potential of PostgreSQL Indexes in Django

by Haki Benita

Really good talk. Detailed discussion with pros and cons of various types of Postgresql indexes with respect to Django. A must watch presentation.

To enable checking of SQL queries in development

In your configuration enable logging for SQL

copy
LOGGING = {
"loggers": {
"django.db.backends": {
"level": "DEBUG"
}
}
}

This will start logging all queries to console/log and you can see what's going on with your ORM.

Django and its Index types?

B-Tree indexes are the default in django and cover 90% of the cases. When you do need that extra 10% kick you need to use EXPLAIN and figure out what needs optimization.

The optimization is not always only for the speed of a lookup. It is a balance between speed and space efficiency. For that we need to look beyond B-Tree.

The talk discussed about creating a URL shortner application to go through various scenarios and the appropriate index for the same.

Index types

  • B-Tree indexes (default)
  • Covering indexes
  • Partial indexes
  • Function based indexes
  • Hash indexes
  • BRIN Indexes

B-Tree Index

These are the default and cover 90% of the cases easily without any intervention. When you create an index it is created as a B-Tree index.

Covering Index/Inclusive Index

https://docs.djangoproject.com/en/3.2/ref/models/indexes/#include

Introduced in django 3.2. Also known as Inclusive Index. This stores additional data within the index so that table lookup is not needed. Best when you need that extra field in the query and table lookup is not needed.

copy
Index(name='covering_index', fields=['headline'], include=['pub_date'])

This makes indexes a bit larger and included data cannot be used in query. This also replaces composite indexes.

Partial Index

When we only need index for some particular values. This is normally used for Nullable columns when we want to check for nulls in query. In that case it does not makes sense to build an index with all values.

To do this when creating an index add a condition. In the presentation, Haki observed, just by doing this a 7MB index became 88kb.

copy
UniqueConstraint(fields=['user'], condition=Q(status='DRAFT'), name='unique_draft_user')

Function Based Index

For example find url by domain. Here if we need to run a function call in SQL. We can use function based index so that function is called when index is accessed and respective value is computed. These will be naturally slow but still faster than a table scan with function call.

Hash Index

Index that keys the data with hash of the column. This reduces the size of the key from say a long URL to a shorter hash resulting in space efficiency.

BRIN Index

Block Range Index - BRIN index store the data in blocks. These are memory efficinent but a little slow than B-Tree indexes.

Getting started with React, GraphQL, and Django

by Aaron Bassett

https://strawberry.rocks/

In my opinion, GraphQL represents a powerful shift from Rest based APIs to a more organized structure. GraphQL separates the depndency of frontend developers from backend developers and the architecture can evolve in a more complete sense without interdependence between the two.

This presentation introduced Strayberry project for adding GraphQL support to django. Strawberry seems pretty powerful and quite simple.

Overall, the steps to Strawberry seem simple.

  1. Create a type to declare data structure for use in a GraphQL API

    copy
    import strawberry

    @strawberry.type
    class User:
    name: str
    age: int
  2. Create a query that consumes the data structure

    copy
    @strawberry.type
    class Query:
    @strawberry.field
    def user(self) -> User:
    return User(name="Patrick", age=100)
  3. Use these to define Schema

    copy
    schema = strawberry.Schema(query=Query)

Will be trying out soon.

Telepath - adding the missing link between Django and rich client apps

by Matt Westcott

https://wagtail.github.io/telepath/

Telepath is a Django library for exchanging data between Python and JavaScript, allowing you to build apps with rich client-side interfaces while keeping the business logic in server-side code.

Telepath looks interesting. It seems to be trying to standardize the data transfer and consumption between Django and JavaScript using JSON. It might work for simple cases but not sure how much it work in complex cases. I think removing JSON from equation is a better answer (see HTMX/Unicorn).

Might be very useful for defining Dynamic fields in some of our projects. In KuberWMS we use a similar approach using django-fobi.

(A) SQL for Django

by Stefan Baerisch

copy
lname = "Adams"
Person.objects.raw('SELECT * FROM myapp_person WHERE last_name = %s', [lname])

Note that this is NOT same as 'SELECT * FROM myapp_person WHERE last_name = %s' % lname

Similarly

copy
with connection.cursor() as cursor:
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()

Using Views in Django

Another approach is to use Views in django. This was missed in the talk. Since we used this I offered this approach to the audience:

Unless you need to save data another option is to create views and then Unmanaged models to fetch that data. We use that in some of our projects and they help a lot in complex queries.

This gives us the benefit of a consistent interface of ORM with desired performance of an optimized query. To do this follow the following steps:

  1. Create the raw query you need. Test that in postgres.
  2. Create an empty migration and edit the migration to create a view with raw SQL.
  3. Create an unmanaged django model with corresponding fields as in view.

Writing Safe Database Migrations

by Markus Holtermann

Good talk on migrations. Most of the things in this talk were what we currently do.

  • Apply migrations before you deploy - We do that.
  • Only go forward and never look back - Yeah, we do that.
  • Only add nullable fields - This one we do not completely agree to. This is done in two steps but I strongly believe in schema correctness so that the intention of a field is also visible in its schema. Moreover this reflects in forms and API. While you can achieve the forms part via blank=False its not the same. For example, we need to calculate a total which is done via code. For that we don't want to ask the user so blank will be true. But its required so the db null is False.
  • Populate default values via management command - Another one we do not agree to. Management commands should not be used for one-off features. For that use django-extensions scripts or run the command in shell_plus. However to do this you will anyway need to either run broken code on server while values are updating or wait for direct SQL update to complete. Unless its a non-interruptable service we prefer updates via a migration.
  • Add indexes concurrently - Good idea. I need to try this.
  • Use meaningful index name - We do that.
  • Have working backups of your database - Doh. How could you not?
  • Test on production like data - We do that.

And run the migration with verbose:

python manage.py migrate -v 2

Migrations and understanding Django's relationship with its database

by David Wobrock

Excellent presentation by David. This covered a deep understanding of migrations and how to keep them sane for a long term project. This is definitely something we need to improve in our infrastructure.

Few pointers from this presentation

To see what SQL command a migration will run

copy
python manage.py sqlmigrate wms 0027

Default values are handled on django/python side and not at database side

This was a SHOCKER! I even confirmed it with a migration that was to set a default. It was empty:

copy
class Migration(migrations.Migration):
dependencies = [
("wms", "0023_auto_20210305_2052"),
]

operations = [
migrations.AlterField(
model_name="item",
name="state",
field=django_fsm.FSMField(
choices=[
("NEW", "New"),
....
("SCRAP", "Scrap"),
],
default="NEW",
max_length=50,
protected=True,
),
),
]

And corresponding migration

copy
> python manage.py sqlmigrate wms 0024_auto_20210326_1419
BEGIN;
--
-- Alter field state on item
--
COMMIT;

To handle these David suggested that if you truly want database side defaults create them in the migration itself. His package does that for us: https://github.com/3YOURMIND/django-add-default-value

Handling Migration conflicts

Conflicts happen. Especially when you have multiple people on the project. To handle them we have two options:

  1. Rename one of the migration and change its dependency also. After all its simply python code. Prefer to do that on the migration that is not applied yet. If its already applied yet or you will need to edit the table data.
  2. Create a merge migration.

While we have done this in the past, here onwards its the option 1 for me.

Upgrading database in production

This is tricky especially if you have a complex database structure with tenant schemas. A few things to remember here are:

  1. If you need to do this plan for a downtime.
  2. Or you can have multiple versions of the database. This only works if you have a read heavy database with no writes or cloning.
  3. Prefer database backward compatible with Nullable columns. And don't drop columns.

Checking for incompatible database migrations

An excellent package that helps in keeping things sane is django-migrations-linter. Several strategies were suggested to keep them backward compatible. This is a must watch presentation.

To use the package add to installed apps and run the management command.

copy
python manage.py lintmigrations

Deleting unused columns

Try to keep them as long as you can. Use django-deprecated-field package to delete them on a later date by marking fields unused first.

https://github.com/3YOURMIND/django-deprecate-fields

Squashing Migrations multiple times

On face2face we also discussed how/if we can squash migrations second time. When I tried this we saw an error. It asked to unsquash migrations and then resquash them. That was not easily possible. David suggested the following:

  1. Make sure after squash you delete old migrations
  2. Remove the dependency of squashed migrations on old ones. For this you will need to edit the squashed migration file.
  3. Now the squashed migration is almost as non-squashed. Squash again.

Regenerating Migrations for an existing project

Again on face2face we discussed if it is possible to delete old migrations and regenerate. He had not tried that but I think I read this somewhere that we will need to delete (considering manual written migrations) and then regenerate. However we will need to reset/prime the existing database by running fake migration commands.

Topic for another blog.

Spreading our tentacles - taking a Django app global

by Dr. Frederike Jaeger

Probably one of the most profound talks of the conference. It helped me remember again how to architect the application for multiple clients when clients have different (similar) needs. I am very well familiar with interface based design and design patterns. But sometimes it just does not hits you till someone tells you how they did.

The overall structure followed a top down approach. Flow of information was in one direction mostly.

InterfacesViews, APIs, Management Commands
ApplicationUse Cases
DomainOperational Queries
DataModels

Plugins - They follow a hierarchy and directly talk to domain. Domain does not knows anything about implementation details.

HTMX: Frontend Revolution

by Thomas Güttler

https://htmx.org/

This was a refresher. We have just started using HTMX in one of our production projects and this presentation confirmed that we are on right track.

Important things I learned:

  1. HTMX automatically applies itself to new HTML.
  2. HTMX has many examples that tell how to do things.
  3. HTMX has an extensible API.

I am sold. If only I can use this for our cascaded dropdowns to replace the old autocompletelight, groovy.

The Future of Web Software Is HTML-over-WebSockets

How to create a full-stack, reactive website in Django with absolutely no JavaScript

by Adam Hill

Another amazing talk about HTML over wire and the tools to make it happen. Adam is author of Unicorn, a powerful way to integrate Django with HTML over request response.

Uses WSGI, HTTP, JSON and HTML.

The talk covered various technologies addressing the same purpose:

  1. Sockpuppet - Using websockets
  2. Django Reactor - Using websockets
  3. Unicorn - No websockets

It uses morphdom, same library that is used by many other similar packages.

Serving Files with Django

by Jochen Wersdörfer

A good talk discussing pros and cons of using django directly (async) for serving static files. One advantage was to remove nginx in the middle. Second was providing access control which is not easy when sensitive data is present in an application.

Access control of user files in a difficult and important topic. To do this there are two known approaches that I know.

  1. Using nginx with X-Sendfile header - The header approval or disapproval is provided by the django application after checking relevant access controls or rights. This has some problems namely a) It is difficult to implement, and b) it is difficult to test. From my Face2Face on Jitsi with Jochen he implied that is risky and we should not do that. Instead let django handle the uploaded media files directly. It is difficult to get the nginx configuration right and to keep it correct in the long run. I agree!
  2. Using Django in an async configuration to handle media files - This makes sense. However it has an impact on appplication performance and is not suggested for applications that have large number of consumers.

Other interesting talks

Hacking Django Channels for Fun (and Profit)

Very interesting talk about basics of django channels. Default channels capacity of message buffer is 100. After that they are dropped silently. For larger buffering change this in settings file.

Load Testing a Django Application using LocustIO

Interesting talk and learned about LocustIO for load testing. Maybe we will use that.

We're all part of this: Jazzband 5 years later

An amazing talk by Jannis Leidel who has spend a long time bringing Jazzband and several django packages together under a common umbrella. It is truly amazing how his efforts have helped django community.

Django Unstuck: Suggestions for common challenges in your projects

While this discussed mostly common sense things that we already do in our projects, one thing that stood out for me was adding apps path to sys path: sys.path.append(BASEDIR / apps)

https://github.com/shezi/django-unstuck/


Resources

TopicLinks
GraphQLhttps://strawberry.rocks/
Django Integration
https://github.com/aaronbassett/DjangoConEU-rgd (Demo Code)
TelepathTelepath
Wagtail Widgets using Telepath
MigrationsModernize a Django Index Definition with Zero Downtime

Packages

  1. https://github.com/gradam/django-better-admin-arrayfield - Array field editing in admin
  2. https://django-postgres-copy.readthedocs.io/en/latest/ - Django postgres copy
  3. https://github.com/3YOURMIND/django-migration-linter - Django Migrations Linter
  4. https://github.com/3YOURMIND/django-add-default-value - Add default value to database
  5. https://github.com/3YOURMIND/django-deprecate-fields - Django Deprecate Field
  6. https://github.com/charettes/django-syzygy/ - Tool to make migrations easier to manage.
  7. https://github.com/carltongibson/django-sphinx-view - Serve Django Sphinx docs with django
  8. https://github.com/defrex/django-encrypted-fields - Encrypted fields
    1. https://gitlab.com/lansharkconsulting/django/django-encrypted-model-fields/
    2. https://gitlab.com/guywillett/django-searchable-encrypted-fields

Books & Videos