Skip to content

Audio denormalization / Performance enhancement in music API

Agate requested to merge audio-denormalization into master

So, I've been thinking about how we can reduce the load on DB when querying artists/albums/tracks that are playable by a given user (this is a common query, see #865 (closed)).

Basically, I've found two ways of enhancing the performance of those endpoints:

  1. Denormalize the audio permission logic
  2. Avoid SQL JOINS as much as possible

Denormalization

THe current permission matrix is quite complex. When we want to display playable artists, tracks and albums, we need to filter first all the uploads the user has the persmission to play:

  • Uploads in public libraries
  • Uploads in followed libraries
  • Uploads in personal liraries
  • Uploads in internal libraries (if the user is logged in)

Doing so result in long and complex SQL queries (as shown in #865 (closed)). Another approach is to maintain an intermediate table with this information (all tracks available to a user), and update this table whenever uploads are created/deleted, etc.

This is achived by my first commit.

Todo:

  • Update the table when:
    • Uploads are created
    • Uploads are deleted (automatic, via cascade)
    • Library is followed
    • Library is unfollowed
    • Library follow is deleted
    • Library visibility is updated
  • Setup a feature flag to ensure this can be disabled if there are any issues (MUSIC_USE_DENORMALIZATION)
  • Data migration to setup the table on existing deployments
  • Django command (rebuild_music_permissions) to populate/reset the table completely

Avoid SQL joins

A typically recommended django optimization is to use select_related() on queryset to retrieve associated objects in a single query. This is usually effective because it means fewer roundtrips to the database, and no N+1 query issues. However, under the hood, it requires a JOIN which is expensive and can lead to less performant query plans.

I've found out that using prefetch_related() is more efficient under some scenarios. It requires an additional SQL query to retrieve the linked objects (so 2 queries if you want tracks and associated albums), but both queries together execute faster than the single, bigger one with the JOIN.

This is implemented in the second commit.

Performance testing

To ensure the impact of both patch, I've run some load testing on open.audio. The test can be found in the library.py file and is executed by https://locust.io/.

Test characteristics:

  • 20 simultaneous clients
  • 120s total time
  • 3 urls queried:
    • /api/v1/albums?playable=True
    • /api/v1/artists?playable=True
    • /api/v1/tracks?playable=True

Test launch:

  • pip install locustio
  • export JWT_TOKEN="<token>" for logged in session
  • locust -f api/tests/loadtesting/library.py -H https://open.audio -c 20 -r 3 --no-web -t 120

Glossary:

  • Master branch: test run with current code from the 0.20 release
  • ORM patch branch: test run with patch to use prefetch_related() instead of select_related(), from commit 3be4fd3d
  • Denorm patch branch: test run with patch use an intermediate table to store available tracks for each user, from commit dd747965
  • Requests: number of requests executed during the tests, higher is better
  • Median: median response time (in milliseconds), lower is better

Aggregated results (all endpoints)

Branch Logged in Requests Median (ms)
Master N 430 3200
Master Y 406 3500
ORM patch N 597 1700
ORM patch Y 486 2400
Denorm patch N 547 1900
Denorm patch Y 522 2200
Both patches N 653 1200
Both patches Y 673 1300

As you can see, both patches, when applied separately have a positive impact on response time and amount of handled requests. When used in conjunction, the improvement is even better, since the number of handled requests increased by Component: Subsonic%, and the median response time was divided by 2.5.

Albums endpoint

URL: /api/v1/albums?playable=True

Branch Logged in Requests Min (ms) Max (ms) Median (ms) Avg (ms)
Master N 148 1653 10088 4300 4473
Master Y 138 1213 8311 4800 4618
ORM patch N 182 745 7151 2600 2730
ORM patch Y 158 732 11259 3500 3963
Denorm patch N 162 687 6840 3000 3036
Denorm patch Y 194 706 13003 2900 3149
Both patches N 200 542 7915 2000 2456
Both patches Y 224 580 4714 2000 2138

Artists endpoint

URL: /api/v1/artists?playable=True

Branch Logged in Requests Min (ms) Max (ms) Median (ms) Avg (ms)
Master N 147 571 5125 2300 2526
Master Y 130 584 5276 2700 2671
ORM patch N 201 410 3224 1400 1506
ORM patch Y 157 497 7516 2000 2171
Denorm patch N 187 311 4102 1400 1663
Denorm patch Y 160 332 11480 1800 1843
Both patches N 243 256 6249 930 1125
Both patches Y 228 366 3657 1000 1117

Tracks endpoint

URL: /api/v1/tracks?playable=True

Branch Logged in Requests Min (ms) Max (ms) Median (ms) Avg (ms)
Master N 135 745 5765 3100 3128
Master Y 138 704 7481 3700 3731
ORM patch N 214 424 3661 1400 1556
ORM patch Y 171 374 8697 2100 2291
Denorm patch N 198 367 4923 1800 2024
Denorm patch Y 168 508 11727 2200 2311
Both patches N 210 292 6881 970 1247
Both patches Y 221 352 3609 1100 1170
Edited by Agate

Merge request reports