Content tagged postgresql

posted on 2015-04-25 13:47:23+01:00

After working a lot more with PostgreSQL, I happened to stumble upon a few things, mostly related to query optimisation, that at least for me weren't quite readily apparent. Please note that this is all with regards to PostgreSQL 9.3, but unless otherwise noted it should still be the case for 9.4 as well.

Slow `DELETE`s

Obviously, indexes are key to good query performance. If you're already using EXPLAIN, with, or without ANALYZE, chances are good you know how and why your queries perform like they do. However I encountered a problem with a DELETE query, where the output from EXPLAIN was as expected, i.e. it was using the optimal plan, but still the performance was abysmal; a query to delete about 300 elements in bulk, like DELETE FROM table WHERE id IN (...);, was quite fast to remove the elements (as tested from a separately running psql), but still the query took about three minutes(!) to complete.

In this scenario the table in question was about a million rows long, had a primary index on the id column and was referenced from three other tables, which also had foreign key constraints set up; no other triggers were running on any of the tables involved.

The postgres server process in question wasn't doing anything interesting, it was basically taking up one core with semop calls as reported by strace, no other I/O was observed though.

At that point I finally turned to #postgresql. Since the information I could share wasn't very helpful, there were no immediate replies, but one hint finally helped me to fix this problem. It turns out that the (kinda obvious) solution was to check for missing indexes on foreign key constraints from other tables. With two out of three indexes missing I resumed to CREATE INDEX CONCURRENTLY...; and about five minutes later the DELETE was now running in a few milliseconds.

The part where I was really frustrated here is that none of the available statistics guided me in the search; EXPLAIN ANALYZE apparently doesn't include the runtime for foreign key constraints and they don't show up in other places as well. In hindsight this is something that I should've checked earlier (and from now on I will), but it's also a weakness of the analyze framework not to help the developer to see the slowdowns involved in a situation like this.

Common Table Expressions

Refactoring queries to reuse results with WITH queries is absolutely worth it and improved the runtime of rather large queries by a large factor. This is something that can be seen from the query plan, so when you're using the same expressions twice, start looking into this and see if it helps both for readability (don't repeat yourself) and performance.

JSON result construction

We need nested JSON output in a few cases. This means (in 9.3, there are some better functions available in 9.4) that a combination of row_to_json(row(json_agg(...))) was necessary to get proper nesting of aggregated sub-objects, as well as wrapping the end result in another object, because the output had to be formatted as a JSON object (with curly brackets) instead of a JSON array (with rectangular brackets).

Technicalities aside the JSON support is pretty good and since the initial code was written I've discovered that since we in many cases don't actually have multiple results (for json_agg), not using that method will again significantly improve performance.

That means instead of something like the following:

SELECT row_to_json(row(json_agg(...))) FROM ... JOIN ... GROUP BY id;

, where the input to json_agg is a single result from the JOIN, we can write the following instead:

SELECT row_to_json(row(ARRAY[...])) FROM ... JOIN ...;

, which, if you examine the output of EXPLAIN, means no sorting because of the GROUP BY clause. The convenience of json_agg here doesn't really justify the significant slowdown caused by the aggregation function.

Note that the array constructed via ARRAY[] is properly converted to JSON, so the end result is again proper JSON.

This blog covers work, unix, tachikoma, scala, sbt, redis, postgresql, no-ads, lisp, kotlin, jvm, java, hardware, go, git, emacs, docker

View content from 2014-08, 2014-11, 2014-12, 2015-01, 2015-02, 2015-04, 2015-06, 2015-08, 2015-11, 2016-08, 2016-09, 2016-10, 2016-11, 2017-06, 2017-07, 2017-12, 2018-04, 2018-07, 2018-08, 2018-12, 2020-04, 2021-03

Content tagged postgresql

Slow DELETEs

Common Table Expressions

JSON result construction

Slow `DELETE`s