Kevin German
Published in• Originally) released onkaleman.netlify.app
#postgres # Database #sql #At sight
In this article, you will learn how to use some hidden PostgreSQL functions to get useful information about your queries running on PostgreSQL.
The problem
Have you tried to identify performance issues in your application? Maybe some of these are in the code (maybe a map with thousands of items...) or maybe the performance problem is due to something else: poorly defined SQL queries.
As a developer, one day, maybe sooner or later, you will have to deal with SQL. And you probably have to work with queries that other people have asked, or even queries that theOfthe created past.
The problem is that without the right tool and information, it's very difficult to identify a slow query. Because?
Some queries are slower with more data
For example, consider a simple query that joins multiple tables. At your site with probably 10 users, the query won't work (and if it does, it's easier to spot!).
Some queries require an index
Indexing is probably the main cause of performance issues. Both their absence and their presence can cause problems. With a small dataset, you can't see whether a query needs an index or not. Worse (or better, depending) PostgreSQL can omit the index from the query if the dataset is small enough to perform a sequential (i.e., row-by-row) scan.
If the problem does not occur in a production environment, then it is very difficult to identify such problems and there is a high possibility that the end-user will discover them before you.
This approach (waiting for the user to say the app is slow) is very reactive. You must wait for the problem to occur before working on a solution to the problem. But what if we can have that information?Beforedoes the problem occur?
This scenario is why some PostgreSQL views exist. These maintenance views are a gold mine for developers who want to track the performance of their queries. Let's talk more about them!
The solution: PostgreSQL maintenance views
PostgreSQL has many views for this purpose. Some of them give us statistics about disk I/O and network statistics. Others allow us to see replication statistics and the like. Here we talk about three views that can help you solve query problems:pg_stat_user_tables
,pg_stat_user_indexes
mipg_stat_statements
.
pg_stat_user_tables
This view shows statistics about each table by schema (there is one row per table) and provides information such as the number ofsequentielle Scans
that the PG was presented in the table how muchselect/insert
Operations are performed on it and so on
As you can see here, you did 1 for the first rowsequential scanning
and this scan returned 939 rows. There were 2 index scans and they returned 2 rows. The numbers are low because I'm using a local database, but these numbers should be higher in a production database.
From this point of view, in addition to all the useful information, we can answer something really interesting:Which of my tables need an index?You can easily answer this question by referring to theseq_scan
miseq_tup_read
Columns!
choose scheme name, however name, seq_scan, seq_tup_read, seq_tup_read / seq_scan if Average, idx_scanvon pg_stat_user_tablesWo seq_scan > 0command von seq_tup_read Description Border 25;
Running this query returns the following
As you can see, it's a good idea to add an index to these tables because they've recently been used in sequential scans. With more data and more execution time, this query gives you a good overview of how your tables are behaving.
pg_stat_user_indexes
While adding indexes solves many problems, they're not the holy grail and come at a price: disk space. The results are good, yes, we all agree on that. But worse than having no index is having a useless one. Because? First of all, it will take up disk space on your database server. Indexes on large tables can be very expensive and get very, very large. The second reason is that the index needs to be recalculated each time it is written to the table. Of course, recalculating a useless index is like paying for food you don't eat!
So if you add a table of contents, make sure it makes sense.
But what if you're working on a code base and database schema that you didn't design? Is this the end of the world? Absolutely! PostgreSQL views back to the rescue! EITHERpg_stat_user_indexes
Table can show you thatfrequency of useof their indices along with the space they occupy.
As you can see in the image above, some of my primary keys are not used yet. But that doesn't give us many details yet. Because we don't know how much space our index occupies! We may obtain this information using thepg_relation_size
work with himindexrelid
our results.
choose scheme name, however name, indexrelname, idx_scan, pg_size_bonita(pg_relation_size(indexrelid)) if size_idx,pg_size_bonita(Soma(pg_relation_size(indexrelid))one (command von idx_scan, indexrelid)) if in totalvon pg_stat_user_indexescommand von 6;
The output of this query shows indexes that have not been used in a while, along with their space consumption. This can give you an idea of what indexes to look for.
Note that the result of this query does not mean that you should drop all unused indexes. You should always investigate why the index is not in use before deleting it!
pg_stat_statements
This is probably the most useful. It's hard to understand why this view isn't enabled by default! This vision must becapablein the PostgreSQL configuration for use.
activate
To enable this view we need to add it to theshared_preloaded_libraries
List. Since I'm using Docker & Docker Compose to manage my database, I can just add an option to the start command to make it look like this:
postgr: Containername: postgr Bild: Postgres: 10 Continue: Always doors: - "5432:5432" Surroundings: - POSTGRES_PASSWORD=${PG_PASSWORD:-postgres} - PGDATA='/var/lib/postgresql/data' Domain: - "postgres" - "-C" - "shared_preload_libraries=pg_stat_statements"
After that, when you start PostgreSQL again, the library will be loaded with the DBMS
create extension
After you activate the library, you need to activate it as an extension. You can do this by running the following query
create renewal pg_stat_statements;
If this query doesn't return an error, you're done! Let's confirm this by running:
choose * von pg_stat_statements;
From this view we can get very good information about the performance of our queries. For example, we have the number ofCalls
I have a specific question. EITHERhalftime
of execution between all calls and even thatstddev_time
(standard deviation) of calls to see if the queries have a constant execution time or vary by how much.
In this view, you can even see how many rows a query returned, whether those rows came from cache or disk, and so on!
With all this information, it's easy to get a list of the most expensive requests and why.
choose repeat(( 100 * total time / Soma(total time) one ())::numeric, 2) percent, repeat(total time::numeric, 2) if in total, Calls, repeat(halftime::numeric, 2) if mean, stddev_time, Subchain(Advice, 1, 40) if Advicevon pg_stat_statementscommand von total time DESCRBorder 10;
With this query, you now have a list of the top 10 most expensive queries, how long they took, how often they were called, and the average time variance for those queries.
That way you can keep track of which queries are taking the longest and try to fix them (or at least understand why they're working the way they do).
Diploma
Using PostgreSQL to monitor PostgreSQL is very useful and can direct you to the right place to understand your application's performance and any issues you may be having.
I hope you enjoyed the article and learned something from it!
Note: This article was also published inmy blog. Still trying to find a good domain name for it lol