PostgreSQL is designed especially to work with large datasets, and thus Postgres is a perfect match for Data Science. In this article, we’ll go through some of my personally frequently used and useful Postgre commands.
There are many PostgreSQL tutorials on the net that describe the basic commands. But when you dive deeper into the work, practical questions arise that require advanced teams.
Such commands, or snippets, are rarely documented. Let’s look at a few examples, useful for both developers, data scientists, and database administrators.
📃Getting information about a PostgreSQL database📃
👉 Database size
To get the physical size of the database files (storage), we use the following query:
The result will be presented as a number of the kind
current_database()- a function that returns the name of the current database. Instead, you can enter a name in the text:
In order to get information in human-readable form, we use the function
As a result, we get information in the form
👉 List of PostgreSQL tables
Sometimes you want to get a list of database tables. To do this, use the following query:
information_schema is a standard database schema that contains collections of views, such as tables, fields, etc. Table views contain information about all tables in a database.
The query below will select all tables from the specified schema of the current database:
In the last condition,
INyou can specify the name of a specific schema.
👉 PostgreSQL table size
By analogy with getting the size of the database, the size of the table data can be calculated using the corresponding function:
pg_relation_sizereturns the space that the specified layer of the given table or index occupies on disk.
👉 The name of the largest table in PostgreSQL
In order to display a list of tables in the current database sorted by table size, execute the following query:
In order to display information about the largest table, we restrict the query with
relname- the name of the table, index, view, etc.
relpages- the size of the representation of this table on disk in the number of pages (by default one page is equal to 8 KB).
pg_class- a system table that contains information about the relationships between the database tables.
👉 List of connected users
To find out the name, IP and port used by the connected users, run the following query:
👉 User activity
To find out the connection activity of a specific user, we use the following query:
🧾Working with data and table fields🧾
👉 Removing duplicate lines
If it so happens that the table does not have a primary key, then there are likely to be duplicated among the records. If for such a table, especially a large one, it is necessary to set constraints to check the integrity, then remove the following elements:
- duplicate lines,
- situations when one or more columns are duplicated (if these columns are supposed to be used as a primary key).
Consider a table with customer data, where a whole row is duplicated (the second in a row).
The following query will help to remove all duplicates:
The field that is unique to each record is
ctidhidden by default, but it is present in every table.
The last request is resource-intensive, so be careful when executing it on a production project.
Now consider the case where field values are repeated.
If it is possible to delete duplicates without saving all the data, we will execute the following query:
If the data is important, then first you need to find records with duplicates:
Before deleting such records, you can move to a temporary table or replace the value in them
The general form of a request to delete the records described above is as follows:
👉 Change field type safely
The question may arise about including such a task on this list. After all, in PostgreSQL, it is very easy to change the type of a field using the command
ALTER. Let's take a look at the customer table again as an example.
customer_iduses a string data type
varchar. This is an error because this field is supposed to store customer IDs, which are in integer format
integer. Use is
varcharunjustified. Let's try to correct this misunderstanding using the command
But (!) as a result of execution we get an error:
ERROR: column “customer_id” cannot be cast automatically to type integer
SQL state: 42804
Hint: Specify a USING expression to perform the conversion.
This means that you cannot just take and change the type of a field if there is data in the table. Since the type was used
varchar, the DBMS cannot determine whether the value belongs to
integer. Although the data corresponds exactly to this type. In order to clarify this point, the error message suggests using an expression
USINGto correctly convert our data to
As a result, everything went without errors:
Please note that when used,
USINGin addition to a specific expression, it is possible to use functions, other fields, and operators.
For example, let’s convert the field
varchar, but with the data format conversion:
As a result, the table will look like this:
👉 Finding “lost” values in PostgreSQL
Be careful when using sequences as the primary key (!): when assigning, some elements of the sequence are accidentally skipped, as a result of working with the table, some records are deleted. These values can be used again, but they are difficult to find in large tables.
Let’s consider two search options.
1️⃣ Method One
Let’s execute the following query to find the beginning of the interval with the “lost” value:
The result values
If you need to find not only the first occurrence but all missing values, we use the following (resource-intensive!) query:
As a result, we see the following results:
2️⃣ Method Two
Get the name of the sequence associated with
And we find all the missing identifiers:
👉 Counting the number of rows in a table
The number of lines is calculated by the standard function
count, but it can be used with additional conditions.
A total number of rows in the table:
The number of lines provided that the specified field does not contain
The number of unique lines for the specified field:
👉 Using transactions in PostgreSQL
A transaction combines a sequence of actions into one operation. Its peculiarity is that if there is an error in the execution of a transaction, none of the results of actions will be saved in the database.
Let’s start a transaction using the command
In order to roll back all operations located after
BEGIN, use the command
And to apply — a command
👉 Viewing and Completing Executable Requests
In order to get information about requests, run the following command:
In order to stop a specific request, execute the following command, indicating the process id (PID):
In order to terminate the request, run:
📝Working with configuration
👉 Finding and Changing the Location of a Cluster Instance
A situation is possible when several PostgreSQL instances are configured on one operating system, which “sit” on different ports. In this case, finding a path to the physical location of each instance is a rather nerve-racking task. In order to get this information, we will execute the following query for any database of the cluster of interest:
Let’s change the location to something else using the command:
But a reboot is required (!) for the changes to take effect.
👉 Getting a list of available data types
Let’s get a list of available data types using the command:
SELECT typname, typlen from pg_type where typtype='b';
typnameis the name of the data type.
typlenis the size of the data type.
👉 Changing DBMS settings without rebooting
PostgreSQL settings are located in special files like
pg_hba.conf. After changing these files, the DBMS needs to get the settings again. To do this, the database server is restarted. It is clear that you have to do this, but on the production version of the project, which is used by thousands of users, this is very undesirable. Therefore, PostgreSQL has a function with which you can apply changes without restarting the server:
But, unfortunately, it does not apply to all parameters. In some cases, a reboot is required to apply the settings.
We’ve covered commands to help make things easier for developers and DBAs using PostgreSQL. But these are not all possible techniques. If you have encountered interesting problems, write about them in the comments. Let’s share a useful experience!
If you found this article helpful, click the💚 or 👏 button below or share the article on Facebook so your friends can benefit from it too.
My other stories about SQL: