PostgreSQL — 15 Most Useful Commands with Examples

PostgreSQL is designed especially to work with large datasets, and thus Postgres is a perfect match for Data Science. In this article, we’ll go through some of my personally frequently used and useful Postgre commands.

PostgreSQL — 15 Most Useful Commands with Examples
PostgreSQL — 15 Most Useful Commands with Examples
Source: iconfinder

There are many PostgreSQL tutorials on the net that describe the basic commands. But when you dive deeper into the work, practical questions arise that require advanced teams.

Such commands, or snippets, are rarely documented. Let’s look at a few examples, useful for both developers, data scientists, and database administrators.

📃Getting information about a PostgreSQL database📃

👉 Database size

To get the physical size of the database files (storage), we use the following query:

The result will be presented as a number of the kind 41809016.

current_database()- a function that returns the name of the current database. Instead, you can enter a name in the text:

In order to get information in human-readable form, we use the function pg_size_pretty:

As a result, we get information in the form 40 Mb.

👉 List of PostgreSQL tables

Sometimes you want to get a list of database tables. To do this, use the following query:

information_schema is a standard database schema that contains collections of views, such as tables, fields, etc. Table views contain information about all tables in a database.

The query below will select all tables from the specified schema of the current database:

In the last condition, INyou can specify the name of a specific schema.

👉 PostgreSQL table size

By analogy with getting the size of the database, the size of the table data can be calculated using the corresponding function:

The function pg_relation_sizereturns the space that the specified layer of the given table or index occupies on disk.

👉 The name of the largest table in PostgreSQL

In order to display a list of tables in the current database sorted by table size, execute the following query:

In order to display information about the largest table, we restrict the query with LIMIT:

relname- the name of the table, index, view, etc.
relpages- the size of the representation of this table on disk in the number of pages (by default one page is equal to 8 KB).
pg_class- a system table that contains information about the relationships between the database tables.

👉 List of connected users

To find out the name, IP and port used by the connected users, run the following query:

👉 User activity

To find out the connection activity of a specific user, we use the following query:

🧾Working with data and table fields🧾

👉 Removing duplicate lines

If it so happens that the table does not have a primary key, then there are likely to be duplicated among the records. If for such a table, especially a large one, it is necessary to set constraints to check the integrity, then remove the following elements:

  • duplicate lines,
  • situations when one or more columns are duplicated (if these columns are supposed to be used as a primary key).

Consider a table with customer data, where a whole row is duplicated (the second in a row).

PostgreSQL — duplicated rows
PostgreSQL — duplicated rows
Duplicated rows. Source: author

The following query will help to remove all duplicates:

The field that is unique to each record is ctidhidden by default, but it is present in every table.

The last request is resource-intensive, so be careful when executing it on a production project.

Now consider the case where field values ​​are repeated.

PostgreSQL — repeated values
PostgreSQL — repeated values
Repeated values case. Source: author

If it is possible to delete duplicates without saving all the data, we will execute the following query:

If the data is important, then first you need to find records with duplicates:

PostgreSQL — removing duplicates
PostgreSQL — removing duplicates
Result. Source: author

Before deleting such records, you can move to a temporary table or replace the value in them customer_idwith another.

The general form of a request to delete the records described above is as follows:

👉 Change field type safely

The question may arise about including such a task on this list. After all, in PostgreSQL, it is very easy to change the type of a field using the command ALTER. Let's take a look at the customer table again as an example.

The field customer_iduses a string data type varchar. This is an error because this field is supposed to store customer IDs, which are in integer format integer. Use is varcharunjustified. Let's try to correct this misunderstanding using the command ALTER:

But (!) as a result of execution we get an error:

ERROR: column “customer_id” cannot be cast automatically to type integer
SQL state: 42804
Hint: Specify a USING expression to perform the conversion.

This means that you cannot just take and change the type of a field if there is data in the table. Since the type was used varchar, the DBMS cannot determine whether the value belongs to integer. Although the data corresponds exactly to this type. In order to clarify this point, the error message suggests using an expression USINGto correctly convert our data to integer:

As a result, everything went without errors:

PostgreSQL table with changed field table. Source: author
PostgreSQL table with changed field table. Source: author
PostgreSQL table with changed field table. Source: author

Please note that when used, USINGin addition to a specific expression, it is possible to use functions, other fields, and operators.

For example, let’s convert the field customer_idback to varchar, but with the data format conversion:

As a result, the table will look like this:

PostgreSQL table with changed field table. Source: author
PostgreSQL table with changed field table. Source: author
PostgreSQL table with changed field table. Source: author

👉 Finding “lost” values in PostgreSQL

Be careful when using sequences as the primary key (!): when assigning, some elements of the sequence are accidentally skipped, as a result of working with the table, some records are deleted. These values ​​can be used again, but they are difficult to find in large tables.

Example PostgreSQL table. Source: author
Example PostgreSQL table. Source: author
Example PostgreSQL table. Source: author

Let’s consider two search options.

1️⃣ Method One
Let’s execute the following query to find the beginning of the interval with the “lost” value:

The result values 5, 9and 11.

If you need to find not only the first occurrence but all missing values, we use the following (resource-intensive!) query:

As a result, we see the following results: 5, 9and 6.

2️⃣ Method Two
Get the name of the sequence associated with customer_id:

And we find all the missing identifiers:

👉 Counting the number of rows in a table

The number of lines is calculated by the standard function count, but it can be used with additional conditions.

A total number of rows in the table:

The number of lines provided that the specified field does not contain NULL:

The number of unique lines for the specified field:

👉 Using transactions in PostgreSQL

A transaction combines a sequence of actions into one operation. Its peculiarity is that if there is an error in the execution of a transaction, none of the results of actions will be saved in the database.

Let’s start a transaction using the command BEGIN.

In order to roll back all operations located after BEGIN, use the command ROLLBACK.

And to apply — a command COMMIT.

👉 Viewing and Completing Executable Requests

In order to get information about requests, run the following command:

In order to stop a specific request, execute the following command, indicating the process id (PID):

In order to terminate the request, run:

📝Working with configuration

👉 Finding and Changing the Location of a Cluster Instance

A situation is possible when several PostgreSQL instances are configured on one operating system, which “sit” on different ports. In this case, finding a path to the physical location of each instance is a rather nerve-racking task. In order to get this information, we will execute the following query for any database of the cluster of interest:

Let’s change the location to something else using the command:

But a reboot is required (!) for the changes to take effect.

👉 Getting a list of available data types

Let’s get a list of available data types using the command:

SELECT typname, typlen from pg_type where typtype='b';

typnameis the name of the data type.
typlenis the size of the data type.

👉 Changing DBMS settings without rebooting

PostgreSQL settings are located in special files like postgresql.confand pg_hba.conf. After changing these files, the DBMS needs to get the settings again. To do this, the database server is restarted. It is clear that you have to do this, but on the production version of the project, which is used by thousands of users, this is very undesirable. Therefore, PostgreSQL has a function with which you can apply changes without restarting the server:

But, unfortunately, it does not apply to all parameters. In some cases, a reboot is required to apply the settings.

✨Outcome ✨

We’ve covered commands to help make things easier for developers and DBAs using PostgreSQL. But these are not all possible techniques. If you have encountered interesting problems, write about them in the comments. Let’s share a useful experience!

Read More

If you found this article helpful, click the💚 or 👏 button below or share the article on Facebook so your friends can benefit from it too.

My other stories about SQL:

Written by

Bioinformatician at Oncobox Inc. (@oncobox). Research Associate at Moscow Institute of Physics and Technology (@mipt_eng).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store