ᚱᛗ
© 2022
Powered by Hugo

Indexes

Table of Contents

Indexes

Concept and Purpose

Indexes are a common way to enhance database performance. An index allows the database server to find and retrieve specific rows much faster than it could do without an index. But indexes also add overhead to the database system as a whole, so they should be used sensibly.

After an index is created, the system has to keep it synchronized with the table. This adds overhead to data manipulation operations. Therefore indexes that are seldom or never used in queries should be removed. It is the task of the database programmer to foresee which indexes will be useful.

Indexes are special tables that, unlike normal data tables, are kept in a specific order. Instead of containing all of the data about an entity, however, an index contains only the column (or columns) used to locate rows in the data table, along with a pointer that indicates where the rows are physically located. Therefore, the role of indexes is to facilitate the retrieval of a subset of a table’s rows and columns without the need to inspect every row in the table.

Indexes can also benefit UPDATE and DELETE commands with search conditions. Indexes can moreover be used in join searches. Thus, an index defined on a column that is part of a join condition can also significantly speed up queries with joins.

Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will use the index in queries when it thinks doing so would be more efficient than a sequential table scan. But you might have to run the ANALYZE command regularly to update statistics to allow the query planner to make educated decisions.

Motivating Example

Assume the following table:

CREATE TABLE test1 (
    id integer,
    content varchar
);

In this table, assume one of the most popular queries is the following:

SELECT content FROM test1 WHERE id = constant;

With no advance preparation, every time this type of query runs the system would have to scan the entire test1 table, row by row, to find all matching entries—ie. the search would take place in linear time. Whereas with an index, it might only have to walk a few levels deep into a search tree.

Creating and Removing an Index

Indexes can be added to and removed from tables at any time:

-- creating an index on the id column
CREATE INDEX test1_id_index ON test1 (id);
-- dropping an index
DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

B-trees

By default, the CREATE INDEX command creates B-tree indexes, which fit the most common situations. B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a comparison using one of these operators: < <= = >= >

Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also be implemented with a B-tree index search. Also, an IS NULL or IS NOT NULL condition on an index column can be used with a B-tree index.

Multi-column Indexes

An index can be defined on more than one column of a table. For example, if you have a table of this form:

CREATE TABLE test2 (
  major int,
  minor int,
  name varchar
);

And you frequently issue queries like:

SELECT name FROM test2 WHERE major = constant AND minor = constant;

Then it might be appropriate to define an index on the columns major and minor together, e.g.:

CREATE INDEX test2_mm_idx ON test2 (major, minor);

A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned.

For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they’d still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

NOTE: Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.