3. MySQL Rules - Index Rules - 《[英文]Alibaba Java Coding Guidelines》

1. [Mandatory] Unique index should be used if business logic is applicable.

Note: Negative impact of unique indices on insert efficiency is neglectable, but it improves query speed significantly. Additionally, even if complete check is done at the application layer, as per Murphy’s Law, dirty data might still be produced, as long as there is no unique index.

2. [Mandatory] JOIN is not allowed if more than three tables are involved. Columns to be joined must be with absolutely similar data types. Make sure that columns to be joined are indexed.

Note: Indexing and SQL performance should be considered even if only 2 tables are joined.

3. [Mandatory] Index length must be specified when adding index on varchar columns. The index length should be set according to the distribution of data.

Note: Normally for char columns, an index with the length of 20 can distinguish more than 90% data, which is calculated by count(distinct left(column_name, index_length)) / count()*.

4. [Mandatory] LIKE ‘%…’ or LIKE ‘%…%’ are not allowed when searching with pagination. Search engine can be used if it is really needed.

Note: Index files have B-Tree’s left most prefix matching characteristic. Index cannot be applied if left prefix value is not determined.

5. [Recommended] Make use of the index order when using ORDER BY clauses. The last columns of ORDER BY clauses should be at the end of a composite index. The reason is to avoid the file_sort issue, which affects the query performance.

Positive example: where a=? and b=? order by c; Index is: a_b_c

Counter example: The index order will not take effect if the query condition contains a range, e.g., where a>10 order by b; Index a_b cannot be activated.

6. [Recommended] Make use of Covering Index for query to avoid additional query after searching index.

Note: If we need to check the title of Chapter 11 of a book, do we need turn to the page where Chapter 11 starts? No, because the table of contents actually includes the title, which serves as a covering index.

Positive example: Index types include primary key index, unique index and common index. Covering index pertains to a query effect. When refer to explain result, using index may appear in extra columns.

7. [Recommended] Use late join or sub-query to optimize scenarios with many pages.

Note: Instead of bypassing offset rows, MySQL retrieves totally offset+N rows, then drops off offset rows and returns N rows. It is very inefficient when offset is very big. The solution is either limiting the number of pages to be returned, or rewriting SQL statement when page number exceeds a predefined threshold.

Positive example: Firstly locate the required id range quickly, then join:
select a. from table1 a, (select id from table1 where some_condition* LIMIT 100000, 20) b where a.id=b.id;

8. [Recommended] The target of SQL performance optimization is that the result type of EXPLAIN reaches REF level, or RANGE at least, or CONSTS if possible.

Counter example: Pay attention to the type of INDEX in EXPLAIN result because it is very slow to do a full scan to the database index file, whose performance nearly equals to an all-table scan.

CONSTS: There is at most one matching row, which is read by the optimizer. It is very fast.

REF: The normal index is used.

RANGE: A given range of index are retrieved, which can be used when a key column is compared to a constant by using any of the =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, or IN() operators.

9. [Recommended] Put the most discriminative column to the left most when adding a composite index.

Positive example: For the sub-clause where a=? and b=?, if data of column a is nearly unique, adding index idx_a is enough.

Note: When equal and non-equal check both exist in query conditions, put the column in equal condition first when adding an index. For example, where a>? and b=?, b should be put as the 1st column of the index, even if column a is more discriminative.

10. [For Reference] Avoid listed below misunderstandings when adding index:
1) It is false that each query needs one index.
2) It is false that index consumes story space and degrades update, insert operations significantly.
3) It is false that unique index should all be achieved from application layer by “check and insert”.