Natural sorting 
Natural sorting 

Natural sorting 

(walk you through various natural sorting techniques in MySQL using the ORDER BY clause.)

The intuitive reading of numbers differs from how MySQL represents data internally, so natural sorting may be needed to display data in a more readable way. Natural sorting follows these principles:

  • Sort numerical values first
  • Follow with text values
  • Ignore leading zeros and letter case
  • Sort symbols and special characters using ASCII or Unicode values

Here's a table that illustrates the natural sorting principles:

Original Values
Naturally Sorted
Explanation
001 15 5 200
200 15 5 001
Numbers are sorted first, leading zeros ignored
Apple apple Banana
Apple apple Banana
Text values follow numbers, case is ignored in sorting
@data #info $value
#info $value @data
Special characters are sorted by ASCII/Unicode values

This demonstrates how natural sorting organizes data in a more intuitive and readable way, different from MySQL's default internal representation.

💡

Before exploring natural sorting, we need to understand two key concepts: REGEXP and CAST.

REGEXP (regular expression) is a pattern-matching syntax used to define text patterns for computer processing, and it comes in 6 different types:

Name
Description
REGEXP
Return 1 if a string matches a regular expression or 0 otherwise.
REGEXP_INSTR()
Return the starting index of a substring that matches a regular expression.
REGEXP_LIKE()
Test if a string matches a regular expression.
REGEXP_REPLACE()
Replace substrings that match a regular expression.
REGEXP_SUBSTR()
Extract a substring that matches a regular expression.
RLIKE
Determine if a string matches a regular expression.

For the purpose of explaining the natural sorting, I will used the REGEXP_SUBSTR()

The CAST function in MySQL is used to convert a value from one data type to another. It's particularly useful when you need to ensure data consistency or perform operations that require specific data types. The basic syntax is:

CAST(expression AS datatype)
  • expression: The value or result you want to convert.
  • data_type: The target data type you want to convert the value into.

To better understand the working of the natural sorting, a items table will be created with the following details

CREATE TABLE items (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);

Second, insert some rows into the items table:

INSERT INTO items(name)
VALUES ('1'),
       ('1C'),
       ('10Z'),
       ('2A'),
       ('2'),
       ('3C'),
       ('20D');

Third, query data from the items table sorted by the values in the name column:

SELECT name
FROM items
ORDER BY
name;

The output

+------+
| name |
+------+
| 1    |
| 10Z  |
| 1C   |
| 2    |
| 20D  |
| 2A   |
| 3C   |
+------+
7 rows in set (0.00 sec)

Looking at the output, the result is not what we expected as detail in the natural sorting, and unfortunately, MYSQL does not provide any built in natural sorting index or function.

💡

To perform a natural sorting, you can use various techniques. The following example shows how to use regular expressions to perform natural sorting.

MySQL natural sorting using regular expressions

First, split the name column into two parts using the REGEXP_SUBSTR() function:

SELECT
  name, 
  CAST(REGEXP_SUBSTR(name, '^\\d+') AS SIGNED) prefix, 
  REGEXP_SUBSTR(name, '\\D$') suffix 
FROM
  items;
+------+--------+--------+
| name | prefix | suffix |
+------+--------+--------+
| 1    |      1 | NULL   |
| 1C   |      1 | C      |
| 10Z  |     10 | Z      |
| 2A   |      2 | A      |
| 2    |      2 | NULL   |
| 3C   |      3 | C      |
| 20D  |     20 | D      |
+------+--------+--------+
7 rows in set (0.00 sec)

As a result, the prefix column stores the number part and suffix column stores the alphabetical part of the values in the name column.

Second, sort the values in the name column by the prefix and suffix columns:

SELECT 
  name, 
  CAST(
    REGEXP_SUBSTR(name, '^\\d+') AS SIGNED
  ) prefix, 
  REGEXP_SUBSTR(name, '\\D$') suffix 
FROM 
  items 
ORDER BY 
  prefix, 
  suffix;
+------+--------+--------+
| name | prefix | suffix |
+------+--------+--------+
| 1    |      1 | NULL   |
| 1C   |      1 | C      |
| 2    |      2 | NULL   |
| 2A   |      2 | A      |
| 3C   |      3 | C      |
| 10Z  |     10 | Z      |
| 20D  |     20 | D      |
+------+--------+--------+
7 rows in set (0.01 sec)
Code Explanation

If you don’t want to have the prefix and suffix columns in the output, you can remove it as follows:

SELECT 
  name 
FROM 
  items 
ORDER BY 
  CAST(
    REGEXP_SUBSTR(name, '^\\d+') AS SIGNED
  ), 
  REGEXP_SUBSTR(name, '\\D$')
+------+
| name |
+------+
| 1    |
| 1C   |
| 2    |
| 2A   |
| 3C   |
| 10Z  |
| 20D  |
+------+
7 rows in set (0.00 sec)