(walk you through various natural sorting techniques in MySQL using the ORDER BY clause.)
The intuitive reading of numbers differs from how MySQL represents data internally, so natural sorting may be needed to display data in a more readable way. Natural sorting follows these principles:
- Sort numerical values first
- Follow with text values
- Ignore leading zeros and letter case
- Sort symbols and special characters using ASCII or Unicode values
Here's a table that illustrates the natural sorting principles:
Original Values | Naturally Sorted | Explanation |
001
15
5
200 | 200
15
5
001 | Numbers are sorted first, leading zeros ignored |
Apple
apple
Banana | Apple
apple
Banana | Text values follow numbers, case is ignored in sorting |
@data
#info
$value | #info
$value
@data | Special characters are sorted by ASCII/Unicode values |
This demonstrates how natural sorting organizes data in a more intuitive and readable way, different from MySQL's default internal representation.
Before exploring natural sorting, we need to understand two key concepts: REGEXP
and CAST.
REGEXP (regular expression) is a pattern-matching syntax used to define text patterns for computer processing, and it comes in 6 different types:
Name | Description |
REGEXP | Return 1 if a string matches a regular expression or 0 otherwise. |
REGEXP_INSTR() | Return the starting index of a substring that matches a regular expression. |
REGEXP_LIKE() | Test if a string matches a regular expression. |
REGEXP_REPLACE() | Replace substrings that match a regular expression. |
REGEXP_SUBSTR() | Extract a substring that matches a regular expression. |
RLIKE | Determine if a string matches a regular expression. |
For the purpose of explaining the natural sorting, I will used the REGEXP_SUBSTR()
The CAST function in MySQL is used to convert a value from one data type to another. It's particularly useful when you need to ensure data consistency or perform operations that require specific data types. The basic syntax is:
CAST(expression AS datatype)
expression
: The value or result you want to convert.data_type
: The target data type you want to convert the value into.
To better understand the working of the natural sorting, a items
table will be created with the following details
CREATE TABLE items (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
Second, insert some rows into the items
table:
INSERT INTO items(name)
VALUES ('1'),
('1C'),
('10Z'),
('2A'),
('2'),
('3C'),
('20D');
Third, query data from the items
table sorted by the values in the name
column:
SELECT name
FROM items
ORDER BY
name;
The output
+------+
| name |
+------+
| 1 |
| 10Z |
| 1C |
| 2 |
| 20D |
| 2A |
| 3C |
+------+
7 rows in set (0.00 sec)
Looking at the output, the result is not what we expected as detail in the natural sorting, and unfortunately, MYSQL does not provide any built in natural sorting index or function.
To perform a natural sorting, you can use various techniques. The following example shows how to use regular expressions to perform natural sorting.
MySQL natural sorting using regular expressions
First, split the name column into two parts using the REGEXP_SUBSTR() function:
SELECT
name,
CAST(REGEXP_SUBSTR(name, '^\\d+') AS SIGNED) prefix,
REGEXP_SUBSTR(name, '\\D$') suffix
FROM
items;
+------+--------+--------+
| name | prefix | suffix |
+------+--------+--------+
| 1 | 1 | NULL |
| 1C | 1 | C |
| 10Z | 10 | Z |
| 2A | 2 | A |
| 2 | 2 | NULL |
| 3C | 3 | C |
| 20D | 20 | D |
+------+--------+--------+
7 rows in set (0.00 sec)
As a result, the prefix
column stores the number part and suffix
column stores the alphabetical part of the values in the name
column.
Second, sort the values in the name
column by the prefix
and suffix
columns:
SELECT
name,
CAST(
REGEXP_SUBSTR(name, '^\\d+') AS SIGNED
) prefix,
REGEXP_SUBSTR(name, '\\D$') suffix
FROM
items
ORDER BY
prefix,
suffix;
+------+--------+--------+
| name | prefix | suffix |
+------+--------+--------+
| 1 | 1 | NULL |
| 1C | 1 | C |
| 2 | 2 | NULL |
| 2A | 2 | A |
| 3C | 3 | C |
| 10Z | 10 | Z |
| 20D | 20 | D |
+------+--------+--------+
7 rows in set (0.01 sec)
If you don’t want to have the prefix
and suffix
columns in the output, you can remove it as follows:
SELECT
name
FROM
items
ORDER BY
CAST(
REGEXP_SUBSTR(name, '^\\d+') AS SIGNED
),
REGEXP_SUBSTR(name, '\\D$')
+------+
| name |
+------+
| 1 |
| 1C |
| 2 |
| 2A |
| 3C |
| 10Z |
| 20D |
+------+
7 rows in set (0.00 sec)