MySQL: Charset, Collation and UCA
This note is about sorting or filtering characters.
Collations
Collations like utf8mb4_unicode_520_ci and utf8mb4_0900_ai_ci are based on Unicode Collation Algorithm (UCA). The number in the collation defines the UCA version:
0900
: UCA Version 9.0.0 http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt520
: UCA Version 5.2.0 http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt
Case and accent sensitive:
ci
: case insensitive.cs
: case sensitiveai
: accent insensitiveas
: accent sensitive
Sample snippet code
CREATE TABLE collation_ex (
id int NOT NULL AUTO_INCREMENT,
0900_ai_ci varchar(50) CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
unicode_520_ci varchar(50) CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
general_ci varchar(50) CHARACTER SET utf8mb4
COLLATE utf8mb4_general_ci DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
SELECT 'е'='ё' collate utf8mb4_unicode_520_ci; # 1
SELECT 'е'='ё' collate utf8mb4_0900_ai_ci; # 1
SELECT 'е'='ё' collate utf8mb4_0900_as_ci; # 0
Read more
https://lefred.be/content/mysql-character-sets-unicode-and-uca-compliant-collations/
Last updated