Oracle allows text to be sorted according to the linguistic rules of each language.

It’s important to understand that each writing system follows different collation rules.

For example, in Norwegian, there are three characters that don’t exist in the standard Latin alphabet. In the Norwegian alphabet, these letters appear at the end, in this order: Æ, Ø, Å.

In languages like Spanish and Catalan, characters such as Ç and Ñ also have specific positions: they come right after C and N, respectively.

However, despite being quite similar, Spanish and Catalan have important differences. For instance, in Catalan, the combination “CH” is not considered a separate letter, whereas in Spanish, it traditionally was. Therefore, when sorting text, each language will require a different approach.

 SQL> select c1 nombre
  2	 from (values
  3	 ('Maa'), ('Mab'), ('Maca'), ('Maco'), ('Macu'), ('Mach'), ('Mad')
  4	 )
  5	 t ( c1 )
  6	order by c1 collate spanish;

NOMBRE
----------
Maa
Mab
Maca
Maco
Macu
Mach   -- The letters CH are sorted after C in Spanish
Mad

7 rows selected.

SQL> select c1 nombre
  2	 from (values
  3	 ('Maa'), ('Mab'), ('Maca'), ('Maco'), ('Macu'), ('Mach'), ('Mad')
  4	 )
  5	 t ( c1 )
  6	order by c1 collate catalan;

NOMBRE
----------
Maa
Mab
Maca
Mach   -- In Catalan, "CH" is not considered a letter and it's sorted between G and I.
Maco
Macu
Mad

7 rows selected.

Not only that, but Oracle also allows collation to be case-insensitive and accent-insensitive, based on how these elements are positioned within the alphabet.

This means it’s not necessary to use character conversion functions like UPPER(), LOWER(), TRANSLATE(), or REPLACE() to retrieve a set of rows sorted according to Spanish, Catalan, or English rules.

Let’s look at an example.

Suppose we have the following list of values, which includes uppercase and lowercase letters, accented characters (both acute and grave), the letter Ç (ce with cedilla), and Ñ (eñe):

‘Maçia’, ‘Mañia’, ‘Mabia’, ‘Macia’, ‘Mania’, ‘MAria’, ‘Maria’, ‘MÁria’, ‘MÉria’, ‘MÍria’, ‘MÓria’, ‘MÚria’, ‘maria’, ‘mária’, ‘méria’, ‘míria’, ‘mória’, ‘múria’, ‘Maxia’, ‘MEria’, ‘Meria’, ‘meria’, ‘MIria’, ‘Miria’, ‘miria’, ‘MOria’, ‘Moria’, ‘moria’, ‘MUria’, ‘Muria’, ‘muria’

Sorting with LOWER(name) won’t work correctly. Neither will using REGEXP or similar functions.

You don’t need them.

In the following example —without even needing a physical table— we generate a dynamic table from the list of values.
Using COLLATE with the SPANISH_AI collation, we indicate that we want to sort based on Spanish linguistic rules, while being accent-insensitive and case-insensitive.

select c1 nombre
 from (values 
    ('Maçia'), ('Mañia'), ('Mabia'), ('Macia'), ('Mania'), ('MAria'), ('Maria'),
    ('MÁria'), ('MÉria'), ('MÍria'), ('MÓria'), ('MÚria'), ('maria'), ('mária'),
    ('méria'), ('míria'), ('mória'), ('múria'), ('Maxia'), ('MEria'), ('Meria'),
    ('meria'), ('MIria'), ('Miria'), ('miria'), ('MOria'), ('Moria'), ('moria'),
    ('MUria'), ('Muria'), ('muria')
    )    t ( c1 ) 
order by c1 collate spanish_ai;
	
NOMBRE
----------
Mabia
Maçia   -- Right placed of Ç, between B and C.
Macia
Mania
Mañia   -- Right placed Ñ just after N.
Maria
MAria
Mária   -- A, a, Á, á, they are considered and sorted at same level
mária
maria
Maxia
meria
Meria
méria
MÉria
MEria
MÍria
míria
miria
Miria
MIria
MÓria
moria
MOria
mória
Moria
muria
MÚria
múria
Muria
MUria

31 rows selected.

Need a specific sorting logic? Here's what to do.

If you need to sort based on a criterion that is neither orthographic nor linguistic, you should use a numbered column to define the order or position of each row relative to the others — the classic SORT_NUMBER column, or something similar, to apply an arbitrary numeric sort.

If you want a specific character to be sorted in a custom way, my recommendation is to create a custom collation or modify one of the standard ones for the target language.

To do that, you can use the Oracle Locale Builder utility.

[oracle@localhost ~]$ lbuilder

This utility allows you to load the local definition files and configure how you want Oracle to behave based on the language, territory, character set, and linguistic collation.

For example, I’m going to create a custom Spanish collation called SPANISH_AMI, short for “a mi modo” (my way).
In this collation, I want the digits 0 and 3 to be sorted alongside the letters O and E, respectively.

This is achieved by modifying the primary and secondary collation levels, assigning those characters to the desired positions.

I save the changes in Oracle’s default NLS directory, which is $ORACLE_HOME/nls/data, and now the collations can access this new custom collation through the newly created files containing our personalized rules.

And now, when I use the ORDER BY clause and specify that I want the sorting to follow my custom Spanish collation —“my way”— the number 3 is sorted alongside the letter E, and the 0 alongside the letter O.

SQL> select c1 nombre
  2  	 from (values
  3  	 ('Maa'), ('Mae'), ('Mai'), ('Mao'), ('Mau'), ('Ma0'), ('Ma1'), 
  4  	 ('Ma2'), ('Ma3'), ('Ma4'), ('Ma5') )
  5  	 t ( c1 )
  6  order by c1 collate spanish_ami;

NOM
---
Maa
Mae
Ma3   --> 3 is sorted after E
Mai
Mao
Ma0   --> 0 is sorted after O
Mau
Ma1
Ma2
Ma4
Ma5

11 rows selected.

Multilingual sorts in Oracle

Need a specific sorting logic? Here's what to do.

Subscribe to the mail list and join us at the Café Database Discord server.

Listo!

Submit a Comment Cancel reply