Categories
PHP

Sorting with Foreign Languages

How to sort multibyte string properly with PHP’s sort() function.

The sort() function works well for both the standard ASCII characters and multibyte characters. However, when you use the special (multibyte) characters, the sorting function returns an undesirable result.

For example, calling sort() on an array with the values 'Frans', 'Frédéric', and 'Froni' puts 'Frédéric' last because the é character has a much larger charcode than o:

<?php
 $a = ['Frédéric', 'Froni', 'Frans'];
 sort ($a);
 print_r($a);
 //Array ( [0] => Frans [1] => Froni [2] => Frédéric )

To fix this issue use the SORT_LOCALE_STRING flag as the second parameter of the sort() function to modify the sorting behavior. When the SORT_LOCALE_STRING flag is used the sort() function compares array elements as strings, based on the current locale. To change the current locale, use the setlocale() function.

Example: Sorting Multibyte Characters Encoded in UTF-8 (French language)

<?php
 $a = ['Frédéric', 'Froni', 'Frans'];

 // Spcifying French locale encoded in UTF8
 setlocale(LC_ALL, "fr_FR.utf8"); 

 sort($a, SORT_LOCALE_STRING);
 print_r($a);
 //Array ( [0] => Frans [1] => Frédéric [2] => Froni )

The result of the preceding code is, as desired, Frans, Frédéric, Froni.

Compare both results (with and without setting the SORT_LOCALE_STRING flag and locale):

Default sorting:
[0] => Frans [1] => Froni [2] => Frédéric

Locale based sorting:
[0] => Frans [1] => Frédéric [2] => Froni

Read setlocale() document on https://php.net/manual/function.setlocale.php.


More Posts on PHP Sorting Arrays: