Multi languages and sort order

The problem

In Swedish we have the letters åäöÅÄÖ which are placed at the end of the alphabet. Correct Swedish sort order is to have these letters at the end of lists, but in collation utf8_general_ci, which is the default collation used by Drupal, the sort order is not "the Swedish way".

  1. Create a database with the same default collation as Drupal uses. mysql> create database demo character set utf8 collate utf8_general_ci;

  2. Use the database. mysql> use demo;

  3. Create a table with one text field. mysql> create table demo_table (demo_field varchar(20));

  4. Insert some data, in this case Swedish names for berries. Except 'önskebär' which I invented for the example. mysql> insert into demo_table (demo_field) values ('annanas'),('rönnbär'),('krusbär'),('havtorn'),('äpple'),('åkerbär'),('odon'),('önskebär');

  5. First we order by demo_field using the utf8_general_ci collation. mysql> select * from demo_table order by demo_field; +------------+ | demo_field | +------------+ | åkerbär | | annanas | | äpple | | havtorn | | krusbär | | odon | | önskebär | | rönnbär | +------------+ To a Swedish eye this looks like (almost) no order at all!

  6. Then we order by demo_field using the utf8_swedish_ci collation. mysql> select * from demo_table order by demo_field collate utf8_swedish_ci; +------------+ | demo_field | +------------+ | annanas | | havtorn | | krusbär | | odon | | rönnbär | | åkerbär | | äpple | | önskebär | +------------+

Ah! That's much better!

Okay, you may say, why don't you use utf8_swedish_ci for all the text fields? I do, for Swedish sites. But that's not possible for multi language sites where each language want to use its own collation.

My idea

Make the brand new database abstraction layer in Drupal 7 aware of current language and place appropriate "COLLATE " in the "ORDER BY" clause. The beauty is that no module (including Views) would have to handle sort order problems because of wrong collation.

This was until I talked to David Strauss at DrupalCamp Stockholm about my idea. He pointed out, totally true, the possible performance problems on big web sites using my idea. On big web sites you want to use indexes in your database to greatly increase performance. Every index has its own collation. My solution can not use existing indexes with the wrong collation. Thanks David for a interesting discussion even though you found some problems with my idea in the matter of seconds!

Conclusion

Maybe my solution is not fit for big web sites, but there could be a useful module for smaller sites. If I get the time it could be me creating it. :-)

(Photo: hisks@sxc)