A Guide to Using Myanmar Unicode

Myanmar Sorting (Collation)

Sorting in Myanmar is quite complicated. The algorithm developed here splits a Myanmar syllable into 5 parts:

The algorithm is complicated because each part can be composed of several characters and the final has a higher sorting precedence than the vowel. For more information please see this paper on Myanmar Collation.

In Glibc there does not appear to be an easy way to rearrange the order of collation elements. As a result the algorithm implemented here has combined the vowel and final components into one (very large) set of collation elements to get the desired sort order. A perl script genMyCollate.pl is used to generate this.

The collation algorithm and the Myanmar NLP Lab my-MM locale for linux can be downloaded here.

A variant on the alogorithm has also been developed for ICU. It uses a very similar approach to the Glibc algorithm - using a large number of collating elements to achieve the correct order. You can download it here.

The Myanmar collation algorithm has been submitted to the Common Locale Data Repository in LDML format. Apparently MySQL can use the LDML format for collation - the second of the options on the MySQL blog for adding a new Unicode Collation, though I haven't tried it myself.

The discussion of the Myanmar Dictionary Order in “Burmese: an introduction to the script” by John Okell* was very helpful in this work.

Downloads

Myanmar ICU>>