SourceForge.net Logo Khmer (Cambodian) Spell Checker

Khmer (Cambodian) Spell Checker

Overview and Immediate Plans

Khmer is the official language of Cambodia. It does not use spaces to separate individual words making spell checking hard unless word boundaries are provided. Users can do this by placing invisible spaces between words. This means that users have to input every segmentation point manually for existing documents, which is time consuming. There is another problem. If there is no alternative to this approach, in the future Khmer will eventually use spaces between words. This implies that we need to change our language to meet technology requirements. Were this to happen, Khmer, which represents not only the language but also the people, culture and history, would lose one of its unique characteristics. From a human responsibility perspective, if the Khmer people fail to offer an alternative solution to the problem, we should not be proud of being Khmer in the 21st century.

Therefore, the immediate objective of this project is to develop an effective, portable and free word segmentation algorithm for everyone. So far, a number of Python prototypes has been released. To download, go to this project's download site. The prototype includes a word segmentation algorithm along with spell checking functionalities and a Graphical User Interface. This makes it hard for others to this algorithm with their applications. Moreover, it has not met its satisfactory performance criteria of 95 percent accuracy yet. So, this project will work on the following tasks in the immediate future:

These objectives may be different from the initial project description of khspell, which was intended to integrate khspell with hunspell used by Open Office. The initial description has to be revised to the current objectives after being realised that Open Office can spell check Khmer as long as a customised Khmer dictionary and word segmentation points are provided. This means that having an effective Khmer word segmentation is the most important goal for the future of computerised Khmer language. It is not so important whether this algorithm will make way to Open Office or not. If this implementation is useful, effective and better than any other available solutions, it will eventually be available in Open Office.

Philosophy

To promote science, research and technology in solving the remaining problems in Khmer computational linguistic once and for all by the Open Source and Free Software community.


Copyright (C) 2006 by Puthick Hok
Last update: 25th July 2006. If you have any comments, please drop me an email. I would like to hear from you. My address is puthick "AT" users.sourceforge.net. This address will forward your mail to my daily mail box.