araSearch - an adaptive user search engine interface based on n-gram
The characteristics of highly inflectional languages result very often in a poor information retrieval performance. As a result, current search engines suffer from poor performance for direct query-term-to-text-word matching for these languages. In order to solve or at least relax this problem search engines need to be able to distinguish different variants of the same word. Detecting all word form variations in the query, is considered essential for achieving good retrieval results, otherwise vast amounts of information might be unavaible to the user.
araSearch is an adaptive user interface that serves as an interface to the actual Google search engine, using the Google WebServices; it is based on an n-gram based similarity feature that is able to account for textual variations with special attention given to the Arabic language. araSearch was designed to be a language independent system, i.e. it is able to handle all languages. Furthermore, araSearch guides users by supporting them to formulate their queries. The araSearch web site was developed using jsp and java servlets and is hosted on a Tomcat server.
- Farag Ahmed and Andreas Nürnberger, Evaluation of n-gram conflation approaches for arabic text retrieval. Journal of the American Society for Information Science and Technology (JASIST), Volume 60, issue 7 (July 2009) USA. pp. 1448-1465.
- Farag Ahmed und Andreas Nürnberger, araSearch: Improving Arabic text retrieval via detection of word form variations, In: Proceedings of the 1st International Conference on Information Systems and Economic Intelligence (SIIE'2008) (Best Paper Award)