multi Searcher - Contextual Interactive Cross-Lingual Information Retrieval
The goal of the tool multi Searcher is to answer this research question: can we expect people to be able to get information from text in languages they can not read or understand? The proposed tool multi Searcher provides users with interactive contextual information that describes the translation in the user's own language so that the user has a certain degree of confidence about the translation. Therefore, the user is considered as an integral part of the retrieval process. The tool provides possibilities to interactively select relevant terms from contextual information in order to improve the translation and thus improve the cross lingual information retrieval (CLIR) process.
multi Searcher deals with several CLIR issues. Firstly, there is translation ambiguity, i.e. one word in one language can have several meanings in another language. Secondly, the user's lack of knowledge in the target language. Here, the tool supports the user by providing interactive contextual information that describes the translation in the user's language. Due to the availability of the language resources needed for Arabic (dictionary and parallel corpora aligned at sentence level) English was selected as test languages.
When the user query is translated, it is looked up in the target language documents index in order to obtain the relevant documents (contextual information) for the translation. In order to get the equivalent documents in the source language the parallel corpora is queried. Since it is possible that some retrieved documents will be very similar -- which would result in duplicate contextual information -- the documents retrieved from the source language are automatically grouped and contextual information is selected only once from each cluster. As shown in Fig. 1, the finally selected contextual information is not provided to the user as raw text, but instead a classified representation of each contextual information term will be presented: each term of the contextual information is colored according to its related type and can be selected as disambiguating term (the user's query terms green, suggested terms by the tool based on high frequent co-occurrences in the context of the query bold blue and underlind, all remaining terms blue except stop words that are not selectable and black).
- Farag Ahmed and Andreas Nürnberger,multi Searcher: Can we Support People to get Information from Text they can't Read or Understand?, In: Proceedings of the 33rd Annual ACM SIGIR conference in Research and Development in Information Retrieval (SIGIR 2010), 19-23 July, pp. 837-838, Geneva, Switzerland.