Stemming

Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base, or root form.

StarTeam search uses a Porter stemmer for default word analysis and indexing. The current JVM language variable is used to decide which stemmer to apply as follows:

Language "en" English Stemmer
Language "fr" French Stemmer
Language "pt" Portuguese Stemmer
Language "de" German Stemmer

Chinese and Japanese locales

StarTeam search uses Lucene's CJKAnalyzer by default. Analyzers are configurable by editing starteam-search-configs.xml. For example, to use Lucene's SmartChineseAnalyzer, which is an analyzer for simplified Chinese or mixed Chinese-English text, make the following change:

<Analyzers><Analyzer name="zh" value="org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer"/></Analyzers>

The string org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer is a class name which is part of the Lucene library.

Back to top