Package org.languagetool
Class Language
java.lang.Object
org.languagetool.Language
- Direct Known Subclasses:
DynamicLanguage
,LanguageBuilder.ExtendedLanguage
,NoopLanguage
,SimpleSentenceTokenizer.AnyLanguage
Base class for any supported language (English, German, etc). Language classes
are detected at runtime by searching the classpath for files named
META-INF/org/languagetool/language-module.properties
. Those file(s)
need to contain a key languageClasses
which specifies the fully qualified
class name(s), e.g. org.languagetool.language.English
. Use commas to specify
more than one class.
Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Disambiguator
private static final Tagger
private final UnifierConfiguration
private final Pattern
private boolean
private List<AbstractPatternRule>
private static final SentenceTokenizer
private final UnifierConfiguration
private static final WordTokenizer
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionboolean
Considers languages as equal if their language code, including the country and variant codes are equal.boolean
equalsConsiderVariantsIfSpecified
(Language otherLanguage) Return true if this is the same language as the given one, considering country variants only if set for both languages.@Nullable Chunker
Get this language's chunker implementation ornull
.A file with commons words, either in the classpath or as a filename in the file system.abstract String[]
Get this language's country options , e.g.Get disabled rules different from the default ones for this language variant.Get enabled rules different from the default ones for this language variant.@Nullable Language
Languages that have country variants need to overwrite this to select their most common variant.Get this language's feature unifier used for disambiguation.Get this language's part-of-speech disambiguator implementation.@Nullable LanguageModel
getLanguageModel
(File indexDir) Get this language's Java locale, not considering the country code.Get this language's Java locale, considering language code and country code (if any).Information about whether the support for this language in LanguageTool is actively maintained.abstract @Nullable Contributor[]
Get the name(s) of the maintainer(s) for this language ornull
.abstract String
getName()
Get this language's name in English, e.g.protected List<AbstractPatternRule>
Get the pattern rules as defined in the files returned bygetRuleFileNames()
.@Nullable Chunker
Get this language's chunker implementation ornull
.int
Returns a priority for Rule or Category Id (default: 0).getRelevantLanguageModelCapableRules
(ResourceBundle messages, @Nullable LanguageModel languageModel, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) Get a list of rules that can optionally use aLanguageModel
.getRelevantLanguageModelRules
(ResourceBundle messages, LanguageModel languageModel) Get a list of rules that require aLanguageModel
.getRelevantNeuralNetworkModels
(ResourceBundle messages, File modelDir) Get a list of rules that load trained neural networks.getRelevantRules
(ResourceBundle messages, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) Get the rules classes that should run for texts in this language.getRelevantRulesGlobalConfig
(ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) Get the rules classes that should run for texts in this language.getRelevantWord2VecModelRules
(ResourceBundle messages, Word2VecModel word2vecModel) Get a list of rules that require aWord2VecModel
.Get the location of the rule file(s) in a form like/org/languagetool/rules/de/grammar.xml
, i.e.Get this language's sentence tokenizer implementation.abstract String
Get this language's character code, e.g.final String
Get the short name of the language with country and variant (if any), if it is a single-country language.@Nullable Synthesizer
Get this language's part-of-speech synthesizer implementation ornull
.Get this language's part-of-speech tagger implementation.final String
getTranslatedName
(ResourceBundle messages) Get the name of the language translated to the current locale, if available.Get this language's feature unifier.@Nullable String
Get this language's variant, e.g.@Nullable Word2VecModel
getWord2VecModel
(File indexDir) Get this language's word tokenizer implementation.private boolean
int
hashCode()
boolean
hasNGramFalseFriendRule
(Language motherTongue) Return true if language has ngram-based false friend rule returned bygetRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>)
.final boolean
Whether this class has at least one subclass that implements variants of this language.protected LanguageModel
initLanguageModel
(File indexDir, LanguageModel languageModel) boolean
For internal use only.boolean
boolean
Whether this language supports spell checking only and no advanced grammar and style checking.private boolean
final boolean
Whether this is a country variant of another language, i.e.final String
toString()
-
Field Details
-
DEMO_DISAMBIGUATOR
-
DEMO_TAGGER
-
SENTENCE_TOKENIZER
-
WORD_TOKENIZER
-
unifierConfig
-
disambiguationUnifierConfig
-
ignoredCharactersRegex
-
patternRules
-
noLmWarningPrinted
private boolean noLmWarningPrinted
-
-
Constructor Details
-
Language
public Language()
-
-
Method Details
-
getShortCode
Get this language's character code, e.g.en
for English. For most languages this is a two-letter code according to ISO 639-1, but for those languages that don't have a two-letter code, a three-letter code according to ISO 639-2 is returned. The country parameter (e.g. "US"), if any, is not returned.- Since:
- 3.6
-
getName
Get this language's name in English, e.g.English
orGerman (Germany)
.- Returns:
- language name
-
getCountries
Get this language's country options , e.g.US
(as inen-US
) orPL
(as inpl-PL
).- Returns:
- String[] - array of country options for the language.
-
getMaintainers
Get the name(s) of the maintainer(s) for this language ornull
. -
getRelevantRules
public abstract List<Rule> getRelevantRules(ResourceBundle messages, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException Get the rules classes that should run for texts in this language.- Throws:
IOException
- Since:
- 4.3
-
getCommonWordsPath
A file with commons words, either in the classpath or as a filename in the file system.- Since:
- 4.5
-
getVariant
Get this language's variant, e.g.valencia
(as inca-ES-valencia
) ornull
. Attention: not to be confused with "country" option- Returns:
- variant for the language or
null
- Since:
- 2.3
-
getDefaultEnabledRulesForVariant
Get enabled rules different from the default ones for this language variant.- Returns:
- enabled rules for the language variant.
- Since:
- 2.4
-
getDefaultDisabledRulesForVariant
Get disabled rules different from the default ones for this language variant.- Returns:
- disabled rules for the language variant.
- Since:
- 2.4
-
getLanguageModel
- Parameters:
indexDir
- directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts- Returns:
- a LanguageModel or
null
if this language doesn't support one - Throws:
IOException
- Since:
- 2.7
-
initLanguageModel
-
getRelevantLanguageModelRules
public List<Rule> getRelevantLanguageModelRules(ResourceBundle messages, LanguageModel languageModel) throws IOException Get a list of rules that require aLanguageModel
. Returns an empty list for languages that don't have such rules.- Throws:
IOException
- Since:
- 2.7
-
getRelevantLanguageModelCapableRules
public List<Rule> getRelevantLanguageModelCapableRules(ResourceBundle messages, @Nullable @Nullable LanguageModel languageModel, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException Get a list of rules that can optionally use aLanguageModel
. Returns an empty list for languages that don't have such rules.- Parameters:
languageModel
- null if no language model is available- Throws:
IOException
- Since:
- 4.5
-
getWord2VecModel
- Parameters:
indexDir
- directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt- Returns:
- a
Word2VecModel
ornull
if this language doesn't support one - Throws:
IOException
- Since:
- 4.0
-
getRelevantWord2VecModelRules
public List<Rule> getRelevantWord2VecModelRules(ResourceBundle messages, Word2VecModel word2vecModel) throws IOException Get a list of rules that require aWord2VecModel
. Returns an empty list for languages that don't have such rules.- Throws:
IOException
- Since:
- 4.0
-
getRelevantNeuralNetworkModels
Get a list of rules that load trained neural networks. Returns an empty list for languages that don't have such rules.- Since:
- 4.4
-
getRelevantRulesGlobalConfig
public List<Rule> getRelevantRulesGlobalConfig(ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException Get the rules classes that should run for texts in this language.- Throws:
IOException
- Since:
- 4.6
-
getLocale
Get this language's Java locale, not considering the country code. -
getLocaleWithCountryAndVariant
Get this language's Java locale, considering language code and country code (if any).- Since:
- 2.1
-
getRuleFileNames
Get the location of the rule file(s) in a form like/org/languagetool/rules/de/grammar.xml
, i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename contains the string-test-
. -
getDefaultLanguageVariant
Languages that have country variants need to overwrite this to select their most common variant.- Returns:
- default country variant or
null
- Since:
- 1.8
-
getDisambiguator
Get this language's part-of-speech disambiguator implementation. -
getTagger
Get this language's part-of-speech tagger implementation. The tagger must not benull
, but it can be a trivial pseudo-tagger that only assignsnull
tags. -
getSentenceTokenizer
Get this language's sentence tokenizer implementation. -
getWordTokenizer
Get this language's word tokenizer implementation. -
getChunker
Get this language's chunker implementation ornull
.- Since:
- 2.3
-
getPostDisambiguationChunker
Get this language's chunker implementation ornull
.- Since:
- 2.9
-
getSynthesizer
Get this language's part-of-speech synthesizer implementation ornull
. -
getUnifier
Get this language's feature unifier.- Returns:
- Feature unifier for analyzed tokens.
-
getDisambiguationUnifier
Get this language's feature unifier used for disambiguation. Note: it might be different from the normal rule unifier.- Returns:
- Feature unifier for analyzed tokens.
-
getUnifierConfiguration
- Since:
- 2.3
-
getDisambiguationUnifierConfiguration
- Since:
- 2.3
-
getTranslatedName
Get the name of the language translated to the current locale, if available. Otherwise, get the untranslated name. -
getShortCodeWithCountryAndVariant
Get the short name of the language with country and variant (if any), if it is a single-country language. For generic language classes, get only a two- or three-character code.- Since:
- 3.6
-
getPatternRules
Get the pattern rules as defined in the files returned bygetRuleFileNames()
.- Throws:
IOException
- Since:
- 2.7
-
toString
-
isVariant
public final boolean isVariant()Whether this is a country variant of another language, i.e. whether it doesn't directly extendLanguage
, but a subclass ofLanguage
.- Since:
- 1.8
-
hasVariant
public final boolean hasVariant()Whether this class has at least one subclass that implements variants of this language.- Since:
- 1.8
-
isExternal
public boolean isExternal()For internal use only. Overwritten to returntrue
for languages that have been loaded from an external file after start up. -
equalsConsiderVariantsIfSpecified
Return true if this is the same language as the given one, considering country variants only if set for both languages. For example: en = en, en = en-GB, en-GB = en-GB, but en-US != en-GB- Since:
- 1.8
-
hasCountry
private boolean hasCountry() -
getIgnoredCharactersRegex
- Returns:
- Return compiled regular expression to ignore inside tokens
- Since:
- 2.9
-
getMaintainedState
Information about whether the support for this language in LanguageTool is actively maintained. If not, the user interface might show a warning.- Since:
- 3.3
-
isHiddenFromGui
public boolean isHiddenFromGui() -
isTheDefaultVariant
private boolean isTheDefaultVariant() -
getPriorityForId
Returns a priority for Rule or Category Id (default: 0). Positive integers have higher priority. Negative integers have lower priority.- Since:
- 3.6
-
isSpellcheckOnlyLanguage
public boolean isSpellcheckOnlyLanguage()Whether this language supports spell checking only and no advanced grammar and style checking.- Since:
- 4.5
-
hasNGramFalseFriendRule
Return true if language has ngram-based false friend rule returned bygetRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>)
.- Since:
- 4.6
-
equals
Considers languages as equal if their language code, including the country and variant codes are equal. -
hashCode
public int hashCode()
-