Package org.languagetool.languagemodel
Class LuceneSingleIndexLanguageModel
java.lang.Object
org.languagetool.languagemodel.BaseLanguageModel
org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
- All Implemented Interfaces:
AutoCloseable
,LanguageModel
Information about ngram occurrences, taken from Lucene indexes (one index per ngram level).
This is not a real language model as it only returns information
about occurrence counts but has no probability calculation, especially
not for the case with 0 occurrences.
- Since:
- 3.2
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Map<File,
LuceneSingleIndexLanguageModel.LuceneSearcher> private final Map<Integer,
LuceneSingleIndexLanguageModel.LuceneSearcher> private final long
private final File
Fields inherited from interface org.languagetool.languagemodel.LanguageModel
GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
-
Constructor Summary
ConstructorsConstructorDescriptionLuceneSingleIndexLanguageModel
(int maxNgram) LuceneSingleIndexLanguageModel
(File topIndexDir) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
static void
Only used internally.void
close()
protected void
doValidateDirectory
(File topIndexDir) getCachedLuceneSearcher
(File indexDir) long
Get the occurrence count fortoken
.long
Get the occurrence count for the given token sequence.private long
getCount
(org.apache.lucene.index.Term term, LuceneSingleIndexLanguageModel.LuceneSearcher luceneSearcher) getLuceneSearcher
(int ngramSize) long
toString()
static void
validateDirectory
(File topIndexDir) Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1grams
etc.Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel
getPseudoProbability, getPseudoProbabilityStupidBackoff
-
Field Details
-
dirToSearcherMap
-
indexes
-
luceneSearcherMap
-
topIndexDir
-
maxNgram
private final long maxNgram
-
-
Constructor Details
-
LuceneSingleIndexLanguageModel
- Parameters:
topIndexDir
- a directory which contains at least another sub directory called3grams
, which is a Lucene index with ngram occurrences as created byorg.languagetool.dev.FrequencyIndexCreator
.
-
LuceneSingleIndexLanguageModel
-
-
Method Details
-
validateDirectory
Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories1grams
etc.- Since:
- 3.0
-
clearCaches
Only used internally.- Since:
- 3.2
-
doValidateDirectory
-
addIndex
-
getCount
Description copied from class:BaseLanguageModel
Get the occurrence count for the given token sequence.- Specified by:
getCount
in classBaseLanguageModel
-
getCount
Description copied from class:BaseLanguageModel
Get the occurrence count fortoken
.- Specified by:
getCount
in classBaseLanguageModel
-
getTotalTokenCount
public long getTotalTokenCount()- Specified by:
getTotalTokenCount
in classBaseLanguageModel
-
getLuceneSearcher
-
getCachedLuceneSearcher
-
getCount
private long getCount(org.apache.lucene.index.Term term, LuceneSingleIndexLanguageModel.LuceneSearcher luceneSearcher) -
close
public void close() -
toString
-