Package morfologik.stemming
Class DictionaryMetadata
java.lang.Object
morfologik.stemming.DictionaryMetadata
Description of attributes, their types and default values.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final EnumMap<DictionaryAttribute,
String> All attributes.private final EnumMap<DictionaryAttribute,
Boolean> All "enabled" boolean attributes.private Charset
private static Map<DictionaryAttribute,
String> Default attribute values.private EncoderType
Sequence encoder.private String
Encoding used for converting bytes to characters and vice versa.private LinkedHashMap<Character,
List<Character>> Equivalent characters (treated similarly as equivalent chars with and without diacritics).private LinkedHashMap<String,
String> Conversion pairs for input conversion, for example to replace ligatures.private Locale
static final String
Expected metadata file extension.private LinkedHashMap<String,
String> Conversion pairs for output conversion, for example to replace ligatures.private LinkedHashMap<String,
List<String>> Replacement pairs for non-obvious candidate search in a speller dictionary.private static EnumSet<DictionaryAttribute>
Required attributes.private byte
A separator character between fields (stem, lemma, form).private char
-
Constructor Summary
ConstructorsConstructorDescriptionCreate an instance from an attribute map. -
Method Summary
Modifier and TypeMethodDescriptionstatic DictionaryMetadataBuilder
builder()
static String
getExpectedMetadataFileName
(String dictionaryFile) Returns the expected name of the metadata file, based on the name of the dictionary file.static Path
getExpectedMetadataLocation
(Path dictionary) byte
char
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
static DictionaryMetadata
read
(InputStream metadataStream) Read dictionary metadata from a property file (stream).void
Write dictionary attributes (metadata).
-
Field Details
-
DEFAULT_ATTRIBUTES
Default attribute values. -
REQUIRED_ATTRIBUTES
Required attributes. -
separator
private byte separatorA separator character between fields (stem, lemma, form). The character must be within byte range (FSA uses bytes internally). -
separatorChar
private char separatorChar -
encoding
Encoding used for converting bytes to characters and vice versa. -
charset
-
locale
-
replacementPairs
Replacement pairs for non-obvious candidate search in a speller dictionary. -
inputConversion
Conversion pairs for input conversion, for example to replace ligatures. -
outputConversion
Conversion pairs for output conversion, for example to replace ligatures. -
equivalentChars
Equivalent characters (treated similarly as equivalent chars with and without diacritics). For example, Polish ł can be specified as equivalent to l. This implements a feature similar to hunspell MAP in the affix file. -
attributes
All attributes. -
boolAttributes
All "enabled" boolean attributes. -
encoderType
Sequence encoder. -
METADATA_FILE_EXTENSION
Expected metadata file extension.- See Also:
-
-
Constructor Details
-
DictionaryMetadata
Create an instance from an attribute map.- Parameters:
attrs
- A set ofDictionaryAttribute
keys and their associated values.- See Also:
-
-
Method Details
-
getAttributes
- Returns:
- Return all metadata attributes.
-
getEncoding
-
getSeparator
public byte getSeparator() -
getLocale
-
getInputConversionPairs
-
getOutputConversionPairs
-
getReplacementPairs
-
getEquivalentChars
-
isFrequencyIncluded
public boolean isFrequencyIncluded() -
isIgnoringPunctuation
public boolean isIgnoringPunctuation() -
isIgnoringNumbers
public boolean isIgnoringNumbers() -
isIgnoringCamelCase
public boolean isIgnoringCamelCase() -
isIgnoringAllUppercase
public boolean isIgnoringAllUppercase() -
isIgnoringDiacritics
public boolean isIgnoringDiacritics() -
isConvertingCase
public boolean isConvertingCase() -
isSupportingRunOnWords
public boolean isSupportingRunOnWords() -
getDecoder
- Returns:
- Returns a new
CharsetDecoder
for theencoding
.
-
getEncoder
- Returns:
- Returns a new
CharsetEncoder
for theencoding
.
-
getSequenceEncoderType
- Returns:
- Return sequence encoder type.
-
getSeparatorAsChar
public char getSeparatorAsChar()- Returns:
- Returns the
separator
byte converted to a singlechar
. - Throws:
RuntimeException
- if this conversion is for some reason impossible (the byte is a surrogate pair, FSA'sencoding
is not available).
-
builder
- Returns:
- A shortcut returning
DictionaryMetadataBuilder
.
-
getExpectedMetadataFileName
Returns the expected name of the metadata file, based on the name of the dictionary file. The expected name is resolved by truncating any file extension ofname
and appendingMETADATA_FILE_EXTENSION
.- Parameters:
dictionaryFile
- The name of the dictionary (*.dict
) file.- Returns:
- Returns the expected name of the metadata file.
-
getExpectedMetadataLocation
- Parameters:
dictionary
- The location of the dictionary file.- Returns:
- Returns the expected location of a metadata file.
-
read
Read dictionary metadata from a property file (stream).- Parameters:
metadataStream
- The stream with metadata.- Returns:
- Returns
DictionaryMetadata
read from a the stream (property file). - Throws:
IOException
- Thrown if an I/O exception occurs.
-
write
Write dictionary attributes (metadata).- Parameters:
writer
- The writer to write to.- Throws:
IOException
- Thrown when an I/O error occurs.
-