public final class CharsToNameCanonicalizer extends Object
For optimal performance, usage pattern should be one where matches
should be very common (especially after "warm-up"), and as with most hash-based
maps/sets, that hash codes are uniformly distributed. Also, collisions
are slightly more expensive than with HashMap or HashSet, since hash codes
are not used in resolving collisions; that is, equals() comparison is
done with all symbols in same bucket index.
Finally, rehashing is also more expensive, as hash codes are not
stored; rehashing requires all entries' hash codes to be recalculated.
Reason for not storing hash codes is reduced memory usage, hoping
for better memory locality.
Usual usage pattern is to create a single "master" instance, and either use that instance in sequential fashion, or to create derived "child" instances, which after use, are asked to return possible symbol additions to master instance. In either case benefit is that symbol table gets initialized so that further uses are more efficient, as eventually all symbols needed will already be in symbol table. At that point no more Symbol String allocations are needed, nor changes to symbol table itself.
Note that while individual SymbolTable instances are NOT thread-safe (much like generic collection classes), concurrently used "child" instances can be freely used without synchronization. However, using master table concurrently with child instances can only be done if access to master instance is read-only (i.e. no modifications done).
Modifier and Type | Field and Description |
---|---|
static int |
HASH_MULT |
Modifier and Type | Method and Description |
---|---|
int |
_hashToIndex(int rawHash)
Helper method that takes in a "raw" hash value, shuffles it as necessary,
and truncates to be used as the index.
|
int |
bucketCount()
Method for checking number of primary hash buckets this symbol
table uses.
|
int |
calcHash(char[] buffer,
int start,
int len)
Implementation of a hashing method for variable length
Strings.
|
int |
calcHash(String key) |
int |
collisionCount()
Method mostly needed by unit tests; calculates number of
entries that are in collision list.
|
static CharsToNameCanonicalizer |
createRoot()
Method called to create root canonicalizer for a
JsonFactory
instance. |
protected static CharsToNameCanonicalizer |
createRoot(int hashSeed) |
String |
findSymbol(char[] buffer,
int start,
int len,
int h) |
int |
hashSeed() |
CharsToNameCanonicalizer |
makeChild(int flags)
"Factory" method; will create a new child instance of this symbol
table.
|
int |
maxCollisionLength()
Method mostly needed by unit tests; calculates length of the
longest collision chain.
|
boolean |
maybeDirty() |
void |
release() |
protected void |
reportTooManyCollisions(int maxLen) |
int |
size() |
public static final int HASH_MULT
public static CharsToNameCanonicalizer createRoot()
JsonFactory
instance. Root instance is never used directly; its main use is for
storing and sharing underlying symbol arrays as needed.protected static CharsToNameCanonicalizer createRoot(int hashSeed)
public CharsToNameCanonicalizer makeChild(int flags)
Note: while this method is synchronized, it is generally not safe to both use makeChild/mergeChild, AND to use instance actively. Instead, a separate 'root' instance should be used on which only makeChild/mergeChild are called, but instance itself is not used as a symbol table.
public void release()
public int size()
public int bucketCount()
public boolean maybeDirty()
public int hashSeed()
public int collisionCount()
size()
- 1), but should usually be much lower, ideally 0.public int maxCollisionLength()
size()
- 1 in the pathological casepublic String findSymbol(char[] buffer, int start, int len, int h)
public int _hashToIndex(int rawHash)
public int calcHash(char[] buffer, int start, int len)
len
- Length of String; has to be at least 1 (caller guarantees
this pre-condition)public int calcHash(String key)
protected void reportTooManyCollisions(int maxLen)
Copyright © 2008-2017 FasterXML. All Rights Reserved.