com.fasterxml.jackson.core.sym

Class CharsToNameCanonicalizer



  • public final class CharsToNameCanonicalizer
    extends Object
    This class is a kind of specialized type-safe Map, from char array to String value. Specialization means that in addition to type-safety and specific access patterns (key char array, Value optionally interned String; values added on access if necessary), and that instances are meant to be used concurrently, but by using well-defined mechanisms to obtain such concurrently usable instances. Main use for the class is to store symbol table information for things like compilers and parsers; especially when number of symbols (keywords) is limited.

    For optimal performance, usage pattern should be one where matches should be very common (especially after "warm-up"), and as with most hash-based maps/sets, that hash codes are uniformly distributed. Also, collisions are slightly more expensive than with HashMap or HashSet, since hash codes are not used in resolving collisions; that is, equals() comparison is done with all symbols in same bucket index.
    Finally, rehashing is also more expensive, as hash codes are not stored; rehashing requires all entries' hash codes to be recalculated. Reason for not storing hash codes is reduced memory usage, hoping for better memory locality.

    Usual usage pattern is to create a single "master" instance, and either use that instance in sequential fashion, or to create derived "child" instances, which after use, are asked to return possible symbol additions to master instance. In either case benefit is that symbol table gets initialized so that further uses are more efficient, as eventually all symbols needed will already be in symbol table. At that point no more Symbol String allocations are needed, nor changes to symbol table itself.

    Note that while individual SymbolTable instances are NOT thread-safe (much like generic collection classes), concurrently used "child" instances can be freely used without synchronization. However, using master table concurrently with child instances can only be done if access to master instance is read-only (i.e. no modifications done).

    • Method Detail

      • createRoot

        public static CharsToNameCanonicalizer createRoot()
        Method called to create root canonicalizer for a JsonFactory instance. Root instance is never used directly; its main use is for storing and sharing underlying symbol arrays as needed.
      • makeChild

        public CharsToNameCanonicalizer makeChild(int flags)
        "Factory" method; will create a new child instance of this symbol table. It will be a copy-on-write instance, ie. it will only use read-only copy of parent's data, but when changes are needed, a copy will be created.

        Note: while this method is synchronized, it is generally not safe to both use makeChild/mergeChild, AND to use instance actively. Instead, a separate 'root' instance should be used on which only makeChild/mergeChild are called, but instance itself is not used as a symbol table.

      • release

        public void release()
        Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information.
      • size

        public int size()
      • bucketCount

        public int bucketCount()
        Method for checking number of primary hash buckets this symbol table uses.
        Since:
        2.1
      • maybeDirty

        public boolean maybeDirty()
      • hashSeed

        public int hashSeed()
      • collisionCount

        public int collisionCount()
        Method mostly needed by unit tests; calculates number of entries that are in collision list. Value can be at most (size() - 1), but should usually be much lower, ideally 0.
        Since:
        2.1
      • maxCollisionLength

        public int maxCollisionLength()
        Method mostly needed by unit tests; calculates length of the longest collision chain. This should typically be a low number, but may be up to size() - 1 in the pathological case
        Since:
        2.1
      • findSymbol

        public String findSymbol(char[] buffer,
                                 int start,
                                 int len,
                                 int h)
      • _hashToIndex

        public int _hashToIndex(int rawHash)
        Helper method that takes in a "raw" hash value, shuffles it as necessary, and truncates to be used as the index.
      • calcHash

        public int calcHash(char[] buffer,
                            int start,
                            int len)
        Implementation of a hashing method for variable length Strings. Most of the time intention is that this calculation is done by caller during parsing, not here; however, sometimes it needs to be done for parsed "String" too.
        Parameters:
        len - Length of String; has to be at least 1 (caller guarantees this pre-condition)
      • calcHash

        public int calcHash(String key)
      • reportTooManyCollisions

        protected void reportTooManyCollisions(int maxLen)
        Since:
        2.1
      • verifyInternalConsistency

        protected void verifyInternalConsistency()
        Diagnostics method that will verify that internal data structures are consistent; not meant as user-facing method but only for test suites and possible troubleshooting.
        Since:
        2.10

Copyright © 2008–2020 FasterXML. All rights reserved.