Class HLLUtil


  • public final class HLLUtil
    extends Object
    Static functions for computing constants and parameters used in the HLL algorithm.
    • Constructor Summary

      Constructors 
      Constructor Description
      HLLUtil()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static double alphaMSquared​(int m)
      Computes the 'alpha-m-squared' constant used by the HyperLogLog algorithm.
      static double largeEstimator​(int log2m, int registerSizeInBits, double estimator)
      The "large range correction" formula from the HyperLogLog algorithm, adapted for 64 bit hashes.
      static double largeEstimatorCutoff​(int log2m, int registerSizeInBits)
      The cutoff for using the "large range correction" formula, from the HyperLogLog algorithm, adapted for 64 bit hashes.
      static long pwMaxMask​(int registerSizeInBits)
      Computes a mask that prevents overflow of HyperLogLog registers.
      static int registerBitSize​(long expectedUniqueElements)
      Computes the bit-width of HLL registers necessary to estimate a set of the specified cardinality.
      static double smallEstimator​(int m, int numberOfZeroes)
      The "small range correction" formula from the HyperLogLog algorithm.
      static double smallEstimatorCutoff​(int m)
      The cutoff for using the "small range correction" formula, in the HyperLogLog algorithm.
    • Constructor Detail

      • HLLUtil

        public HLLUtil()
    • Method Detail

      • registerBitSize

        public static int registerBitSize​(long expectedUniqueElements)
        Computes the bit-width of HLL registers necessary to estimate a set of the specified cardinality.
        Parameters:
        expectedUniqueElements - an upper bound on the number of unique elements that are expected. This must be greater than zero.
        Returns:
        a register size in bits (i.e. log2(log2(n)))
      • alphaMSquared

        public static double alphaMSquared​(int m)
        Computes the 'alpha-m-squared' constant used by the HyperLogLog algorithm.
        Parameters:
        m - this must be a power of two, cannot be less than 16 (24), and cannot be greater than 65536 (216).
        Returns:
        gamma times registerCount squared where gamma is based on the value of registerCount.
        Throws:
        IllegalArgumentException - if registerCount is less than 16.
      • pwMaxMask

        public static long pwMaxMask​(int registerSizeInBits)
        Computes a mask that prevents overflow of HyperLogLog registers.
        Parameters:
        registerSizeInBits - the size of the HLL registers, in bits.
        Returns:
        mask a long mask to prevent overflow of the registers
        See Also:
        registerBitSize(long)
      • smallEstimatorCutoff

        public static double smallEstimatorCutoff​(int m)
        The cutoff for using the "small range correction" formula, in the HyperLogLog algorithm.
        Parameters:
        m - the number of registers in the HLL. m in the paper.
        Returns:
        the cutoff for the small range correction.
        See Also:
        smallEstimator(int, int)
      • smallEstimator

        public static double smallEstimator​(int m,
                                            int numberOfZeroes)
        The "small range correction" formula from the HyperLogLog algorithm. Only appropriate if both the estimator is smaller than
        (5/2) * m
        and there are still registers that have the zero value.
        Parameters:
        m - the number of registers in the HLL. m in the paper.
        numberOfZeroes - the number of registers with value zero. V in the paper.
        Returns:
        a corrected cardinality estimate.
      • largeEstimatorCutoff

        public static double largeEstimatorCutoff​(int log2m,
                                                  int registerSizeInBits)
        The cutoff for using the "large range correction" formula, from the HyperLogLog algorithm, adapted for 64 bit hashes.
        Parameters:
        log2m - log-base-2 of the number of registers in the HLL. b in the paper.
        registerSizeInBits - the size of the HLL registers, in bits.
        Returns:
        the cutoff for the large range correction.
        See Also:
        largeEstimator(int, int, double), Blog post with section on 64 bit hashes and "large range correction" cutoff
      • largeEstimator

        public static double largeEstimator​(int log2m,
                                            int registerSizeInBits,
                                            double estimator)
        The "large range correction" formula from the HyperLogLog algorithm, adapted for 64 bit hashes. Only appropriate for estimators whose value exceeds the return of largeEstimatorCutoff(int, int).
        Parameters:
        log2m - log-base-2 of the number of registers in the HLL. b in the paper.
        registerSizeInBits - the size of the HLL registers, in bits.
        estimator - the original estimator ("E" in the paper).
        Returns:
        a corrected cardinality estimate.
        See Also:
        Blog post with section on 64 bit hashes and "large range correction"