Class HLLUtil
- java.lang.Object
-
- org.apache.ignite.internal.processors.query.stat.hll.util.HLLUtil
-
public final class HLLUtil extends Object
Static functions for computing constants and parameters used in the HLL algorithm.
-
-
Constructor Summary
Constructors Constructor Description HLLUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static doublealphaMSquared(int m)Computes the 'alpha-m-squared' constant used by the HyperLogLog algorithm.static doublelargeEstimator(int log2m, int registerSizeInBits, double estimator)The "large range correction" formula from the HyperLogLog algorithm, adapted for 64 bit hashes.static doublelargeEstimatorCutoff(int log2m, int registerSizeInBits)The cutoff for using the "large range correction" formula, from the HyperLogLog algorithm, adapted for 64 bit hashes.static longpwMaxMask(int registerSizeInBits)Computes a mask that prevents overflow of HyperLogLog registers.static intregisterBitSize(long expectedUniqueElements)Computes the bit-width of HLL registers necessary to estimate a set of the specified cardinality.static doublesmallEstimator(int m, int numberOfZeroes)The "small range correction" formula from the HyperLogLog algorithm.static doublesmallEstimatorCutoff(int m)The cutoff for using the "small range correction" formula, in the HyperLogLog algorithm.
-
-
-
Method Detail
-
registerBitSize
public static int registerBitSize(long expectedUniqueElements)
Computes the bit-width of HLL registers necessary to estimate a set of the specified cardinality.- Parameters:
expectedUniqueElements- an upper bound on the number of unique elements that are expected. This must be greater than zero.- Returns:
- a register size in bits (i.e.
log2(log2(n)))
-
alphaMSquared
public static double alphaMSquared(int m)
Computes the 'alpha-m-squared' constant used by the HyperLogLog algorithm.- Parameters:
m- this must be a power of two, cannot be less than 16 (24), and cannot be greater than 65536 (216).- Returns:
- gamma times
registerCountsquared where gamma is based on the value ofregisterCount. - Throws:
IllegalArgumentException- ifregisterCountis less than 16.
-
pwMaxMask
public static long pwMaxMask(int registerSizeInBits)
Computes a mask that prevents overflow of HyperLogLog registers.- Parameters:
registerSizeInBits- the size of the HLL registers, in bits.- Returns:
- mask a
longmask to prevent overflow of the registers - See Also:
registerBitSize(long)
-
smallEstimatorCutoff
public static double smallEstimatorCutoff(int m)
The cutoff for using the "small range correction" formula, in the HyperLogLog algorithm.- Parameters:
m- the number of registers in the HLL. m in the paper.- Returns:
- the cutoff for the small range correction.
- See Also:
smallEstimator(int, int)
-
smallEstimator
public static double smallEstimator(int m, int numberOfZeroes)The "small range correction" formula from the HyperLogLog algorithm. Only appropriate if both the estimator is smaller than(5/2) * m
and there are still registers that have the zero value.- Parameters:
m- the number of registers in the HLL. m in the paper.numberOfZeroes- the number of registers with value zero. V in the paper.- Returns:
- a corrected cardinality estimate.
-
largeEstimatorCutoff
public static double largeEstimatorCutoff(int log2m, int registerSizeInBits)The cutoff for using the "large range correction" formula, from the HyperLogLog algorithm, adapted for 64 bit hashes.- Parameters:
log2m- log-base-2 of the number of registers in the HLL. b in the paper.registerSizeInBits- the size of the HLL registers, in bits.- Returns:
- the cutoff for the large range correction.
- See Also:
largeEstimator(int, int, double), Blog post with section on 64 bit hashes and "large range correction" cutoff
-
largeEstimator
public static double largeEstimator(int log2m, int registerSizeInBits, double estimator)The "large range correction" formula from the HyperLogLog algorithm, adapted for 64 bit hashes. Only appropriate for estimators whose value exceeds the return oflargeEstimatorCutoff(int, int).- Parameters:
log2m- log-base-2 of the number of registers in the HLL. b in the paper.registerSizeInBits- the size of the HLL registers, in bits.estimator- the original estimator ("E" in the paper).- Returns:
- a corrected cardinality estimate.
- See Also:
- Blog post with section on 64 bit hashes and "large range correction"
-
-