Class HLL
- java.lang.Object
-
- org.apache.ignite.internal.processors.query.stat.hll.HLL
-
- All Implemented Interfaces:
Cloneable
public class HLL extends Object implements Cloneable
A probabilistic set of hashedlongelements. Useful for computing the approximate cardinality of a stream of data in very small storage. A modified version of the 'HyperLogLog' data structure and algorithm is used, which combines both probabilistic and non-probabilistic techniques to improve the accuracy and storage requirements of the original algorithm. More specifically, initializing and storing a newHLLwill allocate a sentinel value symbolizing the empty set (HLLType.EMPTY). After adding the first few values, a sorted list of unique integers is stored in aHLLType.EXPLICIThash set. When configured, accuracy can be sacrificed for memory footprint: the values in the sorted list are "promoted" to a "HLLType.SPARSE" map-based HyperLogLog structure. Finally, when enough registers are set, the map-based HLL will be converted to a bit-packed "HLLType.FULL" HyperLogLog structure. This data structure is interoperable with the implementations found at:- postgresql-hll, and
- js-hll
-
-
Field Summary
Fields Modifier and Type Field Description static intMAXIMUM_EXPLICIT_THRESHOLDstatic intMAXIMUM_EXPTHRESH_PARAMstatic intMAXIMUM_LOG2M_PARAMMaximum value for the log-base-2 of the number of registers in the HLL.static intMAXIMUM_REGWIDTH_PARAMMaximum values for the register width of the HLL.static intMINIMUM_EXPTHRESH_PARAMstatic intMINIMUM_LOG2M_PARAMMinimum value for the log-base-2 of the number of registers in the HLL.static intMINIMUM_REGWIDTH_PARAMMinimum value for the register width of the HLL.
-
Constructor Summary
Constructors Constructor Description HLL(int log2m, int regwidth)Construct an empty HLL with the givenlog2mandregwidth.HLL(int log2m, int regwidth, int expthresh, boolean sparseon, HLLType type)NOTE: Arguments here are named and structured identically to those in the PostgreSQL implementation, which can be found here.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddRaw(long rawValue)AddsrawValuedirectly to the HLL.longcardinality()Computes the cardinality of the HLL.voidclear()Clears the HLL.HLLclone()Create a deep copy of this HLL.static HLLfromBytes(byte[] bytes)Deserializes the HLL (intoBytes(ISchemaVersion)format) serialized intobytes.HLLTypegetType()byte[]toBytes()Serializes the HLL to an array of bytes in correspondence with the format of the default schema version,SerializationUtil.DEFAULT_SCHEMA_VERSION.byte[]toBytes(ISchemaVersion schemaVersion)Serializes the HLL to an array of bytes in correspondence with the format of the specified schema version.voidunion(HLL other)Computes the union of HLLs and stores the result in this instance.
-
-
-
Field Detail
-
MINIMUM_LOG2M_PARAM
public static final int MINIMUM_LOG2M_PARAM
Minimum value for the log-base-2 of the number of registers in the HLL.- See Also:
- Constant Field Values
-
MAXIMUM_LOG2M_PARAM
public static final int MAXIMUM_LOG2M_PARAM
Maximum value for the log-base-2 of the number of registers in the HLL.- See Also:
- Constant Field Values
-
MINIMUM_REGWIDTH_PARAM
public static final int MINIMUM_REGWIDTH_PARAM
Minimum value for the register width of the HLL.- See Also:
- Constant Field Values
-
MAXIMUM_REGWIDTH_PARAM
public static final int MAXIMUM_REGWIDTH_PARAM
Maximum values for the register width of the HLL.- See Also:
- Constant Field Values
-
MINIMUM_EXPTHRESH_PARAM
public static final int MINIMUM_EXPTHRESH_PARAM
- See Also:
- Constant Field Values
-
MAXIMUM_EXPTHRESH_PARAM
public static final int MAXIMUM_EXPTHRESH_PARAM
- See Also:
- Constant Field Values
-
MAXIMUM_EXPLICIT_THRESHOLD
public static final int MAXIMUM_EXPLICIT_THRESHOLD
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HLL
public HLL(int log2m, int regwidth, int expthresh, boolean sparseon, HLLType type)NOTE: Arguments here are named and structured identically to those in the PostgreSQL implementation, which can be found here.- Parameters:
log2m- log-base-2 of the number of registers used in the HyperLogLog algorithm. Must be at least 4 and at most 30.regwidth- number of bits used per register in the HyperLogLog algorithm. Must be at least 1 and at most 8.expthresh- tunes when theHLLType.EXPLICITtoHLLType.SPARSEpromotion occurs, based on the set's cardinality. Must be at least -1 and at most 18.expthreshvalueMeaning -1 Promote at whatever cutoff makes sense for optimal memory usage. ('auto' mode) 0 Skip EXPLICITrepresentation in hierarchy.1-18 Promote at 2expthresh - 1 cardinality sparseon- Flag indicating if theHLLType.SPARSErepresentation should be used.type- the type in the promotion hierarchy which this instance should start at. This cannot benull.
-
HLL
public HLL(int log2m, int regwidth)Construct an empty HLL with the givenlog2mandregwidth. This is equivalent to callingHLL(log2m, regwidth, -1, true, HLLType.EMPTY).- Parameters:
log2m- log-base-2 of the number of registers used in the HyperLogLog algorithm. Must be at least 4 and at most 30.regwidth- number of bits used per register in the HyperLogLog algorithm. Must be at least 1 and at most 8.- See Also:
HLL(int, int, int, boolean, HLLType)
-
-
Method Detail
-
getType
public HLLType getType()
- Returns:
- the type in the promotion hierarchy of this instance. This will
never be
null.
-
addRaw
public void addRaw(long rawValue)
AddsrawValuedirectly to the HLL.- Parameters:
rawValue- the value to be added. It is very important that this value already be hashed with a strong (but not necessarily cryptographic) hash function. For instance, the Murmur3 implementation in Google's Guava library is an excellent hash function for this purpose and, for seeds greater than zero, matches the output of the hash provided in the PostgreSQL implementation.
-
cardinality
public long cardinality()
Computes the cardinality of the HLL.- Returns:
- the cardinality of HLL. This will never be negative.
-
clear
public void clear()
Clears the HLL. The HLL will have cardinality zero and will act as if no elements have been added. NOTE: UnlikeaddRaw(long),cleardoes NOT handle transitions betweenHLLTypes - a probabilistic type will remain probabilistic after being cleared.
-
union
public void union(HLL other)
Computes the union of HLLs and stores the result in this instance.- Parameters:
other- the otherHLLinstance to union into this one. This cannot benull.
-
toBytes
public byte[] toBytes()
Serializes the HLL to an array of bytes in correspondence with the format of the default schema version,SerializationUtil.DEFAULT_SCHEMA_VERSION.- Returns:
- the array of bytes representing the HLL. This will never be
nullor empty.
-
toBytes
public byte[] toBytes(ISchemaVersion schemaVersion)
Serializes the HLL to an array of bytes in correspondence with the format of the specified schema version.- Parameters:
schemaVersion- the schema version dictating the serialization format- Returns:
- the array of bytes representing the HLL. This will never be
nullor empty.
-
fromBytes
public static HLL fromBytes(byte[] bytes)
Deserializes the HLL (intoBytes(ISchemaVersion)format) serialized intobytes.- Parameters:
bytes- the serialized bytes of new HLL- Returns:
- the deserialized HLL. This will never be
null. - See Also:
toBytes(ISchemaVersion)
-
clone
public HLL clone() throws CloneNotSupportedException
Create a deep copy of this HLL.- Overrides:
clonein classObject- Throws:
CloneNotSupportedException- See Also:
Object.clone()
-
-