Class BPlusTree<L,​T extends L>

  • All Implemented Interfaces:
    IgniteTree<L,​T>
    Direct Known Subclasses:
    CacheDataTree, InlineIndexTree, MetastorageTree, PendingEntriesTree

    public abstract class BPlusTree<L,​T extends L>
    extends DataStructure
    implements IgniteTree<L,​T>

    Abstract B+Tree

    B+Tree is a block-based tree structure. Each block is represented with the page (PageIO) and contains a single tree node. There are two types of pages/nodes: BPlusInnerIO and BPlusLeafIO.

    Every page in the tree contains a list of items. Item is just a fixed-size binary payload. Inner nodes and leaves may have different item sizes. There's a limit on how many items each page can hold. It is defined by a BPlusIO.getMaxCount(long, int) method of the corresponding IO. There should be no empty pages in trees, so:

    • a leaf page must have from 1 to max items
    • an inner page must have from 0 to max items (an inner page with 0 items is a routing page, it still has 1 pointer to 1 child, it's not considered an empty page; see below)

    Items might have different meaning depending on the type of the page. In case of leaves, every item must describe a key and a value. In case of inner nodes, items describe only keys if canGetRowFromInner is false, or a key and a value otherwise. Items in every page are sorted according to the order dscribed by compare(BPlusIO, long, int, Object) method. Specifics of the data stored in items are defined in the implementation and generally don't matter.

    All pages in the tree are divided into levels. Leaves are always at the level 0. Levels of inner pages are thus positive. Each level represents a singly linked list - each page has a link to the forward page at the same level. It can be retrieved by calling BPlusIO.getForward(long). This link must be a zero if there's no forward page. Forward links on level 0 allow iterating tree's keys and values effectively without traversing any inner nodes (AbstractForwardCursor). Forward links in inner nodes have different purpose, more on that later.

    Leaves have no links other than forward links. But inner nodes also have links to their children nodes. Every inner node can be viewed like the following structure:

    
           item(0)     item(1)        ...          item(N-1)
     link(0)     link(1)     link(2)  ...  link(N-1)       link(N)
     
    There are N items and N+1 links. Each link points to page of a lower level. For example, pages on level 2 always point to pages of level 1. For an item i left subtree is defined by link(i) and right subtree is defined by link(i+1) (BPlusInnerIO.getLeft(long, int) and BPlusInnerIO.getRight(long, int)). All items in the left subtree are less or equal to the original item (basic property for the trees).

    There's one more important property of these links: forward(left(i)) == right(i). It is called the triangle invariant. More information on B+Tree structure can easily be found online. Following documentation concentrates more on specifics of this particular B+Tree implementation.

    General operations

    This implementation allows for concurrent reads and update. Given that each page locks individually, there are general rules to avoid deadlocks.
    • Pages within a level always locked from left to right.
    • If there's already a lock on the page of level X then no locks should be acquired on levels less than X. In other words, locks are aquired from the bottom to the top (in the direction from leaves to root). The only exception to this rule is the allocation of a new page on a lower level that no one sees yet.
    All basic operations fit into a similar pattern. First, the search is performed (BPlusTree.Get). It goes recursively from the root to the leaf (if it's needed). On each level several outcomes are possible.
    • Exact value is found on the leaf level and operation can be completed.
    • Insertion point is found and recursive procedure continues on the lower level.
    • Insertion point is not found due to concurrent modifications, but retry in the same node is possible.
    • Insertion point is not found due to concurrent modifications, but retry in the same node is impossible.
    All these options, and more, are described in the class BPlusTree.Result. Please refer to its usages for specifics of each operation. Once the path and the leaf for put/remove is found, the operation is then performed from the bottom to the top. Specifics are described in corresponding classes (BPlusTree.Put, BPlusTree.Remove).

    Maintained invariants

    1. Triangle invariant (see above), used to detect concurrent tree structure changes
    2. Each key existing in an inner page also exists in exactly one leaf, as its rightmost key
    3. For each leaf that is not the rightmost leaf in the tree (i.e. its forwardId is not 0), its rightmost key exists in exactly one of its ancestor blocks.

      The invariant is maintained using special cases in insert with split, replace and remove scenarios.

    Invariants that are NOT maintained

    1. Classic B-Tree (and B+Tree as well) makes sure that each non-root node is at least half-full. This implementation does NOT maintain this invariant.

    Merge properties

    When a key is removed from a leaf node, the node might become empty and hence a mandatory merge happens. If the parent is a routing page (see below), another mandatory merge will happen. (Mandatory merges are the ones that must happen to maintain the tree invariants). This procedure may propagate a few levels up if there is a chain of routing pages as ancestors.

    After all mandatory merges happen, we try to go up and make another merge (by merging the reached ancestor and its sibling, if they fit in one block). Such a merge is called a regular merge in the code. It is not mandatory to maintain invariants, but it improves tree structure from the point of view of performance. If first regular merge is successful, the attempt will be repeated one level higher, and so on.

    Routing pages

    An inner (i.e. non-leaf) page is called a routing page if it contains zero items (hence, zero keys), but it still contains one pointer to a child one level below. (This is valid because an inner page contains one pointer more than item count.)

    An inner page becomes a routing page when removing last item from it (as a consequence to one of its children becoming empty due to a removal somewhere below), AND due to inability to merge the page with its sibling because the sibling is full.

    A confusion might arise between routing pages and empty pages. A routing page does not contain any items, but it does contain a pointer to its single child, so it is not treated as an empty page (and we keep such pages in the tree).

    • Field Detail

      • suspendFailureDiagnostic

        public static final ThreadLocal<Boolean> suspendFailureDiagnostic
      • IGNITE_BPLUS_TREE_LOCK_RETRIES_DEFAULT

        public static final int IGNITE_BPLUS_TREE_LOCK_RETRIES_DEFAULT
        See Also:
        Constant Field Values
      • metaPageId

        protected final long metaPageId
    • Constructor Detail

      • BPlusTree

        protected BPlusTree​(String name,
                            int cacheGrpId,
                            String grpName,
                            PageMemory pageMem,
                            @Nullable
                            @Nullable IgniteWriteAheadLogManager wal,
                            AtomicLong globalRmvId,
                            long metaPageId,
                            ReuseList reuseList,
                            byte pageFlag,
                            @Nullable
                            @Nullable FailureProcessor failureProcessor,
                            PageLockTrackerManager pageLockTrackerManager,
                            PageIoResolver pageIoRslvr,
                            @Nullable
                            @Nullable PageHandlerWrapper<BPlusTree.Result> hndWrapper)
        Parameters:
        name - Tree name.
        cacheGrpId - Cache ID.
        grpName - Cache group name.
        pageMem - Page memory.
        wal - Write ahead log manager.
        globalRmvId - Remove ID.
        metaPageId - Meta page ID.
        reuseList - Reuse list.
        pageFlag - Default flag value for allocated pages.
        failureProcessor - if the tree is corrupted.
        pageLockTrackerManager - Page lock tracker manager.
    • Method Detail

      • enableSequentialWriteMode

        public void enableSequentialWriteMode()
        Flag for enabling single-threaded append-only tree creation.
      • initTree

        protected final void initTree​(boolean initNew,
                                      int inlineSize)
                               throws IgniteCheckedException
        Initialize new tree.
        Parameters:
        initNew - True if new tree should be created.
        inlineSize - Inline size.
        Throws:
        IgniteCheckedException - If failed.
      • find

        public final GridCursor<T> find​(L lower,
                                        L upper,
                                        Object x)
                                 throws IgniteCheckedException
        Returns a cursor from lower to upper bounds inclusive.
        Specified by:
        find in interface IgniteTree<L,​T extends L>
        Parameters:
        lower - Lower bound or null if unbounded.
        upper - Upper bound or null if unbounded.
        x - Implementation specific argument, null always means that we need to return full detached data row.
        Returns:
        Cursor.
        Throws:
        IgniteCheckedException - If failed.
      • find

        public GridCursor<T> find​(L lower,
                                  L upper,
                                  boolean lowIncl,
                                  boolean upIncl,
                                  BPlusTree.TreeRowClosure<L,​T> c,
                                  BPlusTree.TreeRowFactory<L,​T> rowFactory,
                                  Object x)
                           throws IgniteCheckedException
        Parameters:
        lower - Lower bound or null if unbounded.
        upper - Upper bound or null if unbounded.
        lowIncl - true if lower bound is inclusive.
        upIncl - true if upper bound is inclusive.
        c - Filter closure.
        rowFactory - Row factory or (@code null} for default factory.
        x - Implementation specific argument, null always means that we need to return full detached data row.
        Returns:
        Cursor.
        Throws:
        IgniteCheckedException - If failed.
      • findOne

        public final <R> R findOne​(L row,
                                   Object x)
                            throws IgniteCheckedException
        Parameters:
        row - Lookup row for exact match.
        x - Implementation specific argument, null always means that we need to return full detached data row.
        Returns:
        Found result or null
        Throws:
        IgniteCheckedException - If failed.
      • findOne

        public final T findOne​(L row)
                        throws IgniteCheckedException
        Description copied from interface: IgniteTree
        Returns the value to which the specified key is mapped, or null if this tree contains no mapping for the key.
        Specified by:
        findOne in interface IgniteTree<L,​T extends L>
        Parameters:
        row - Lookup row for exact match.
        Returns:
        Found row.
        Throws:
        IgniteCheckedException - If failed.
      • treeName

        public static String treeName​(String instance,
                                      String type)
        Parameters:
        instance - Instance name.
        type - Tree type.
        Returns:
        Tree name.
      • interruptAll

        public static void interruptAll()
        Interrupt all operations on all threads and all indexes.
      • remove

        public List<L> remove​(L lower,
                              L upper,
                              int limit)
                       throws IgniteCheckedException
        Parameters:
        lower - Lower bound (inclusive).
        upper - Upper bound (inclusive).
        limit - Limit of processed entries by single call, 0 for no limit.
        Returns:
        Removed rows.
        Throws:
        IgniteCheckedException - If failed.
      • removex

        protected boolean removex​(L lower,
                                  L upper,
                                  Object x,
                                  int limit)
                           throws IgniteCheckedException
        Parameters:
        lower - Lower bound (inclusive).
        upper - Upper bound (inclusive).
        x - Implementation specific argument.
        limit - Limit of processed entries by single call, 0 or negative value for no limit.
        Returns:
        True if removed at least one row.
        Throws:
        IgniteCheckedException - If failed.
      • size

        public final long size()
                        throws IgniteCheckedException
        Returns number of elements in the tree by scanning pages of the bottom (leaf) level. Since a concurrent access is permitted, there is no guarantee about momentary consistency: the method may miss updates made in already scanned pages.
        Specified by:
        size in interface IgniteTree<L,​T extends L>
        Returns:
        Number of elements in the tree.
        Throws:
        IgniteCheckedException - If failed.
      • size

        public long size​(@Nullable
                         @Nullable BPlusTree.TreeRowClosure<L,​T> filter)
                  throws IgniteCheckedException
        Returns number of elements in the tree that match the filter by scanning through the pages of the leaf level. Since a concurrent access to the tree is permitted, there is no guarantee about momentary consistency: the method may not see updates made in already scanned pages.
        Parameters:
        filter - The filter to use or null to count all elements.
        Returns:
        Number of either all elements in the tree or the elements that match the filter.
        Throws:
        IgniteCheckedException - If failed.
      • put

        public final T put​(T row)
                    throws IgniteCheckedException
        Put value in this tree.
        Specified by:
        put in interface IgniteTree<L,​T extends L>
        Parameters:
        row - Value to be associated with the specified key.
        Returns:
        The previous value associated with key.
        Throws:
        IgniteCheckedException - If failed.
      • temporaryReleaseLock

        protected void temporaryReleaseLock()
        Releases the lock that is held by long tree destroy process for a short period of time and acquires it again, allowing other processes to acquire it.
      • maxLockHoldTime

        protected long maxLockHoldTime()
        Maximum time for which tree destroy process is allowed to hold the lock, after this time exceeds, temporaryReleaseLock() is called and hold time is reset.
        Returns:
        Time, in milliseconds.
      • destroy

        public final long destroy()
                           throws IgniteCheckedException
        Destroys tree. This method is allowed to be invoked only when the tree is out of use (no concurrent operations are trying to read or update the tree after destroy beginning).
        Returns:
        Number of pages recycled from this tree. If the tree was destroyed by someone else concurrently returns 0, otherwise it should return at least 2 (for meta page and root page), unless this tree is used as metadata storage, or -1 if we don't have a reuse list and did not do recycling at all.
        Throws:
        IgniteCheckedException - If failed.
      • destroy

        public final long destroy​(@Nullable
                                  @Nullable IgniteInClosure<L> c,
                                  boolean forceDestroy)
                           throws IgniteCheckedException
        Destroys tree. This method is allowed to be invoked only when the tree is out of use (no concurrent operations are trying to read or update the tree after destroy beginning).
        Parameters:
        c - Visitor closure. Visits only leaf pages.
        forceDestroy - Whether to proceed with destroying, even if tree is already marked as destroyed (see markDestroyed()).
        Returns:
        Number of pages recycled from this tree. If the tree was destroyed by someone else concurrently returns 0, otherwise it should return at least 2 (for meta page and root page), unless this tree is used as metadata storage, or -1 if we don't have a reuse list and did not do recycling at all.
        Throws:
        IgniteCheckedException - If failed.
      • destroyDownPages

        protected long destroyDownPages​(LongListReuseBag bag,
                                        long pageId,
                                        int lvl,
                                        @Nullable
                                        @Nullable IgniteInClosure<L> c,
                                        AtomicLong lockHoldStartTime,
                                        long lockMaxTime,
                                        Deque<GridTuple3<Long,​Long,​Long>> lockedPages)
                                 throws IgniteCheckedException
        Recursively destroys tree pages. Should be initially called with id of root page as pageId and root level as lvl.
        Parameters:
        bag - Reuse bag.
        pageId - Page id.
        lvl - Current level of tree.
        c - Visitor closure. Visits only leaf pages.
        lockHoldStartTime - When lock has been aquired last time.
        lockMaxTime - Maximum time to hold the lock.
        lockedPages - Deque of locked pages. Is used to release write-locked pages when temporary releasing checkpoint read lock.
        Returns:
        Count of destroyed pages.
        Throws:
        IgniteCheckedException - If failed.
      • markDestroyed

        public boolean markDestroyed()
        Returns:
        True if state was changed.
      • destroyed

        public boolean destroyed()
        Returns:
        True if marked as destroyed.
      • getFirstPageIds

        protected Iterable<Long> getFirstPageIds​(long pageAddr)
        Parameters:
        pageAddr - Meta page address.
        Returns:
        First page IDs.
      • latestInnerIO

        public final BPlusInnerIO<L> latestInnerIO()
        Returns:
        Latest version of inner page IO.
      • latestLeafIO

        public final BPlusLeafIO<L> latestLeafIO()
        Returns:
        Latest version of leaf page IO.
      • getRow

        public final T getRow​(BPlusIO<L> io,
                              long pageAddr,
                              int idx)
                       throws IgniteCheckedException
        Get a full detached data row.
        Parameters:
        io - IO.
        pageAddr - Page address.
        idx - Index.
        Returns:
        Full detached data row.
        Throws:
        IgniteCheckedException - If failed.
      • getRow

        public abstract T getRow​(BPlusIO<L> io,
                                 long pageAddr,
                                 int idx,
                                 Object x)
                          throws IgniteCheckedException
        Get data row. Can be called on inner page only if canGetRowFromInner is true.
        Parameters:
        io - IO.
        pageAddr - Page address.
        idx - Index.
        x - Implementation specific argument, null always means that we need to return full detached data row.
        Returns:
        Data row.
        Throws:
        IgniteCheckedException - If failed.
      • getLockRetries

        protected int getLockRetries()
        Returns:
        Return number of retries.
      • read

        protected final <X,​R> R read​(long pageId,
                                           PageHandler<X,​R> h,
                                           X arg,
                                           int intArg,
                                           R lockFailed)
                                    throws IgniteCheckedException
        Parameters:
        pageId - Page ID.
        h - Handler.
        arg - Argument.
        intArg - Argument of type int.
        lockFailed - Result in case of lock failure due to page recycling.
        Returns:
        Handler result.
        Throws:
        IgniteCheckedException - If failed.
      • read

        protected final <X,​R> R read​(long pageId,
                                           long page,
                                           PageHandler<X,​R> h,
                                           X arg,
                                           int intArg,
                                           R lockFailed)
                                    throws IgniteCheckedException
        Parameters:
        pageId - Page ID.
        page - Page pointer.
        h - Handler.
        arg - Argument.
        intArg - Argument of type int.
        lockFailed - Result in case of lock failure due to page recycling.
        Returns:
        Handler result.
        Throws:
        IgniteCheckedException - If failed.
      • statisticsHolder

        protected IoStatisticsHolder statisticsHolder()
        Returns:
        Statistics holder to track IO operations.
      • corruptedTreeException

        protected CorruptedTreeException corruptedTreeException​(String msg,
                                                                Throwable cause,
                                                                int grpId,
                                                                long... pageIds)
        Construct the exception and invoke failure processor.
        Parameters:
        msg - Message.
        cause - Cause.
        grpId - Group id.
        pageIds - Pages ids.
        Returns:
        New CorruptedTreeException instance.
      • processFailure

        protected void processFailure​(FailureType failureType,
                                      Throwable e)
        Processes failure with failure processor.
        Parameters:
        failureType - Failure type.
        e - Exception.
      • getMetaPageId

        public long getMetaPageId()
        Returns meta page id.
        Returns:
        Meta page id.
      • lockRetryErrorMessage

        protected String lockRetryErrorMessage​(String op)
        Create an error message when reaching the maximum number of repetitions to capture a lock in the B+Tree.
        Parameters:
        op - Operation name, for example: GET, PUT.
        Returns:
        Error message.