Class ThriftStreamingProvider<T>

  • Type Parameters:
    T - The type of data in each batch (ColumnarRowView or ArrowResultChunk)
    All Implemented Interfaces:
    AutoCloseable

    public class ThriftStreamingProvider<T>
    extends Object
    implements AutoCloseable
    Type-safe streaming provider for Thrift-based results.

    This generic provider works with any data type through pluggable processors. It provides proactive prefetching with a sliding window to maximize throughput while bounding memory usage.

    Key features:

    • Type-safe: getData() returns the correct type without casting
    • Prefetching: Background thread fetches ahead of consumer
    • Memory-bounded: Sliding window limits batches in memory
    • Pluggable: Processors handle type-specific data conversion
    • Method Detail

      • forColumnar

        public static ThriftStreamingProvider<ColumnarRowView> forColumnar​(ThriftBatchFetcher fetcher,
                                                                           TFetchResultsResp initialResponse,
                                                                           int maxBatchesInMemory,
                                                                           int timeoutSeconds)
                                                                    throws DatabricksSQLException
        Creates a streaming provider for Thrift Columnar results.
        Parameters:
        fetcher - The batch fetcher for retrieving data from server
        initialResponse - The initial Thrift fetch response
        maxBatchesInMemory - Maximum batches to keep in memory (sliding window)
        timeoutSeconds - Timeout waiting for batch to be ready
        Returns:
        A type-safe provider that produces ColumnarRowView data
        Throws:
        DatabricksSQLException - if initialization fails
      • forInlineArrow

        public static ThriftStreamingProvider<ArrowResultChunk> forInlineArrow​(ThriftBatchFetcher fetcher,
                                                                               TFetchResultsResp initialResponse,
                                                                               StatementId statementId,
                                                                               int maxBatchesInMemory,
                                                                               int timeoutSeconds)
                                                                        throws DatabricksSQLException
        Creates a streaming provider for Inline Arrow results.
        Parameters:
        fetcher - The batch fetcher for retrieving data from server
        initialResponse - The initial Thrift fetch response
        statementId - The statement ID for chunk creation
        maxBatchesInMemory - Maximum batches to keep in memory (sliding window)
        timeoutSeconds - Timeout waiting for batch to be ready
        Returns:
        A type-safe provider that produces ArrowResultChunk data
        Throws:
        DatabricksSQLException - if initialization fails
      • hasNextBatch

        public boolean hasNextBatch()
        Checks if there are more batches available.
        Returns:
        true if more batches may be available
      • nextBatch

        public boolean nextBatch()
                          throws DatabricksSQLException
        Moves to the next batch. Releases the previous batch.

        This method automatically skips empty batches (rowCount == 0), continuing to advance until a non-empty batch is found or no more batches are available. This ensures consumers never see empty batches and matches the behavior of lazy result implementations.

        Returns:
        true if moved to a non-empty batch, false if no more batches
        Throws:
        DatabricksSQLException - if an error occurred during prefetch
      • getCurrentBatch

        public StreamingBatch<T> getCurrentBatch()
                                          throws DatabricksSQLException
        Gets the current batch with type-safe data access.

        No casting required - returns StreamingBatch<T> with correctly typed data!

        Returns:
        The current batch, or null if before first batch
        Throws:
        DatabricksSQLException - if the batch cannot be retrieved
      • getTotalRowsFetched

        public long getTotalRowsFetched()
        Gets total rows fetched so far.
        Returns:
        The total row count
      • getBatchesInMemory

        public int getBatchesInMemory()
        Gets number of batches currently in memory.
        Returns:
        The batch count
      • isEndOfStreamReached

        public boolean isEndOfStreamReached()
        Checks if the end of stream has been reached.
        Returns:
        true if all batches have been fetched from the server