Class StreamingChunkProvider

  • All Implemented Interfaces:
    ChunkProvider

    public class StreamingChunkProvider
    extends Object
    implements ChunkProvider
    A streaming chunk provider that fetches chunk links proactively and downloads chunks in parallel.

    Key features:

    • No dependency on total chunk count - streams until end of data
    • Proactive link prefetching with configurable window
    • Memory-bounded parallel downloads
    • Automatic link refresh on expiration

    This provider uses two key windows:

    • Link prefetch window: How many links to fetch ahead of consumption
    • Download window: How many chunks to keep in memory (downloading or ready)
    • Constructor Detail

      • StreamingChunkProvider

        public StreamingChunkProvider​(ChunkLinkFetcher linkFetcher,
                                      IDatabricksHttpClient httpClient,
                                      CompressionCodec compressionCodec,
                                      StatementId statementId,
                                      int maxChunksInMemory,
                                      int linkPrefetchWindow,
                                      int chunkReadyTimeoutSeconds,
                                      double cloudFetchSpeedThreshold,
                                      ChunkLinkFetchResult initialLinks)
                               throws DatabricksParsingException
        Creates a new StreamingChunkProvider.
        Parameters:
        linkFetcher - Fetcher for chunk links
        httpClient - HTTP client for downloads
        compressionCodec - Codec for decompressing chunk data
        statementId - Statement ID for logging and chunk creation
        maxChunksInMemory - Maximum chunks to keep in memory (download window)
        linkPrefetchWindow - How many links to fetch ahead
        chunkReadyTimeoutSeconds - Timeout waiting for chunk to be ready
        cloudFetchSpeedThreshold - Speed threshold for logging warnings
        initialLinks - Initial links provided with result data (avoids extra fetch), may be null
        Throws:
        DatabricksParsingException
    • Method Detail

      • hasNextChunk

        public boolean hasNextChunk()
        Description copied from interface: ChunkProvider
        Checks if there are more chunks available to iterate over.
        Specified by:
        hasNextChunk in interface ChunkProvider
        Returns:
        true if there are additional chunks to be retrieved; false otherwise.
      • close

        public void close()
        Description copied from interface: ChunkProvider
        Closes the chunk provider and releases any resources associated with it. After calling this method, the chunk provider should not be used again.
        Specified by:
        close in interface ChunkProvider
      • getChunkCount

        public long getChunkCount()
        Returns the total chunk count only when all chunks have been discovered.

        In streaming mode, the total chunk count is unknown until we reach the end of the stream. This method returns -1 if chunks are still being discovered, and the actual count once all chunks have been fetched.

        Specified by:
        getChunkCount in interface ChunkProvider
        Returns:
        the total chunk count if all chunks have been discovered, or -1 if still streaming