Class ArrowUtil


  • public final class ArrowUtil
    extends Object
    Utility class for Arrow operations.

    Provides methods for:

    • Converting Thrift/Hive schemas to Arrow schemas and serialization
    • Creating Arrow IPC byte streams from Thrift responses
    • Processing Arrow batches with decompression

    This consolidates Arrow handling logic used by both streaming and lazy inline Arrow result handlers.

    • Method Detail

      • getSerializedSchema

        public static byte[] getSerializedSchema​(TGetResultSetMetadataResp metadata)
                                          throws DatabricksParsingException
        Gets the serialized Arrow schema from Thrift metadata.

        If the metadata contains a pre-serialized Arrow schema, it is returned directly. Otherwise, the Hive schema is converted to Arrow format and serialized.

        Parameters:
        metadata - The Thrift result set metadata
        Returns:
        The serialized Arrow schema bytes
        Throws:
        DatabricksParsingException - if schema conversion or serialization fails
      • hiveSchemaToArrowSchema

        public static org.apache.arrow.vector.types.pojo.Schema hiveSchemaToArrowSchema​(TTableSchema hiveSchema)
                                                                                 throws DatabricksParsingException
        Converts a Hive TTableSchema to an Arrow Schema.
        Parameters:
        hiveSchema - The Hive table schema from Thrift
        Returns:
        The equivalent Arrow schema
        Throws:
        DatabricksParsingException - if conversion fails
      • columnDescToArrowField

        public static org.apache.arrow.vector.types.pojo.Field columnDescToArrowField​(TColumnDesc columnDesc)
                                                                               throws SQLException
        Creates an Arrow Field from a Thrift column descriptor.
        Parameters:
        columnDesc - The Thrift column descriptor
        Returns:
        The equivalent Arrow field
        Throws:
        SQLException - if type mapping fails
      • createArrowByteStream

        public static ByteArrayInputStream createArrowByteStream​(byte[] cachedSchema,
                                                                 TFetchResultsResp response,
                                                                 Class<?> callerClass)
                                                          throws DatabricksParsingException
        Creates a ByteArrayInputStream containing Arrow IPC data from the response.

        This method combines the cached schema with decompressed Arrow batches to create a complete Arrow IPC stream that can be parsed by Arrow readers.

        Parameters:
        cachedSchema - The serialized Arrow schema bytes (should be cached from first response)
        response - The Thrift fetch response containing Arrow batches
        callerClass - The calling class for logging context
        Returns:
        ByteArrayInputStream containing the Arrow IPC data
        Throws:
        DatabricksParsingException - if processing fails
      • getTotalRowsInResponse

        public static long getTotalRowsInResponse​(TFetchResultsResp response)
        Gets the total row count from all Arrow batches in the response.
        Parameters:
        response - The Thrift fetch response
        Returns:
        The total number of rows across all batches
      • getColumnInfoList

        public static List<ColumnInfo> getColumnInfoList​(TGetResultSetMetadataResp resultManifest)
                                                  throws DatabricksSQLException
        Extracts column information from Thrift result set metadata.

        Converts each column descriptor in the Thrift schema to a ColumnInfo object.

        Parameters:
        resultManifest - The Thrift result set metadata containing schema information
        Returns:
        A list of ColumnInfo objects, empty list if schema is null
        Throws:
        DatabricksSQLException