Class OCGraph

java.lang.Object
com.ocient.jdbc.graph.OCGraph

public class OCGraph extends Object
The OCGraph class provides the same operations as the Spark GraphX library, but for an Ocient database. Graphs are represented as tables in the database, with vertices and edges stored in separate tables. A vertices table must contain a column named "id" that is not nullable and is of type BIGINT. An edges table must contain two columns named "srcid" and "destid" that are not nullable and are of type BIGINT. Besides these columns, the vertices and edges tables can contain any number of additional columns with any data types.
  • Constructor Details

    • OCGraph

      public OCGraph()
  • Method Details

    • subgraph

      public static void subgraph(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, String vertexFilter, String edgeFilter, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Creates a subgraph consisting of new vertex and edge tables based on filtering criteria applied to existing input graph tables.

      This method performs the following steps:

      1. Validates input parameters (non-null, non-empty where applicable).
      2. Checks for potential conflicts where result tables might overwrite input tables.
      3. Creates the structure for the result vertex and edge tables, copying the schema from the corresponding input tables and applying specified indexes.
      4. Populates the result vertex table by selecting vertices from the input vertex table that satisfy the vertexFilter.
      5. Populates the result edge table by selecting edges from the input edge table that satisfy the edgeFilter AND whose source and destination vertices BOTH exist in the newly created (filtered) result vertex table.

      Important Filter Logic:

      • The vertexFilter is a standard SQL WHERE clause applied to the input vertex table.
      • The edgeFilter is a standard SQL WHERE clause condition. Within this filter string:
        • Reference columns from the source vertex using the alias a (e.g., a.vertex_property).
        • Reference columns from the edge itself using the alias b (e.g., b.edge_property).
        • Reference columns from the destination vertex using the alias c (e.g., c.vertex_property).
        An edge is only included in the final result edge table if it passes this filter *and* its corresponding source and destination vertices are present in the *result* vertex table (i.e., they passed the vertexFilter).

      Atomicity and Cleanup: The operation attempts to be atomic in the sense that if any step after table creation fails (e.g., due to an SQL error during insertion or an invalid filter), it will attempt to drop the newly created result vertex and edge tables to avoid leaving partial results. Input tables are never modified.

      Parameters:
      inputSchema - The schema name containing the original graph tables. Must not be null or empty.
      inputVerticesTable - The name of the table containing the original vertices. Must not be null or empty.
      inputEdgesTable - The name of the table containing the original edges. Must not be null or empty.
      resultSchema - The schema name where the resulting subgraph tables will be created. Must not be null or empty.
      resultVerticesTable - The name for the new table that will store the filtered vertices. Must not be null or empty. This table should not conflict with input table names.
      resultEdgesTable - The name for the new table that will store the filtered edges. Must not be null or empty. This table should not conflict with input table names.
      vertexFilter - An SQL WHERE clause condition (without the "WHERE" keyword) used to filter vertices from the inputVerticesTable. Use "1=1" for no filtering. Must not be null.
      edgeFilter - An SQL WHERE clause condition (without the "WHERE" keyword) used to filter edges. Use aliases a, b, and c to refer to the source vertex, edge, and destination vertex columns, respectively (e.g., "a.prop > 5 AND b.weight < 10"). Use "1=1" for no filtering. Must not be null.
      resultVerticesIndexes - A list of column names from the vertex table on which to create indexes in the resultVerticesTable. Can be empty. Must not be null.
      resultEdgesIndexes - A list of column names from the edge table on which to create indexes in the resultEdgesTable. Can be empty. Must not be null.
      stmt - The JDBC Statement object used to execute SQL commands against the database. Must not be null.
      Throws:
      SQLException - If any database access error occurs during table creation, data insertion, or cleanup, or if the provided filters result in invalid SQL.
    • joinVertices

      public static void joinVertices(String inputSchema, String inputVerticesTable, String modificationSchema, String modificationVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultAttributeExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Joins two sets of vertices (input and modification) into a result table.

      This method performs a join operation based on the vertex id. It first creates the result table structure based on the input vertices table and specified indexes. Then, it performs two main insertion operations:

      1. Inserts all vertices from the inputVerticesTable that are NOT present (by id) in the modificationVerticesTable. All columns are copied directly.
      2. Inserts vertices that ARE present (by id) in BOTH the inputVerticesTable and the modificationVerticesTable. For these vertices, the id is taken from the input table, and the remaining attribute columns are calculated based on the provided resultAttributeExpressions.

      Usage of resultAttributeExpressions:

      This parameter is crucial for defining the attribute values in the result table when a vertex exists in both the input and modification sets. Each string in the list represents a SQL expression that calculates a single column value for the resulting vertex.

      • Available Columns and Aliases: Within these expressions, you can refer to columns from both the inputVerticesTable and the modificationVerticesTable. To distinguish between them, the query internally uses aliases:
        • The inputVerticesTable is aliased as a.
        • The modificationVerticesTable is aliased as b.
        Therefore, you must prefix column names with these aliases (e.g., a.attribute_name, b.other_attribute) to ensure clarity and avoid ambiguity, especially if columns share the same name in both tables[9]. The join condition is a.id = b.id.
      • Expression Structure and AS alias: Each expression string provided in the list will be placed directly into the SELECT list of the second INSERT statement. For example: SELECT a.id, expression1, expression2, ... FROM .... While the function itself doesn't enforce it, it is highly recommended that each expression ends with AS alias_name (e.g., a.value + b.delta AS final_value, b.status AS current_status)[4][5][8][10].
      • Order: The order of expressions in the resultAttributeExpressions list determines the order of the corresponding columns in the result table (following the initial id column). Ensure this order matches the column order established when the resultVerticesTable was created.

      Transactional Behavior: If any SQL operation within this method fails (e.g., due to invalid syntax in expressions, constraint violations, or database errors), it attempts to drop the partially created resultVerticesTable to avoid leaving incomplete data. The original SQLException is then re-thrown.

      Parameters:
      inputSchema - The schema containing the primary input vertices table. Cannot be null or empty.
      inputVerticesTable - The name of the primary input vertices table (e.g., containing the original state). Cannot be null or empty.
      modificationSchema - The schema containing the modification vertices table. Cannot be null or empty.
      modificationVerticesTable - The name of the modification vertices table (e.g., containing changes or updates). Cannot be null or empty.
      resultSchema - The schema where the result table will be created. Cannot be null or empty.
      resultVerticesTable - The name of the table to be created containing the joined vertices. Cannot be null or empty, and must not conflict with input table names.
      resultAttributeExpressions - An ordered list of SQL expressions defining the attribute columns for vertices present in both input and modification tables. The list cannot be null or empty. See details above regarding column referencing (use aliases a and b) and the recommended use of AS alias_name for each expression.
      resultVerticesIndexes - A list of column names on which indexes should be created in the resultVerticesTable. Cannot be null. Can be empty if no indexes are needed.
      stmt - The JDBC Statement object used to execute the SQL commands. Cannot be null.
      Throws:
      SQLException - If any database access error occurs, if table/schema names are invalid, if resultAttributeExpressions is empty, or if an expression is syntactically incorrect. Also thrown if cleanup (dropping the result table on failure) fails.
    • mapVertices

      public static void mapVertices(String inputSchema, String inputVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Transforms vertex attributes from an input table to a new result table.

      This function creates a new table (resultSchema.resultVerticesTable) containing vertices with transformed attributes based on the data in the input table ( inputSchema.inputVerticesTable). The transformation is defined by the SQL expressions provided in resultColumnExpressions.

      The function performs the following steps:

      1. Performs sanity checks on input parameters.
      2. Checks for potential naming conflicts between input and result tables.
      3. Creates the result table structure using CREATE TABLE ... AS SELECT ... WHERE 1=0. The mandatory id column from the input table is always included as the first column.
      4. Creates indexes on the specified columns (resultVerticesIndexes) in the result table.
      5. Populates the result table by selecting and transforming data from the input table using INSERT INTO ... SELECT ....

      Transactional Behavior: If any step after the initial creation of the result table fails (e.g., index creation, data insertion), the function attempts to drop the newly created result table (resultVerticesName) to leave the database in a consistent state.

      Using resultColumnExpressions

      This parameter defines the columns of the result table, beyond the mandatory id column which is always included automatically as the first column. Each string in the list represents a column definition in the SELECT list of the underlying SQL queries.

      Referencing Input Columns

      Expressions within resultColumnExpressions can refer to any column present in the input vertices table (inputSchema.inputVerticesTable). You should refer to these input columns using their actual names as they exist in the input table. No special aliasing is required simply to reference an input column. For example, if the input table has columns named "id", "property1", and "value", your expressions can directly use property1 and value.

      Defining Output Columns and Aliases (AS alias)

      Each string in resultColumnExpressions becomes part of the SELECT clause.
      • Simple Column Selection: If an expression is just the name of a column from the input table (e.g., "property1"), that column will be included in the result table with the same name.
      • Transformations and Expressions: You can use any valid SQL expression (e.g., "value * 2", "CONCAT(first_name, ' ', last_name')", "CASE WHEN status = 'A' THEN 1 ELSE 0 END").
      • Using AS alias: It is highly recommended to use an alias using AS alias_name for each expression, especially for complex expressions or when you want to rename a column.
        • If you provide a complex expression without an alias (e.g., "value * 2"), the database will assign a default name to the resulting column, which can be unpredictable and difficult to work with later.
        • Using an alias provides a clear, predictable name for the column in the result table. Example: "value * 2 AS double_value".
        • Even for simple column selections, you can use an alias to rename the column in the result table. Example: "property1 AS vertex_property".
        Therefore, while not strictly mandatory for simple column names, ending each expression string with AS your_desired_column_name is the best practice for clarity, predictability, and compatibility with subsequent operations (like defining indexes using resultVerticesIndexes).
      Important: Do not include the id column in resultColumnExpressions; it is automatically handled and placed as the first column in the result table. The order of expressions in the list determines the order of the corresponding columns in the result table (following the id column).

      Example usage for resultColumnExpressions:

      
       ArrayList<String> expressions = new ArrayList<>();
       expressions.add("name"); // Selects 'name' column, result column is named 'name'
       expressions.add("age AS vertex_age"); // Selects 'age' column, result column is named 'vertex_age'
       expressions.add("salary * 1.1 AS increased_salary"); // Calculates new salary, result column 'increased_salary'
       expressions.add("CASE category WHEN 'A' THEN 1 ELSE 0 END AS category_code"); // Transforms category
       
      Parameters:
      inputSchema - The schema name of the source vertices table. Cannot be null or empty.
      inputVerticesTable - The name of the source vertices table. Cannot be null or empty. Must contain an 'id' column and any columns referenced in resultColumnExpressions.
      resultSchema - The schema name for the target (result) vertices table. Cannot be null or empty.
      resultVerticesTable - The name for the target (result) vertices table. This table will be created. Cannot be null or empty. Must not conflict with input tables.
      resultColumnExpressions - An ordered list of SQL expressions defining the columns (beyond the mandatory id) for the result table. See method description for detailed usage. Cannot be null; elements cannot be null or empty.
      resultVerticesIndexes - An ordered list of column names in the result table on which to create indexes. These names must match the final column names in the result table (either id, names derived from resultColumnExpressions, or aliases provided via AS). Cannot be null; elements cannot be null or empty.
      stmt - The Statement object used to execute SQL commands. Must be valid and connected. Cannot be null.
      Throws:
      SQLException - if any database operation (CREATE TABLE, CREATE INDEX, INSERT) fails.
    • mapEdges

      public static void mapEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Transforms edge attributes by creating a new edges table based on specified column expressions derived from an existing edges table.

      This method creates a new table (`resultSchema.resultEdgesTable`) with the structure defined by the `resultColumnExpressions` plus mandatory `srcid` and `destid` columns. It then populates this new table by selecting data from the `inputSchema.inputEdgesTable` and applying the specified transformations. Indexes are created on the result table as specified.

      If the operation fails at any point after the initial `CREATE TABLE` statement for the result table is issued (e.g., during index creation or data insertion), the method will attempt to drop the partially created `resultEdgesTable` to ensure a cleaner state.

      ## Understanding `resultColumnExpressions`

      This parameter is crucial for defining the content and structure of the new edges table, beyond the mandatory `srcid` and `destid` columns which are always included.

      ### Available Columns for Expressions Each string expression in the `resultColumnExpressions` list is evaluated within the context of a `SELECT ... FROM inputSchema.inputEdgesTable` statement. Therefore, your expressions can refer to **any column present in the original `inputSchema.inputEdgesTable`**.

      ### Referencing Input Columns You should refer to the columns from the `inputEdgesTable` using their **original names**. No special prefixes or table aliases are required *for the source columns* within the expression itself. For example, if the input table has columns `weight`, `type`, and `cost`, you can write expressions like: - `"weight * 2"` - `"LOWER(type)"` - `"cost / 100.0"` - `"CASE WHEN status = 'ACTIVE' THEN 1 ELSE 0 END"`

      ### Defining Result Column Names (AS alias) **Each expression string should define the name for the resulting column using the `AS` keyword followed by an alias.** This alias becomes the column name in the new `resultEdgesTable`. The database needs this alias to know what to name the column resulting from your expression in the `CREATE TABLE AS SELECT ...` and `INSERT INTO ... SELECT ...` statements.

      **Example:** If your input table has columns `weight` and `label`, and you want the result table to have `srcid`, `destid`, a column named `double_weight` (calculated as weight * 2), and a column named `upper_label` (the uppercase version of the label), your `resultColumnExpressions` list should look like this:

      ``` ArrayList expressions = new ArrayListinvalid input: '<'>(List.of( "weight * 2 AS double_weight", "UPPER(label) AS upper_label" )); ```

      The final result table (`resultSchema.resultEdgesTable`) will then have the columns: `srcid`, `destid`, `double_weight`, `upper_label`.

      **Note:** Do *not* include expressions for `srcid` or `destid` in this list; they are automatically handled and preserved with their original names.

      Parameters:
      inputSchema - The schema of the source table containing the original edge data. Cannot be null or empty.
      inputEdgesTable - The name of the source table containing the original edge data. Cannot be null or empty.
      resultSchema - The schema for the new table that will store the transformed edge data. Cannot be null or empty.
      resultEdgesTable - The name for the new table that will store the transformed edge data. Cannot be null or empty.
      resultColumnExpressions - An ArrayList of SQL expressions defining the columns of the result table, *in addition* to the implicitly included `srcid` and `destid` columns. Each expression must refer to columns in the `inputEdgesTable` and should use `AS alias` to define the result column name. The list cannot be null, but can be empty if only `srcid` and `destid` are needed. Individual expressions cannot be null or empty.
      resultEdgesIndexes - An ArrayList of column names (from the *result* table, including `srcid`, `destid`, or any names defined via `AS` in `resultColumnExpressions`) on which to create indexes in the `resultEdgesTable`. The list cannot be null, but can be empty. Individual column names for indexing cannot be null or empty.
      stmt - The JDBC Statement object used to execute the SQL queries. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, index creation, data insertion, or dropping the table on failure.
    • mapTriplets

      public static void mapTriplets(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Transforms edge attributes by incorporating information from the source and destination vertices (the "triplet").

      This method creates a new edge table (`resultEdgesTable`) based on joining the input edge table (`inputEdgesTable`) with the input vertex table (`inputVerticesTable`) for both source and destination vertices. It allows defining custom columns in the new edge table using SQL expressions that can refer to columns from the source vertex, the original edge, and the destination vertex.

      The operation is designed to be somewhat transactional within its scope: if any step (table creation, index creation, data insertion) fails, it attempts to drop the partially created `resultEdgesTable` to avoid leaving incomplete artifacts. ## Understanding `resultColumnExpressions`

      This parameter is crucial for defining the schema and content (beyond `srcid` and `destid`) of the resulting edge table. Each string in the `resultColumnExpressions` list represents a SQL expression that will be included in the `SELECT` list used to create and populate the new table.

      ### Available Columns and Required Aliases

      When constructing your expressions, you have access to columns from three sources, which *must* be referenced using specific aliases:

      • Source Vertex (`a`): Columns from the `inputVerticesTable` corresponding to the source of the edge. Reference these using the alias `a`. For example: a.vertex_property, a.name. The join condition is a.id = b.srcid.
      • Edge (`b`): Columns from the `inputEdgesTable`. Reference these using the alias `b`. For example: b.edge_weight, b.type. The columns b.srcid and b.destid are automatically included in the result table's primary selection and do not need to be added via `resultColumnExpressions`.
      • Destination Vertex (`c`): Columns from the `inputVerticesTable` corresponding to the destination of the edge. Reference these using the alias `c`. For example: c.vertex_property, c.status. The join condition is c.id = b.destid.
      It is assumed that the `inputVerticesTable` has an `id` column and the `inputEdgesTable` has `srcid` and `destid` columns representing the foreign keys to the vertex `id`.

      ### Using `AS alias` for Result Column Names

      It is highly recommended that every expression in `resultColumnExpressions` ends with an `AS result_column_name` clause.

      • Why? When creating a table using `CREATE TABLE AS SELECT ...`, the database needs names for the columns being created. While simple expressions like `a.name` implicitly use `name` as the column name, complex expressions (e.g., a.value + c.value) or functions (e.g., upper(b.type)) will result in default column names if an alias is not provided.
      • Clarity and Control: Using `AS alias` gives you explicit control over the column names in your `resultEdgesTable`. This is essential for predictability and for referencing these columns later (e.g., in the `resultEdgesIndexes` list).
      • Requirement: For any expression that is not a simple column reference (e.g., arithmetic, function calls, concatenation), an `AS alias` is effectively mandatory to ensure a valid and predictable column name.
      ### Examples of `resultColumnExpressions`:
      
       ArrayList<String> expressions = new ArrayList<>();
       // Include the edge weight directly
       expressions.add("b.weight AS edge_weight");
       // Combine properties from source and destination vertices
       expressions.add("a.category || '-' || c.category AS vertex_categories");
       // Calculate a value based on source vertex and edge
       expressions.add("a.value * b.multiplier AS calculated_value");
       // Include the name of the source vertex
       expressions.add("a.name AS source_vertex_name");
       
      In this example, the resulting table `resultEdgesTable` would have columns: `srcid`, `destid`, `edge_weight`, `vertex_categories`, `calculated_value`, and `source_vertex_name`.
      Parameters:
      inputSchema - The schema containing the input vertex and edge tables. Cannot be null or empty.
      inputVerticesTable - The name of the input table containing vertex data. Must contain an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the input table containing edge data. Must contain 'srcid' and 'destid' columns referencing vertex 'id's. Cannot be null or empty.
      resultSchema - The schema where the result edge table will be created. Cannot be null or empty.
      resultEdgesTable - The name of the new edge table to be created. This table will contain `srcid`, `destid`, and columns defined by `resultColumnExpressions`. Cannot be null or empty.
      resultColumnExpressions - A list of SQL expressions defining additional columns for the result table. Each expression should reference columns using aliases `a` (source vertex), `b` (edge), `c` (destination vertex) and should specify a result column name using `AS alias`. Cannot be null. List elements cannot be null/empty.
      resultEdgesIndexes - A list of column names in the `resultEdgesTable` on which to create indexes. These names should correspond to the aliases provided in `resultColumnExpressions` or the standard 'srcid', 'destid'. Cannot be null. List elements cannot be null/empty.
      stmt - The Statement object used to execute the SQL queries. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, index creation, data insertion, or cleanup.
    • createTripletsView

      public static void createTripletsView(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTripletsView, Statement stmt) throws SQLException
      Creates a database view representing graph triplets (source vertex, edge, destination vertex).

      This method constructs and executes a CREATE VIEW SQL statement. The resulting view joins the specified edges table with the vertices table twice: once for the source vertex (based on edges.srcid = vertices.id) and once for the destination vertex (based on edges.destid = vertices.id).

      The view includes all columns from the original edges table, along with all columns from the vertices table (excluding the id column itself). Vertex columns are prefixed with src_ for the source vertex and dest_ for the destination vertex to avoid naming conflicts and clarify origin.

      Important: If any part of the view creation process fails (e.g., due to SQL errors, insufficient permissions, or invalid input), the method attempts to drop the view using safeDropView to ensure a clean state.

      Preconditions:

      • The input Statement object must be valid, open, and connected to the target database.
      • All schema and table/view name strings must be non-null and non-empty.
      • The inputVerticesTable must exist in the inputSchema and contain an id column along with other attribute columns.
      • The inputEdgesTable must exist in the inputSchema and contain srcid and destid columns that reference the id column of the inputVerticesTable.
      • The proposed resultTripletsView name must not conflict with the input table names within their respective schemas (checked internally by checkTableConflict).
      Parameters:
      inputSchema - The schema containing the source vertices and edges tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing vertex data. Must include an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the table containing edge data. Must include 'srcid' and 'destid' columns referencing inputVerticesTable.id. Cannot be null or empty.
      resultSchema - The schema where the resulting triplets view will be created. Cannot be null or empty.
      resultTripletsView - The name for the new view to be created. Cannot be null or empty.
      stmt - The JDBC Statement object used to execute the SQL commands. Cannot be null.
      Throws:
      SQLException - If a database access error occurs during query execution (e.g., table not found, syntax error in generated SQL, insufficient permissions).
    • createTripletsTable

      public static void createTripletsTable(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTripletsTable, ArrayList<String> resultTripletsIndexes, Statement stmt) throws SQLException
      Creates a materialized triplet table by joining vertex and edge data from input tables.

      This method constructs a new table in the specified resultSchema named resultTripletsTable. This table represents graph triplets (source vertex, edge, destination vertex). It achieves this by performing a SQL join:

      1. It selects all columns from the inputEdgesTable.
      2. It joins the inputEdgesTable with the inputVerticesTable on inputEdgesTable.srcid = inputVerticesTable.id to get the source vertex attributes. All vertex attributes (except 'id') are included in the result table, prefixed with "src_".
      3. It joins the inputEdgesTable with the inputVerticesTable again on inputEdgesTable.destid = inputVerticesTable.id to get the destination vertex attributes. All vertex attributes (except 'id') are included in the result table, prefixed with "dest_".
      The method first creates the table structure, then creates the specified indexes on the new table, and finally populates the table with the data from the join query.

      Input parameters are checked for null or emptiness. It also checks if the target table name conflicts with any of the input table names.

      If any step during the table creation, index creation, or population fails (e.g., due to an SQLException), the method attempts to drop the partially created resultTripletsTable to ensure atomicity as much as possible within the operation's scope.

      Parameters:
      inputSchema - The schema containing the source vertex and edge tables. Must not be null or empty.
      inputVerticesTable - The name of the source table containing vertex data. It is expected to have an 'id' column and potentially other attribute columns. Must not be null or empty.
      inputEdgesTable - The name of the source table containing edge data. It is expected to have 'srcid' and 'destid' columns referencing vertex IDs, and potentially other edge attribute columns. Must not be null or empty.
      resultSchema - The schema where the resulting triplets table will be created. Must not be null or empty.
      resultTripletsTable - The desired name for the output triplets table. Must not be null or empty.
      resultTripletsIndexes - A list of column names *in the resulting triplets table* on which B-tree indexes should be created. These can include original edge columns or the prefixed vertex columns (e.g., "src_attributeName", "dest_attributeName"). The list itself must not be null, and each column name within the list must not be null or empty.
      stmt - The Statement object used to execute the SQL DDL (CREATE TABLE, CREATE INDEX) and DML (INSERT) commands. Must not be null.
      Throws:
      SQLException - If any database access error occurs during sanity checks (metadata retrieval), table creation, index creation, data insertion, or the cleanup attempt (dropping the table).
    • reverseEdges

      public static void reverseEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Reverses the direction of edges stored in a database table.

      This method reads edges from an input table specified by inputSchema and inputEdgesTable. It assumes the input table has columns named srcid (source ID) and destid (destination ID). It creates a new result table specified by resultSchema and resultEdgesTable.

      The result table will have the same columns as the input table, but the values in the srcid and destid columns will be swapped for each edge. All other columns present in the input table are copied verbatim to the result table.

      The method also creates indexes on the specified columns (resultEdgesIndexes) in the newly created result table.

      Important: This operation is designed to be somewhat atomic. If any step during the table creation or data population fails (indicated by an SQLException), the method will attempt to drop the newly created result table (resultSchema.resultEdgesTable) to avoid leaving partially created or populated tables. Input validation failures (e.g., null parameters) may occur before table creation begins.

      Parameters:
      inputSchema - The schema name of the input table containing the original edges. Cannot be null or empty.
      inputEdgesTable - The name of the input table containing the original edges. Must contain columns named srcid and destid (case-insensitive). Cannot be null or empty.
      resultSchema - The schema name for the new table where reversed edges will be stored. Cannot be null or empty.
      resultEdgesTable - The name for the new table where reversed edges will be stored. Cannot be null or empty. The combination of resultSchema and resultEdgesTable must not conflict with the input table name.
      resultEdgesIndexes - A list of column names on which to create indexes in the result table. The list itself cannot be null, but it can be empty if no indexes are desired. Column names should exist in the input table structure.
      stmt - The JDBC Statement object used to execute SQL commands against the database. Cannot be null.
      Throws:
      SQLException - If any database access error occurs during table creation, metadata retrieval, data insertion, index creation, or the cleanup (table drop) attempt.
    • filterEdges

      public static void filterEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, String edgeFilter, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Filters edges from a source table into a new result table based on a provided SQL predicate.

      This method performs the following steps:

      1. Performs sanity checks on input parameters.
      2. Checks if the intended result table name conflicts with input tables.
      3. Creates the result table (`resultSchema.resultEdgesTable`) with the same structure as the input table (`inputSchema.inputEdgesTable`).
      4. Creates specified indexes on the result table.
      5. Constructs and executes an SQL `INSERT INTO ... SELECT * FROM ... WHERE ...` statement to populate the result table with rows from the input table that satisfy the edgeFilter.
      6. If any step fails (e.g., due to an SQLException), it attempts to drop the potentially partially created result table to ensure cleanup.
      ## The `edgeFilter` Parameter Explained

      The edgeFilter parameter is a crucial part of this function, allowing you to specify *which* edges should be copied from the input table to the result table. Here's how it works:

      - **Purpose:** It provides the filtering logic as a standard SQL condition. - **Content:** It should contain only the conditional expression that normally comes *after* the `WHERE` keyword in a SQL query. - **`WHERE` Keyword:** **Do NOT include the word `WHERE`** in the edgeFilter string. The function automatically prepends ` WHERE ` before adding your filter condition to the SQL query, but only if the filter is not empty or effectively "no filter" (`"1=1"`). - **Column References:** The conditions in edgeFilter must reference columns that exist in the **input edges table** (`inputSchema.inputEdgesTable`). - **Table Aliases:** The underlying SQL query generated by this function is structured like: `INSERT INTO result_table SELECT * FROM input_table WHERE [your_edgeFilter];` Since this query selects directly from the `input_table` without defining any table aliases in the `FROM` clause, you should **refer to columns directly by their names**. You do not need (and should not use) table aliases (like `t1.col_name`) within the edgeFilter string. - **No Filter:** If you want to copy all edges without any filtering, pass the string `"1=1"` as the edgeFilter. The function recognizes this and will effectively omit the `WHERE` clause. Note that `null` is not allowed (enforced by a check).

      ### Examples of `edgeFilter` values:

      • Filter by edge weight: "weight > 0.5"
      • Filter by type and creation date: "type = 'mention' AND created_ts > '2024-01-01'"
      • Filter based on a source node ID: "src_id = 12345"
      • No filtering (copy all edges): "1=1"
      Parameters:
      inputSchema - The schema name of the source table containing edges. Cannot be null or empty.
      inputEdgesTable - The table name of the source table containing edges. Cannot be null or empty.
      resultSchema - The schema name for the new result table where filtered edges will be stored. Cannot be null or empty.
      resultEdgesTable - The table name for the new result table where filtered edges will be stored. Cannot be null or empty. This table will be created by the function.
      edgeFilter - The SQL predicate string (condition) used to filter edges. Should reference columns in the input table directly by name. Do NOT include the `WHERE` keyword. Use "1=1" for no filter. Cannot be null.
      resultEdgesIndexes - A list of column names (which must exist in the input table) on which to create indexes in the result table. Cannot be null (can be an empty list).
      stmt - The JDBC Statement object used to execute the SQL commands. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, index creation, data insertion, or cleanup (dropping the table on failure). Also thrown by potential underlying utility methods if they interact with the DB.
    • filterVertices

      public static void filterVertices(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, String vertexFilter, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Creates a new subgraph by filtering vertices based on a specified condition and retaining only the edges that connect the resulting vertices.

      This method performs the following steps:

      1. Performs sanity checks on input parameters.
      2. Checks for potential conflicts where result tables might overwrite input tables.
      3. Creates new empty tables (resultVerticesTable, resultEdgesTable) in the resultSchema, replicating the structure and specified indexes of the corresponding input tables.
      4. Populates the resultVerticesTable by selecting vertices from the inputVerticesTable that satisfy the vertexFilter condition.
      5. Populates the resultEdgesTable by selecting edges from the inputEdgesTable where *both* the source vertex (srcid) and the destination vertex (destid) are present in the newly populated resultVerticesTable.

      Transactional Behavior: If any SQLException occurs after the result tables have been created but before the operation completes successfully, the method will attempt to drop the resultVerticesTable and resultEdgesTable to clean up potentially incomplete results.

      Assumptions:

      • The input vertices table (inputVerticesTable) contains a column named id which serves as the vertex identifier.
      • The input edges table (inputEdgesTable) contains columns named srcid and destid representing the source and destination vertex IDs, respectively, which correspond to the id column in the vertices table.
      Parameters:
      inputSchema - The schema containing the source graph tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the source vertices. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the source edges. Cannot be null or empty.
      resultSchema - The schema where the resulting filtered graph tables will be created. Cannot be null or empty.
      resultVerticesTable - The name for the new table that will store the filtered vertices. Cannot be null or empty.
      resultEdgesTable - The name for the new table that will store the filtered edges. Cannot be null or empty.
      vertexFilter - An SQL predicate string used to filter rows from the inputVerticesTable. This predicate is applied directly within a WHERE clause constructed internally like: SELECT * FROM inputVerticesTable WHERE [vertexFilter].
      • Do NOT include the WHERE keyword in this string; the method adds it automatically.
      • The filter condition can reference any column available in the inputVerticesTable directly by its name.
      • Table aliases are NOT required (and generally should not be used) within this filter string, as the context is a simple select from the single input vertices table.
      • Example: "population > 10000 AND type = 'city'"
      • Example: "ST_Area(geom) > 500"
      • Use "1=1" if no vertex filtering is desired (i.e., keep all vertices initially).
      • Cannot be null.
      resultVerticesIndexes - A list of column names from the inputVerticesTable on which indexes should be created in the resultVerticesTable. Cannot be null. Can be empty if no indexes are needed.
      resultEdgesIndexes - A list of column names from the inputEdgesTable on which indexes should be created in the resultEdgesTable. Cannot be null. Can be empty if no indexes are needed.
      stmt - The JDBC Statement object used to execute the SQL commands against the database. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, data insertion, index creation, or table dropping.
    • inDegrees

      public static void inDegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultInDegreesTable, ArrayList<String> resultIndexes, Statement stmt) throws SQLException
      Calculates the in-degree for each destination node in a given edge table and stores the results (node ID and its in-degree count) in a newly created table.

      This method performs the following steps:

      1. Performs sanity checks on all input parameters to ensure they are not null or empty where required.
      2. Checks for potential naming conflicts between the input tables and the target result table.
      3. Constructs the fully qualified names for input and result tables (schema.table).
      4. Creates the result table (`resultSchema.resultInDegreesTable`) with columns `id` (BIGINT) and `in_degree` (BIGINT).
      5. Creates indexes on the specified columns ('id' and/or 'in_degree') in the result table based on the `resultIndexes` list.
      6. Populates the result table by executing an SQL query that counts the occurrences of each `destid` in the input edge table (`inputSchema.inputEdgesTable`) and inserts the `destid` (as `id`) and the count (as `in_degree`).

      Error Handling: The method includes a safety mechanism. If any part of the operation (table creation, index creation, or data population) fails and throws an exception, it attempts to drop the potentially partially created result table (`resultSchema.resultInDegreesTable`) before propagating the exception. This helps prevent leaving incomplete artifacts in the database.

      Parameters:
      inputSchema - The schema name of the input table containing edges. Must not be null or empty.
      inputEdgesTable - The name of the input table containing edges. This table is expected to have a column named destid representing the destination node ID of an edge. Must not be null or empty.
      resultSchema - The schema name for the output table where in-degree results will be stored. Must not be null or empty.
      resultInDegreesTable - The name for the output table where in-degree results will be stored. Must not be null or empty.
      resultIndexes - An ArrayList of strings specifying which columns in the result table should be indexed. Valid entries are "id" and "in_degree" (case-insensitive). The list itself must not be null, and its elements must not be null or empty. An empty list means no specific indexes will be created beyond potential defaults.
      stmt - The JDBC Statement object used to execute the SQL queries. Must not be null.
      Throws:
      SQLException - If any database access error occurs during table creation, index creation, data insertion, or if invalid arguments are provided (e.g., invalid index column name, table name conflict, SQL syntax errors, connection issues).
    • outDegrees

      public static void outDegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultOutDegreesTable, ArrayList<String> resultIndexes, Statement stmt) throws SQLException
      Calculates the out-degree for each node in a graph represented by an edge table and stores the results in a new database table.

      This method performs the following steps:

      1. Performs sanity checks on all input parameters.
      2. Checks for potential naming conflicts between input and result tables.
      3. Creates a new result table (resultOutDegreesTable) in the specified resultSchema with columns id (BIGINT) and out_degree (BIGINT).
      4. Optionally creates indexes on the 'id' and/or 'out_degree' columns of the result table based on the resultIndexes list.
      5. Populates the result table by querying the inputEdgesTable, grouping by the source ID (assumed to be 'srcid'), and counting the occurrences to determine the out-degree.
      The operation aims to be somewhat atomic: if any step (table creation, index creation, or data population) fails and throws an SQLException, this method attempts to drop the potentially incomplete result table before propagating the exception.
      Parameters:
      inputSchema - The schema name where the input edge table resides. Cannot be null or empty.
      inputEdgesTable - The name of the input table containing graph edges. Expected to have at least a source ID column (typically 'srcid'). Cannot be null or empty.
      resultSchema - The schema name where the result table will be created. Cannot be null or empty.
      resultOutDegreesTable - The name of the table to be created for storing out-degrees (node ID and count). Cannot be null or empty. Must not conflict with input table names.
      resultIndexes - A list of column names ('id' or 'out_degree', case-insensitive) on which to create indexes in the result table. Indexes can improve query performance. The list cannot be null; elements cannot be null/empty and must be 'id' or 'out_degree'.
      stmt - The JDBC Statement object used to execute SQL commands against the database. Cannot be null. The statement should be associated with an active connection with sufficient privileges to create tables, create indexes, and query data.
      Throws:
      SQLException - If any database access error occurs during sanity checks (e.g., invalid index column name), table creation, index creation, data population, or the cleanup attempt (dropping the table on failure). This includes issues like invalid SQL syntax, insufficient permissions, connection problems, or if the specified tables/schemas do not exist or have unexpected structures.
    • degrees

      public static void degrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultDegreesTable, ArrayList<String> resultIndexes, Statement stmt) throws SQLException
      Calculates the total degree (combined in-degree and out-degree) for each node in a graph represented by an edges table and stores the results in a new table.

      The input edges table is expected to have columns named srcid and destid, both representing node IDs (assumed to be BIGINT).

      This method performs the following steps:

      1. Performs sanity checks on input parameters (non-null, non-empty).
      2. Checks for potential naming conflicts between input and result tables.
      3. Creates the result table (resultSchema.resultDegreesTable) with columns id BIGINT NOT NULL and degree BIGINT NOT NULL.
      4. Optionally creates indexes on the 'id' or 'degree' columns of the result table as specified in resultIndexes.
      5. Populates the result table by counting all occurrences of each node ID in both the 'srcid' and 'destid' columns of the input edges table.

      Error Handling: The operation attempts to be transactional within its scope. If any step (table creation, index creation, data insertion) fails and throws an SQLException, the method will attempt to drop the resultDegreesTable to avoid leaving a partially created or populated table in the database. This cleanup attempt might also fail but is performed in a finally block.

      Parameters:
      inputSchema - The schema name of the input table containing graph edges. Cannot be null or empty.
      inputEdgesTable - The name of the input table containing graph edges. Must contain 'srcid' and 'destid' columns. Cannot be null or empty.
      resultSchema - The schema name for the output table where node degrees will be stored. Cannot be null or empty.
      resultDegreesTable - The name for the output table to be created. This table will store node IDs and their calculated degrees. Cannot be null or empty.
      resultIndexes - An ArrayList of column names on which to create indexes in the result table. Valid values are "id" and "degree" (case-insensitive). Cannot be null, but can be empty. Invalid column names will cause an SQLException.
      stmt - The JDBC Statement object used to execute the SQL commands. Cannot be null.
      Throws:
      SQLException - If any database access error occurs during table/index creation, data insertion, or cleanup. Also thrown if input validation fails (null/empty parameters, invalid index column name, table name conflicts).
    • fromEdges

      public static void fromEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Creates a vertices table from an existing edges table by extracting unique source and destination IDs.

      This function identifies all unique identifiers present in the srcid and destid columns of the input edges table. It then creates a new table (the result vertices table) containing these unique IDs. Optionally, additional columns can be generated in the result table based on SQL expressions provided in resultColumnExpressions. Indexes can also be created on specified columns in the result table.

      If any step of the operation fails (e.g., SQL error during table creation, index creation, or data insertion), the function will attempt to drop the partially created result vertices table to ensure atomicity.

      ## How `resultColumnExpressions` Works

      The resultColumnExpressions parameter allows you to define additional columns in the output vertices table beyond the primary id column. Each string in this list should be a valid SQL expression that computes a value for a new column.

      ### Available Columns for Expressions

      When defining your expressions, you can refer to the unique vertex ID generated by the function. Internally, the function creates a subquery like: (SELECT srcid AS id FROM input_edges UNION DISTINCT SELECT destid AS id FROM input_edges) ids

      This means the *only* column directly available from this internal source for your expressions is the unique vertex ID itself.

      ### Referencing the Vertex ID

      You *must* refer to the unique vertex ID using the alias ids.id. For example, an expression might be "ids.id % 100 AS bucket_id".

      ### Column Aliases (`AS alias`)

      Each expression provided in the resultColumnExpressions list **must** end with AS alias_name. This alias defines the name of the new column that will be created in the result vertices table. The database uses this alias to name the column generated by your expression. For example:

      • "LOWER(CAST(ids.id AS VARCHAR(255))) AS id_lowercase"
      • "CASE WHEN ids.id > 1000 THEN 'high' ELSE 'low' END AS category"
      • "ids.id AS original_id" (if you simply want a copy with a different name)
      The resulting table will always have the primary id column (derived from srcid and destid), plus one column for each expression provided in resultColumnExpressions, named according to the alias you specify.
      Parameters:
      inputSchema - The schema name of the input edges table. Must not be null or empty. The edges table is expected to have columns named srcid and destid.
      inputEdgesTable - The name of the input edges table. Must not be null or empty. Expected to contain srcid and destid columns representing edge endpoints.
      resultSchema - The schema name for the new vertices table to be created. Must not be null or empty.
      resultVerticesTable - The name for the new vertices table to be created. Must not be null or empty. This table will contain unique vertex IDs and any additional computed columns.
      resultColumnExpressions - An ArrayList of strings, where each string is a SQL expression used to compute an additional column for the result vertices table. Each expression *must* use the alias ids.id to refer to the unique vertex ID and *must* end with AS column_alias to name the resulting column. If empty or null, no additional columns besides 'id' are created. Individual expressions cannot be null or empty.
      resultVerticesIndexes - An ArrayList of strings specifying the names of columns in the *result* vertices table on which to create indexes. This can include the primary 'id' column and any columns created via resultColumnExpressions (using their specified aliases). If empty or null, no indexes are created. Individual column names cannot be null or empty.
      stmt - The Statement object used to execute SQL commands against the database. Must not be null.
      Throws:
      SQLException - If a database access error occurs, if any input SQL (table names, expressions, index columns) is invalid, or if the underlying JDBC driver encounters an issue. If an exception occurs during the creation process, an attempt will be made to drop the result table.
    • innerJoinVertices

      public static void innerJoinVertices(String inputSchema, String inputVerticesTable, String otherSchema, String otherVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Performs an SQL INNER JOIN operation between two vertex tables based on their id columns, creating a new result table containing specified columns from both input tables.

      The join condition is always a.id = b.id, where a is an alias for the first input table (inputSchema.inputVerticesTable) and b is an alias for the second input table (otherSchema.otherVerticesTable).

      The function first creates the structure of the result table using CREATE TABLE AS SELECT ... WHERE 1=0, then creates specified indexes, and finally populates the table using INSERT INTO ... SELECT .... This approach ensures the table structure is defined correctly before data insertion.

      Important: The id column from the first input table (aliased as a) is automatically included as the first column in the result table and will be named id. You do not need to specify a.id in the resultColumnExpressions.

      Atomicity: If any step of the operation (table creation, index creation, data insertion) fails, the function attempts to drop the partially created result table to leave the database in a cleaner state. However, this is a best-effort cleanup and does not guarantee full transactional atomicity across different database systems or complex failure scenarios. ## How to use resultColumnExpressions

      The resultColumnExpressions list defines the additional columns to be included in the result table, beyond the mandatory id column from the first table. Each string in this list should be a valid SQL expression that can appear in a SELECT clause.

      ### Available Columns for Expressions

      Within each expression string, you can refer to columns from the two input tables using specific aliases:

      • Columns from the first input table (inputSchema.inputVerticesTable) must be referenced using the alias a. Example: a.property_name.
      • Columns from the second input table (otherSchema.otherVerticesTable) must be referenced using the alias b. Example: b.other_property.
      ### Using Aliases with AS

      It is strongly recommended that each expression explicitly defines the resulting column name using the AS alias syntax.

      • Syntax: expression AS desired_column_name
      • Example: a.name AS vertex_name, b.status AS vertex_status, a.value + b.offset AS calculated_value
      If you omit AS alias, the database system will automatically assign a name to the column. Explicitly using AS ensures predictable column names in the resultVerticesTable, which is crucial for subsequent queries and for defining indexes using the resultVerticesIndexes parameter.

      ### Examples of resultColumnExpressions:

      
       ArrayList<String> columns = new ArrayList<>();
       columns.add("a.name AS name"); // Select 'name' from the first table, call it 'name' in the result
       columns.add("b.type AS type");   // Select 'type' from the second table, call it 'type'
       columns.add("a.value * b.factor AS weighted_value"); // Calculate a value, call it 'weighted_value'
       columns.add("b.creation_date"); // Select 'creation_date' from second table. Strongly recommend adding 'AS creation_date'
       
      Parameters:
      inputSchema - Schema of the first input vertices table. Cannot be null or empty.
      inputVerticesTable - Name of the first input vertices table (aliased as 'a' internally). Must contain an 'id' column. Cannot be null or empty.
      otherSchema - Schema of the second input vertices table. Cannot be null or empty.
      otherVerticesTable - Name of the second input vertices table (aliased as 'b' internally). Must contain an 'id' column used for joining. Cannot be null or empty.
      resultSchema - Schema for the output table. Cannot be null or empty.
      resultVerticesTable - Name of the table to be created with the join results. Cannot be null or empty. This table will be dropped if it exists before the operation starts.
      resultColumnExpressions - An ArrayList of SQL expressions defining the columns (beyond the implicit 'a.id') to include in the result table. See detailed explanation above regarding aliases 'a' and 'b', and the recommended use of 'AS alias'. Cannot be null; elements cannot be null or empty.
      resultVerticesIndexes - An ArrayList of column names in the result table on which to create indexes. These names should match the aliases defined in resultColumnExpressions or the implicit 'id' column. Cannot be null; elements cannot be null or empty.
      stmt - The JDBC Statement object to use for executing SQL commands. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, indexing, data insertion, or cleanup. This could include syntax errors in resultColumnExpressions, missing tables/columns, constraint violations, or connection issues.
    • outerJoinVertices

      public static void outerJoinVertices(String inputSchema, String inputVerticesTable, String otherSchema, String otherVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Performs a LEFT OUTER JOIN between two vertex tables based on their 'id' columns, storing the result in a new table.

      The operation joins the inputVerticesTable (left side, alias 'a') with the otherVerticesTable (right side, alias 'b') using the condition a.id = b.id. All rows from the left table ('a') are included in the result. If a row in 'a' has no matching 'id' in 'b', the columns selected from 'b' will be NULL in the result row.

      The result table is created first with the correct structure, then indexes are added, and finally, the data is inserted. If any step of this process fails, an attempt is made to drop the partially created result table to ensure atomicity.

      Result Columns and resultColumnExpressions

      The result table will always contain the 'id' column from the left table ('a'). The resultColumnExpressions list defines any additional columns to be included in the result table. These expressions are used directly in the SELECT list of the underlying SQL query.

      Available Columns for Expressions:

      Within each expression string in resultColumnExpressions, you can refer to columns from both input tables using specific aliases:

      • Alias a: Refers to columns from the left table, specified by inputSchema and inputVerticesTable. Example: a.name, a.value.
      • Alias b: Refers to columns from the right table, specified by otherSchema and otherVerticesTable. Example: b.property, b.score.

      Remember that any column selected via alias b might be NULL in the output if there was no matching row in the right table for a given row from the left table.

      Using Aliases (AS alias):

      Each string in resultColumnExpressions effectively becomes an item in the SQL SELECT list. Therefore, each expression must define the name for the resulting column in the output table.

      • For simple column selections, the database implicitly uses the column name. For example, "a.name" results in a column named name.
      • For clarity, consistency, and especially for any expression more complex than a simple column reference (e.g., using functions like COALESCE, arithmetic operations, or selecting columns with the same name from both tables), you must use the AS keyword to explicitly assign a unique alias (name) to the resulting column.

      Examples of resultColumnExpressions elements:

      • "a.name AS vertex_name" - Selects 'name' from the left table, names the result column 'vertex_name'.
      • "b.score AS property_score" - Selects 'score' from the right table, names the result column 'property_score'. (Will be NULL if no match).
      • "COALESCE(a.value, b.default_value, 0) AS effective_value" - A complex expression using values from both tables, explicitly named 'effective_value'.
      • "a.category" - Selects 'category' from the left table. Might result in a column named 'category' (safer to use AS).
      • "b.status" - Selects 'status' from the right table. Might result in a column named 'status' (safer to use AS).

      Recommendation: It is strongly recommended to use AS alias for all expressions in resultColumnExpressions to ensure predictable column names in the result table and avoid potential ambiguity or database-specific behavior.

      Parameters:
      inputSchema - The schema of the primary (left) input vertices table. Cannot be null or empty.
      inputVerticesTable - The name of the primary (left) input vertices table (aliased as 'a'). Must contain an 'id' column. Cannot be null or empty.
      otherSchema - The schema of the secondary (right) input vertices table to join with. Cannot be null or empty.
      otherVerticesTable - The name of the secondary (right) input vertices table (aliased as 'b'). Must contain an 'id' column. Cannot be null or empty.
      resultSchema - The schema where the resulting joined table will be created. Cannot be null or empty.
      resultVerticesTable - The name for the new table that will store the results of the join. Cannot be null or empty, and must not conflict with input tables.
      resultColumnExpressions - An ArrayList of strings, where each string is a SQL expression defining a column to include in the result table (beyond the mandatory 'a.id'). See documentation section "Result Columns and resultColumnExpressions" for details on available columns (using aliases 'a' and 'b') and the mandatory use of 'AS alias' for complex expressions. Cannot be null; elements cannot be null or empty.
      resultVerticesIndexes - An ArrayList of strings specifying the names of columns in the result table on which to create indexes. These column names must match the 'id' column or the aliases defined in resultColumnExpressions. Cannot be null; elements cannot be null or empty.
      stmt - The JDBC Statement object used to execute the SQL commands. Cannot be null.
      Throws:
      SQLException - If any database operation (table creation, index creation, data insertion, table drop on failure) fails.
    • mask

      public static void mask(String inputSchema, String inputVerticesTable, String inputEdgesTable, String otherSchema, String otherVerticesTable, String otherEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Creates a subgraph by retaining vertices and edges from an input graph that are also present in an 'other' graph, mimicking the behavior of GraphX's mask operation.

      The operation proceeds as follows:

      1. Result tables for vertices and edges are created with the same columns as the corresponding input tables. Specified indexes are also created on these result tables.
      2. Vertices from the input graph are included in the result vertex table if a vertex with the same ID exists in the 'other' vertex table. Vertex attributes are copied directly from the input vertex table.
      3. Edges from the input graph are included in the result edge table if:
        • An edge with the same source ID (srcid) and destination ID (destid) exists in the 'other' edge table.
        • Both the source vertex and the destination vertex of the edge (referenced by srcid and destid) are present in the newly created result vertex table. This ensures the consistency of the resulting subgraph.
        Edge attributes are copied directly from the input edge table.

      Input validation checks are performed for all parameters, and potential naming conflicts between source and result tables are checked before execution begins.

      Error Handling: If any step of the operation fails after the result tables have been initially created (e.g., due to an SQL error during data population), the function attempts to safely drop the potentially partially populated result vertex and edge tables to avoid leaving incomplete artifacts in the database.

      Parameters:
      inputSchema - The schema containing the input graph tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the input vertices. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the input edges. Cannot be null or empty.
      otherSchema - The schema containing the 'other' graph tables used for masking. Cannot be null or empty.
      otherVerticesTable - The name of the table containing the 'other' graph's vertices. Cannot be null or empty.
      otherEdgesTable - The name of the table containing the 'other' graph's edges. Cannot be null or empty.
      resultSchema - The schema where the resulting subgraph tables will be created. Cannot be null or empty.
      resultVerticesTable - The name for the table that will store the resulting vertices. Cannot be null or empty.
      resultEdgesTable - The name for the table that will store the resulting edges. Cannot be null or empty.
      resultVerticesIndexes - A list of column names on which to create indexes for the result vertices table. Cannot be null. Can be empty if no indexes are desired.
      resultEdgesIndexes - A list of column names on which to create indexes for the result edges table. Cannot be null. Can be empty if no indexes are desired.
      stmt - The JDBC Statement object used to execute SQL commands against the database. Cannot be null.
      Throws:
      SQLException - If any database access error occurs during table creation, data population, index creation, or cleanup operations.
    • groupEdges

      public static void groupEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException
      Groups duplicate edges from an input table into a new result table using SQL aggregation.

      This method creates a new table (`resultEdgesTable`) containing unique pairs of `srcid` and `destid` from the `inputEdgesTable`. For each unique pair, it calculates additional columns based on the aggregate expressions provided in `resultColumnExpressions`. These expressions use SQL aggregate functions (e.g., COUNT, SUM, AVG, MIN, MAX) to combine data from the original rows that share the same `srcid` and `destid`.

      The process involves:

      1. Performing sanity checks on input parameters.
      2. Checking for potential naming conflicts between input and result tables.
      3. Creating the result table structure based on `srcid`, `destid`, and the specified result column aggregate expressions.
      4. Creating specified indexes on the result table.
      5. Populating the result table by selecting and grouping data from the input table using the specified aggregate expressions.

      If any step of the table creation or population fails, the method attempts to drop the potentially partially created `resultEdgesTable` to ensure atomicity and avoid leaving incomplete artifacts.

      Using resultColumnExpressions

      The resultColumnExpressions list allows you to define additional columns for the resultEdgesTable. These columns are computed by aggregating data from the rows in the inputEdgesTable that share the same srcid and destid.

      Expression Content and Available Data

      Each string in the resultColumnExpressions list becomes part of the SELECT clause in the underlying SQL query. Because the query uses GROUP BY srcid, destid, standard SQL rules apply: any expression in the select list must either be an aggregate function or one of the grouping columns (srcid or destid).

      Therefore, each expression you provide **must use an SQL aggregate function** (e.g., COUNT, SUM, AVG, MIN, MAX). These aggregate functions operate on the values from the original inputEdgesTable rows that are grouped together for a given unique pair of srcid and destid. The aggregate functions can operate on any column present in the inputEdgesTable.

      Column Aliases in Expressions

      You do not need to use specific aliases when referring to the *input* columns within your aggregate function. For example, if the input table has a column named weight, you can directly use SUM(weight).

      Requirement for AS alias

      Each expression string provided in resultColumnExpressions MUST end with AS alias. This alias defines the name of the corresponding computed column in the newly created resultEdgesTable. The database requires this alias to name the result of the aggregate function.

      For example, if your inputEdgesTable has columns srcid, destid, weight, and timestamp, valid expressions could be:

      • "COUNT(*) AS edge_count" (to count the number of original edges grouped)
      • "SUM(weight) AS total_weight" (to sum the weights of the grouped edges)
      • "AVG(weight) AS average_weight"
      • "MAX(timestamp) AS latest_timestamp"
      An expression like "weight AS original_weight" would be **invalid** in standard SQL in this context, because weight is not used within an aggregate function and it's not one of the GROUP BY columns (srcid, destid).
      Parameters:
      inputSchema - The schema name of the input edges table. Cannot be null or empty.
      inputEdgesTable - The name of the input table containing edges. This table MUST have columns named srcid and destid. It should also contain any other columns referenced within the aggregate functions in resultColumnExpressions. Cannot be null or empty.
      resultSchema - The schema name for the result table. Cannot be null or empty.
      resultEdgesTable - The name of the table to be created, which will store the grouped edges. Cannot be null or empty. This table will contain `srcid`, `destid`, and columns defined by the aliases in `resultColumnExpressions`.
      resultColumnExpressions - An ArrayList of strings, where each string is a SQL aggregate expression (e.g., COUNT(*), SUM(column_name)) operating on columns from the inputEdgesTable. Each expression MUST include an AS alias clause to name the resulting column in the resultEdgesTable. Cannot be null, but can be empty. If empty, the result table will only have `srcid` and `destid`.
      resultEdgesIndexes - An ArrayList of strings, where each string is a column name in the *result* table (i.e., `srcid`, `destid`, or one of the aliases defined in `resultColumnExpressions`) on which a database index should be created. Cannot be null, but can be empty.
      stmt - The JDBC Statement object used to execute SQL queries. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, index creation, data insertion, or cleanup.
    • aggregateMessages

      public static void aggregateMessages(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String sendToSourceExpr, String sendToDestExpr, String aggregateExpr, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Aggregates messages passed between connected vertices in a graph and stores the results in a new table.

      This method simulates a message-passing process on a graph defined by input vertices and edges. For each edge connecting a source vertex 'a' and a destination vertex 'c' via edge 'b', messages can be generated based on sendToSourceExpr (message sent to 'a') and/or sendToDestExpr (message sent to 'c'). These generated messages (internally aliased as 'msg') are then grouped by the target vertex ID ('id') and aggregated using the aggregateExpr. The final result, containing the vertex ID and the aggregated message, is stored in the specified result table.

      The process involves:

      1. Generating messages based on the send expressions for each edge.
      2. Grouping these messages by the target vertex ID.
      3. Applying the aggregation expression to the grouped messages.
      4. Creating the result table and inserting the aggregated results.
      5. Creating specified indexes on the result table.

      If the operation fails at any point after starting table creation, it attempts to drop the potentially partially created result table to ensure cleanup.

      Important Note on Expressions and Aliases: Within the sendToSourceExpr and sendToDestExpr, the context is always a single edge traversal. You can refer to columns from the source vertex using the alias a, columns from the edge itself using the alias b, and columns from the destination vertex using the alias c. For example, to access a property named 'value' from the source vertex, you would use a.value. To access the edge weight (assuming an edge column named 'weight'), you would use b.weight.

      The aggregateExpr operates on the messages generated by the send expressions. These generated messages are internally collected under the column name msg. Therefore, your aggregateExpr typically needs to reference msg, e.g., SUM(msg), MAX(msg).

      Parameters:
      inputSchema - The schema containing the input graph tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the graph vertices. Must have at least an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the graph edges. Must have 'srcid' and 'destid' columns referencing vertex IDs. Cannot be null or empty.
      resultSchema - The schema where the result table will be created. Cannot be null or empty.
      resultVerticesTable - The name of the table to be created, containing the aggregated results. This table will have an 'id' column (the vertex ID) and a column for the aggregated message (named based on the alias in aggregateExpr, or 'aggregated_message' by default). Cannot be null or empty. Must not conflict with input table names.
      sendToSourceExpr - A SQL expression defining the message to be sent *to the source* vertex ('a') of an edge ('b'). This expression is evaluated in the context of the join: inputVerticesTable a JOIN inputEdgesTable b ON a.id = b.srcid JOIN inputVerticesTable c ON b.destid = c.id. You can use aliases a, b, and c to refer to columns from the source vertex, edge, and destination vertex, respectively (e.g., c.property + b.weight). If this expression evaluates to NULL for an edge, no message is sent to the source for that edge. Can be null or empty, but at least one of sendToSourceExpr or sendToDestExpr must be non-empty. The result of this expression becomes the 'msg' value associated with the source vertex's 'id' ( b.srcid).
      sendToDestExpr - A SQL expression defining the message to be sent *to the destination* vertex ('c') of an edge ('b'). Evaluated in the same join context as sendToSourceExpr (a JOIN b JOIN c). You can use aliases a, b, and c. (e.g., a.property * b.weight). If this expression evaluates to NULL for an edge, no message is sent to the destination for that edge. Can be null or empty, but at least one of sendToSourceExpr or sendToDestExpr must be non-empty. The result of this expression becomes the 'msg' value associated with the destination vertex's 'id' (b.destid).
      aggregateExpr - A SQL aggregate expression defining how messages arriving at the *same* vertex ID are combined. This expression *must* operate on the generated message values, which are internally available under the column name msg. Example: SUM(msg), MIN(msg). You can optionally provide an alias for the result column using standard SQL 'AS' syntax (e.g., "SUM(msg) AS total_value"). If no alias is provided, the result column will be named "aggregated_message". This alias is used as the column name in the resultVerticesTable. Cannot be null or empty. Must be a valid SQL aggregate function call.
      resultVerticesIndexes - An ArrayList of column names in the resultVerticesTable on which to create indexes. Allowed values are "id" and the alias used for the aggregateExpr result (either explicitly provided via 'AS' or the default "aggregated_message"). Cannot be null. List elements cannot be null/empty.
      stmt - The JDBC Statement object to use for executing the SQL commands. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, insertion, or index creation. Also thrown if required arguments are null/empty, if both send expressions are empty, if specified index columns are invalid, if the result table name conflicts with input tables, or if the provided SQL expressions are invalid.
    • collectNeighbors

      public static void collectNeighbors(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, OCGraph.EdgeDirection direction, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Collects neighbor information for each vertex in a graph stored in database tables and stores the results in a new table.

      This method queries a graph represented by vertex and edge tables. Based on the specified OCGraph.EdgeDirection, it identifies neighbors for each vertex. The neighbor information (ID and all other columns from the input vertices table) is packaged into a tuple. These tuples are then aggregated into an array for each source vertex.

      A new result table is created with two primary columns:

      • 'id': The ID of the source vertex.
      • A column (typically named 'neighbors'): An array containing tuples representing the neighbors. Each tuple structure mirrors the input vertices table row (id + other columns).
      Indexes can optionally be created on the 'id' column and the neighbor array column in the result table.

      If any step of the operation fails *after* the result table has been initially created (e.g., during index creation or data insertion), the method attempts to drop the potentially incomplete or incorrectly populated result table to ensure atomicity.

      Parameters:
      inputSchema - The schema containing the input vertices and edges tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the input vertices. Must contain at least an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the input edges. Must contain 'srcid' and 'destid' columns referencing vertex IDs. Cannot be null or empty.
      resultSchema - The schema where the result table will be created. Cannot be null or empty.
      resultVerticesTable - The name of the table to create for storing the results. This table should not conflict with input tables. Cannot be null or empty.
      direction - The direction of edges to follow (IN, OUT, or BOTH) when collecting neighbors. See OCGraph.EdgeDirection. Cannot be null.
      resultVerticesIndexes - A list containing the names of columns in the *result* table on which to create indexes. Valid names are 'id' and the name of the aggregated neighbor array column (determined dynamically, typically 'neighbors'). Cannot be null; list elements must be valid column names.
      stmt - The JDBC Statement object used to execute SQL queries. Cannot be null.
      Throws:
      SQLException - If a database access error occurs during query execution, table creation, index creation, or data insertion. Also thrown if input arguments are invalid (e.g., null/empty strings, invalid index column names specified in resultVerticesIndexes, table name conflicts, non-existent input tables/columns) or if an internal error occurs (e.g., unexpected EdgeDirection).
      See Also:
    • collectEdges

      public static void collectEdges(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, OCGraph.EdgeDirection direction, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Collects edge information associated with each vertex in a graph stored in database tables and stores the results in a new table.

      This method queries a graph represented by vertex and edge tables. For each vertex, it identifies connected edges based on the specified OCGraph.EdgeDirection. All columns from the identified edges in the input edges table are packaged into a tuple. These edge tuples are then aggregated into an array for each vertex.

      A new result table is created with two primary columns:

      • 'id': The ID of the vertex (the source or destination, depending on the edge direction).
      • 'edges': An array containing tuples, where each tuple represents a connected edge and includes all columns from the original edge row in the input edges table.
      Indexes can optionally be created on the 'id' column and the 'edges' array column in the result table.

      If any step of the operation fails *after* the result table has been initially created (e.g., during index creation or data insertion), the method attempts to drop the potentially incomplete or incorrectly populated result table to ensure atomicity.

      Parameters:
      inputSchema - The schema containing the input vertices and edges tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the input vertices. Must contain at least an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the input edges. Must contain 'srcid' and 'destid' columns referencing vertex IDs, and potentially other edge attribute columns. Cannot be null or empty, and must contain at least one column.
      resultSchema - The schema where the result table will be created. Cannot be null or empty.
      resultVerticesTable - The name of the table to create for storing the results. This table should not conflict with input tables. Cannot be null or empty. The resulting table will contain an 'id' column and an 'edges' column.
      direction - The direction of edges to collect relative to each vertex (IN, OUT, or BOTH). See OCGraph.EdgeDirection. Cannot be null.
      resultVerticesIndexes - A list containing the names of columns in the *result* table on which to create indexes. Valid names are 'id' and 'edges'. Cannot be null; list elements must be valid column names.
      stmt - The JDBC Statement object used to execute SQL queries. Cannot be null.
      Throws:
      SQLException - If a database access error occurs during query execution, table creation, index creation, or data insertion. Also thrown if input arguments are invalid (e.g., null/empty strings, invalid index column names specified in resultVerticesIndexes, table name conflicts, non-existent input tables/columns, input edge table having no columns) or if an internal error occurs (e.g., unexpected EdgeDirection).
      See Also:
    • labelPropagation

      public static void labelPropagation(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Performs the Label Propagation Algorithm (LPA) iteratively on graph data stored in database tables, assigning community labels to vertices.

      The algorithm initializes each vertex's label to its own ID. Then, for a specified number of iterations, each vertex updates its label to the most frequent label among its neighbors. Ties are broken by choosing the numerically smaller label ID. The process uses SQL queries executed via the provided Statement.

      Temporary tables are created within the resultSchema to store intermediate results for each iteration. These tables are named uniquely using a UUID suffix (e.g., lpa_temp_xxxxxxxx_iterN). The final results are stored in the specified resultVerticesTable within the resultSchema.

      This method includes robust cleanup logic:

      • Temporary tables are dropped upon successful completion.
      • If an error occurs during execution, all created temporary tables AND the potentially partially created result table are dropped to ensure a clean state.
      Indexes can be optionally created on the final resultVerticesTable for faster lookups on the specified columns ('id' or 'label').

      Note: Isolated vertices (those with no edges) will retain their initial label (their own ID). Input tables are assumed to exist and be accessible. The target result table should not conflict with input table names.

      Parameters:
      inputSchema - The name of the database schema containing the input vertex and edge tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing vertex data. Expected to have at least an id column (BIGINT). Cannot be null or empty.
      inputEdgesTable - The name of the table containing edge data. Expected to have srcid and destid columns representing the edges (BIGINT). Cannot be null or empty.
      resultSchema - The name of the database schema where the result table and temporary iteration tables will be created. Cannot be null or empty.
      resultVerticesTable - The name of the table to be created in the resultSchema storing the final vertex labels. This table will have columns id (BIGINT NOT NULL) and label (BIGINT). Cannot be null or empty.
      maxIterations - The maximum number of iterations to perform. Must be >= 1.
      resultVerticesIndexes - An ArrayList of column names ("id" and/or "label") on which to create indexes in the final resultVerticesTable. If empty or null, no indexes are created besides potential primary keys defined by the database. Cannot be null, but can be empty. Column names within the list cannot be null/empty and must be either "id" or "label" (case-insensitive).
      stmt - A Statement object connected to the target database, used to execute all SQL commands. Must not be null and should be associated with an active connection. The caller is responsible for managing the Statement's lifecycle (creation, closing).
      Throws:
      SQLException - If any database access error occurs during table creation, data manipulation (INSERT, SELECT), index creation, or table dropping. Also thrown if input/result table name conflicts exist, or if an invalid column name is provided in resultVerticesIndexes.
    • pregel

      public static void pregel(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String initializerExpr, String sendToSourceExpr, String sendToDestExpr, String aggregateExpr, String updaterExpr, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Performs a Pregel-like graph computation using SQL.

      This method implements the core logic of a vertex-centric graph processing algorithm, often referred to as Pregel. It iteratively computes vertex states based on messages passed along edges. The computation involves initializing vertex states, sending messages between vertices based on their state and edge connections, aggregating incoming messages for each vertex, and updating the vertex state based on its current state and aggregated messages.

      The process creates and manages temporary tables for intermediate states and messages, ensuring they are cleaned up even if the operation fails. The final result is stored in the specified result table.

      Parameters:
      inputSchema - The schema containing the input vertex and edge tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the initial vertices. Must have at least an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the graph edges. Must have 'srcid' and 'destid' columns corresponding to vertex IDs. Cannot be null or empty.
      resultSchema - The schema where the result table and temporary tables will be created. Cannot be null or empty.
      resultVerticesTable - The name of the table to store the final vertex states. This table will be created or replaced. Cannot be null or empty. It will contain 'id' and 'result' columns.
      initializerExpr - An SQL expression defining the initial state for each vertex.
      • Purpose: Computes the starting value for the 'state' column for every vertex.
      • Context: Used in a SELECT id, <initializerExpr> FROM inputVerticesTable statement.
      • Available Columns: Can reference any column from the inputVerticesTable.
      • Aliases: No specific table aliases are needed within the expression itself.
      • Syntax: Any valid SQL expression that returns a single value per vertex. The result of this expression will implicitly be named 'state' (the function appends AS state if not present).
      • Example: "CASE WHEN type = 'seed' THEN 1.0 ELSE 0.0 END"
      sendToSourceExpr - An SQL expression defining the message to send *from* the source vertex *to* the destination vertex, where the message is associated with the source vertex ID in the intermediate message table. Can be null or empty if no messages are sent in this direction.
      • Purpose: Computes a message based on the states of connected vertices and the edge connecting them. This message is conceptually sent *along* the edge but grouped by the *source* vertex ID (`b.srcid`) in the temporary message table.
      • Context: Used in a SELECT b.srcid AS id, (<sendToSourceExpr>) AS msg FROM currentStateTable a JOIN inputEdgesTable b ON a.id = b.srcid JOIN currentStateTable c ON b.destid = c.id WHERE (<sendToSourceExpr>) IS NOT NULL.
      • Available Columns: Must use aliases:
        • a: Represents the source vertex's current state table row (columns: id, state).
        • b: Represents the edge table row (columns: srcid, destid, potentially others).
        • c: Represents the destination vertex's current state table row (columns: id, state).
      • Aliases: You *must* use the aliases a, b, and c to refer to the tables (e.g., a.state, c.state, b.weight).
      • Syntax: Any valid SQL expression returning a single value (the message). Crucially, if no message should be sent for a given edge/vertex combination in this direction, this expression *must* evaluate to NULL. Returning NULL prevents a message row from being created for that specific edge interaction.
      • Early Termination: If *all* evaluations of both sendToSourceExpr and sendToDestExpr result in NULL during an iteration, no messages are generated, and the computation terminates early. Structure your expression to return NULL when a message is not needed to potentially speed up convergence.
      • Example: "CASE WHEN a.state > c.state THEN a.state / 2 ELSE NULL END" (Sends half the source state if it's greater than the destination state, associated with the source ID).
      sendToDestExpr - An SQL expression defining the message to send *from* the source vertex *to* the destination vertex, where the message is associated with the destination vertex ID in the intermediate message table. Can be null or empty if no messages are sent in this direction.
      • Purpose: Computes a message based on the states of connected vertices and the edge connecting them. This message is conceptually sent *along* the edge and grouped by the *destination* vertex ID (`b.destid`) in the temporary message table.
      • Context: Used in a SELECT b.destid AS id, (<sendToDestExpr>) AS msg FROM currentStateTable a JOIN inputEdgesTable b ON a.id = b.srcid JOIN currentStateTable c ON b.destid = c.id WHERE (<sendToDestExpr>) IS NOT NULL. (Potentially combined with sendToSourceExpr via UNION ALL).
      • Available Columns: Must use aliases (same as sendToSourceExpr):
        • a: Represents the source vertex's current state table row (columns: id, state).
        • b: Represents the edge table row (columns: srcid, destid, potentially others).
        • c: Represents the destination vertex's current state table row (columns: id, state).
      • Aliases: You *must* use the aliases a, b, and c (e.g., a.state, c.state, b.weight).
      • Syntax: Any valid SQL expression returning a single value (the message). Must evaluate to NULL if no message should be sent for a given edge/vertex combination in this direction.
      • Early Termination: Same as sendToSourceExpr. If all message expressions evaluate to NULL, computation stops.
      • Example: "a.state * b.weight" (Sends the source state multiplied by edge weight, associated with the destination ID).
      aggregateExpr - An SQL aggregation expression to combine incoming messages for a vertex. Cannot be null or empty.
      • Purpose: Combines all messages ('msg') received by a single vertex ('id') in an iteration into one value.
      • Context: Used in a SELECT id, <aggregateExpr> FROM messageTable GROUP BY id. The message table contains columns id (the recipient vertex) and msg (the message content from send expressions).
      • Available Columns: Can reference the msg column from the temporary message table generated by the send expressions.
      • Aliases: No specific table aliases are needed within the expression (just use msg). The result of the aggregation will implicitly be named aggregated_message (the function appends AS aggregated_message if not present). The subquery containing this aggregation is aliased as m in the next step.
      • Syntax: Must be a valid SQL aggregation function applied to the msg column (e.g., SUM(msg), ARRAY_AGG(msg), MIN(msg), MAX(msg)).
      • Example: "SUM(msg)"
      updaterExpr - An SQL expression to compute the next state of a vertex based on its current state and the aggregated messages it received. Cannot be null or empty.
      • Purpose: Calculates the new 'state' for a vertex for the next iteration.
      • Context: Used in a SELECT s.id, (<updaterExpr>) AS state FROM currentStateTable s LEFT JOIN aggregatedMessagesSubquery m ON s.id = m.id.
      • Available Columns: Must use aliases:
        • s: Represents the vertex's current state table row (columns: id, state).
        • m: Represents the aggregated messages subquery row for that vertex (columns: id, aggregated_message). Note: m.aggregated_message will be NULL if the vertex received no messages in this iteration due to the LEFT JOIN. Handle this possibility (e.g., using COALESCE).
      • Aliases: You *must* use the aliases s and m (e.g., s.state, m.aggregated_message).
      • Syntax: Any valid SQL expression that computes the new state. The result becomes the 'state' column for the next iteration. Handle potential NULL values for m.aggregated_message (e.g., COALESCE(m.aggregated_message, 0) if combining numerically).
      • Example: "COALESCE(m.aggregated_message, s.state)" (Keep current state if no messages received, otherwise use aggregated message as new state). Or "0.15 + 0.85 * COALESCE(m.aggregated_message, 0)" (PageRank-style update).
      maxIterations - The maximum number of iterations to perform. Must be >= 1. Computation may stop earlier if no messages are sent or if the vertex states do not change between iterations.
      resultVerticesIndexes - A list of column names on which to create indexes in the final resultVerticesTable. Valid column names are "id" and "result". Cannot be null. Indexes can speed up subsequent queries on the result table.
      stmt - The JDBC Statement object used to execute the SQL commands. Cannot be null.
      Throws:
      SQLException - If any database error occurs during table creation, manipulation, or querying, or if input validation fails (e.g., invalid index column names).
      See Also:
    • connectedComponents

      public static void connectedComponents(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Finds the connected components of a graph stored in database tables using a Pregel-based algorithm.

      This method configures and executes a general-purpose Pregel computation (via the pregel method) specifically for the connected components problem. The algorithm works by label propagation:

      • Each vertex initially considers itself to be in its own component (component ID = vertex ID).
      • In each iteration, vertices send their current component ID to their neighbors.
      • Vertices update their component ID to the minimum ID they have seen (either their own previous ID or one received from a neighbor).
      This process continues until no vertex updates its component ID, or the specified maximum number of iterations is reached. The component ID for each vertex ultimately becomes the minimum vertex ID within its connected component.

      The input graph is defined by vertex and edge tables within a specified schema. The results are stored in a new table within a specified schema. This result table will contain two columns: 'id' (the vertex ID) and 'result' (the calculated component ID for that vertex). Optional indexes can be created on the result table for performance.

      Parameters:
      inputSchema - The schema name for the input graph tables.
      inputVerticesTable - The table name containing the vertices of the input graph. Expected to have an 'id' column.
      inputEdgesTable - The table name containing the edges of the input graph. Expected to have 'srcid' and 'destid' columns referencing vertex IDs.
      resultSchema - The schema name where the result table will be created.
      resultVerticesTable - The table name for storing the connected components results. It will have 'id' and 'result' columns.
      maxIterations - The maximum number of Pregel iterations to perform. The algorithm might converge in fewer iterations. Must be >= 1.
      resultVerticesIndexes - A list of column names (e.g., "id", "result") on which to create indexes in the result table for faster subsequent lookups. Can be empty if no indexes are needed.
      stmt - The JDBC Statement object used to execute SQL commands against the database. Cannot be null.
      Throws:
      SQLException - If a database access error occurs during the Pregel execution or while creating/populating the result table.
      See Also:
    • stronglyConnectedComponents

      public static void stronglyConnectedComponents(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException
      Computes the Strongly Connected Components (SCCs) of a directed graph represented by vertex and edge tables in a database.

      This method implements a recursive algorithm to partition the graph and identify sets of vertices where every vertex is reachable from every other vertex within the same set. It operates directly on database tables using the provided JDBC Statement.

      Temporary tables are created within the specified resultSchema during execution to store intermediate results. These tables are automatically dropped upon completion or if an error occurs. Ensure the provided database user has permissions to create and drop tables in the resultSchema.

      The final result is populated into the resultVerticesTable within the resultSchema. This table maps each vertex ID to a representative ID for its component.

      The result table (resultVerticesTable) created by this function will have the following structure:

      • id BIGINT NOT NULL: The identifier of a vertex from the input graph[1].
      • component BIGINT NOT NULL: The identifier of the strongly connected component to which the vertex belongs. This identifier is the minimum vertex ID found within that specific component[1].
      Parameters:
      inputSchema - The name of the database schema containing the input graph tables. Must not be null or empty[1].
      inputVerticesTable - The name of the table containing the vertices of the graph, located in the inputSchema. This table is expected to have at least an 'id' column (assumed BIGINT based on usage) uniquely identifying each vertex[1]. Must not be null or empty[1].
      inputEdgesTable - The name of the table containing the directed edges of the graph, located in the inputSchema. This table is expected to have 'srcid' and 'destid' columns (assumed BIGINT based on usage) representing the source and destination vertex IDs for each edge[1]. Must not be null or empty[1].
      resultSchema - The name of the database schema where the final result table and all temporary tables will be created. Must not be null or empty[1].
      resultVerticesTable - The desired name for the output table that will store the mapping of each vertex ID to its component ID. This table will be created in the resultSchema[1]. It must not conflict with input table names[1]. Must not be null or empty[1].
      resultVerticesIndexes - A list specifying which columns ('id' and/or 'component') in the resultVerticesTable should have database indexes created on them for potentially faster lookups. Only "id" and "component" (case-insensitive) are valid entries[1]. Must not be null[1].
      stmt - The active JDBC Statement object used to execute all SQL commands against the database. Must not be null[1].
      Throws:
      SQLException - If any database access error occurs, including issues with table creation/deletion, query execution, or if invalid parameters (e.g., invalid index column names, null/empty schema/table names) are provided[1].
    • shortestPaths

      public static void shortestPaths(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTable, List<Long> landmarks, String edgeWeightColumn, int maxIterations, ArrayList<String> resultIndexes, Statement stmt) throws SQLException
      Computes the shortest paths from all vertices in the graph to a specified set of landmark vertices.

      This method implements a shortest path algorithm based on iterative relaxation (similar to Bellman-Ford, but adapted for multiple destinations/landmarks simultaneously) to find the shortest distance from every vertex in the graph to each of the provided landmarks. It assumes that all edge weights are non-negative. If no edge weight column is specified, edges are assumed to have a uniform weight of 1.0.

      The computation proceeds iteratively, refining distance estimates in each step. It terminates either when no shorter paths are found in an iteration or when the specified maximum number of iterations (`maxIterations`) is reached.

      The results are stored in a specified database table. Only paths with finite distances (i.e., reachable landmarks from a source vertex) are included in the final result table. Temporary tables are created during the computation and are cleaned up afterwards. ## Result Table Columns The table created at `resultSchema.resultTable` will have the following columns:

      • srcid (BIGINT): The ID of the source vertex.
      • destid (BIGINT): The ID of the landmark vertex (the destination of the path).
      • distance (DOUBLE PRECISION): The computed shortest distance from srcid to destid.
      Parameters:
      inputSchema - The schema containing the input vertices and edges tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing the vertex IDs. Expected column: `id` (BIGINT). Cannot be null or empty.
      inputEdgesTable - The name of the table containing the graph edges. Expected columns: `srcid` (BIGINT), `destid` (BIGINT), and optionally the weight column specified by `edgeWeightColumn`. Cannot be null or empty.
      resultSchema - The schema where the result table will be created. Cannot be null or empty.
      resultTable - The name of the table to store the shortest path results. Cannot be null or empty. This table must not conflict with input table names.
      landmarks - A non-null, non-empty list of vertex IDs (BIGINT) to be used as landmarks. Shortest paths will be computed *to* these vertices. Individual landmark IDs within the list cannot be null.
      edgeWeightColumn - The name of the column in the `inputEdgesTable` containing non-negative edge weights (numeric type). If null or empty, a default weight of 1.0 is assumed for all edges.
      maxIterations - The maximum number of iterations for the relaxation algorithm. Must be a positive integer (>= 1).
      resultIndexes - A non-null list of column names ('srcid', 'destid', 'distance') on which to create indexes in the result table for potentially faster querying. Can be empty if no indexes are desired. Column names must be valid ('srcid', 'destid', or 'distance').
      stmt - The JDBC Statement object used to execute SQL queries. Cannot be null.
      Throws:
      SQLException - if any input parameters are invalid (null, empty, non-positive, etc.), if the landmarks list is empty or contains nulls, if invalid index columns are specified, if the result table name conflicts with input table names, or if a database access error occurs during query execution, table manipulation, or cleanup.
    • staticPageRank

      public static void staticPageRank(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int numIterations, double resetProb, ArrayList<String> resultVerticesIndexes, Long personalizationSrcId, Statement stmt) throws SQLException
      Computes static PageRank scores for vertices in a graph, optionally personalized towards a specific source vertex, and stores the results in a new vertices table.

      ## Algorithm Overview: 1. **Validation:** Checks that all input parameters (schemas, tables, statement, numerical constraints) are valid. Throws SQLException on validation failure. 2. **Initialization:** - Creates uniquely named temporary tables for storing ranks during iterations. - Calculates the total number of vertices (N). - If the graph is empty (N=0), creates an empty result table with the correct schema and returns. - Initializes the rank for each vertex: - **Standard PageRank:** Each vertex starts with a rank of `1.0 / N`. - **Personalized PageRank:** The `personalizationSrcId` vertex starts with a rank of `1.0`, and all other vertices start with `0.0`. 3. **Iteration:** Performs `numIterations` iterations of the PageRank update rule: - **OPTIMIZATION:** Instead of copying tables each iteration, this implementation swaps the roles of two temporary tables (`current` and `previous`) by swapping the variables that hold their names. This is a near-instantaneous metadata change. - The core PageRank formula implemented is: `NewRank(v) = ( (1 - resetProb) / N ) + resetProb * Sum( Rank(u) / OutDegree(u) )` for all vertices `u` that link to `v`. - **Dangling Node Handling:** Rank from dangling nodes (vertices with no outgoing edges) is collected and redistributed evenly among all nodes: `DanglingContribution = resetProb * (TotalDanglingRank / N)` - The SQL implementation combines these steps using Common Table Expressions (CTEs). 4. **Final Result:** - After the final iteration, the computed ranks (from the final 'current ranks' table) are joined with the original input vertices table (`inputVerticesTable`). - The result, containing all original vertex columns plus the new `pagerank` column, is stored in the specified `resultVerticesTable`. 5. **Indexing:** Creates indexes on the `resultVerticesTable` as specified by `resultVerticesIndexes`. 6. **Cleanup:** Ensures that all temporary intermediate tables created during the process are dropped, both on successful completion and in case of exceptions during execution.

      ## Result Table Columns: The created `resultVerticesTable` in `resultSchema` will contain:

      • All columns present in the original `inputVerticesTable`.
      • A new column named `pagerank` of type `DOUBLE PRECISION`, containing the calculated static PageRank score for each vertex.
      Parameters:
      inputSchema - The database schema containing the input graph tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing vertex information. Must contain at least an 'id' column representing the unique vertex identifier (assumed to be compatible with edge table IDs). Cannot be null or empty.
      inputEdgesTable - The name of the table containing the directed edges of the graph. Must contain at least 'srcid' (source vertex ID) and 'destid' (destination vertex ID) columns. Cannot be null or empty.
      resultSchema - The database schema where the result table will be created. Cannot be null or empty.
      resultVerticesTable - The name for the new table that will store the results. This table will contain all columns from `inputVerticesTable` plus a 'pagerank' column. Cannot be null or empty, and must not conflict with input tables.
      numIterations - The number of PageRank iterations to perform. Must be a positive integer.
      resetProb - The reset probability, also known as the damping factor (often denoted as 'alpha' or 'd'). Represents the probability at each step of continuing to follow graph links (as opposed to teleporting to a random vertex). Must be between 0.0 and 1.0 (inclusive). A typical value is 0.85.
      resultVerticesIndexes - An optional list of column names from the `resultVerticesTable` on which to create database indexes after the table is populated. Pass an empty list if no indexes are needed. Cannot be null.
      personalizationSrcId - If not null, computes Personalized PageRank biased towards this vertex ID. If null, computes standard (non-personalized) PageRank where the teleportation is uniform.
      stmt - The JDBC Statement object used to execute SQL commands against the database. Cannot be null.
      Throws:
      SQLException - If any input parameters are invalid, if a database error occurs during execution (e.g., table creation/access failure, SQL syntax error), or if an internal error occurs (like an intermediate table unexpectedly disappearing).
    • dynamicPageRank

      public static void dynamicPageRank(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, double tolerance, double resetProb, ArrayList<String> resultVerticesIndexes, Long personalizationSrcId, Statement stmt) throws SQLException
      Computes PageRank scores for vertices in a graph, optionally personalized, by iterating until the scores converge within a specified tolerance. Results are stored in a new vertices table. Intermediate tables are cleaned up.

      ## Algorithm Overview: 1. **Validation:** Checks that all input parameters (schemas, tables, statement, numerical constraints) are valid. Throws SQLException on validation failure. 2. **Initialization:** - Creates uniquely named temporary tables for storing ranks during iterations. - Calculates the total number of vertices (N). - If the graph is empty (N=0), creates an empty result table with the correct schema and returns. - Initializes the rank for each vertex: - **Standard PageRank:** Each vertex starts with a rank of `1.0 / N`. - **Personalized PageRank:** The `personalizationSrcId` vertex starts with a rank of `1.0`, and all other vertices start with `0.0`. 3. **Iteration until Convergence:** Performs iterations of the PageRank update rule. - **OPTIMIZATION:** Instead of copying tables each iteration, this implementation swaps the roles of two temporary tables (`current` and `previous`) by swapping the variables that hold their names. This is a near-instantaneous metadata change. - **Convergence Check:** After each iteration, it calculates the sum of the absolute differences (`delta`) between the rank of each vertex in the current iteration and the previous iteration. The loop continues as long as `delta` is greater than the specified `tolerance`. - **Update Rule:** New ranks are computed based on the ranks from the previous iteration. 4. **Final Result:** - Once convergence is reached, the final computed ranks are joined with the original input vertices table (`inputVerticesTable`). - The result is stored in the specified `resultVerticesTable`. 5. **Indexing:** Creates indexes on the `resultVerticesTable` as specified by `resultVerticesIndexes`. 6. **Cleanup:** Ensures that all temporary intermediate tables created during the process are dropped.

      Parameters:
      inputSchema - The database schema containing the input graph tables. Cannot be null or empty.
      inputVerticesTable - The name of the table containing vertex information. Must contain at least an 'id' column. Cannot be null or empty.
      inputEdgesTable - The name of the table containing the directed edges. Must contain 'srcid' and 'destid' columns. Cannot be null or empty.
      resultSchema - The database schema where the result table will be created. Cannot be null or empty.
      resultVerticesTable - The name for the new table storing results (original vertex columns + 'pagerank'). Cannot be null or empty, and must not conflict with input tables.
      tolerance - The convergence tolerance. Iterations stop when the sum of absolute differences between vertex ranks in consecutive iterations is less than or equal to this value. Must be non-negative.
      resetProb - The reset probability (damping factor alpha), typically 0.85.
      resultVerticesIndexes - An optional list of column names from `resultVerticesTable` for index creation. Pass an empty list if no indexes are needed. Cannot be null.
      personalizationSrcId - If not null, computes Personalized PageRank biased towards this vertex ID. If null, computes standard PageRank.
      stmt - The JDBC Statement object for executing SQL commands. Cannot be null.
      Throws:
      SQLException - If any input parameters are invalid, if a database error occurs (SQL execution, table access), or if an internal error occurs.