Class OCGraph
OCGraph class provides the same operations as the Spark GraphX library, but for an
Ocient database. Graphs are represented as tables in the database, with vertices and edges stored
in separate tables. A vertices table must contain a column named "id" that is not nullable and is
of type BIGINT. An edges table must contain two columns named "srcid" and "destid" that are not
nullable and are of type BIGINT. Besides these columns, the vertices and edges tables can contain
any number of additional columns with any data types.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumSpecifies the direction of edges to follow relative to a source vertex when traversing a graph or collecting neighbors.static classProvides utility methods for performing Triangle Counting operations on graph data stored in a database. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidaggregateMessages(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String sendToSourceExpr, String sendToDestExpr, String aggregateExpr, ArrayList<String> resultVerticesIndexes, Statement stmt) Aggregates messages passed between connected vertices in a graph and stores the results in a new table.static voidcollectEdges(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, OCGraph.EdgeDirection direction, ArrayList<String> resultVerticesIndexes, Statement stmt) Collects edge information associated with each vertex in a graph stored in database tables and stores the results in a new table.static voidcollectNeighbors(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, OCGraph.EdgeDirection direction, ArrayList<String> resultVerticesIndexes, Statement stmt) Collects neighbor information for each vertex in a graph stored in database tables and stores the results in a new table.static voidconnectedComponents(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) Finds the connected components of a graph stored in database tables using a Pregel-based algorithm.static voidcreateTripletsTable(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTripletsTable, ArrayList<String> resultTripletsIndexes, Statement stmt) Creates a materialized triplet table by joining vertex and edge data from input tables.static voidcreateTripletsView(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTripletsView, Statement stmt) Creates a database view representing graph triplets (source vertex, edge, destination vertex).static voiddegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultDegreesTable, ArrayList<String> resultIndexes, Statement stmt) Calculates the total degree (combined in-degree and out-degree) for each node in a graph represented by an edges table and stores the results in a new table.static voiddynamicPageRank(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, double tolerance, double resetProb, ArrayList<String> resultVerticesIndexes, Long personalizationSrcId, Statement stmt) Computes PageRank scores for vertices in a graph, optionally personalized, by iterating until the scores converge within a specified tolerance.static voidfilterEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, String edgeFilter, ArrayList<String> resultEdgesIndexes, Statement stmt) Filters edges from a source table into a new result table based on a provided SQL predicate.static voidfilterVertices(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, String vertexFilter, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) Creates a new subgraph by filtering vertices based on a specified condition and retaining only the edges that connect the resulting vertices.static voidfromEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) Creates a vertices table from an existing edges table by extracting unique source and destination IDs.static voidgroupEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) Groups duplicate edges from an input table into a new result table using SQL aggregation.static voidinDegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultInDegreesTable, ArrayList<String> resultIndexes, Statement stmt) Calculates the in-degree for each destination node in a given edge table and stores the results (node ID and its in-degree count) in a newly created table.static voidinnerJoinVertices(String inputSchema, String inputVerticesTable, String otherSchema, String otherVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) Performs an SQLINNER JOINoperation between two vertex tables based on theiridcolumns, creating a new result table containing specified columns from both input tables.static voidjoinVertices(String inputSchema, String inputVerticesTable, String modificationSchema, String modificationVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultAttributeExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) Joins two sets of vertices (input and modification) into a result table.static voidlabelPropagation(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) Performs the Label Propagation Algorithm (LPA) iteratively on graph data stored in database tables, assigning community labels to vertices.static voidmapEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) Transforms edge attributes by creating a new edges table based on specified column expressions derived from an existing edges table.static voidmapTriplets(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) Transforms edge attributes by incorporating information from the source and destination vertices (the "triplet").static voidmapVertices(String inputSchema, String inputVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) Transforms vertex attributes from an input table to a new result table.static voidmask(String inputSchema, String inputVerticesTable, String inputEdgesTable, String otherSchema, String otherVerticesTable, String otherEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) Creates a subgraph by retaining vertices and edges from an input graph that are also present in an 'other' graph, mimicking the behavior of GraphX'smaskoperation.static voidoutDegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultOutDegreesTable, ArrayList<String> resultIndexes, Statement stmt) Calculates the out-degree for each node in a graph represented by an edge table and stores the results in a new database table.static voidouterJoinVertices(String inputSchema, String inputVerticesTable, String otherSchema, String otherVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) Performs a LEFT OUTER JOIN between two vertex tables based on their 'id' columns, storing the result in a new table.static voidpregel(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String initializerExpr, String sendToSourceExpr, String sendToDestExpr, String aggregateExpr, String updaterExpr, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) Performs a Pregel-like graph computation using SQL.static voidreverseEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultEdgesIndexes, Statement stmt) Reverses the direction of edges stored in a database table.static voidshortestPaths(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTable, List<Long> landmarks, String edgeWeightColumn, int maxIterations, ArrayList<String> resultIndexes, Statement stmt) Computes the shortest paths from all vertices in the graph to a specified set of landmark vertices.static voidstaticPageRank(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int numIterations, double resetProb, ArrayList<String> resultVerticesIndexes, Long personalizationSrcId, Statement stmt) Computes static PageRank scores for vertices in a graph, optionally personalized towards a specific source vertex, and stores the results in a new vertices table.static voidstronglyConnectedComponents(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultVerticesIndexes, Statement stmt) Computes the Strongly Connected Components (SCCs) of a directed graph represented by vertex and edge tables in a database.static voidsubgraph(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, String vertexFilter, String edgeFilter, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) Creates a subgraph consisting of new vertex and edge tables based on filtering criteria applied to existing input graph tables.
-
Constructor Details
-
OCGraph
public OCGraph()
-
-
Method Details
-
subgraph
public static void subgraph(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, String vertexFilter, String edgeFilter, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Creates a subgraph consisting of new vertex and edge tables based on filtering criteria applied to existing input graph tables.This method performs the following steps:
- Validates input parameters (non-null, non-empty where applicable).
- Checks for potential conflicts where result tables might overwrite input tables.
- Creates the structure for the result vertex and edge tables, copying the schema from the corresponding input tables and applying specified indexes.
- Populates the result vertex table by selecting vertices from the input vertex table that
satisfy the
vertexFilter. - Populates the result edge table by selecting edges from the input edge table that satisfy
the
edgeFilterAND whose source and destination vertices BOTH exist in the newly created (filtered) result vertex table.
Important Filter Logic:
- The
vertexFilteris a standard SQL WHERE clause applied to the input vertex table. - The
edgeFilteris a standard SQL WHERE clause condition. Within this filter string:- Reference columns from the source vertex using the alias
a(e.g.,a.vertex_property). - Reference columns from the edge itself using the alias
b(e.g.,b.edge_property). - Reference columns from the destination vertex using the alias
c(e.g.,c.vertex_property).
vertexFilter). - Reference columns from the source vertex using the alias
Atomicity and Cleanup: The operation attempts to be atomic in the sense that if any step after table creation fails (e.g., due to an SQL error during insertion or an invalid filter), it will attempt to drop the newly created result vertex and edge tables to avoid leaving partial results. Input tables are never modified.
- Parameters:
inputSchema- The schema name containing the original graph tables. Must not be null or empty.inputVerticesTable- The name of the table containing the original vertices. Must not be null or empty.inputEdgesTable- The name of the table containing the original edges. Must not be null or empty.resultSchema- The schema name where the resulting subgraph tables will be created. Must not be null or empty.resultVerticesTable- The name for the new table that will store the filtered vertices. Must not be null or empty. This table should not conflict with input table names.resultEdgesTable- The name for the new table that will store the filtered edges. Must not be null or empty. This table should not conflict with input table names.vertexFilter- An SQL WHERE clause condition (without the "WHERE" keyword) used to filter vertices from theinputVerticesTable. Use"1=1"for no filtering. Must not be null.edgeFilter- An SQL WHERE clause condition (without the "WHERE" keyword) used to filter edges. Use aliasesa,b, andcto refer to the source vertex, edge, and destination vertex columns, respectively (e.g.,"a.prop > 5 AND b.weight < 10"). Use"1=1"for no filtering. Must not be null.resultVerticesIndexes- A list of column names from the vertex table on which to create indexes in theresultVerticesTable. Can be empty. Must not be null.resultEdgesIndexes- A list of column names from the edge table on which to create indexes in theresultEdgesTable. Can be empty. Must not be null.stmt- The JDBCStatementobject used to execute SQL commands against the database. Must not be null.- Throws:
SQLException- If any database access error occurs during table creation, data insertion, or cleanup, or if the provided filters result in invalid SQL.
-
joinVertices
public static void joinVertices(String inputSchema, String inputVerticesTable, String modificationSchema, String modificationVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultAttributeExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Joins two sets of vertices (input and modification) into a result table.This method performs a join operation based on the vertex
id. It first creates the result table structure based on the input vertices table and specified indexes. Then, it performs two main insertion operations:- Inserts all vertices from the
inputVerticesTablethat are NOT present (byid) in themodificationVerticesTable. All columns are copied directly. - Inserts vertices that ARE present (by
id) in BOTH theinputVerticesTableand themodificationVerticesTable. For these vertices, theidis taken from the input table, and the remaining attribute columns are calculated based on the providedresultAttributeExpressions.
Usage of
resultAttributeExpressions:This parameter is crucial for defining the attribute values in the result table when a vertex exists in both the input and modification sets. Each string in the list represents a SQL expression that calculates a single column value for the resulting vertex.
- Available Columns and Aliases: Within these expressions, you can refer to columns
from both the
inputVerticesTableand themodificationVerticesTable. To distinguish between them, the query internally uses aliases:- The
inputVerticesTableis aliased asa. - The
modificationVerticesTableis aliased asb.
a.attribute_name,b.other_attribute) to ensure clarity and avoid ambiguity, especially if columns share the same name in both tables[9]. The join condition isa.id = b.id. - The
- Expression Structure and
AS alias: Each expression string provided in the list will be placed directly into theSELECTlist of the secondINSERTstatement. For example:SELECT a.id, expression1, expression2, ... FROM .... While the function itself doesn't enforce it, it is highly recommended that each expression ends withAS alias_name(e.g.,a.value + b.delta AS final_value,b.status AS current_status)[4][5][8][10]. - Order: The order of expressions in the
resultAttributeExpressionslist determines the order of the corresponding columns in the result table (following the initialidcolumn). Ensure this order matches the column order established when theresultVerticesTablewas created.
Transactional Behavior: If any SQL operation within this method fails (e.g., due to invalid syntax in expressions, constraint violations, or database errors), it attempts to drop the partially created
resultVerticesTableto avoid leaving incomplete data. The originalSQLExceptionis then re-thrown.- Parameters:
inputSchema- The schema containing the primary input vertices table. Cannot be null or empty.inputVerticesTable- The name of the primary input vertices table (e.g., containing the original state). Cannot be null or empty.modificationSchema- The schema containing the modification vertices table. Cannot be null or empty.modificationVerticesTable- The name of the modification vertices table (e.g., containing changes or updates). Cannot be null or empty.resultSchema- The schema where the result table will be created. Cannot be null or empty.resultVerticesTable- The name of the table to be created containing the joined vertices. Cannot be null or empty, and must not conflict with input table names.resultAttributeExpressions- An ordered list of SQL expressions defining the attribute columns for vertices present in both input and modification tables. The list cannot be null or empty. See details above regarding column referencing (use aliasesaandb) and the recommended use ofAS alias_namefor each expression.resultVerticesIndexes- A list of column names on which indexes should be created in theresultVerticesTable. Cannot be null. Can be empty if no indexes are needed.stmt- The JDBCStatementobject used to execute the SQL commands. Cannot be null.- Throws:
SQLException- If any database access error occurs, if table/schema names are invalid, ifresultAttributeExpressionsis empty, or if an expression is syntactically incorrect. Also thrown if cleanup (dropping the result table on failure) fails.
- Inserts all vertices from the
-
mapVertices
public static void mapVertices(String inputSchema, String inputVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Transforms vertex attributes from an input table to a new result table.This function creates a new table (
resultSchema.resultVerticesTable) containing vertices with transformed attributes based on the data in the input table (inputSchema.inputVerticesTable). The transformation is defined by the SQL expressions provided inresultColumnExpressions.The function performs the following steps:
- Performs sanity checks on input parameters.
- Checks for potential naming conflicts between input and result tables.
- Creates the result table structure using
CREATE TABLE ... AS SELECT ... WHERE 1=0. The mandatoryidcolumn from the input table is always included as the first column. - Creates indexes on the specified columns (
resultVerticesIndexes) in the result table. - Populates the result table by selecting and transforming data from the input table using
INSERT INTO ... SELECT ....
Transactional Behavior: If any step after the initial creation of the result table fails (e.g., index creation, data insertion), the function attempts to drop the newly created result table (
resultVerticesName) to leave the database in a consistent state.Using
This parameter defines the columns of the result table, beyond the mandatoryresultColumnExpressionsidcolumn which is always included automatically as the first column. Each string in the list represents a column definition in theSELECTlist of the underlying SQL queries.Referencing Input Columns
Expressions withinresultColumnExpressionscan refer to any column present in the input vertices table (inputSchema.inputVerticesTable). You should refer to these input columns using their actual names as they exist in the input table. No special aliasing is required simply to reference an input column. For example, if the input table has columns named"id","property1", and"value", your expressions can directly useproperty1andvalue.Defining Output Columns and Aliases (AS alias)
Each string inresultColumnExpressionsbecomes part of theSELECTclause.- Simple Column Selection: If an expression is just the name of a column from the
input table (e.g.,
"property1"), that column will be included in the result table with the same name. - Transformations and Expressions: You can use any valid SQL expression (e.g.,
"value * 2","CONCAT(first_name, ' ', last_name')","CASE WHEN status = 'A' THEN 1 ELSE 0 END"). - Using
AS alias: It is highly recommended to use an alias usingAS alias_namefor each expression, especially for complex expressions or when you want to rename a column.- If you provide a complex expression without an alias (e.g.,
"value * 2"), the database will assign a default name to the resulting column, which can be unpredictable and difficult to work with later. - Using an alias provides a clear, predictable name for the column in the result
table. Example:
"value * 2 AS double_value". - Even for simple column selections, you can use an alias to rename the column in the
result table. Example:
"property1 AS vertex_property".
AS your_desired_column_nameis the best practice for clarity, predictability, and compatibility with subsequent operations (like defining indexes usingresultVerticesIndexes). - If you provide a complex expression without an alias (e.g.,
idcolumn inresultColumnExpressions; it is automatically handled and placed as the first column in the result table. The order of expressions in the list determines the order of the corresponding columns in the result table (following theidcolumn).Example usage for
resultColumnExpressions:ArrayList<String> expressions = new ArrayList<>(); expressions.add("name"); // Selects 'name' column, result column is named 'name' expressions.add("age AS vertex_age"); // Selects 'age' column, result column is named 'vertex_age' expressions.add("salary * 1.1 AS increased_salary"); // Calculates new salary, result column 'increased_salary' expressions.add("CASE category WHEN 'A' THEN 1 ELSE 0 END AS category_code"); // Transforms category- Parameters:
inputSchema- The schema name of the source vertices table. Cannot be null or empty.inputVerticesTable- The name of the source vertices table. Cannot be null or empty. Must contain an 'id' column and any columns referenced inresultColumnExpressions.resultSchema- The schema name for the target (result) vertices table. Cannot be null or empty.resultVerticesTable- The name for the target (result) vertices table. This table will be created. Cannot be null or empty. Must not conflict with input tables.resultColumnExpressions- An ordered list of SQL expressions defining the columns (beyond the mandatoryid) for the result table. See method description for detailed usage. Cannot be null; elements cannot be null or empty.resultVerticesIndexes- An ordered list of column names in the result table on which to create indexes. These names must match the final column names in the result table (eitherid, names derived fromresultColumnExpressions, or aliases provided viaAS). Cannot be null; elements cannot be null or empty.stmt- TheStatementobject used to execute SQL commands. Must be valid and connected. Cannot be null.- Throws:
SQLException- if any database operation (CREATE TABLE, CREATE INDEX, INSERT) fails.
-
mapEdges
public static void mapEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Transforms edge attributes by creating a new edges table based on specified column expressions derived from an existing edges table.This method creates a new table (`resultSchema.resultEdgesTable`) with the structure defined by the `resultColumnExpressions` plus mandatory `srcid` and `destid` columns. It then populates this new table by selecting data from the `inputSchema.inputEdgesTable` and applying the specified transformations. Indexes are created on the result table as specified.
If the operation fails at any point after the initial `CREATE TABLE` statement for the result table is issued (e.g., during index creation or data insertion), the method will attempt to drop the partially created `resultEdgesTable` to ensure a cleaner state.
## Understanding `resultColumnExpressions`
This parameter is crucial for defining the content and structure of the new edges table, beyond the mandatory `srcid` and `destid` columns which are always included.
### Available Columns for Expressions Each string expression in the `resultColumnExpressions` list is evaluated within the context of a `SELECT ... FROM inputSchema.inputEdgesTable` statement. Therefore, your expressions can refer to **any column present in the original `inputSchema.inputEdgesTable`**.
### Referencing Input Columns You should refer to the columns from the `inputEdgesTable` using their **original names**. No special prefixes or table aliases are required *for the source columns* within the expression itself. For example, if the input table has columns `weight`, `type`, and `cost`, you can write expressions like: - `"weight * 2"` - `"LOWER(type)"` - `"cost / 100.0"` - `"CASE WHEN status = 'ACTIVE' THEN 1 ELSE 0 END"`
### Defining Result Column Names (AS alias) **Each expression string should define the name for the resulting column using the `AS` keyword followed by an alias.** This alias becomes the column name in the new `resultEdgesTable`. The database needs this alias to know what to name the column resulting from your expression in the `CREATE TABLE AS SELECT ...` and `INSERT INTO ... SELECT ...` statements.
**Example:** If your input table has columns `weight` and `label`, and you want the result table to have `srcid`, `destid`, a column named `double_weight` (calculated as weight * 2), and a column named `upper_label` (the uppercase version of the label), your `resultColumnExpressions` list should look like this:
``` ArrayList
expressions = new ArrayListinvalid input: '<'>(List.of( "weight * 2 AS double_weight", "UPPER(label) AS upper_label" )); ``` The final result table (`resultSchema.resultEdgesTable`) will then have the columns: `srcid`, `destid`, `double_weight`, `upper_label`.
**Note:** Do *not* include expressions for `srcid` or `destid` in this list; they are automatically handled and preserved with their original names.
- Parameters:
inputSchema- The schema of the source table containing the original edge data. Cannot be null or empty.inputEdgesTable- The name of the source table containing the original edge data. Cannot be null or empty.resultSchema- The schema for the new table that will store the transformed edge data. Cannot be null or empty.resultEdgesTable- The name for the new table that will store the transformed edge data. Cannot be null or empty.resultColumnExpressions- An ArrayList of SQL expressions defining the columns of the result table, *in addition* to the implicitly included `srcid` and `destid` columns. Each expression must refer to columns in the `inputEdgesTable` and should use `AS alias` to define the result column name. The list cannot be null, but can be empty if only `srcid` and `destid` are needed. Individual expressions cannot be null or empty.resultEdgesIndexes- An ArrayList of column names (from the *result* table, including `srcid`, `destid`, or any names defined via `AS` in `resultColumnExpressions`) on which to create indexes in the `resultEdgesTable`. The list cannot be null, but can be empty. Individual column names for indexing cannot be null or empty.stmt- The JDBCStatementobject used to execute the SQL queries. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, index creation, data insertion, or dropping the table on failure.
-
mapTriplets
public static void mapTriplets(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Transforms edge attributes by incorporating information from the source and destination vertices (the "triplet").This method creates a new edge table (`resultEdgesTable`) based on joining the input edge table (`inputEdgesTable`) with the input vertex table (`inputVerticesTable`) for both source and destination vertices. It allows defining custom columns in the new edge table using SQL expressions that can refer to columns from the source vertex, the original edge, and the destination vertex.
The operation is designed to be somewhat transactional within its scope: if any step (table creation, index creation, data insertion) fails, it attempts to drop the partially created `resultEdgesTable` to avoid leaving incomplete artifacts. ## Understanding `resultColumnExpressions`
This parameter is crucial for defining the schema and content (beyond `srcid` and `destid`) of the resulting edge table. Each string in the `resultColumnExpressions` list represents a SQL expression that will be included in the `SELECT` list used to create and populate the new table.
### Available Columns and Required Aliases
When constructing your expressions, you have access to columns from three sources, which *must* be referenced using specific aliases:
- Source Vertex (`a`): Columns from the `inputVerticesTable` corresponding to the
source of the edge. Reference these using the alias `a`. For example:
a.vertex_property,a.name. The join condition isa.id = b.srcid. - Edge (`b`): Columns from the `inputEdgesTable`. Reference these using the alias
`b`. For example:
b.edge_weight,b.type. The columnsb.srcidandb.destidare automatically included in the result table's primary selection and do not need to be added via `resultColumnExpressions`. - Destination Vertex (`c`): Columns from the `inputVerticesTable` corresponding to
the destination of the edge. Reference these using the alias `c`. For example:
c.vertex_property,c.status. The join condition isc.id = b.destid.
### Using `AS alias` for Result Column Names
It is highly recommended that every expression in `resultColumnExpressions` ends with an `AS result_column_name` clause.
- Why? When creating a table using `CREATE TABLE AS SELECT ...`, the database needs
names for the columns being created. While simple expressions like `a.name` implicitly
use `name` as the column name, complex expressions (e.g.,
a.value + c.value) or functions (e.g.,upper(b.type)) will result in default column names if an alias is not provided. - Clarity and Control: Using `AS alias` gives you explicit control over the column names in your `resultEdgesTable`. This is essential for predictability and for referencing these columns later (e.g., in the `resultEdgesIndexes` list).
- Requirement: For any expression that is not a simple column reference (e.g., arithmetic, function calls, concatenation), an `AS alias` is effectively mandatory to ensure a valid and predictable column name.
In this example, the resulting table `resultEdgesTable` would have columns: `srcid`, `destid`, `edge_weight`, `vertex_categories`, `calculated_value`, and `source_vertex_name`.ArrayList<String> expressions = new ArrayList<>(); // Include the edge weight directly expressions.add("b.weight AS edge_weight"); // Combine properties from source and destination vertices expressions.add("a.category || '-' || c.category AS vertex_categories"); // Calculate a value based on source vertex and edge expressions.add("a.value * b.multiplier AS calculated_value"); // Include the name of the source vertex expressions.add("a.name AS source_vertex_name");- Parameters:
inputSchema- The schema containing the input vertex and edge tables. Cannot be null or empty.inputVerticesTable- The name of the input table containing vertex data. Must contain an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the input table containing edge data. Must contain 'srcid' and 'destid' columns referencing vertex 'id's. Cannot be null or empty.resultSchema- The schema where the result edge table will be created. Cannot be null or empty.resultEdgesTable- The name of the new edge table to be created. This table will contain `srcid`, `destid`, and columns defined by `resultColumnExpressions`. Cannot be null or empty.resultColumnExpressions- A list of SQL expressions defining additional columns for the result table. Each expression should reference columns using aliases `a` (source vertex), `b` (edge), `c` (destination vertex) and should specify a result column name using `AS alias`. Cannot be null. List elements cannot be null/empty.resultEdgesIndexes- A list of column names in the `resultEdgesTable` on which to create indexes. These names should correspond to the aliases provided in `resultColumnExpressions` or the standard 'srcid', 'destid'. Cannot be null. List elements cannot be null/empty.stmt- TheStatementobject used to execute the SQL queries. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, index creation, data insertion, or cleanup.
- Source Vertex (`a`): Columns from the `inputVerticesTable` corresponding to the
source of the edge. Reference these using the alias `a`. For example:
-
createTripletsView
public static void createTripletsView(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTripletsView, Statement stmt) throws SQLException Creates a database view representing graph triplets (source vertex, edge, destination vertex).This method constructs and executes a
CREATE VIEWSQL statement. The resulting view joins the specified edges table with the vertices table twice: once for the source vertex (based onedges.srcid = vertices.id) and once for the destination vertex (based onedges.destid = vertices.id).The view includes all columns from the original edges table, along with all columns from the vertices table (excluding the
idcolumn itself). Vertex columns are prefixed withsrc_for the source vertex anddest_for the destination vertex to avoid naming conflicts and clarify origin.Important: If any part of the view creation process fails (e.g., due to SQL errors, insufficient permissions, or invalid input), the method attempts to drop the view using
safeDropViewto ensure a clean state.Preconditions:
- The input
Statementobject must be valid, open, and connected to the target database. - All schema and table/view name strings must be non-null and non-empty.
- The
inputVerticesTablemust exist in theinputSchemaand contain anidcolumn along with other attribute columns. - The
inputEdgesTablemust exist in theinputSchemaand containsrcidanddestidcolumns that reference theidcolumn of theinputVerticesTable. - The proposed
resultTripletsViewname must not conflict with the input table names within their respective schemas (checked internally bycheckTableConflict).
- Parameters:
inputSchema- The schema containing the source vertices and edges tables. Cannot be null or empty.inputVerticesTable- The name of the table containing vertex data. Must include an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the table containing edge data. Must include 'srcid' and 'destid' columns referencinginputVerticesTable.id. Cannot be null or empty.resultSchema- The schema where the resulting triplets view will be created. Cannot be null or empty.resultTripletsView- The name for the new view to be created. Cannot be null or empty.stmt- The JDBCStatementobject used to execute the SQL commands. Cannot be null.- Throws:
SQLException- If a database access error occurs during query execution (e.g., table not found, syntax error in generated SQL, insufficient permissions).
- The input
-
createTripletsTable
public static void createTripletsTable(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTripletsTable, ArrayList<String> resultTripletsIndexes, Statement stmt) throws SQLException Creates a materialized triplet table by joining vertex and edge data from input tables.This method constructs a new table in the specified
resultSchemanamedresultTripletsTable. This table represents graph triplets (source vertex, edge, destination vertex). It achieves this by performing a SQL join:- It selects all columns from the
inputEdgesTable. - It joins the
inputEdgesTablewith theinputVerticesTableoninputEdgesTable.srcid = inputVerticesTable.idto get the source vertex attributes. All vertex attributes (except 'id') are included in the result table, prefixed with "src_". - It joins the
inputEdgesTablewith theinputVerticesTableagain oninputEdgesTable.destid = inputVerticesTable.idto get the destination vertex attributes. All vertex attributes (except 'id') are included in the result table, prefixed with "dest_".
Input parameters are checked for null or emptiness. It also checks if the target table name conflicts with any of the input table names.
If any step during the table creation, index creation, or population fails (e.g., due to an SQLException), the method attempts to drop the partially created
resultTripletsTableto ensure atomicity as much as possible within the operation's scope.- Parameters:
inputSchema- The schema containing the source vertex and edge tables. Must not be null or empty.inputVerticesTable- The name of the source table containing vertex data. It is expected to have an 'id' column and potentially other attribute columns. Must not be null or empty.inputEdgesTable- The name of the source table containing edge data. It is expected to have 'srcid' and 'destid' columns referencing vertex IDs, and potentially other edge attribute columns. Must not be null or empty.resultSchema- The schema where the resulting triplets table will be created. Must not be null or empty.resultTripletsTable- The desired name for the output triplets table. Must not be null or empty.resultTripletsIndexes- A list of column names *in the resulting triplets table* on which B-tree indexes should be created. These can include original edge columns or the prefixed vertex columns (e.g., "src_attributeName", "dest_attributeName"). The list itself must not be null, and each column name within the list must not be null or empty.stmt- TheStatementobject used to execute the SQL DDL (CREATE TABLE, CREATE INDEX) and DML (INSERT) commands. Must not be null.- Throws:
SQLException- If any database access error occurs during sanity checks (metadata retrieval), table creation, index creation, data insertion, or the cleanup attempt (dropping the table).
- It selects all columns from the
-
reverseEdges
public static void reverseEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Reverses the direction of edges stored in a database table.This method reads edges from an input table specified by
inputSchemaandinputEdgesTable. It assumes the input table has columns namedsrcid(source ID) anddestid(destination ID). It creates a new result table specified byresultSchemaandresultEdgesTable.The result table will have the same columns as the input table, but the values in the
srcidanddestidcolumns will be swapped for each edge. All other columns present in the input table are copied verbatim to the result table.The method also creates indexes on the specified columns (
resultEdgesIndexes) in the newly created result table.Important: This operation is designed to be somewhat atomic. If any step during the table creation or data population fails (indicated by an
SQLException), the method will attempt to drop the newly created result table (resultSchema.resultEdgesTable) to avoid leaving partially created or populated tables. Input validation failures (e.g., null parameters) may occur before table creation begins.- Parameters:
inputSchema- The schema name of the input table containing the original edges. Cannot be null or empty.inputEdgesTable- The name of the input table containing the original edges. Must contain columns namedsrcidanddestid(case-insensitive). Cannot be null or empty.resultSchema- The schema name for the new table where reversed edges will be stored. Cannot be null or empty.resultEdgesTable- The name for the new table where reversed edges will be stored. Cannot be null or empty. The combination ofresultSchemaandresultEdgesTablemust not conflict with the input table name.resultEdgesIndexes- A list of column names on which to create indexes in the result table. The list itself cannot be null, but it can be empty if no indexes are desired. Column names should exist in the input table structure.stmt- The JDBCStatementobject used to execute SQL commands against the database. Cannot be null.- Throws:
SQLException- If any database access error occurs during table creation, metadata retrieval, data insertion, index creation, or the cleanup (table drop) attempt.
-
filterEdges
public static void filterEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, String edgeFilter, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Filters edges from a source table into a new result table based on a provided SQL predicate.This method performs the following steps:
- Performs sanity checks on input parameters.
- Checks if the intended result table name conflicts with input tables.
- Creates the result table (`resultSchema.resultEdgesTable`) with the same structure as the input table (`inputSchema.inputEdgesTable`).
- Creates specified indexes on the result table.
- Constructs and executes an SQL `INSERT INTO ... SELECT * FROM ... WHERE ...` statement to
populate the result table with rows from the input table that satisfy the
edgeFilter. - If any step fails (e.g., due to an
SQLException), it attempts to drop the potentially partially created result table to ensure cleanup.
The
edgeFilterparameter is a crucial part of this function, allowing you to specify *which* edges should be copied from the input table to the result table. Here's how it works:- **Purpose:** It provides the filtering logic as a standard SQL condition. - **Content:** It should contain only the conditional expression that normally comes *after* the `WHERE` keyword in a SQL query. - **`WHERE` Keyword:** **Do NOT include the word `WHERE`** in the
edgeFilterstring. The function automatically prepends ` WHERE ` before adding your filter condition to the SQL query, but only if the filter is not empty or effectively "no filter" (`"1=1"`). - **Column References:** The conditions inedgeFiltermust reference columns that exist in the **input edges table** (`inputSchema.inputEdgesTable`). - **Table Aliases:** The underlying SQL query generated by this function is structured like: `INSERT INTO result_table SELECT * FROM input_table WHERE [your_edgeFilter];` Since this query selects directly from the `input_table` without defining any table aliases in the `FROM` clause, you should **refer to columns directly by their names**. You do not need (and should not use) table aliases (like `t1.col_name`) within theedgeFilterstring. - **No Filter:** If you want to copy all edges without any filtering, pass the string `"1=1"` as theedgeFilter. The function recognizes this and will effectively omit the `WHERE` clause. Note that `null` is not allowed (enforced by a check).### Examples of `edgeFilter` values:
- Filter by edge weight:
"weight > 0.5" - Filter by type and creation date:
"type = 'mention' AND created_ts > '2024-01-01'" - Filter based on a source node ID:
"src_id = 12345" - No filtering (copy all edges):
"1=1"
- Parameters:
inputSchema- The schema name of the source table containing edges. Cannot be null or empty.inputEdgesTable- The table name of the source table containing edges. Cannot be null or empty.resultSchema- The schema name for the new result table where filtered edges will be stored. Cannot be null or empty.resultEdgesTable- The table name for the new result table where filtered edges will be stored. Cannot be null or empty. This table will be created by the function.edgeFilter- The SQL predicate string (condition) used to filter edges. Should reference columns in the input table directly by name. Do NOT include the `WHERE` keyword. Use "1=1" for no filter. Cannot be null.resultEdgesIndexes- A list of column names (which must exist in the input table) on which to create indexes in the result table. Cannot be null (can be an empty list).stmt- The JDBCStatementobject used to execute the SQL commands. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, index creation, data insertion, or cleanup (dropping the table on failure). Also thrown by potential underlying utility methods if they interact with the DB.
-
filterVertices
public static void filterVertices(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, String vertexFilter, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Creates a new subgraph by filtering vertices based on a specified condition and retaining only the edges that connect the resulting vertices.This method performs the following steps:
- Performs sanity checks on input parameters.
- Checks for potential conflicts where result tables might overwrite input tables.
- Creates new empty tables (
resultVerticesTable,resultEdgesTable) in theresultSchema, replicating the structure and specified indexes of the corresponding input tables. - Populates the
resultVerticesTableby selecting vertices from theinputVerticesTablethat satisfy thevertexFiltercondition. - Populates the
resultEdgesTableby selecting edges from theinputEdgesTablewhere *both* the source vertex (srcid) and the destination vertex (destid) are present in the newly populatedresultVerticesTable.
Transactional Behavior: If any
SQLExceptionoccurs after the result tables have been created but before the operation completes successfully, the method will attempt to drop theresultVerticesTableandresultEdgesTableto clean up potentially incomplete results.Assumptions:
- The input vertices table (
inputVerticesTable) contains a column namedidwhich serves as the vertex identifier. - The input edges table (
inputEdgesTable) contains columns namedsrcidanddestidrepresenting the source and destination vertex IDs, respectively, which correspond to theidcolumn in the vertices table.
- Parameters:
inputSchema- The schema containing the source graph tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the source vertices. Cannot be null or empty.inputEdgesTable- The name of the table containing the source edges. Cannot be null or empty.resultSchema- The schema where the resulting filtered graph tables will be created. Cannot be null or empty.resultVerticesTable- The name for the new table that will store the filtered vertices. Cannot be null or empty.resultEdgesTable- The name for the new table that will store the filtered edges. Cannot be null or empty.vertexFilter- An SQL predicate string used to filter rows from theinputVerticesTable. This predicate is applied directly within aWHEREclause constructed internally like:SELECT * FROM inputVerticesTable WHERE [vertexFilter].- Do NOT include the
WHEREkeyword in this string; the method adds it automatically. - The filter condition can reference any column available in the
inputVerticesTabledirectly by its name. - Table aliases are NOT required (and generally should not be used) within this filter string, as the context is a simple select from the single input vertices table.
- Example:
"population > 10000 AND type = 'city'" - Example:
"ST_Area(geom) > 500" - Use
"1=1"if no vertex filtering is desired (i.e., keep all vertices initially). - Cannot be null.
- Do NOT include the
resultVerticesIndexes- A list of column names from theinputVerticesTableon which indexes should be created in theresultVerticesTable. Cannot be null. Can be empty if no indexes are needed.resultEdgesIndexes- A list of column names from theinputEdgesTableon which indexes should be created in theresultEdgesTable. Cannot be null. Can be empty if no indexes are needed.stmt- The JDBCStatementobject used to execute the SQL commands against the database. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, data insertion, index creation, or table dropping.
-
inDegrees
public static void inDegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultInDegreesTable, ArrayList<String> resultIndexes, Statement stmt) throws SQLException Calculates the in-degree for each destination node in a given edge table and stores the results (node ID and its in-degree count) in a newly created table.This method performs the following steps:
- Performs sanity checks on all input parameters to ensure they are not null or empty where required.
- Checks for potential naming conflicts between the input tables and the target result table.
- Constructs the fully qualified names for input and result tables (schema.table).
- Creates the result table (`resultSchema.resultInDegreesTable`) with columns `id` (BIGINT) and `in_degree` (BIGINT).
- Creates indexes on the specified columns ('id' and/or 'in_degree') in the result table based on the `resultIndexes` list.
- Populates the result table by executing an SQL query that counts the occurrences of each `destid` in the input edge table (`inputSchema.inputEdgesTable`) and inserts the `destid` (as `id`) and the count (as `in_degree`).
Error Handling: The method includes a safety mechanism. If any part of the operation (table creation, index creation, or data population) fails and throws an exception, it attempts to drop the potentially partially created result table (`resultSchema.resultInDegreesTable`) before propagating the exception. This helps prevent leaving incomplete artifacts in the database.
- Parameters:
inputSchema- The schema name of the input table containing edges. Must not be null or empty.inputEdgesTable- The name of the input table containing edges. This table is expected to have a column nameddestidrepresenting the destination node ID of an edge. Must not be null or empty.resultSchema- The schema name for the output table where in-degree results will be stored. Must not be null or empty.resultInDegreesTable- The name for the output table where in-degree results will be stored. Must not be null or empty.resultIndexes- AnArrayListof strings specifying which columns in the result table should be indexed. Valid entries are "id" and "in_degree" (case-insensitive). The list itself must not be null, and its elements must not be null or empty. An empty list means no specific indexes will be created beyond potential defaults.stmt- The JDBCStatementobject used to execute the SQL queries. Must not be null.- Throws:
SQLException- If any database access error occurs during table creation, index creation, data insertion, or if invalid arguments are provided (e.g., invalid index column name, table name conflict, SQL syntax errors, connection issues).
-
outDegrees
public static void outDegrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultOutDegreesTable, ArrayList<String> resultIndexes, Statement stmt) throws SQLException Calculates the out-degree for each node in a graph represented by an edge table and stores the results in a new database table.This method performs the following steps:
- Performs sanity checks on all input parameters.
- Checks for potential naming conflicts between input and result tables.
- Creates a new result table (
resultOutDegreesTable) in the specifiedresultSchemawith columnsid(BIGINT) andout_degree(BIGINT). - Optionally creates indexes on the 'id' and/or 'out_degree' columns of the result table
based on the
resultIndexeslist. - Populates the result table by querying the
inputEdgesTable, grouping by the source ID (assumed to be 'srcid'), and counting the occurrences to determine the out-degree.
SQLException, this method attempts to drop the potentially incomplete result table before propagating the exception.- Parameters:
inputSchema- The schema name where the input edge table resides. Cannot be null or empty.inputEdgesTable- The name of the input table containing graph edges. Expected to have at least a source ID column (typically 'srcid'). Cannot be null or empty.resultSchema- The schema name where the result table will be created. Cannot be null or empty.resultOutDegreesTable- The name of the table to be created for storing out-degrees (node ID and count). Cannot be null or empty. Must not conflict with input table names.resultIndexes- A list of column names ('id' or 'out_degree', case-insensitive) on which to create indexes in the result table. Indexes can improve query performance. The list cannot be null; elements cannot be null/empty and must be 'id' or 'out_degree'.stmt- The JDBCStatementobject used to execute SQL commands against the database. Cannot be null. The statement should be associated with an active connection with sufficient privileges to create tables, create indexes, and query data.- Throws:
SQLException- If any database access error occurs during sanity checks (e.g., invalid index column name), table creation, index creation, data population, or the cleanup attempt (dropping the table on failure). This includes issues like invalid SQL syntax, insufficient permissions, connection problems, or if the specified tables/schemas do not exist or have unexpected structures.
-
degrees
public static void degrees(String inputSchema, String inputEdgesTable, String resultSchema, String resultDegreesTable, ArrayList<String> resultIndexes, Statement stmt) throws SQLException Calculates the total degree (combined in-degree and out-degree) for each node in a graph represented by an edges table and stores the results in a new table.The input edges table is expected to have columns named
srcidanddestid, both representing node IDs (assumed to be BIGINT).This method performs the following steps:
- Performs sanity checks on input parameters (non-null, non-empty).
- Checks for potential naming conflicts between input and result tables.
- Creates the result table (
resultSchema.resultDegreesTable) with columnsid BIGINT NOT NULLanddegree BIGINT NOT NULL. - Optionally creates indexes on the 'id' or 'degree' columns of the result table as
specified in
resultIndexes. - Populates the result table by counting all occurrences of each node ID in both the 'srcid' and 'destid' columns of the input edges table.
Error Handling: The operation attempts to be transactional within its scope. If any step (table creation, index creation, data insertion) fails and throws an SQLException, the method will attempt to drop the
resultDegreesTableto avoid leaving a partially created or populated table in the database. This cleanup attempt might also fail but is performed in a finally block.- Parameters:
inputSchema- The schema name of the input table containing graph edges. Cannot be null or empty.inputEdgesTable- The name of the input table containing graph edges. Must contain 'srcid' and 'destid' columns. Cannot be null or empty.resultSchema- The schema name for the output table where node degrees will be stored. Cannot be null or empty.resultDegreesTable- The name for the output table to be created. This table will store node IDs and their calculated degrees. Cannot be null or empty.resultIndexes- An ArrayList of column names on which to create indexes in the result table. Valid values are "id" and "degree" (case-insensitive). Cannot be null, but can be empty. Invalid column names will cause an SQLException.stmt- The JDBCStatementobject used to execute the SQL commands. Cannot be null.- Throws:
SQLException- If any database access error occurs during table/index creation, data insertion, or cleanup. Also thrown if input validation fails (null/empty parameters, invalid index column name, table name conflicts).
-
fromEdges
public static void fromEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Creates a vertices table from an existing edges table by extracting unique source and destination IDs.This function identifies all unique identifiers present in the
srcidanddestidcolumns of the input edges table. It then creates a new table (the result vertices table) containing these unique IDs. Optionally, additional columns can be generated in the result table based on SQL expressions provided inresultColumnExpressions. Indexes can also be created on specified columns in the result table.If any step of the operation fails (e.g., SQL error during table creation, index creation, or data insertion), the function will attempt to drop the partially created result vertices table to ensure atomicity.
## How `resultColumnExpressions` Works
The
resultColumnExpressionsparameter allows you to define additional columns in the output vertices table beyond the primaryidcolumn. Each string in this list should be a valid SQL expression that computes a value for a new column.### Available Columns for Expressions
When defining your expressions, you can refer to the unique vertex ID generated by the function. Internally, the function creates a subquery like:
(SELECT srcid AS id FROM input_edges UNION DISTINCT SELECT destid AS id FROM input_edges) idsThis means the *only* column directly available from this internal source for your expressions is the unique vertex ID itself.
### Referencing the Vertex ID
You *must* refer to the unique vertex ID using the alias
ids.id. For example, an expression might be"ids.id % 100 AS bucket_id".### Column Aliases (`AS alias`)
Each expression provided in the
resultColumnExpressionslist **must** end withAS alias_name. This alias defines the name of the new column that will be created in the result vertices table. The database uses this alias to name the column generated by your expression. For example:"LOWER(CAST(ids.id AS VARCHAR(255))) AS id_lowercase""CASE WHEN ids.id > 1000 THEN 'high' ELSE 'low' END AS category""ids.id AS original_id"(if you simply want a copy with a different name)
idcolumn (derived fromsrcidanddestid), plus one column for each expression provided inresultColumnExpressions, named according to the alias you specify.- Parameters:
inputSchema- The schema name of the input edges table. Must not be null or empty. The edges table is expected to have columns namedsrcidanddestid.inputEdgesTable- The name of the input edges table. Must not be null or empty. Expected to containsrcidanddestidcolumns representing edge endpoints.resultSchema- The schema name for the new vertices table to be created. Must not be null or empty.resultVerticesTable- The name for the new vertices table to be created. Must not be null or empty. This table will contain unique vertex IDs and any additional computed columns.resultColumnExpressions- An ArrayList of strings, where each string is a SQL expression used to compute an additional column for the result vertices table. Each expression *must* use the aliasids.idto refer to the unique vertex ID and *must* end withAS column_aliasto name the resulting column. If empty or null, no additional columns besides 'id' are created. Individual expressions cannot be null or empty.resultVerticesIndexes- An ArrayList of strings specifying the names of columns in the *result* vertices table on which to create indexes. This can include the primary 'id' column and any columns created viaresultColumnExpressions(using their specified aliases). If empty or null, no indexes are created. Individual column names cannot be null or empty.stmt- TheStatementobject used to execute SQL commands against the database. Must not be null.- Throws:
SQLException- If a database access error occurs, if any input SQL (table names, expressions, index columns) is invalid, or if the underlying JDBC driver encounters an issue. If an exception occurs during the creation process, an attempt will be made to drop the result table.
-
innerJoinVertices
public static void innerJoinVertices(String inputSchema, String inputVerticesTable, String otherSchema, String otherVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Performs an SQLINNER JOINoperation between two vertex tables based on theiridcolumns, creating a new result table containing specified columns from both input tables.The join condition is always
a.id = b.id, whereais an alias for the first input table (inputSchema.inputVerticesTable) andbis an alias for the second input table (otherSchema.otherVerticesTable).The function first creates the structure of the result table using
CREATE TABLE AS SELECT ... WHERE 1=0, then creates specified indexes, and finally populates the table usingINSERT INTO ... SELECT .... This approach ensures the table structure is defined correctly before data insertion.Important: The
idcolumn from the first input table (aliased asa) is automatically included as the first column in the result table and will be namedid. You do not need to specifya.idin theresultColumnExpressions.Atomicity: If any step of the operation (table creation, index creation, data insertion) fails, the function attempts to drop the partially created result table to leave the database in a cleaner state. However, this is a best-effort cleanup and does not guarantee full transactional atomicity across different database systems or complex failure scenarios. ## How to use resultColumnExpressions
The
resultColumnExpressionslist defines the additional columns to be included in the result table, beyond the mandatoryidcolumn from the first table. Each string in this list should be a valid SQL expression that can appear in aSELECTclause.### Available Columns for Expressions
Within each expression string, you can refer to columns from the two input tables using specific aliases:
- Columns from the first input table (
inputSchema.inputVerticesTable) must be referenced using the aliasa. Example:a.property_name. - Columns from the second input table (
otherSchema.otherVerticesTable) must be referenced using the aliasb. Example:b.other_property.
It is strongly recommended that each expression explicitly defines the resulting column name using the
AS aliassyntax.- Syntax:
expression AS desired_column_name - Example:
a.name AS vertex_name,b.status AS vertex_status,a.value + b.offset AS calculated_value
AS alias, the database system will automatically assign a name to the column. Explicitly usingASensures predictable column names in theresultVerticesTable, which is crucial for subsequent queries and for defining indexes using theresultVerticesIndexesparameter.### Examples of resultColumnExpressions:
ArrayList<String> columns = new ArrayList<>(); columns.add("a.name AS name"); // Select 'name' from the first table, call it 'name' in the result columns.add("b.type AS type"); // Select 'type' from the second table, call it 'type' columns.add("a.value * b.factor AS weighted_value"); // Calculate a value, call it 'weighted_value' columns.add("b.creation_date"); // Select 'creation_date' from second table. Strongly recommend adding 'AS creation_date'- Parameters:
inputSchema- Schema of the first input vertices table. Cannot be null or empty.inputVerticesTable- Name of the first input vertices table (aliased as 'a' internally). Must contain an 'id' column. Cannot be null or empty.otherSchema- Schema of the second input vertices table. Cannot be null or empty.otherVerticesTable- Name of the second input vertices table (aliased as 'b' internally). Must contain an 'id' column used for joining. Cannot be null or empty.resultSchema- Schema for the output table. Cannot be null or empty.resultVerticesTable- Name of the table to be created with the join results. Cannot be null or empty. This table will be dropped if it exists before the operation starts.resultColumnExpressions- An ArrayList of SQL expressions defining the columns (beyond the implicit 'a.id') to include in the result table. See detailed explanation above regarding aliases 'a' and 'b', and the recommended use of 'AS alias'. Cannot be null; elements cannot be null or empty.resultVerticesIndexes- An ArrayList of column names in the result table on which to create indexes. These names should match the aliases defined inresultColumnExpressionsor the implicit 'id' column. Cannot be null; elements cannot be null or empty.stmt- The JDBC Statement object to use for executing SQL commands. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, indexing, data insertion, or cleanup. This could include syntax errors inresultColumnExpressions, missing tables/columns, constraint violations, or connection issues.
- Columns from the first input table (
-
outerJoinVertices
public static void outerJoinVertices(String inputSchema, String inputVerticesTable, String otherSchema, String otherVerticesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Performs a LEFT OUTER JOIN between two vertex tables based on their 'id' columns, storing the result in a new table.The operation joins the
inputVerticesTable(left side, alias 'a') with theotherVerticesTable(right side, alias 'b') using the conditiona.id = b.id. All rows from the left table ('a') are included in the result. If a row in 'a' has no matching 'id' in 'b', the columns selected from 'b' will be NULL in the result row.The result table is created first with the correct structure, then indexes are added, and finally, the data is inserted. If any step of this process fails, an attempt is made to drop the partially created result table to ensure atomicity.
Result Columns and
resultColumnExpressionsThe result table will always contain the 'id' column from the left table ('a'). The
resultColumnExpressionslist defines any additional columns to be included in the result table. These expressions are used directly in theSELECTlist of the underlying SQL query.Available Columns for Expressions:
Within each expression string in
resultColumnExpressions, you can refer to columns from both input tables using specific aliases:- Alias
a: Refers to columns from the left table, specified byinputSchemaandinputVerticesTable. Example:a.name,a.value. - Alias
b: Refers to columns from the right table, specified byotherSchemaandotherVerticesTable. Example:b.property,b.score.
Remember that any column selected via alias
bmight beNULLin the output if there was no matching row in the right table for a given row from the left table.Using Aliases (
AS alias):Each string in
resultColumnExpressionseffectively becomes an item in the SQLSELECTlist. Therefore, each expression must define the name for the resulting column in the output table.- For simple column selections, the database implicitly uses the column name. For example,
"a.name"results in a column namedname. - For clarity, consistency, and especially for any expression more complex than a simple
column reference (e.g., using functions like
COALESCE, arithmetic operations, or selecting columns with the same name from both tables), you must use theASkeyword to explicitly assign a unique alias (name) to the resulting column.
Examples of
resultColumnExpressionselements:"a.name AS vertex_name"- Selects 'name' from the left table, names the result column 'vertex_name'."b.score AS property_score"- Selects 'score' from the right table, names the result column 'property_score'. (Will be NULL if no match)."COALESCE(a.value, b.default_value, 0) AS effective_value"- A complex expression using values from both tables, explicitly named 'effective_value'."a.category"- Selects 'category' from the left table. Might result in a column named 'category' (safer to useAS)."b.status"- Selects 'status' from the right table. Might result in a column named 'status' (safer to useAS).
Recommendation: It is strongly recommended to use
AS aliasfor all expressions inresultColumnExpressionsto ensure predictable column names in the result table and avoid potential ambiguity or database-specific behavior.- Parameters:
inputSchema- The schema of the primary (left) input vertices table. Cannot be null or empty.inputVerticesTable- The name of the primary (left) input vertices table (aliased as 'a'). Must contain an 'id' column. Cannot be null or empty.otherSchema- The schema of the secondary (right) input vertices table to join with. Cannot be null or empty.otherVerticesTable- The name of the secondary (right) input vertices table (aliased as 'b'). Must contain an 'id' column. Cannot be null or empty.resultSchema- The schema where the resulting joined table will be created. Cannot be null or empty.resultVerticesTable- The name for the new table that will store the results of the join. Cannot be null or empty, and must not conflict with input tables.resultColumnExpressions- An ArrayList of strings, where each string is a SQL expression defining a column to include in the result table (beyond the mandatory 'a.id'). See documentation section "Result Columns and resultColumnExpressions" for details on available columns (using aliases 'a' and 'b') and the mandatory use of 'AS alias' for complex expressions. Cannot be null; elements cannot be null or empty.resultVerticesIndexes- An ArrayList of strings specifying the names of columns in the result table on which to create indexes. These column names must match the 'id' column or the aliases defined inresultColumnExpressions. Cannot be null; elements cannot be null or empty.stmt- The JDBC Statement object used to execute the SQL commands. Cannot be null.- Throws:
SQLException- If any database operation (table creation, index creation, data insertion, table drop on failure) fails.
- Alias
-
mask
public static void mask(String inputSchema, String inputVerticesTable, String inputEdgesTable, String otherSchema, String otherVerticesTable, String otherEdgesTable, String resultSchema, String resultVerticesTable, String resultEdgesTable, ArrayList<String> resultVerticesIndexes, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Creates a subgraph by retaining vertices and edges from an input graph that are also present in an 'other' graph, mimicking the behavior of GraphX'smaskoperation.The operation proceeds as follows:
- Result tables for vertices and edges are created with the same columns as the corresponding input tables. Specified indexes are also created on these result tables.
- Vertices from the input graph are included in the result vertex table if a vertex with the same ID exists in the 'other' vertex table. Vertex attributes are copied directly from the input vertex table.
- Edges from the input graph are included in the result edge table if:
- An edge with the same source ID (
srcid) and destination ID (destid) exists in the 'other' edge table. - Both the source vertex and the destination vertex of the edge (referenced by
srcidanddestid) are present in the newly created result vertex table. This ensures the consistency of the resulting subgraph.
- An edge with the same source ID (
Input validation checks are performed for all parameters, and potential naming conflicts between source and result tables are checked before execution begins.
Error Handling: If any step of the operation fails after the result tables have been initially created (e.g., due to an SQL error during data population), the function attempts to safely drop the potentially partially populated result vertex and edge tables to avoid leaving incomplete artifacts in the database.
- Parameters:
inputSchema- The schema containing the input graph tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the input vertices. Cannot be null or empty.inputEdgesTable- The name of the table containing the input edges. Cannot be null or empty.otherSchema- The schema containing the 'other' graph tables used for masking. Cannot be null or empty.otherVerticesTable- The name of the table containing the 'other' graph's vertices. Cannot be null or empty.otherEdgesTable- The name of the table containing the 'other' graph's edges. Cannot be null or empty.resultSchema- The schema where the resulting subgraph tables will be created. Cannot be null or empty.resultVerticesTable- The name for the table that will store the resulting vertices. Cannot be null or empty.resultEdgesTable- The name for the table that will store the resulting edges. Cannot be null or empty.resultVerticesIndexes- A list of column names on which to create indexes for the result vertices table. Cannot be null. Can be empty if no indexes are desired.resultEdgesIndexes- A list of column names on which to create indexes for the result edges table. Cannot be null. Can be empty if no indexes are desired.stmt- The JDBCStatementobject used to execute SQL commands against the database. Cannot be null.- Throws:
SQLException- If any database access error occurs during table creation, data population, index creation, or cleanup operations.
-
groupEdges
public static void groupEdges(String inputSchema, String inputEdgesTable, String resultSchema, String resultEdgesTable, ArrayList<String> resultColumnExpressions, ArrayList<String> resultEdgesIndexes, Statement stmt) throws SQLException Groups duplicate edges from an input table into a new result table using SQL aggregation.This method creates a new table (`resultEdgesTable`) containing unique pairs of `srcid` and `destid` from the `inputEdgesTable`. For each unique pair, it calculates additional columns based on the aggregate expressions provided in `resultColumnExpressions`. These expressions use SQL aggregate functions (e.g., COUNT, SUM, AVG, MIN, MAX) to combine data from the original rows that share the same `srcid` and `destid`.
The process involves:
- Performing sanity checks on input parameters.
- Checking for potential naming conflicts between input and result tables.
- Creating the result table structure based on `srcid`, `destid`, and the specified result column aggregate expressions.
- Creating specified indexes on the result table.
- Populating the result table by selecting and grouping data from the input table using the specified aggregate expressions.
If any step of the table creation or population fails, the method attempts to drop the potentially partially created `resultEdgesTable` to ensure atomicity and avoid leaving incomplete artifacts.
Using resultColumnExpressions
The
resultColumnExpressionslist allows you to define additional columns for theresultEdgesTable. These columns are computed by aggregating data from the rows in theinputEdgesTablethat share the samesrcidanddestid.Expression Content and Available Data
Each string in theresultColumnExpressionslist becomes part of theSELECTclause in the underlying SQL query. Because the query usesGROUP BY srcid, destid, standard SQL rules apply: any expression in the select list must either be an aggregate function or one of the grouping columns (srcidordestid).Therefore, each expression you provide **must use an SQL aggregate function** (e.g.,
COUNT,SUM,AVG,MIN,MAX). These aggregate functions operate on the values from the originalinputEdgesTablerows that are grouped together for a given unique pair ofsrcidanddestid. The aggregate functions can operate on any column present in theinputEdgesTable.Column Aliases in Expressions
You do not need to use specific aliases when referring to the *input* columns within your aggregate function. For example, if the input table has a column namedweight, you can directly useSUM(weight).Requirement for
Each expression string provided inAS aliasresultColumnExpressionsMUST end withAS alias. This alias defines the name of the corresponding computed column in the newly createdresultEdgesTable. The database requires this alias to name the result of the aggregate function.For example, if your
inputEdgesTablehas columnssrcid,destid,weight, andtimestamp, valid expressions could be:"COUNT(*) AS edge_count"(to count the number of original edges grouped)"SUM(weight) AS total_weight"(to sum the weights of the grouped edges)"AVG(weight) AS average_weight""MAX(timestamp) AS latest_timestamp"
"weight AS original_weight"would be **invalid** in standard SQL in this context, becauseweightis not used within an aggregate function and it's not one of theGROUP BYcolumns (srcid,destid).- Parameters:
inputSchema- The schema name of the input edges table. Cannot be null or empty.inputEdgesTable- The name of the input table containing edges. This table MUST have columns namedsrcidanddestid. It should also contain any other columns referenced within the aggregate functions inresultColumnExpressions. Cannot be null or empty.resultSchema- The schema name for the result table. Cannot be null or empty.resultEdgesTable- The name of the table to be created, which will store the grouped edges. Cannot be null or empty. This table will contain `srcid`, `destid`, and columns defined by the aliases in `resultColumnExpressions`.resultColumnExpressions- An ArrayList of strings, where each string is a SQL aggregate expression (e.g.,COUNT(*),SUM(column_name)) operating on columns from theinputEdgesTable. Each expression MUST include anAS aliasclause to name the resulting column in theresultEdgesTable. Cannot be null, but can be empty. If empty, the result table will only have `srcid` and `destid`.resultEdgesIndexes- An ArrayList of strings, where each string is a column name in the *result* table (i.e., `srcid`, `destid`, or one of the aliases defined in `resultColumnExpressions`) on which a database index should be created. Cannot be null, but can be empty.stmt- The JDBC Statement object used to execute SQL queries. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, index creation, data insertion, or cleanup.
-
aggregateMessages
public static void aggregateMessages(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String sendToSourceExpr, String sendToDestExpr, String aggregateExpr, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Aggregates messages passed between connected vertices in a graph and stores the results in a new table.This method simulates a message-passing process on a graph defined by input vertices and edges. For each edge connecting a source vertex 'a' and a destination vertex 'c' via edge 'b', messages can be generated based on
sendToSourceExpr(message sent to 'a') and/orsendToDestExpr(message sent to 'c'). These generated messages (internally aliased as 'msg') are then grouped by the target vertex ID ('id') and aggregated using theaggregateExpr. The final result, containing the vertex ID and the aggregated message, is stored in the specified result table.The process involves:
- Generating messages based on the send expressions for each edge.
- Grouping these messages by the target vertex ID.
- Applying the aggregation expression to the grouped messages.
- Creating the result table and inserting the aggregated results.
- Creating specified indexes on the result table.
If the operation fails at any point after starting table creation, it attempts to drop the potentially partially created result table to ensure cleanup.
Important Note on Expressions and Aliases: Within the
sendToSourceExprandsendToDestExpr, the context is always a single edge traversal. You can refer to columns from the source vertex using the aliasa, columns from the edge itself using the aliasb, and columns from the destination vertex using the aliasc. For example, to access a property named 'value' from the source vertex, you would usea.value. To access the edge weight (assuming an edge column named 'weight'), you would useb.weight.The
aggregateExproperates on the messages generated by the send expressions. These generated messages are internally collected under the column namemsg. Therefore, youraggregateExprtypically needs to referencemsg, e.g.,SUM(msg),MAX(msg).- Parameters:
inputSchema- The schema containing the input graph tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the graph vertices. Must have at least an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the table containing the graph edges. Must have 'srcid' and 'destid' columns referencing vertex IDs. Cannot be null or empty.resultSchema- The schema where the result table will be created. Cannot be null or empty.resultVerticesTable- The name of the table to be created, containing the aggregated results. This table will have an 'id' column (the vertex ID) and a column for the aggregated message (named based on the alias inaggregateExpr, or 'aggregated_message' by default). Cannot be null or empty. Must not conflict with input table names.sendToSourceExpr- A SQL expression defining the message to be sent *to the source* vertex ('a') of an edge ('b'). This expression is evaluated in the context of the join:inputVerticesTable a JOIN inputEdgesTable b ON a.id = b.srcid JOIN inputVerticesTable c ON b.destid = c.id. You can use aliasesa,b, andcto refer to columns from the source vertex, edge, and destination vertex, respectively (e.g.,c.property + b.weight). If this expression evaluates to NULL for an edge, no message is sent to the source for that edge. Can be null or empty, but at least one ofsendToSourceExprorsendToDestExprmust be non-empty. The result of this expression becomes the 'msg' value associated with the source vertex's 'id' (b.srcid).sendToDestExpr- A SQL expression defining the message to be sent *to the destination* vertex ('c') of an edge ('b'). Evaluated in the same join context assendToSourceExpr(a JOIN b JOIN c). You can use aliasesa,b, andc. (e.g.,a.property * b.weight). If this expression evaluates to NULL for an edge, no message is sent to the destination for that edge. Can be null or empty, but at least one ofsendToSourceExprorsendToDestExprmust be non-empty. The result of this expression becomes the 'msg' value associated with the destination vertex's 'id' (b.destid).aggregateExpr- A SQL aggregate expression defining how messages arriving at the *same* vertex ID are combined. This expression *must* operate on the generated message values, which are internally available under the column namemsg. Example:SUM(msg),MIN(msg). You can optionally provide an alias for the result column using standard SQL 'AS' syntax (e.g.,"SUM(msg) AS total_value"). If no alias is provided, the result column will be named "aggregated_message". This alias is used as the column name in theresultVerticesTable. Cannot be null or empty. Must be a valid SQL aggregate function call.resultVerticesIndexes- An ArrayList of column names in theresultVerticesTableon which to create indexes. Allowed values are "id" and the alias used for theaggregateExprresult (either explicitly provided via 'AS' or the default "aggregated_message"). Cannot be null. List elements cannot be null/empty.stmt- The JDBC Statement object to use for executing the SQL commands. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, insertion, or index creation. Also thrown if required arguments are null/empty, if both send expressions are empty, if specified index columns are invalid, if the result table name conflicts with input tables, or if the provided SQL expressions are invalid.
-
collectNeighbors
public static void collectNeighbors(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, OCGraph.EdgeDirection direction, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Collects neighbor information for each vertex in a graph stored in database tables and stores the results in a new table.This method queries a graph represented by vertex and edge tables. Based on the specified
OCGraph.EdgeDirection, it identifies neighbors for each vertex. The neighbor information (ID and all other columns from the input vertices table) is packaged into a tuple. These tuples are then aggregated into an array for each source vertex.A new result table is created with two primary columns:
- 'id': The ID of the source vertex.
- A column (typically named 'neighbors'): An array containing tuples representing the neighbors. Each tuple structure mirrors the input vertices table row (id + other columns).
If any step of the operation fails *after* the result table has been initially created (e.g., during index creation or data insertion), the method attempts to drop the potentially incomplete or incorrectly populated result table to ensure atomicity.
- Parameters:
inputSchema- The schema containing the input vertices and edges tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the input vertices. Must contain at least an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the table containing the input edges. Must contain 'srcid' and 'destid' columns referencing vertex IDs. Cannot be null or empty.resultSchema- The schema where the result table will be created. Cannot be null or empty.resultVerticesTable- The name of the table to create for storing the results. This table should not conflict with input tables. Cannot be null or empty.direction- The direction of edges to follow (IN, OUT, or BOTH) when collecting neighbors. SeeOCGraph.EdgeDirection. Cannot be null.resultVerticesIndexes- A list containing the names of columns in the *result* table on which to create indexes. Valid names are 'id' and the name of the aggregated neighbor array column (determined dynamically, typically 'neighbors'). Cannot be null; list elements must be valid column names.stmt- The JDBCStatementobject used to execute SQL queries. Cannot be null.- Throws:
SQLException- If a database access error occurs during query execution, table creation, index creation, or data insertion. Also thrown if input arguments are invalid (e.g., null/empty strings, invalid index column names specified inresultVerticesIndexes, table name conflicts, non-existent input tables/columns) or if an internal error occurs (e.g., unexpectedEdgeDirection).- See Also:
-
collectEdges
public static void collectEdges(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, OCGraph.EdgeDirection direction, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Collects edge information associated with each vertex in a graph stored in database tables and stores the results in a new table.This method queries a graph represented by vertex and edge tables. For each vertex, it identifies connected edges based on the specified
OCGraph.EdgeDirection. All columns from the identified edges in the input edges table are packaged into a tuple. These edge tuples are then aggregated into an array for each vertex.A new result table is created with two primary columns:
- 'id': The ID of the vertex (the source or destination, depending on the edge direction).
- 'edges': An array containing tuples, where each tuple represents a connected edge and includes all columns from the original edge row in the input edges table.
If any step of the operation fails *after* the result table has been initially created (e.g., during index creation or data insertion), the method attempts to drop the potentially incomplete or incorrectly populated result table to ensure atomicity.
- Parameters:
inputSchema- The schema containing the input vertices and edges tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the input vertices. Must contain at least an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the table containing the input edges. Must contain 'srcid' and 'destid' columns referencing vertex IDs, and potentially other edge attribute columns. Cannot be null or empty, and must contain at least one column.resultSchema- The schema where the result table will be created. Cannot be null or empty.resultVerticesTable- The name of the table to create for storing the results. This table should not conflict with input tables. Cannot be null or empty. The resulting table will contain an 'id' column and an 'edges' column.direction- The direction of edges to collect relative to each vertex (IN, OUT, or BOTH). SeeOCGraph.EdgeDirection. Cannot be null.resultVerticesIndexes- A list containing the names of columns in the *result* table on which to create indexes. Valid names are 'id' and 'edges'. Cannot be null; list elements must be valid column names.stmt- The JDBCStatementobject used to execute SQL queries. Cannot be null.- Throws:
SQLException- If a database access error occurs during query execution, table creation, index creation, or data insertion. Also thrown if input arguments are invalid (e.g., null/empty strings, invalid index column names specified inresultVerticesIndexes, table name conflicts, non-existent input tables/columns, input edge table having no columns) or if an internal error occurs (e.g., unexpectedEdgeDirection).- See Also:
-
labelPropagation
public static void labelPropagation(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Performs the Label Propagation Algorithm (LPA) iteratively on graph data stored in database tables, assigning community labels to vertices.The algorithm initializes each vertex's label to its own ID. Then, for a specified number of iterations, each vertex updates its label to the most frequent label among its neighbors. Ties are broken by choosing the numerically smaller label ID. The process uses SQL queries executed via the provided
Statement.Temporary tables are created within the
resultSchemato store intermediate results for each iteration. These tables are named uniquely using a UUID suffix (e.g.,lpa_temp_xxxxxxxx_iterN). The final results are stored in the specifiedresultVerticesTablewithin theresultSchema.This method includes robust cleanup logic:
- Temporary tables are dropped upon successful completion.
- If an error occurs during execution, all created temporary tables AND the potentially partially created result table are dropped to ensure a clean state.
resultVerticesTablefor faster lookups on the specified columns ('id' or 'label').Note: Isolated vertices (those with no edges) will retain their initial label (their own ID). Input tables are assumed to exist and be accessible. The target result table should not conflict with input table names.
- Parameters:
inputSchema- The name of the database schema containing the input vertex and edge tables. Cannot be null or empty.inputVerticesTable- The name of the table containing vertex data. Expected to have at least anidcolumn (BIGINT). Cannot be null or empty.inputEdgesTable- The name of the table containing edge data. Expected to havesrcidanddestidcolumns representing the edges (BIGINT). Cannot be null or empty.resultSchema- The name of the database schema where the result table and temporary iteration tables will be created. Cannot be null or empty.resultVerticesTable- The name of the table to be created in theresultSchemastoring the final vertex labels. This table will have columnsid(BIGINT NOT NULL) andlabel(BIGINT). Cannot be null or empty.maxIterations- The maximum number of iterations to perform. Must be >= 1.resultVerticesIndexes- AnArrayListof column names ("id" and/or "label") on which to create indexes in the finalresultVerticesTable. If empty or null, no indexes are created besides potential primary keys defined by the database. Cannot be null, but can be empty. Column names within the list cannot be null/empty and must be either "id" or "label" (case-insensitive).stmt- AStatementobject connected to the target database, used to execute all SQL commands. Must not be null and should be associated with an active connection. The caller is responsible for managing the Statement's lifecycle (creation, closing).- Throws:
SQLException- If any database access error occurs during table creation, data manipulation (INSERT, SELECT), index creation, or table dropping. Also thrown if input/result table name conflicts exist, or if an invalid column name is provided inresultVerticesIndexes.
-
pregel
public static void pregel(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, String initializerExpr, String sendToSourceExpr, String sendToDestExpr, String aggregateExpr, String updaterExpr, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Performs a Pregel-like graph computation using SQL.This method implements the core logic of a vertex-centric graph processing algorithm, often referred to as Pregel. It iteratively computes vertex states based on messages passed along edges. The computation involves initializing vertex states, sending messages between vertices based on their state and edge connections, aggregating incoming messages for each vertex, and updating the vertex state based on its current state and aggregated messages.
The process creates and manages temporary tables for intermediate states and messages, ensuring they are cleaned up even if the operation fails. The final result is stored in the specified result table.
- Parameters:
inputSchema- The schema containing the input vertex and edge tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the initial vertices. Must have at least an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the table containing the graph edges. Must have 'srcid' and 'destid' columns corresponding to vertex IDs. Cannot be null or empty.resultSchema- The schema where the result table and temporary tables will be created. Cannot be null or empty.resultVerticesTable- The name of the table to store the final vertex states. This table will be created or replaced. Cannot be null or empty. It will contain 'id' and 'result' columns.initializerExpr- An SQL expression defining the initial state for each vertex.- Purpose: Computes the starting value for the 'state' column for every vertex.
- Context: Used in a
SELECT id, <initializerExpr> FROM inputVerticesTablestatement. - Available Columns: Can reference any column from the
inputVerticesTable. - Aliases: No specific table aliases are needed within the expression itself.
- Syntax: Any valid SQL expression that returns a single value per vertex. The
result of this expression will implicitly be named 'state' (the function appends
AS stateif not present). - Example:
"CASE WHEN type = 'seed' THEN 1.0 ELSE 0.0 END"
sendToSourceExpr- An SQL expression defining the message to send *from* the source vertex *to* the destination vertex, where the message is associated with the source vertex ID in the intermediate message table. Can be null or empty if no messages are sent in this direction.- Purpose: Computes a message based on the states of connected vertices and the edge connecting them. This message is conceptually sent *along* the edge but grouped by the *source* vertex ID (`b.srcid`) in the temporary message table.
- Context: Used in a
SELECT b.srcid AS id, (<sendToSourceExpr>) AS msg FROM currentStateTable a JOIN inputEdgesTable b ON a.id = b.srcid JOIN currentStateTable c ON b.destid = c.id WHERE (<sendToSourceExpr>) IS NOT NULL. - Available Columns: Must use aliases:
a: Represents the source vertex's current state table row (columns:id,state).b: Represents the edge table row (columns:srcid,destid, potentially others).c: Represents the destination vertex's current state table row (columns:id,state).
- Aliases: You *must* use the aliases
a,b, andcto refer to the tables (e.g.,a.state,c.state,b.weight). - Syntax: Any valid SQL expression returning a single value (the message).
Crucially, if no message should be sent for a given edge/vertex combination in this
direction, this expression *must* evaluate to
NULL. ReturningNULLprevents a message row from being created for that specific edge interaction. - Early Termination: If *all* evaluations of both
sendToSourceExprandsendToDestExprresult inNULLduring an iteration, no messages are generated, and the computation terminates early. Structure your expression to returnNULLwhen a message is not needed to potentially speed up convergence. - Example:
"CASE WHEN a.state > c.state THEN a.state / 2 ELSE NULL END"(Sends half the source state if it's greater than the destination state, associated with the source ID).
sendToDestExpr- An SQL expression defining the message to send *from* the source vertex *to* the destination vertex, where the message is associated with the destination vertex ID in the intermediate message table. Can be null or empty if no messages are sent in this direction.- Purpose: Computes a message based on the states of connected vertices and the edge connecting them. This message is conceptually sent *along* the edge and grouped by the *destination* vertex ID (`b.destid`) in the temporary message table.
- Context: Used in a
SELECT b.destid AS id, (<sendToDestExpr>) AS msg FROM currentStateTable a JOIN inputEdgesTable b ON a.id = b.srcid JOIN currentStateTable c ON b.destid = c.id WHERE (<sendToDestExpr>) IS NOT NULL. (Potentially combined withsendToSourceExprviaUNION ALL). - Available Columns: Must use aliases (same as
sendToSourceExpr):a: Represents the source vertex's current state table row (columns:id,state).b: Represents the edge table row (columns:srcid,destid, potentially others).c: Represents the destination vertex's current state table row (columns:id,state).
- Aliases: You *must* use the aliases
a,b, andc(e.g.,a.state,c.state,b.weight). - Syntax: Any valid SQL expression returning a single value (the message). Must
evaluate to
NULLif no message should be sent for a given edge/vertex combination in this direction. - Early Termination: Same as
sendToSourceExpr. If all message expressions evaluate toNULL, computation stops. - Example:
"a.state * b.weight"(Sends the source state multiplied by edge weight, associated with the destination ID).
aggregateExpr- An SQL aggregation expression to combine incoming messages for a vertex. Cannot be null or empty.- Purpose: Combines all messages ('msg') received by a single vertex ('id') in an iteration into one value.
- Context: Used in a
SELECT id, <aggregateExpr> FROM messageTable GROUP BY id. The message table contains columnsid(the recipient vertex) andmsg(the message content from send expressions). - Available Columns: Can reference the
msgcolumn from the temporary message table generated by the send expressions. - Aliases: No specific table aliases are needed within the expression (just use
msg). The result of the aggregation will implicitly be namedaggregated_message(the function appendsAS aggregated_messageif not present). The subquery containing this aggregation is aliased asmin the next step. - Syntax: Must be a valid SQL aggregation function applied to the
msgcolumn (e.g.,SUM(msg),ARRAY_AGG(msg),MIN(msg),MAX(msg)). - Example:
"SUM(msg)"
updaterExpr- An SQL expression to compute the next state of a vertex based on its current state and the aggregated messages it received. Cannot be null or empty.- Purpose: Calculates the new 'state' for a vertex for the next iteration.
- Context: Used in a
SELECT s.id, (<updaterExpr>) AS state FROM currentStateTable s LEFT JOIN aggregatedMessagesSubquery m ON s.id = m.id. - Available Columns: Must use aliases:
s: Represents the vertex's current state table row (columns:id,state).m: Represents the aggregated messages subquery row for that vertex (columns:id,aggregated_message). Note:m.aggregated_messagewill beNULLif the vertex received no messages in this iteration due to theLEFT JOIN. Handle this possibility (e.g., usingCOALESCE).
- Aliases: You *must* use the aliases
sandm(e.g.,s.state,m.aggregated_message). - Syntax: Any valid SQL expression that computes the new state. The result
becomes the 'state' column for the next iteration. Handle potential
NULLvalues form.aggregated_message(e.g.,COALESCE(m.aggregated_message, 0)if combining numerically). - Example:
"COALESCE(m.aggregated_message, s.state)"(Keep current state if no messages received, otherwise use aggregated message as new state). Or"0.15 + 0.85 * COALESCE(m.aggregated_message, 0)"(PageRank-style update).
maxIterations- The maximum number of iterations to perform. Must be >= 1. Computation may stop earlier if no messages are sent or if the vertex states do not change between iterations.resultVerticesIndexes- A list of column names on which to create indexes in the finalresultVerticesTable. Valid column names are"id"and"result". Cannot be null. Indexes can speed up subsequent queries on the result table.stmt- The JDBC Statement object used to execute the SQL commands. Cannot be null.- Throws:
SQLException- If any database error occurs during table creation, manipulation, or querying, or if input validation fails (e.g., invalid index column names).- See Also:
-
connectedComponents
public static void connectedComponents(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int maxIterations, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Finds the connected components of a graph stored in database tables using a Pregel-based algorithm.This method configures and executes a general-purpose Pregel computation (via the
pregelmethod) specifically for the connected components problem. The algorithm works by label propagation:- Each vertex initially considers itself to be in its own component (component ID = vertex ID).
- In each iteration, vertices send their current component ID to their neighbors.
- Vertices update their component ID to the minimum ID they have seen (either their own previous ID or one received from a neighbor).
The input graph is defined by vertex and edge tables within a specified schema. The results are stored in a new table within a specified schema. This result table will contain two columns: 'id' (the vertex ID) and 'result' (the calculated component ID for that vertex). Optional indexes can be created on the result table for performance.
- Parameters:
inputSchema- The schema name for the input graph tables.inputVerticesTable- The table name containing the vertices of the input graph. Expected to have an 'id' column.inputEdgesTable- The table name containing the edges of the input graph. Expected to have 'source' and 'target' columns referencing vertex IDs.resultSchema- The schema name where the result table will be created.resultVerticesTable- The table name for storing the connected components results. It will have 'id' and 'result' columns.maxIterations- The maximum number of Pregel iterations to perform. The algorithm might converge in fewer iterations. Must be >= 1.resultVerticesIndexes- A list of column names (e.g., "id", "result") on which to create indexes in the result table for faster subsequent lookups. Can be empty or null if no indexes are needed.stmt- The JDBCStatementobject used to execute SQL commands against the database. Cannot be null.- Throws:
SQLException- If a database access error occurs during the Pregel execution or while creating/populating the result table.- See Also:
-
stronglyConnectedComponents
public static void stronglyConnectedComponents(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, ArrayList<String> resultVerticesIndexes, Statement stmt) throws SQLException Computes the Strongly Connected Components (SCCs) of a directed graph represented by vertex and edge tables in a database.This method implements a recursive algorithm to partition the graph and identify sets of vertices where every vertex is reachable from every other vertex within the same set. It operates directly on database tables using the provided JDBC Statement.
Temporary tables are created within the specified
resultSchemaduring execution to store intermediate results. These tables are automatically dropped upon completion or if an error occurs. Ensure the provided database user has permissions to create and drop tables in theresultSchema.The final result is populated into the
resultVerticesTablewithin theresultSchema. This table maps each vertex ID to a representative ID for its component.The result table (
resultVerticesTable) created by this function will have the following structure:- id BIGINT NOT NULL: The identifier of a vertex from the input graph[1].
- component BIGINT NOT NULL: The identifier of the strongly connected component to which the vertex belongs. This identifier is the minimum vertex ID found within that specific component[1].
- Parameters:
inputSchema- The name of the database schema containing the input graph tables. Must not be null or empty[1].inputVerticesTable- The name of the table containing the vertices of the graph, located in theinputSchema. This table is expected to have at least an 'id' column (assumed BIGINT based on usage) uniquely identifying each vertex[1]. Must not be null or empty[1].inputEdgesTable- The name of the table containing the directed edges of the graph, located in theinputSchema. This table is expected to have 'srcid' and 'destid' columns (assumed BIGINT based on usage) representing the source and destination vertex IDs for each edge[1]. Must not be null or empty[1].resultSchema- The name of the database schema where the final result table and all temporary tables will be created. Must not be null or empty[1].resultVerticesTable- The desired name for the output table that will store the mapping of each vertex ID to its component ID. This table will be created in theresultSchema[1]. It must not conflict with input table names[1]. Must not be null or empty[1].resultVerticesIndexes- A list specifying which columns ('id' and/or 'component') in theresultVerticesTableshould have database indexes created on them for potentially faster lookups. Only "id" and "component" (case-insensitive) are valid entries[1]. Must not be null[1].stmt- The active JDBCStatementobject used to execute all SQL commands against the database. Must not be null[1].- Throws:
SQLException- If any database access error occurs, including issues with table creation/deletion, query execution, or if invalid parameters (e.g., invalid index column names, null/empty schema/table names) are provided[1].
-
shortestPaths
public static void shortestPaths(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultTable, List<Long> landmarks, String edgeWeightColumn, int maxIterations, ArrayList<String> resultIndexes, Statement stmt) throws SQLException Computes the shortest paths from all vertices in the graph to a specified set of landmark vertices.This method implements a shortest path algorithm based on iterative relaxation (similar to Bellman-Ford, but adapted for multiple destinations/landmarks simultaneously) to find the shortest distance from every vertex in the graph to each of the provided landmarks. It assumes that all edge weights are non-negative. If no edge weight column is specified, edges are assumed to have a uniform weight of 1.0.
The computation proceeds iteratively, refining distance estimates in each step. It terminates either when no shorter paths are found in an iteration or when the specified maximum number of iterations (`maxIterations`) is reached.
The results are stored in a specified database table. Only paths with finite distances (i.e., reachable landmarks from a source vertex) are included in the final result table. Temporary tables are created during the computation and are cleaned up afterwards. ## Result Table Columns The table created at `resultSchema.resultTable` will have the following columns:
srcid(BIGINT): The ID of the source vertex.destid(BIGINT): The ID of the landmark vertex (the destination of the path).distance(DOUBLE PRECISION): The computed shortest distance fromsrcidtodestid.
- Parameters:
inputSchema- The schema containing the input vertices and edges tables. Cannot be null or empty.inputVerticesTable- The name of the table containing the vertex IDs. Expected column: `id` (BIGINT). Cannot be null or empty.inputEdgesTable- The name of the table containing the graph edges. Expected columns: `srcid` (BIGINT), `destid` (BIGINT), and optionally the weight column specified by `edgeWeightColumn`. Cannot be null or empty.resultSchema- The schema where the result table will be created. Cannot be null or empty.resultTable- The name of the table to store the shortest path results. Cannot be null or empty. This table must not conflict with input table names.landmarks- A non-null, non-empty list of vertex IDs (BIGINT) to be used as landmarks. Shortest paths will be computed *to* these vertices. Individual landmark IDs within the list cannot be null.edgeWeightColumn- The name of the column in the `inputEdgesTable` containing non-negative edge weights (numeric type). If null or empty, a default weight of 1.0 is assumed for all edges.maxIterations- The maximum number of iterations for the relaxation algorithm. Must be a positive integer (>= 1).resultIndexes- A non-null list of column names ('srcid', 'destid', 'distance') on which to create indexes in the result table for potentially faster querying. Can be empty if no indexes are desired. Column names must be valid ('srcid', 'destid', or 'distance').stmt- The JDBCStatementobject used to execute SQL queries. Cannot be null.- Throws:
SQLException- if any input parameters are invalid (null, empty, non-positive, etc.), if the landmarks list is empty or contains nulls, if invalid index columns are specified, if the result table name conflicts with input table names, or if a database access error occurs during query execution, table manipulation, or cleanup.
-
staticPageRank
public static void staticPageRank(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, int numIterations, double resetProb, ArrayList<String> resultVerticesIndexes, Long personalizationSrcId, Statement stmt) throws SQLException Computes static PageRank scores for vertices in a graph, optionally personalized towards a specific source vertex, and stores the results in a new vertices table.## Algorithm Overview: 1. **Validation:** Checks that all input parameters (schemas, tables, statement, numerical constraints) are valid. Throws
SQLExceptionon validation failure. 2. **Initialization:** - Creates uniquely named temporary tables for storing ranks during iterations. - Calculates the total number of vertices (N). - If the graph is empty (N=0), creates an empty result table with the correct schema and returns. - Initializes the rank for each vertex: - **Standard PageRank:** Each vertex starts with a rank of `1.0 / N`. - **Personalized PageRank:** The `personalizationSrcId` vertex starts with a rank of `1.0`, and all other vertices start with `0.0`. 3. **Iteration:** Performs `numIterations` iterations of the PageRank update rule: - In each iteration `i`, the ranks calculated in iteration `i-1` (stored in a 'previous ranks' table) are used to compute the new ranks (stored in a 'current ranks' table). - The core PageRank formula implemented is: `NewRank(v) = ( (1 - resetProb) / N ) + resetProb * Sum( Rank(u) / OutDegree(u) )` for all vertices `u` that link to `v`. - **Dangling Node Handling:** Rank from dangling nodes (vertices with no outgoing edges) is collected and redistributed evenly among all nodes: `DanglingContribution = resetProb * (TotalDanglingRank / N)` - The SQL implementation combines these steps using Common Table Expressions (CTEs): - `EdgeContributions`: Calculates the `Sum( Rank(u) / OutDegree(u) )` part for each destination vertex. - `DanglingRank`: Calculates the total rank of all dangling nodes. - The final SELECT combines the base probability `(1 - resetProb) / N`, the edge contributions, and the redistributed dangling rank contribution. - Intermediate 'previous' and 'current' rank tables are swapped/managed between iterations. 4. **Final Result:** - After the final iteration, the computed ranks (from the final 'current ranks' table) are joined with the original input vertices table (`inputVerticesTable`). - The result, containing all original vertex columns plus the new `pagerank` column, is stored in the specified `resultVerticesTable`. 5. **Indexing:** Creates indexes on the `resultVerticesTable` as specified by `resultVerticesIndexes`. 6. **Cleanup:** Ensures that all temporary intermediate tables created during the process are dropped, both on successful completion and in case of exceptions during execution.## Result Table Columns: The created `resultVerticesTable` in `resultSchema` will contain:
- All columns present in the original `inputVerticesTable`.
- A new column named `pagerank` of type `DOUBLE PRECISION`, containing the calculated static PageRank score for each vertex.
- Parameters:
inputSchema- The database schema containing the input graph tables. Cannot be null or empty.inputVerticesTable- The name of the table containing vertex information. Must contain at least an 'id' column representing the unique vertex identifier (assumed to be compatible with edge table IDs). Cannot be null or empty.inputEdgesTable- The name of the table containing the directed edges of the graph. Must contain at least 'srcid' (source vertex ID) and 'destid' (destination vertex ID) columns. Cannot be null or empty.resultSchema- The database schema where the result table will be created. Cannot be null or empty.resultVerticesTable- The name for the new table that will store the results. This table will contain all columns from `inputVerticesTable` plus a 'pagerank' column. Cannot be null or empty, and must not conflict with input tables.numIterations- The number of PageRank iterations to perform. Must be a positive integer.resetProb- The reset probability, also known as the damping factor (often denoted as 'alpha' or 'd'). Represents the probability at each step of continuing to follow graph links (as opposed to teleporting to a random vertex). Must be between 0.0 and 1.0 (inclusive). A typical value is 0.85.resultVerticesIndexes- An optional list of column names from the `resultVerticesTable` on which to create database indexes after the table is populated. Pass an empty list if no indexes are needed. Cannot be null.personalizationSrcId- If not null, computes Personalized PageRank biased towards this vertex ID. If null, computes standard (non-personalized) PageRank where the teleportation is uniform.stmt- The JDBCStatementobject used to execute SQL commands against the database. Cannot be null.- Throws:
SQLException- If any input parameters are invalid, if a database error occurs during execution (e.g., table creation/access failure, SQL syntax error), or if an internal error occurs (like an intermediate table unexpectedly disappearing).
-
dynamicPageRank
public static void dynamicPageRank(String inputSchema, String inputVerticesTable, String inputEdgesTable, String resultSchema, String resultVerticesTable, double tolerance, double resetProb, ArrayList<String> resultVerticesIndexes, Long personalizationSrcId, Statement stmt) throws SQLException Computes PageRank scores for vertices in a graph, optionally personalized, by iterating until the scores converge within a specified tolerance. Results are stored in a new vertices table. Intermediate tables are cleaned up.## Algorithm Overview: 1. **Validation:** Checks that all input parameters (schemas, tables, statement, numerical constraints) are valid. Throws
SQLExceptionon validation failure. 2. **Initialization:** - Creates uniquely named temporary tables for storing ranks during iterations. - Calculates the total number of vertices (N). - If the graph is empty (N=0), creates an empty result table with the correct schema and returns. - Initializes the rank for each vertex: - **Standard PageRank:** Each vertex starts with a rank of `1.0 / N`. - **Personalized PageRank:** The `personalizationSrcId` vertex starts with a rank of `1.0`, and all other vertices start with `0.0`. 3. **Iteration until Convergence:** Performs iterations of the PageRank update rule. - Unlike static PageRank which runs for a fixed number of iterations[2], this method iterates until the PageRank scores stabilize. - **Convergence Check:** After each iteration, it calculates the sum of the absolute differences (`delta`) between the rank of each vertex in the current iteration and the previous iteration. The loop continues as long as `delta` is greater than the specified `tolerance`[6]. - **Update Rule:** In each iteration, new ranks are computed based on the ranks from the previous iteration using the standard PageRank formula adjusted for personalization and dangling nodes: `NewRank(v) = BaseProbability + resetProb * Sum( Rank(u) / OutDegree(u) )` (for neighbors u linking to v) `+ RedistributedDanglingRank` - The `BaseProbability` and contributions are slightly adjusted in the SQL depending on whether personalization is active to correctly handle the reset probability distribution. - **Dangling Node Handling:** Rank from dangling nodes (vertices with no outgoing edges) is collected and redistributed evenly among all nodes[6]. - Intermediate 'previous' and 'current' rank tables are swapped/managed between iterations. 4. **Final Result:** - Once convergence is reached (`delta invalid input: '<'= tolerance`), the final computed ranks are joined with the original input vertices table (`inputVerticesTable`). - The result, containing all original vertex columns plus the new `pagerank` column, is stored in the specified `resultVerticesTable`. 5. **Indexing:** Creates indexes on the `resultVerticesTable` as specified by `resultVerticesIndexes`. 6. **Cleanup:** Ensures that all temporary intermediate tables created during the process are dropped, both on successful completion and in case of exceptions.## Clarification on "Dynamic": Note: While sometimes referred to as 'dynamic', this implementation calculates PageRank until convergence on the graph state provided at execution time[2][6]. It does not perform incremental updates based on real-time graph changes (like some 'online' or 'streaming' PageRank algorithms designed for constantly evolving graphs[3][4]). The dynamic aspect here refers to the convergence-based termination condition.
## Result Table Columns: The created `resultVerticesTable` in `resultSchema` will contain:
- All columns present in the original `inputVerticesTable`.
- A new column named `pagerank` of type `DOUBLE PRECISION`, containing the calculated PageRank score for each vertex after convergence.
- Parameters:
inputSchema- The database schema containing the input graph tables. Cannot be null or empty.inputVerticesTable- The name of the table containing vertex information. Must contain at least an 'id' column. Cannot be null or empty.inputEdgesTable- The name of the table containing the directed edges. Must contain 'srcid' and 'destid' columns. Cannot be null or empty.resultSchema- The database schema where the result table will be created. Cannot be null or empty.resultVerticesTable- The name for the new table storing results (original vertex columns + 'pagerank'). Cannot be null or empty, and must not conflict with input tables.tolerance- The convergence tolerance. Iterations stop when the sum of absolute differences between vertex ranks in consecutive iterations is less than or equal to this value. Must be non-negative. A smaller value leads to higher accuracy but potentially more iterations[6].resetProb- The reset probability (damping factor alpha), typically 0.85[2][6]. Probability of following a link vs. teleporting. Must be between 0.0 and 1.0.resultVerticesIndexes- An optional list of column names from `resultVerticesTable` for index creation. Pass an empty list if no indexes are needed. Cannot be null.personalizationSrcId- If not null, computes Personalized PageRank biased towards this vertex ID[2]. If null, computes standard PageRank.stmt- The JDBCStatementobject for executing SQL commands. Cannot be null.- Throws:
SQLException- If any input parameters are invalid, if a database error occurs (SQL execution, table access), or if an internal error occurs (e.g., intermediate table issues, failure to calculate delta).
-