Specification: Ballerina SQL Library

Owners: @daneshk @niveathika
Reviewers: @daneshk
Created: 2022/01/13
Updated: 2023/03/07
Edition: Swan Lake

Introduction

This is the specification for the SQL standard library of Ballerina language, which provides the generic interface and functionality to interact with a SQL database.

The SQL library specification has evolved and may continue to evolve in the future. The released versions of the specification can be found under the relevant GitHub tag.

If you have any feedback or suggestions about the library, start a discussion via a GitHub issue or in the Discord server. Based on the outcome of the discussion, the specification and implementation can be updated. Community feedback is always welcome. Any accepted proposal, which affects the specification is stored under /docs/proposals. Proposals under discussion can be found with the label type/proposal in GitHub.

The conforming implementation of the specification is released and included in the distribution. Any deviation from the specification is considered a bug.

Overview
Client
2.1. Handle connection pools
2.2. Closing the Client
Queries and Values
3.1. ParameterizedQuery and Values
3.2. ParameterizedCallQuery and Parameters
3.3. Query concatenation
Database operations
4.1. Query
4.2. Query row
4.3. Execute
4.4. Batch execute
4.5. Call
Metadata operations
5.1. Schema client
5.2. Metadata types
5.3. Metadata methods
Errors

1. Overview

This specification elaborates on the generic Client interface used in ballerina SQL connectors to interface with a relational database such as MySQL, MSSQL, Postgresql and OracleDB.

Client supports five database operations as follows,

Executes the query, which may return multiple results.
Executes the query, which is expected to return at most one row of the result.
Executes the SQL query. Only the metadata of the execution is returned.
Executes the SQL query with multiple sets of parameters in a batch. Only the metadata of the execution is returned.
Executes a SQL query, which calls a stored procedure. This can either return results or nil.

All the above operations make use of ParameterizedQuery object, backtick surrounded string template to pass SQL statements to the database. ParameterizedQuery supports passing of Ballerina basic types or Typed SQL Values such as CharValue, BigIntValue, etc. to indicate parameter types in SQL statements.

2. Client

Each client represents a pool of connections to the database. The pool of connections is maintained throughout the lifetime of the client.

2.1. Handle connection pools

Configuration available for tweaking the connection pool properties:

# The properties, which are used to configure a DB connection pool.
# Default values of the fields can be set through the configuration API.
#
# + maxOpenConnections - The maximum number of open connections that the pool is allowed to have.
#                        Includes both idle and in-use connections. The default value is 15. This can be changed through
#                        the configuration API with the `ballerina.sql.maxOpenConnections` key
# + maxConnectionLifeTime - The maximum lifetime (in seconds) of a connection in the pool. The default value is 1800
#                           seconds (30 minutes). This can be changed through the configuration API with the
#                           `ballerina.sql.maxConnectionLifeTime` key. A value of 0 indicates an unlimited maximum
#                           lifetime (infinite lifetime)
# + minIdleConnections - The minimum number of idle connections that the pool tries to maintain. The default
#                        value is the same as `maxOpenConnections` and it can be changed through the configuration
#                        API with the `ballerina.sql.minIdleConnections` key
public type ConnectionPool record {|
    int maxOpenConnections = maxOpenConnections;
    decimal maxConnectionLifeTime = maxConnectionLifeTime;
    int minIdleConnections = minIdleConnections;
|};

There are three possible scenarios for connection pool handling,

Global, shareable, default connection pool

If the poolOptions field is NOT provided when creating the database client, a globally-shareable pool will be created for the database connection unless a connection pool matching with the properties provided already exists. The JDBC module sample below shows how the global connection pool is used.
```
jdbc:Client|sql:Error dbClient = new ("jdbc:mysql://localhost:3306/testdb", "root", "root");
```

Client-owned, un-sharable connection pool

If the connectionPool field is defined inline when creating the database client with the sql:ConnectionPool type, an un-sharable connection pool will be created. The JDBC module sample below shows how the global connection pool is used.

jdbc:Client|sql:Error dbClient =  new (url = "jdbc:mysql://localhost:3306/testdb",
                                            connectionPool = { maxOpenConnections: 5 });

Local, shareable connection pool

If a record of the sql:ConnectionPool type is created and reused in the configuration of multiple clients, for each set of clients that connects to the same database instance with the same set of properties, a shared connection pool will be created. The JDBC module sample below shows how the global connection pool is used.

sql:ConnectionPool connPool = { maxOpenConnections: 5 };

jdbc:Client|sql:Error dbClient1 = new (url = "jdbc:mysql://localhost:3306/testdb", connectionPool = connPool);
jdbc:Client|sql:Error dbClient2 = new (url = "jdbc:mysql://localhost:3306/testdb", connectionPool = connPool);
jdbc:Client|sql:Error dbClient3 = new (url = "jdbc:mysql://localhost:3306/testdb", connectionPool = connPool);

2.2. Close the client

Once all the database operations are performed, the client can be closed by invoking the close() operation. This will close the corresponding connection pool if it is not shared by any other database clients.

# Closes the SQL client and shuts down the connection pool.
#
# + return - Possible error when closing the client
public isolated function close() returns Error?;

3. Queries and values

3.1. `ParameterizedQuery` and values

The sql:ParameterizedQuery is used to construct the SQL query to be executed by the client. It is backtick string template which allows dynamic values for query parameters.

Query with constant values

sql:ParameterizedQuery query = `SELECT * FROM students WHERE id < 10 AND age > 12`;

Query with dynamic values

int[] ids = [10, 50];
int age = 12;
sql:ParameterizedQuery query = `SELECT * FROM students WHERE id < ${ids[0]} AND age > ${age}`;

All primitive Ballerina types are supported for parameter values in sql:ParameterizedQuery. Following mapping is used to map ballerina types to SQL types,

Ballerina type	SQL type
String	VARCHAR
int	INTEGER
boolean	BOOLEAN
float	REAL
decimal	DECIMAL
byte[]	BINARY
xml	SQLXML
record {}	STRUCT
Array	ARRAY

In addition to the above time module constructs are used to represent SQL Time data types.

`time` constructs	SQL type
time:Date	DATE
time:TimeOfDay	TIME
time:Civil	DATETIME
time:Utc	TIMESTAMP

Furthermore, typed values are used to map values to a specific SQL data type such as BIGINT,

BigIntValue value = new(1000000000000);
sql:ParameterizedQuery query = `SELECT * FROM students WHERE id < ${value}`;

List of typed values:

VarcharValue
NVarcharValue
CharValue
NCharValue
TextValue
ClobValue
NClobValue
SmallIntValue
IntegerValue
BigIntValue
NumericValue
DecimalValue
RealValue
FloatValue
DoubleValue
BitValue
BooleanValue
BinaryValue
VarBinaryValue
BlobValue
DateValue
TimeValue
DateTimeValue
TimestampValue
RefValue
StructValue
RowValue

3.2. `ParameterizedCallQuery` and parameters

The sql:ParameterizedCallQuery is used to construct the SQL CALL Statement to be executed by the client. It is backtick string template which allows dynamic values for query parameters. In addition to Values supported by sql:ParameterizedQuery, sql:ParameterizedCallQuery supports following SQL Parameters,

InOutParameter
Typed OutParameters

These types can be used to retrieve values from SQL stored procedures using the get() method.

# Parses returned Char SQL value to a ballerina value.
#
# + typeDesc - The `typedesc` of the type to which the result needs to be returned
# + return - The result in the `typeDesc` type, or an `sql:Error`
public isolated function get(typedesc<anydata> typeDesc = <>) returns typeDesc|Error;

InOutParameter parameter = new ("varchar value");

// Execute the DB call method

string procedureParam = parameter.get();

Type of the returned value is inferred from LHS of the expression.

In addition to the above parameters, it has CursorOutParameter to retrieve the result set from the SQL stored procedure.

# Parses returned SQL result set values to a ballerina stream value.
#
# + typeDesc - The `typedesc` of the record to which the result needs to be returned
# + return - Stream of records in the `rowType` type
public isolated function get(typedesc<record {}> typeDesc = <>) returns stream<record {}, Error?>;

CursorOutParameter cursor = new;

// Execute the DB call method

stream<record{}, sql:Error?> resultStream = cursor.get();

Note: In the case of a stored procedure query that returns a result set along with the mentioned parameters, the get() method of these parameters should only be invoked after consuming the result set. Otherwise, consuming the result set fails with a 'Error when iterating the SQL result. The result set is closed.' error.

3.3. Query concatenation

sql:ParameterizedQuery can be concatenated using util methods such as sql:queryConcat() and sql:arrayFlattenQuery() which makes it easier to create a dynamic/constant complex query.

The sql:queryConcat() is used to create a parameterized query by concatenating a set of parameterized queries.

int id = 10;
int age = 12;
sql:ParameterizedQuery query = `SELECT * FROM students`;
sql:ParameterizedQuery query1 = ` WHERE id < ${id} AND age > ${age}`;
sql:ParameterizedQuery sqlQuery = sql:queryConcat(query, query1);

The query with the IN operator can be created using the sql:ParameterizedQuery(). Here the values in the array are flattened with a comma separator to add it in the query.

int[] ids = [1, 2, 3];
sql:ParameterizedQuery query = `SELECT count(*) as total FROM DataTable 
                                WHERE row_id in (${ids[0]}, ${ids[1]}, ${ids[2]})`;

The util function sql:arrayFlattenQuery() can be used to make the array flatten easier. It makes the inclusion of varying array elements into the query easier by flattening the array to return a parameterized query.

int[] ids = [1, 2];
sql:ParameterizedQuery sqlQuery = sql:queryConcat(`SELECT * FROM DataTable WHERE id IN (`, 
                                          sql:arrayFlattenQuery(ids), `)`);

4. Database operations

The client supports five remote methods, each for one Database operation

4.1. Query

query() remote method execute the SQL query and returns multiple results.

# Executes the query, which may return multiple results.
#
# + sqlQuery - The SQL query
# + rowType - The `typedesc` of the record to which the result needs to be returned
# + return - Stream of records in the `rowType` type
remote isolated function query(ParameterizedQuery sqlQuery, typedesc<record {}> rowType = <>) 
                        returns stream <rowType, Error?>;

Here the returned stream can consist of following types of records,

Open record
The property name in the open record type will be the same as how the column is defined in the database.

sql:ParameterizedQuery query = `SELECT * FROM students WHERE id < ${id} AND age > ${age}`;
stream<record {}, sql:Error?> resultStream = dbClient->query(query);

Typed record
A ballerina record type is created to represent the returned result set. the SELECT query is executed via the query remote method of the client. Once the query is executed, each data record can be retrieved by looping the result set. The stream returned by the select operation holds a pointer to the actual data in the database, and it loads data from the table only when it is accessed. This stream can be iterated only once.

This record can be defined as an open or a closed record according to the requirement. If an open record is defined, the returned stream type will include both defined fields in the record and additional database columns fetched by the SQL query that are not defined in the record. Additional column names added to the returned record as in the SQL query. If the record is defined as a close record, only defined fields in the record are returned or gives an error when additional columns present in the SQL query.

Note the mapping of the database column to the returned record's property is case-insensitive if it is defined in the record(i.e., the ID column in the result can be mapped to the id property in the record).

type Student record {
    int id;
    int age;
    string name;
};

int id = 10;
int age = 12;
sql:ParameterizedQuery query = `SELECT * FROM students WHERE id < ${id} AND age > ${age}`;
stream<Student, sql:Error?> resultStream = dbClient->query(query);

sql:Column annotation can be used to map database columns to Typed record fields of different name. This annotation should be attached to record fields.

type Student record {
    int id;
    @sql:Column { name: "first_name" }
    string firstName;
    @sql:Column { name: "last_name" }
    string lastName
};

The above annotation will map the database column first_name to the Ballerina record field firstName. If the query() function does not return first_name column, the field will not be populated.

Multiple table columns can be matched to a single Ballerina record within a returned record. For instance if the query returns data from multiple tables as follows, schema Both TEACHERS.id and TEACHERS.name can be grouped to another Typed record such as Teacher type.

public type Students record {|
    int id;
    string name;
    string? age;
    float? gpa;
    Teachers teachers;
|}
type Teachers record {|
    int id;
    string name;
|}

In the above scenario also, sql:Column annotation can be used to rename field name such as,

public type Students record {|
    int id;
    string name;
    string? age;
    float? gpa;
    @sql:Column { name: "teachers" }
    Teacher teacher;
|}

The returned stream needs to be closed properly to release resources. The stream is automatically closed if either it is iterated fully or consists of an error. If result is accessed one by one using next() method, it should be closed after the required results are accessed.

# Releases the associated resources such as the database connection, results, etc.
#
# + return - An `sql:Error` if any error occurred while cleanup
public isolated function close() returns Error?;

Usage:

check resultStream.close();

4.2. Query row

queryRow() remote method executes the SQL query return at most one row of the result.

# Executes the query, which is expected to return at most one row of the result.
# If the query does not return any results, `sql:NoRowsError` is returned.
#
# + sqlQuery - The SQL query
# + returnType - The `typedesc` of the record to which the result needs to be returned.
#                It can be a basic type if the query result contains only one column
# + return - Result in the `returnType` type or an `sql:Error`
remote isolated function queryRow(ParameterizedQuery sqlQuery, typedesc<anydata> returnType = <>)
                                            returns returnType|Error;

The provided return type(inferred from LHS) can be of 2 types,

Ballerina record Returns only the first row retrieved by the query as a record.

int id = 10;
sql:ParameterizedQuery query = `SELECT * FROM students WHERE id = ${id}`;
Student retrievedStudent = check dbClient->queryRow(query);

Ballerina primitive type Return the value of the first column of the first row retrieved by the query.

int age = 12;
sql:ParameterizedQuery query = `SELECT COUNT(*) FROM students WHERE age < ${age}`;
int youngStudents = check dbClient->queryRow(query);

sql:NoRowsError is returned if the query does not return at most of one result.

4.3. Execute

execute() remote method executes the SQL query return metadata of the execution.

# Executes the SQL query. Only the metadata of the execution is returned (not the results from the query).
#
# + sqlQuery - The SQL query
# + return - Metadata of the query execution as an `sql:ExecutionResult` or an `sql:Error`
remote isolated function execute(ParameterizedQuery sqlQuery) returns ExecutionResult|Error;

The metadata is returned as sql:ExecutionResult record,

# Metadata of the query execution.
#
# + affectedRowCount - Number of rows affected by the execution of the query. It may be one of the following,  
#                      (1) A number greater than or equal to zero, the count of affected rows after the successful 
#                          execution of the query  
#                      (2) A value of the `SUCCESS_NO_INFO`, the count of affected rows is unknown after the successful 
#                          execution of the query  
#                      (3) A value of the `EXECUTION_FAILED`, the query execution failed
# + lastInsertId - The ID generated by the database in response to a query execution. This can be `()` in case 
#                  the database does not support this feature
public type ExecutionResult record {
    int? affectedRowCount;
    string|int? lastInsertId;
};

This sample demonstrates modifying data by executing an UPDATE statement via the execute remote method of the client.

int age = 23;
sql:ParameterizedQuery query = `UPDATE students SET name = 'John' WHERE age = ${age}`;
sql:ExecutionResult result = check dbClient->execute(query);

4.4. Batch execute

batchExecute() remote method executes the SQL query with multiple sets of parameters in a batch. To optimize memory utilisation, the inputs would be executed in batch sizes of 1000.

# Executes the SQL query with multiple sets of parameters in a batch. Only the metadata of the execution is returned
# (not the results from the query).
# If one of the commands in the batch fails, an `sql:BatchExecuteError` will be returned. However, the driver may
# or may not continue to process the remaining commands in the batch after a failure.
#
# + sqlQueries - The SQL query with multiple sets of parameters
# + return - Metadata of the query execution as an `sql:ExecutionResult[]` or an `sql:Error`
remote isolated function batchExecute(ParameterizedQuery[] sqlQueries) returns ExecutionResult[]|Error;

This sample demonstrates how to insert multiple records with a single INSERT statement that is executed via the batchExecute remote method of the client. This is done by creating a table with multiple records and parameterized SQL query as same as the above execute operations.

var data = [
  { name: "John", age: 25  },
  { name: "Peter", age: 24 },
  { name: "jane", age: 22 }
];

sql:ParameterizedQuery[] batch = from var row in data
                                 select `INSERT INTO students ('name', 'age')
                                           VALUES (${row.name}, ${row.age})`;
sql:ExecutionResult[] result = check dbClient->batchExecute(batch);

4.5. Call

call() remote method executes a SQL query, which calls a stored procedure. This can either return results or nil.

# Executes an SQL query, which calls a stored procedure. This may or may not return results.
#
# + sqlQuery - The SQL query
# + rowTypes - `typedesc` array of the records to which the results need to be returned
# + return - Summary of the execution and results are returned in an `sql:ProcedureCallResult`, or an `sql:Error`
remote isolated function call(ParameterizedCallQuery sqlQuery, typedesc<record {}>[] rowTypes = [])
returns ProcedureCallResult|Error;

The returned results are returned as of type sql:ProcedureCallResult,

# Represents the results from the `call` method holding the returned results or metadata of the query execution.
#
# + executionResult - Summary of the query execution
# + queryResult - Results from the SQL query
# + customResultIterator - Any custom result iterator to be used to override the default behaviour
public class ProcedureCallResult {
    public ExecutionResult? executionResult = ();
    public stream<record {}, Error?>? queryResult = ();
    public CustomResultIterator? customResultIterator;

    public isolated function init(CustomResultIterator? customResultIterator = ());

    # Updates `executionResult` or `queryResult` field with the succeeding result in the result list. 
    # This will also close the current result when called.
    #
    # + return - True if the next result is `queryResult`
    public isolated function getNextQueryResult() returns boolean|Error;

    # Releases the associated resources such as the database connection, results, etc.
    #
    # + return - An `sql:Error` if any error occurred while cleanup
    public isolated function close();
}

Here getNextQueryResult() can be used to update the pointer to iterate through the returned result sets.

This sample demonstrates how to execute a stored procedure with a single INSERT statement that is executed via the call remote method of the client.

int uid = 10;
sql:IntegerOutParameter insertId = new;

sql:ProcedureCallResult result = 
                         check dbClient->call(`call InsertPerson(${uid}, ${insertId})`);
stream<record{}, sql:Error?>? resultStr = result.queryResult;
if resultStr is stream<record{}, sql:Error?> {
    check from record{} value in resultStr
        do {
          // Can perform operations using the record 'result'.
        };
}
check result.close();

The close operation needs to be explicitly invoked on the sql:ProcedureCallResult to release the connection resources and avoid a connection leak as shown above.

5. Metadata operations

5.1. Schema client

The SchemaClient is used to query the database to retrieve the relevant metadata. On initialization, the user would have to provide credentials to access the relevant metadata tables. It is also required to provide the name of the database regarding which the metadata should be retrieved.

isolated client class MockSchemaClient {
    *SchemaClient;

    public function init(string host, int port, string user, string password, string database) returns Error? {
        self.database = database;
        self.dbClient = check new(url, user, password);
    }
}

This client will be extended by each SQL connector to customize the implementation as necessary.

5.2. Metadata types

New record-types would be introduced to represent the metadata which may be retrieved.

TableDefinition
ColumnDefinition
CheckConstraintDefinition
ReferentialConstraintDefinition
RoutineDefinition
ParameterDefinition

These record types contain only fields which are common to relational databases. However, they may be inherited and then extended by each SQL connector to provide support for additional database-specific fields.

Table definition

# Represents a table in the database.
#
# + name - The name of the table
# + 'type - Whether the table is a base table or a view
# + columns - The columns included in the table
public type TableDefinition record {
    string name;
    TableType 'type;
    ColumnDefinition[] columns?;
};

The columns field is optional as there are use cases where the user would not want to retrieve information regarding the columns of a table.

The TableType type is an enum, which can take one of two values:

BASE_TABLE
VIEW

Column definition

# Represents a column in a table.
#
# + name - The name of the column  
# + 'type - The SQL data-type associated with the column
# + defaultValue - The default value of the column  
# + nullable - Whether the column is nullable  
# + referentialConstraints - Referential constraints (foreign key relationships) associated with the column
# + checkConstraints - Check constraints associated with the column
public type ColumnDefinition record {
    string name;
    string 'type;
    anydata? defaultValue;
    boolean nullable;
    ReferentialConstraint[] referentialConstraints?;
    CheckConstraint[] checkConstraints?;
};

The referentialConstraints and checkConstraints fields are optional as there may be use cases where the user does not want to retrieve this information.

Referential constraint

# Represents a referential constraint (foriegn key constraint).
# 
# + name - The name of the constraint
# + tableName - The name of the table which contains the referenced column
# + columnName - The name of the referenced column
# + updateRule - The action taken when an update statement violates the constraint
# + deleteRule - The action taken when a delete statement violates the constraint
public type ReferentialConstraint record {
    string name;
    string tableName;
    string columnName;
    ReferentialRule updateRule;
    ReferentialRule deleteRule; 
};

The ReferentialRule type is an enum with four possible values:

NO_ACTION
RESTRICT
CASCADE
SET_NULL
SET_DEFAULT

Check constraint

# Represents a check constraint.
# 
# + name - The name of the constraint
# + clause - The actual text of the SQL definition statement
public type CheckConstraint record {
    string name;
    string clause;
};

Routine definition

# Represents a routine.
# 
# + name - The name of the routine
# + 'type - The type of the routine (procedure or function)
# + returnType - If the routine returns a value, the return data-type. Else ()
# + parameters - The parameters associated with the routine
public type RoutineDefinition record {
    string name;
    RoutineType 'type;
    string? returnType;
    ParameterDefinition[] parameters;
};

The RoutineType type is an enum which can take one of two values

PROCEDURE
FUNCTION

Parameter definition

# Represents a routine parameter.
# 
# + mode - The mode of the parameter (IN, OUT, INOUT)
# + name - The name of the parameter
# + type - The data-type of the parameter
public type ParameterDefinition record {
    ParameterMode mode;
    string name;
    string type;
};

The ParameterMode type is an enum which can take one of three values

IN
OUT
INOUT

5.3. Metadata methods

The SchemaClient contains six methods which may be used to retrieve metadata.

listTables()
getTableInfo()
listRoutines()
getRoutineInfo()

All of these methods will be implemented by each SQL connector.

Retrieve all tables

remote isolated function listTables() returns string[]|Error;

This would fetch the names of all the tables in the database.

Retrieve a single table

remote isolated function getTableInfo(string tableName, ColumnRetrievalOptions include = COLUMNS_ONLY) returns TableDefinition|Error;

public enum ColumnRetrievalOptions {
    NO_COLUMNS,
    COLUMNS_ONLY,
    COLUMNS_WITH_CONSTRAINTS
}

This would fetch all relevant information from a given table from the database. Based on the option provided for the include parameter, relevant column information would also be retrieved.

Retrieve all routines

remote isolated function listRoutines() returns string[]|Error;

This would fetch the names of all the routines created in the database.

Retrieve a single routine

remote isolated function getRoutineInfo(string name) returns RoutineDefinition|Error;

This would fetch all relevant information regarding the provided routine (including the parameter data).

6. Errors

sql package consists of following Errors,

.
└── Error                            # Generic error type for the `sql` module. 
    ├── DatabaseError                # Error caused by an issue related to database accessibility, erroneous queries, etc
    ├── BatchExecuteError            # Error that occurs during the execution of batch queries.
    ├── NoRowsError                  # Error when a query retrieves does not retrieve any rows when at least one row is expected.
    └── ApplicationError             # Error originating from application-level configurations.
        └── DataError                # Error during the processing of the parameters or returned results.
            ├── TypeMismatchError    # Error when a query retrieves a result that differs from the supported result type.
            ├── ConversionError      # Error when result is corrupted and cannot be casted to expected result type.
            ├── FieldMismatchError   # Error when a query retrieves a result that cannot be mapped to the expected record type.
            └── UnsupportedTypeError # Error that occurs when an unsupported parameter type is added to the query.

All errors include detailed error message.

Include error details along with this sql:DatabaseError as follows.

# Represents the properties belonging to an `sql:DatabaseError`.
#
# + errorCode - SQL error code
# + sqlState - SQL state
public type DatabaseErrorDetail record {
    int errorCode;
    string? sqlState;
};

sql:BatchExecuteError includes detail of the metadata of the executed queries before the error occurred.

# Represents the properties belonging to an `sql:BatchExecuteError`.
#
# + errorCode - SQL error code
# + sqlState - SQL state
# + executionResults - Metadata of the query executions
public type BatchExecuteErrorDetail record {
    int errorCode;
    string? sqlState;
    ExecutionResult[] executionResults;
};