This guide walks you through using query expressions (a.k.a. language integrated queries) on data to filter, sort, and join with different data sets to produce new data.
Ballerina has first-class support for writing SQL-like queries to process data. Language-integrated queries can process any Ballerina iterable.
Set up the prerequisites
To run this guide, you need the following prerequisites:
- Ballerina 2201.0.0 (Swan Lake) or greater
- A text editor
Tip: Preferably, Visual Studio Code with the Ballerina extension installed.
- A command terminal
Create the Ballerina package
Ballerina uses packages to group code. You need to create a Ballerina package and write the business logic in it. In the terminal, execute the command below to create the Ballerina package for the implementation.
Info: For more information on Ballerina packages, see Organize Ballerina code.
$ bal new query_expressions
You view the output below.
Created new package 'query_expressions' at query_expressions.
This creates a directory named query_expressions
with the default module along with a sample code for the service as shown below.
. ├── Ballerina.toml └── main.bal 0 directories, 2 files
Create the dataset
To keep things simple, an in-memory table is used to store the COVID-19 dataset. To create the dataset, replace the main.bal
file with the code below.
public type CovidEntry record {| readonly string iso_code; string country; decimal cases; decimal deaths; decimal recovered; decimal active; |}; public final table<CovidEntry> key(iso_code) covidTable = table [ {iso_code: "AFG", country: "Afghanistan", cases: 159303, deaths: 7386, recovered: 146084, active: 5833}, {iso_code: "SL", country: "Sri Lanka", cases: 598536, deaths: 5243, recovered: 568637, active: 14656}, {iso_code: "US", country: "USA", cases: 69808350, deaths: 880976, recovered: 43892277, active: 25035097}, {iso_code: "IND", country: "India", cases: 80808350, deaths: 980976, recovered: 33892279, active: 35035095} ];
In this code:
- Each record of type
CovidEntry
in the table represents the COVID-19 data related to a particular country. - The
iso_code
is used to uniquely identify a country and other fields are self-explanatory.
Filter the data
Create the filterCountriesByCases
function
To define a function, which will filter out the records, which have values higher than 100,000 for the cases
field, add the code below to the main.bal
file.
public function filterCountriesByCases(table<CovidEntry> dataTable, decimal noOfCases) returns string[] { string[] filteredCountries = from CovidEntry entry in dataTable where entry.cases > noOfCases select entry.country; return filteredCountries; }
In this code:
- The
filterCountriesByCases
function uses a query expression to iterate the records in thedataTable
table and filter only the records, which have more thannoOfCases
cases. Awhere
clause with a condition is used to filter. - All records, which satisfy the condition in the
where
clause will be selected.
Create the main
function
To call the filterCountriesByCases
function from inside the main
function, add the code below to the main.bal
file.
public function main() { string[] countries = filterCountriesByCases(covidTable, 10000000); io:println("Countries with more than 10 million cases: ", countries); }
In this code,
- The
filterCountriesByCases
function is called andcovidTable
and10000000
are provided as parameters so that the function will filter the countries, which have more than 10000000 COVID-19 cases. - The next line prints the result of the function.
The complete code of the filtering function
Below is the complete code after adding the filtering function.
public function filterCountriesByCases(table<CovidEntry> dataTable, decimal noOfCases) returns string[] { string[] filteredCountries = from CovidEntry entry in dataTable where entry.cases > noOfCases select entry.country; return filteredCountries; } public function main() { string[] countries = filterCountriesByCases(covidTable, 10000000); io:println("Countries with more than 10 million cases: ", countries); }
Run the package for filtering
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
$ bal run
Info: The console should have warning logs related to the isolatedness of resources. It is a built-in service concurrency safety feature of Ballerina.
You view the output below.
Compiling source .../query_expressions:0.1.0 Running executable Countries with more than 10 million cases: ["USA","India"]
Sort countries by COVID-19 deaths
Create the findCountriesByHighestNoOfDeaths
function
To define a new function to find the top three countries with the highest number of COVID-19 deaths, add the code below to the main.bal
file.
public function findCountriesByHighestNoOfDeaths(table<CovidEntry> dataTable, int n) returns [string, decimal][] { [string, decimal][] countriesWithDeaths = from CovidEntry entry in dataTable order by entry.deaths descending limit n select [entry.country, entry.deaths]; return countriesWithDeaths; }
In this code,
- You use another query to sort and retrieve a limited number of records from the table.
- The
findCountriesByHighestNoOfDeaths
function uses a query expression to find the top three countries with the highest COVID-19 deaths. - The
order by
clause is used to sort the records in the table indescending
order and thelimit
clause to limit the number of output records of the query ton
. - The query produces an array of tuples of type
[string, decimal]
as the result. Each tuple contains the country name and the number of reported deaths. - The produced array is in descending order by the number of deaths.
Update the main
function for sorting
To call the findCountriesByHighestNoOfDeaths
function from within the main
function to find the top three countries by the number of deaths, add the code below to the main
function of the main.bal
file.
[string, decimal][] countriesWithDeaths = findCountriesByHighestNoOfDeaths(covidTable, 3); io:println("Countries with highest deaths:", countriesWithDeaths);
The complete code with the sorting function
Below is the complete code after adding the sorting function.
public function filterCountriesByCases(table<CovidEntry> dataTable, decimal noOfCases) returns string[] { string[] filteredCountries = from CovidEntry entry in dataTable where entry.cases > noOfCases select entry.country; return filteredCountries; } public function findCountriesByHighestNoOfDeaths(table<CovidEntry> dataTable, int n) returns [string, decimal][] { [string, decimal][] countriesWithDeaths = from CovidEntry entry in dataTable order by entry.deaths descending limit n select [entry.country, entry.deaths]; return countriesWithDeaths; } public function main() { string[] countries = filterCountriesByCases(covidTable, 10000000); io:println("Countries with more than 10 million cases: ", countries); [string, decimal][] countriesWithDeaths = findCountriesByHighestNoOfDeaths(covidTable, 3); io:println("Countries with highest deaths:", countriesWithDeaths); }
Run the package for sorting
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
$ bal run
You view the output below.
Compiling source .../query_expressions:0.1.0 Running executable Countries with more than 10 million cases: ["USA","India"] Countries with highest deaths:[["India",980976],["USA",880976],["Afghanistan",7386]]
Join with another data source
Create the findRecoveredPatientsOfCountries
function
Using query expressions, you can join two collections and produce a new collection. The join
operation is similar to the SQL join
operation. To join the covidTable
with a string array, which contains three countries, add the code below to the main.bal
file.
public function findRecoveredPatientsOfCountries(table<CovidEntry> dataTable, string[] countries) returns [string, decimal][] { [string, decimal][] countriesWithRecovered = from CovidEntry entry in dataTable join string country in countries on entry.country equals country select [entry.country, entry.recovered]; return countriesWithRecovered; }
In this code:
- The
findRecoveredPatientsOfCountries
function uses a query expression to join thedataTable
table with an array of strings namedcountries
. The join condition is provided after theon
keyword. - For every record in the
dataTable
, all the elements in thecountries
array will be joined. - The output array of tuples will have the country and the number of recovered patients only if the condition after the
on
keyword is satisfied for that particular pair of table records and element of the array being joined.
Update the main
function for joining
To call the findRecoveredPatientsOfCountries
function at the end to get the number of recovered patients, add the code below to the main
function of the main.bal
file.
Info: You will get the number of recovered patients in the USA, India, and Afghanistan.
string[] c = ["USA", "India", "Afghanistan"]; [string, decimal][] countriesWithRecovered = findRecoveredPatientsOfCountries(covidTable, c); io:println("Countries with number of Recovered patients:", countriesWithRecovered);
The complete code with the joining function
Below is the complete code after adding the joining function.
public function filterCountriesByCases(table<CovidEntry> dataTable, decimal noOfCases) returns string[] { string[] filteredCountries = from CovidEntry entry in dataTable where entry.cases > noOfCases select entry.country; return filteredCountries; } public function findCountriesByHighestNoOfDeaths(table<CovidEntry> dataTable, int n) returns [string, decimal][] { [string, decimal][] countriesWithDeaths = from CovidEntry entry in dataTable order by entry.deaths descending limit n select [entry.country, entry.deaths]; return countriesWithDeaths; } public function findRecoveredPatientsOfCountries(table<CovidEntry> dataTable, string[] countries) returns [string, decimal][] { [string, decimal][] countriesWithRecovered = from CovidEntry entry in dataTable join string country in countries on entry.country equals country select [entry.country, entry.recovered]; return countriesWithRecovered; } public function main() { string[] countries = filterCountriesByCases(covidTable, 10000000); io:println("Countries with more than 10 million cases: ", countries); [string, decimal][] countriesWithDeaths = findCountriesByHighestNoOfDeaths(covidTable, 3); io:println("Countries with highest deaths:", countriesWithDeaths); string[] c = ["USA", "India", "Afghanistan"]; [string, decimal][] countriesWithRecovered = findRecoveredPatientsOfCountries(covidTable, c); io:println("Countries with number of Recovered patients:", countriesWithRecovered); }
Run the package for joining
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
$ bal run
You view the output below.
Compiling source .../query_expressions:0.1.0 Running executable Countries with more than 10 million cases: ["USA","India"] Countries with highest deaths:[["India",980976],["USA",880976],["Afghanistan",7386]] Countries with number of Recovered patients:[["Afghanistan",146084],["USA",43892277],["India",33892279]]
Find discrepancies in the dataset
Create the printErroneousData
function
This example shows how you can use the let
clause to maintain an intermediate state while iterating a collection using query expression and how to use that intermediate state for further processing. For example, in this dataset, the total number of reported cases should be equal to the sum of the number of deaths, recovered, and active. If they are not equal, an error should have occurred while the dataset is populated.
To define a function called printErroneousData
to find any erroneous records in the dataset, add the code below to the main.bal
file.
public function printErroneousData(table<CovidEntry> dataTable) { string[] countries = from CovidEntry entry in dataTable let decimal sum = entry.recovered + entry.deaths + entry.active where entry.cases != sum select entry.country; if countries.length() > 0 { io:println("Found erroneous entries for countries: ", countries); } }
In this code:
- If there is any record in which the number of reported
cases
is not equal to the sum ofrecovered
,active
, anddeaths
, this function will print it. - Here, you use the
sum
variable to hold the results of intermediate calculations in the query expression.
Note: Even though this particular example uses a separate variable to demonstrate the usage of the
let
clause, you can do the calculations inline from within thewhere
clause also.
Call the printErroneousData
function
To call the findRecoveredPatientsOfCountries
function at the end to get the number of recovered patients, add the code below to the main
function of the main.bal
file.
printErroneousData(covidTable);
The complete code
Below is the complete code after adding the finding function.
public function filterCountriesByCases(table<CovidEntry> dataTable, decimal noOfCases) returns string[] { string[] filteredCountries = from CovidEntry entry in dataTable where entry.cases > noOfCases select entry.country; return filteredCountries; } public function findCountriesByHighestNoOfDeaths(table<CovidEntry> dataTable, int n) returns [string, decimal][] { [string, decimal][] countriesWithDeaths = from CovidEntry entry in dataTable order by entry.deaths descending limit n select [entry.country, entry.deaths]; return countriesWithDeaths; } public function findRecoveredPatientsOfCountries(table<CovidEntry> dataTable, string[] countries) returns [string, decimal][] { [string, decimal][] countriesWithRecovered = from CovidEntry entry in dataTable join string country in countries on entry.country equals country select [entry.country, entry.recovered]; return countriesWithRecovered; } public function printErroneousData(table<CovidEntry> dataTable) { string[] countries = from CovidEntry entry in dataTable let decimal sum = entry.recovered + entry.deaths + entry.active where entry.cases != sum select entry.country; if countries.length() > 0 { io:println("Found erroneous entries for countries: ", countries); } } public function main() { string[] countries = filterCountriesByCases(covidTable, 10000000); io:println("Countries with more than 10 million cases: ", countries); [string, decimal][] countriesWithDeaths = findCountriesByHighestNoOfDeaths(covidTable, 3); io:println("Countries with highest deaths:", countriesWithDeaths); string[] c = ["USA", "India", "Afghanistan"]; [string, decimal][] countriesWithRecovered = findRecoveredPatientsOfCountries(covidTable, c); io:println("Countries with number of Recovered patients:", countriesWithRecovered); printErroneousData(covidTable); }
Run the package
In the terminal, navigate to the query_expressions
directory, and execute the command below to run the service package.
$ bal run
You view the output below.
Compiling source .../query_expressions:0.1.0 Running executable Countries with more than 10 million cases: ["USA","India"] Countries with highest deaths:[["India",980976],["USA",880976],["Afghanistan",7386]] Countries with number of Recovered patients:[["Afghanistan",146084],["USA",43892277],["India",33892279]] Found erroneous entries for countries: ["Sri Lanka","India"]