Specification: Ballerina Avro Library

Authors: @Nuvindu
Reviewers: @ThisaruGuruge
Created: 2024/04/04
Updated: 2024/04/04
Edition: Swan Lake

Introduction

The Ballerina Avro module is designed to provide an easy way to convert data to bytes according to an Avro schema and to convert serialized bytes to a specific Ballerina type.

The Avro library specification has evolved and may continue to evolve in the future. The released versions of the specification can be found under the relevant GitHub tag.

If you have any feedback or suggestions about the library, start a discussion via a GitHub issue or in the Discord server. Based on the outcome of the discussion, the specification and implementation can be updated. Community feedback is always welcome. Any accepted proposal, which affects the specification is stored under /docs/proposals. Proposals under discussion can be found with the label type/proposal in GitHub.

The conforming implementation of the specification is released and included in the distribution. Any deviation from the specification is considered a bug.

Contents

  1. Overview
  2. Initialize the Avro instance
  3. Serialize data into bytes
  4. Deserialize bytes to a specific Ballerina type
  5. The avro:Error type

1. Overview

This specification elaborates on serializing data to byte[], as well as deserializing a byte[] to a specific Ballerina type.

The Avro module provides the following functionalities.

  1. Serialize data into bytes
  2. Deserialize bytes to a specific Ballerina type

2. Initialize the Avro instance

The avro:Schema instance needs to be initialized before performing the functionalities.

2.1 The init method

The init method can be used to initialize the avro:Schema instance. This method has a parameter named schema which accepts Avro schemas in the string format. The method will return an avro:Error in case of failure.

vro:Schema schema = check new ("avro-schema-string");

3. Serialize data into bytes

This section describes the details of serializing Ballerina data into byte arrays.

3.1 The toAvro API

The toAvro API can be used to serialize data into byte[].

yte[] serializedData = check schema.toAvro("avro-data");

3.1.1 API parameters

3.1.1.1 The data parameter

The data parameter accepts the following Ballerina data types that is needed to be serialized into byte array.

3.1.1.1.1 Map Avro types to Ballerina Types

The following table summarizes how Avro types are mapped to corresponding Ballerina types. These rules are applicable when serializing/deserializing Ballerina data according to an Avro schema.

Avro TypeBallerina Type
nullnil
booleanboolean
int,longint
float,doublefloat
bytesbyte[]
stringstring
recordrecord
enumenum
arrayarray
mapmap
fixedbyte[]

Note: The Ballerina int type can represent integers up to 64 bits in size using the two's complement representation. Therefore, it can handle both int (32-bit signed integer) and long (64-bit signed integer) Avro types.

Note: The Ballerina float type supports the IEEE 754-2008 64-bit binary (radix 2) floating-point number standard. Therefore, it can handle both float (32-bit single precision IEEE 754 floating-point number) and double (64-bit double precision IEEE 754 floating-point number) Avro types.

3.1.2 Return type

The function returns a byte[] or a avro:Error based on the conversion.

4. Deserialize bytes to a specific Ballerina type

The Avro module provides an API to deserialize a given byte[] to a given Ballerina type.

4.1 The fromAvro API

The fromAvro API facilitates the deserialization of Avro byte[] into a given Ballerina type.

tring deserializedData = check schema.fromAvro(data);

4.1.1 API parameters

4.1.1.1 The data parameter

The data parameter is an Avro byte[] that needs to be converted to a Ballerina type.

4.1.1.2 The targetType parameter

The targetType parameter accepts the type descriptor of the target Ballerina type.

4.1.2 Return type

The return type will be inferred from the user specified type on success, or a avro:Error in case of conversion errors.

5. The avro:Error type

The avro:Error type represents all the errors related to the Avro module. This is a subtype of the Ballerina error type.