Skip to main content

Extend and reuse an existing AirByte destination connector

AirByte is an open-source ELT (Extract, Load, and Transformation) application. It heavily uses containerization for the deployment of its various components. On the local machine, we need docker to run it.

AirByte has an impressive list of source and destination connectors available.

One of my use case data destinations is the ClickHouse data warehouse and its destination connector is not yet (2021-12-08) available.

As per the documentation, It seems that creating a destination connector is a non-trivial job. It's a great idea to build an open-source ClickHouse destination connector. However, I tried avoiding the temptation to create one because of the required effort.

AirByte has a MySql destination connector available. ClickHouse provides a MySQL connector for access from any MySQL client. We need to configure Clickhouse to give support for the MySQL connector. Accessing ClickHouse from AirByte using its MySQL destination connector looks promising. However, when I tried connecting it from AirByte a small snag prevented it from making a connection.

Error Public Key Retrieval is not allowed is thrown by AirByte.

The solution for this error is to pass a property allowPublicKeyRetrieval with TRUE value to the underlying JDBC driver (AirByte uses JDBC MySQL drive). Unfortunately, the AirByte setting doesn't have any field to take these additional JDBC driver properties in the MySQL destination connector. However, we can copy, extend and deploy an existing Connector as per our need and add them into AirByte as a new source.

I git clone AirByte git repo at a local directory. All official code for source and destination connectors can be found in the directory airbyte/airbyte-integrations/connectors.

I went ahead inside destination-mysql and made some changes to the appropriate Java Source file to accommodate allowPublicKeyRetrieval=true properties.

Inside destination-mysql we have Dockerfile, I made the change in the

LABEL io.airbyte.name=geekmj/destination-mysql

It will help in getting a docker image with geekmj/destination-mysql name.

To build the docker image I run the following command:


# Must be run from the Airbyte project root
./gradlew :airbyte-integrations:connectors:destination-<name>:build

I published the generated image to Docker Repository.

In AirByte settings > Destinations I added a new connector with the appropriate name.

While creating a new destination I was able to see the newly added destination connector now.

When I tried connecting to ClickHouse now I started seeing a new error.

	
 SQL Error [62] [00000]: Code: 62. DB::Exception: Syntax error: failed at position 54 ('FULL'): FULL TABLES FROM test. Expected one of: TABLES, CLUSTER, CHANGED, GRANTS, CREATE, ACCESS, QUOTA, SETTINGS, CURRENT ROLES, PRIVILEGES, PROCESSLIST, CLUSTERS, DATABASES, CURRENT QUOTA, ENABLED ROLES, CREATE, DICTIONARIES, USERS, ROLES, SETTINGS PROFILES, PROFILES, ROW POLICIES, POLICIES, QUOTAS. (SYNTAX_ERROR) (version 21.11.4.14 (official build))
  Code: 62. DB::Exception: Syntax error: failed at position 54 ('FULL'): FULL TABLES FROM test. Expected one of: TABLES, CLUSTER, CHANGED, GRANTS, CREATE, ACCESS, QUOTA, SETTINGS, CURRENT ROLES, PRIVILEGES, PROCESSLIST, CLUSTERS, DATABASES, CURRENT QUOTA, ENABLED ROLES, CREATE, DICTIONARIES, USERS, ROLES, SETTINGS PROFILES, PROFILES, ROW POLICIES, POLICIES, QUOTAS. (SYNTAX_ERROR) (version 21.11.4.14 (official build))

I realized when we try connecting to ClickHouse using the JDBC MySQL connector it doesn't work as expected. While I can see the connection is established with the server but table level queries started failing.

Hence I decided to create a ClickHouse destination connector from scratch using the JDBC Clickhouse driver.

Comments

Popular posts from this blog

Understanding Type Checking

A few examples of types in the context of programming language can be integer, float, character, string, array, etc.  When a program executes then data flow between instructions and values of specific types are assigned to a variable after some operation. It's important for the system to verify if the correct types are used as operands in operations. For e.g. In a sum operation, the expectation for operands to be of numeric type. The program's execution should fail in the case there is inconsistency. We can classify programming languages into two categories based as per their ability to cater to type safety: Dynamically Typed Language Statically Typed Language

Setting Clickhouse column data warehouse at Google Cloud Compute Engine VM

I didn't have a Google Cloud account associated with my email, so I signed up for one. It needs a valid Credit Card and mobile number to check if you are human. On successful sign up I get 300$ to spend within 3 months. Creating a free forever Google Cloud Compute Engine VM As per Google Cloud documentation you can have 1 non-preemptible e2-micro VM instance (1GB 2vCPU, 30GB Disk, etc.) per month free forever in some regions with some restrictions. I wanted the following stuff in my VM before I can install Clickhouse on to that: Ubuntu 20.x LTS SSH access from my machine Enabling SSH-based access to Google Compute Engine VM Step 1 Created an ssh private and public key on my mac using the following command ssh-keygen -t rsa -f ~/.ssh/gcloud-ssh-key -C mrityunjay -b 2048 Step 2 Copied the public key from the console using the following command: cat ~/.ssh/gcloud-ssh-key.pub output ssh-rsa <Gibrish :)> mrityunjay Step 3 I went to Google Cloud Console > Co...