Skip to main content

Setting Clickhouse column data warehouse at Google Cloud Compute Engine VM

I didn't have a Google Cloud account associated with my email, so I signed up for one. It needs a valid Credit Card and mobile number to check if you are human.

On successful sign up I get 300$ to spend within 3 months.

Creating a free forever Google Cloud Compute Engine VM

As per Google Cloud documentation you can have 1 non-preemptible e2-micro VM instance (1GB 2vCPU, 30GB Disk, etc.) per month free forever in some regions with some restrictions.

I wanted the following stuff in my VM before I can install Clickhouse on to that:

  1. Ubuntu 20.x LTS
  2. SSH access from my machine

Enabling SSH-based access to Google Compute Engine VM

Step 1

Created an ssh private and public key on my mac using the following command

ssh-keygen -t rsa -f ~/.ssh/gcloud-ssh-key -C mrityunjay -b 2048

Step 2

Copied the public key from the console using the following command:

cat ~/.ssh/gcloud-ssh-key.pub

output

ssh-rsa <Gibrish :)> mrityunjay

Step 3

I went to Google Cloud Console > Compute Engine > Metadata > SSH Keys Section.

I clicked on Edit > Add Item and pasted the previously copied public SSH key and saved the item.

Newly added SSH keys started appearing in the list.

Creating a new VM

Creating a new VM is a breeze with easy-to-use options on the screen in the Console.

I made sure the following stuff:

  1. The selected region is us-west-1.
  2. Changed Disk to 30 GB general purpose.
  3. Choose Ubuntu 20 LTS.
  4. Paste Google Cloud SSH Public Key.

VM successfully created!

Accessing my VM

Command

ssh -i ~/.ssh/gcloud-ssh-key mrityunjay@xx.yyy.xxxx.xx

And I am within the Ubuntu 20.x Machine.

Setting up the ClickHouse

I found a great article on DigitalOcean documentation for setting up ClickHouse on Ubuntu 20.x LTS.

Most of my setup follows instructions from the article. God bless the author.

Adding Yandex managed ClickHouse APT repository


sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4


 echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list

sudo apt update

Installing clickhouse-server and clickhouse-client packages


sudo apt install clickhouse-server clickhouse-client


I provided a secure password for the default user of ClickHouse.

Setup done!

ClickHouse Administration

Starting ClickHouse on Ubuntu 20.X LTS


sudo service clickhouse-server start

Checking status for the ClickHouse


sudo service clickhouse-server status


Connecting to ClickHouse


clickhouse-client --password

On success, I was seeing a shell that invited me to try some commands.

ClickHouse is in the heart like an RDBMS with some twist.

Executing few commands

Creating a new database

create database test;

Output

Query id: 9dc02359-02da-480e-b538-96d63d1155ad

Ok.

0 rows in set. Elapsed: 0.005 sec.

Using the database

use test

Creating a new table

CREATE TABLE test1
(
    `id` UInt64,
    `name` String
)
ENGINE = MergeTree
PRIMARY KEY id
ORDER BY id

Output


Query id: dd2cd818-780b-4c63-9c8f-695a83121719

Ok.

0 rows in set. Elapsed: 0.010 ```

Displaying all tables


show tables

Output

Query id: 2adb7b7e-3f53-40cf-8b2d-263c27a487a0

┌─name──┐
│ test1 │
└───────┘

1 rows in set. Elapsed: 0.013 sec. 


Okay so for this tree in my garden I can stop now by trying out commands of ClickHouse.

Accessing ClickHouse from other Machine

To access ClickHouse from another machine I made the following changes in the ClickHouse config to allow listening from all interfaces.

sudo nano /etc/clickhouse-server/config.xml

And uncomment <!-- <listen_host>::</listen_host> -->.

Restart server


sudo service clickhouse-server restart

There are two ports on which ClickHouse listen 8123 (HTTP) and 9000 (Internal)

I enabled them in Google Cloud Compute Engine Firewall options.

And I am good to go with accessing ClickHouse DB from another system.

Allowing MySQL Connector

ClickHouse supports MySQL wire protocol. It can be enabled by mysql_port setting in the configuration file:


 <!-- Compatibility with MySQL protocol.

 ClickHouse will pretend to be MySQL for applications connecting to this port.

 -->
<mysql_port>9004</mysql_port>

Uncomment or add the above line in /etc/clickhouse-server/config.xml the file. Restart the server after saving.

Allowing PostgreSQL Connector


 <!-- Compatibility with PostgreSQL protocol.

 ClickHouse will pretend to be PostgreSQL for applications connecting to this port.

 -->

 <postgresql_port>9005</postgresql_port>

Uncomment or add the above line in /etc/clickhouse-server/config.xml the file. Restart the server after saving.

Comments

Popular posts from this blog

Extend and reuse an existing AirByte destination connector

AirByte is an open-source ELT (Extract, Load, and Transformation) application. It heavily uses containerization for the deployment of its various components. On the local machine, we need docker to run it. AirByte has an impressive list of source and destination connectors available. One of my use case data destinations is the  ClickHouse data warehouse and its destination connector is not yet (2021-12-08) available. As per the documentation, It seems that creating a destination connector is a non-trivial job. It's a great idea to build an open-source ClickHouse destination connector. However, I tried avoiding the temptation to create one because of the required effort. AirByte has a  MySql destination connector available. ClickHouse provides a MySQL connector for access from any MySQL client. We need to configure Clickhouse to give support for the MySQL connector. Accessing ClickHouse from AirByte using its MySQL destination connector looks promising. However, when ...

Understanding Type Checking

A few examples of types in the context of programming language can be integer, float, character, string, array, etc.  When a program executes then data flow between instructions and values of specific types are assigned to a variable after some operation. It's important for the system to verify if the correct types are used as operands in operations. For e.g. In a sum operation, the expectation for operands to be of numeric type. The program's execution should fail in the case there is inconsistency. We can classify programming languages into two categories based as per their ability to cater to type safety: Dynamically Typed Language Statically Typed Language