Using Kafka Connect With Oracle Streaming Service And Autonomous DB

Posted By: Todd Sharp on 12/12/2019 5:08 GMT

Tagged: Cloud, Containers, Microservices, APIs

Whether you are using a framework like Micronaut to consume and produce messages or using the Kafka SDK itself, Oracle Streaming Service (OSS) is an easy and less expensive way to handle messaging within your application infrastructure. You don't have to turn up your own Kafka cluster and worry about the cost and maintenance that goes along with that. You just create a stream and get to work with producing and consuming messages.

Sometimes you need a bit more though. Messaging is certainly important in a microservice architecture, and until now OSS only handled the 'transport' part of the equation - IE: producing and consuming - which meant your application was responsible for both the source and the destination part of the communication. That all changes today with the ability to utilize Kafka Connect with OSS.

What's Kafka Connect? Glad you asked! Kafka Connect is an open source framework for connecting Kafka (or, in our case - OSS) with external sources. Things like object stores, databases, key-value stores, etc. There are two terms you should be familiar with when it comes to Kafka Connect: source connectors and sink connectors. Source connectors allow you to ingest data from an external source. Sink connectors let you deliver data to an external source.

Pretty cool stuff, really. Think about it - you can actually have a stream that receives a message every time a record is inserted into a database table. Or, you could post records to a table just by producing a message to a topic! But enough talk, let's take a look at how to actually make this happen. It's not difficult - we'll walk through every step below.

Preparing For Kafka Connect Integration

We're going to create a source connector to ingest data from a table in our Autonomous Transaction Processing (ATP) instance in this tutorial. But before we dig into the integration bits we need to do a bit of prep work. It would be a good idea to create a project directory somewhere on your machine to store some of the miscellaneous bits and bytes that we'll be working with. We'll refer to that directory as /projects/connect-demo from here on out - just make sure to substitute your own path as necessary.

Autonomous DB Setup

Let's create a schema and table to use for testing the integration with Autonomous DB and grab our wallet credentials. If you don't have an instance, follow my guide to get up and running quickly with Autonomous DB. You can even use an "always free" ATP instance if you'd like - it will work just fine.

First up, let's connect to the running instance with SQL Developer (or the free SQL Developer Web) and create a new user with a few permissions:

	CREATE USER connectdemo IDENTIFIED BY "Str0ngP@$$word1234";
	GRANT CONNECT, RESOURCE TO connectdemo;
	GRANT UNLIMITED TABLESPACE TO connectdemo;

view raw create-user.sql hosted with ❤ by GitHub

Next, create a minimal table.

	CREATE TABLE TEST
	(
	ID NUMBER(10,0) GENERATED BY DEFAULT ON NULL AS IDENTITY,
	USERNAME VARCHAR2(50) NOT NULL,
	FIRST_NAME VARCHAR2(50) NOT NULL,
	MIDDLE_NAME VARCHAR2(50),
	LAST_NAME VARCHAR2(50) NOT NULL,
	AGE NUMBER(5,0) DEFAULT 0 NOT NULL,
	CREATED_ON TIMESTAMP(9) NOT NULL,
	CONSTRAINT TEST_PK PRIMARY KEY
	(
	ID
	)
	ENABLE
	);

view raw create-table.sql hosted with ❤ by GitHub

OK, let's grab some other bits!

Download Dependencies

We'll need to download three things into our project directory:

Place the contents from the Oracle driver zip file in /projects/connect-demo/drivers and the contents of the Kafka JDBC Connector zip in /projects/connect-demo/kafka-jdbc/connector. Next, we'll grab our wallet and place it in /projects/connect-demo/wallet. You can download it via the console UI, or via the CLI. To do it quickly via the CLI:

	# list your Autonomous instances

	oci db autonomous-database list

	# find your instance in the list and get the OCID
	# substitute the OCID below and run:

	oci db autonomous-database generate-wallet \
	--autonomous-database-id \
	ocid1.autonomousdatabase.oc1.phx... \
	--file /projects/connect-demo/wallet/wallet.zip \
	--password Str0ngP@$$word

view raw download-wallet.sh hosted with ❤ by GitHub

Note: don't forget to unzip the wallet so that the contents of the wallet are in /projects/connect-demo/wallet.

OK, that's all the 1's and 0's that we need from elsewhere to move forward. Let's dip our toes into the stream pool!

Creating A Stream Pool

Right, so there are two pieces we need to create next. A Stream Pool and a Connect Configuration. You can do this via code with the OCI SDK which comes in all your favorite flavors, but I find it easier to create it via the console UI. First thing you'll need to do is head to the Streaming portion of the console by clicking 'Analytics' -> 'Streaming' from the console burger menu:

Next, in the left hand menu of the streaming landing page, select 'Stream Pools'.

Then click 'Create Stream Pool'.

Give it a name and check 'Auto Create Topics'. This will ensure that Kafka Connect can create topics as it needs to and is equivalent to the Kafka setting 'auto.create.topics.enable'.

Click 'Create Stream Pool' and in a few seconds your pool will become 'Active'. Once it's active, click the 'View Kafka Connection Settings' button.

The next dialog will contain some information that we'll need to copy down for later use. Take note though, you may want to substitute a different username in the SASL Connection String (see the Create A Streams User of this post for more information).

The pool is now warm and ready, so let's cook up a Connect Configuration.

Creating A Connect Configuration

Next, click 'Kafka Connect Configuration' from the sidebar and click the button to create one.

The only thing to do here is give it a name.

Once it's created, copy down the Connect configuration OCID as well as the Kafka Connect Storage Topics.

Done. Let's move on!

Configuring And Launching Kafka Connect

We're now ready to launch Kafka Connect and create our Source Connector to listen to our TEST table. We're going to use the Debezium Connect Docker image to keep things simple and containerized, but you can certainly use the official Kafka Connect Docker image or the binary version. Before we can launch the Docker image, we'll need to set up a property file that will be used to configure Connect. We'll need some of the values that we collected earlier, so keep those handy. We'll also need our streaming username which we collected from our Stream Pool above (see the SASL Connection String) and our auth token. See this post for more info on dedicated streaming users and how to generate an auth token for that user.

Create a file called /projects/connect-demo/connect-distributed.properties and populate it as such, substituting your actual values wherever you see <bracketed> values.

	group.id=connect-demo-group

	bootstrap.servers=<streamPoolBootstrapServer>
	sasl.mechanism=PLAIN
	security.protocol=SASL_SSL
	sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="<tenancy>/<username>/<streamPoolId>" password="<authToken>";

	producer.sasl.mechanism=PLAIN
	producer.security.protocol=SASL_SSL
	producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="<tenancy>/<username>/<streamPoolId>" password="<authToken>";

	consumer.sasl.mechanism=PLAIN
	consumer.security.protocol=SASL_SSL
	consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="<tenancy>/<username>/<streamPoolId>" password="<authToken>";

	config.storage.replication.factor=1
	config.storage.partitions=1
	config.storage.topic=<connectConfigOCID>-config

	status.storage.replication.factor=1
	status.storage.partitions=1
	status.storage.topic=<connectConfigOCID>-status

	offset.storage.replication.factor=1
	offset.storage.partitions=1
	offset.storage.topic=<connectConfigOCID>-offset
	offset.flush.interval.ms=10000
	offset.flush.timeout.ms=5000

	key.converter=org.apache.kafka.connect.storage.StringConverter
	value.converter=org.apache.kafka.connect.json.JsonConverter
	key.converter.schemas.enable=false
	value.converter.schemas.enable=true

	internal.key.converter=org.apache.kafka.connect.json.JsonConverter
	internal.value.converter=org.apache.kafka.connect.json.JsonConverter

	task.shutdown.graceful.timeout.ms=10000

view raw connect-distributed.properties hosted with ❤ by GitHub

We need a way to get our dependencies into the container, so create /projects/connect-demo/Dockerfile which we will base on the debezium/connect image.

	FROM debezium/connect:0.10
	USER root
	RUN mkdir /wallet
	USER kafka
	COPY driver/* /kafka/libs/
	COPY kafka-connect-jdbc/lib/* /kafka/libs/
	COPY wallet/* /wallet/

view raw Dockerfile hosted with ❤ by GitHub

Now we'll build the Docker image. I create a Bash script to build and run the image just to make substituting the necessary topic names easier, but you can just run these commands manually and substitute the values directly if you'd prefer. Note that we're mounting the /projects/connect-demo/connect-distributed.properties file into the Docker container.

	#!/usr/bin/env bash
	CONNECT_CONFIG_ID=<connectConfigOCID>

	CONFIG_STORAGE_TOPIC=$CONNECT_CONFIG_ID-config
	OFFSET_STORAGE_TOPIC=$CONNECT_CONFIG_ID-offset
	STATUS_STORAGE_TOPIC=$CONNECT_CONFIG_ID-status

	docker build -t connect .

	docker run -it --rm --name connect -p 8083:8083 \
	-e GROUP_ID=connect-demo-group \
	-e BOOTSTRAP_SERVERS=<streamPoolBootstrapServer> \
	-e CONFIG_STORAGE_TOPIC=$CONFIG_STORAGE_TOPIC \
	-e OFFSET_STORAGE_TOPIC=$OFFSET_STORAGE_TOPIC \
	-e STATUS_STORAGE_TOPIC=$STATUS_STORAGE_TOPIC \
	-v `pwd -P`/connect-distributed.properties:/kafka/config.orig/connect-distributed.properties \
	connect

view raw connect-demo.sh hosted with ❤ by GitHub

Now we can launch the Connect instance by running the Bash script. It'll take about 30-45 seconds to get up and running, depending on your machine and your connection. Once it's running, we can utilize the REST API to create our connector, but first we'll need a JSON config file to describe our connector.

	{
	"name": "oss-atp-connector",
	"config": {
	"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
	"tasks.max": "100",
	"connection.url": "jdbc:oracle:thin:@demodb_high?TNS_ADMIN=/wallet",
	"connection.user": "coannectdemo",
	"connection.password": "Str0ngP@$$word1234",
	"mode": "incrementing",
	"incrementing.column.name": "ID",
	"topic.prefix": "demo-stream-",
	"table.whitelist": "TEST",
	"numeric.mapping": "best_fit"
	}
	}

view raw connector-config.json hosted with ❤ by GitHub

Note a few things above. Our connection URL should look familiar if you've worked with ATP and JDBC in the past. It references the chosen entry from the tnsnames.ora file in your wallet and then passes the location to the wallet (the location within the Docker container, which we placed in the root of the file system if you remember above). The user and password are the schema credentials that we created above. The entry for topic.prefix is what will be used to prefix each topic created for the tables in table.whitelist.

Now we can POST our config to the REST API to create the source connector:

curl -iX POST -H "Accept:application/json" -H "Content-Type:application/json" -d @connector-config.json http://localhost:8083/connectors

view raw create-connector.sh hosted with ❤ by GitHub

To list all connectors, perform a GET request:

curl -i http://localhost:8083/connectors

view raw list-connectors.sh hosted with ❤ by GitHub

To delete a connector, perform a DELETE request:

curl -i -X DELETE http://localhost:8083/connectors/[connector-name]

view raw delete-connector.sh hosted with ❤ by GitHub

For further operations, refer to the Connect REST API documentation.

Once you have created your connector, a topic for each whitelisted table will be created and shortly become available named with the specified topic prefix and the table(s) name.

Testing The Integration

When your stream is ready, you can insert some records into the table and commit the transaction:

+INSERT INTO TEST (username, first_name, middle_name, last_name, age, created_on)
+VALUES ('todd', 'Todd', null, 'Sharp', 42, sysdate);
+INSERT INTO TEST (username, first_name, middle_name, last_name, age, created_on)
+VALUES ('gvenzl', 'Gerald', null, 'Venzl', 30, sysdate);
+INSERT INTO TEST (username, first_name, middle_name, last_name, age, created_on)
+VALUES ('aalmiray', 'Andres', null, 'Almiray', 40, sysdate);

view raw insert.sql hosted with ❤ by GitHub

Click into the stream in the console and take a look at the recent messages by clicking 'Load Messages' and you'll see a message for each record that was inserted into the TEST table.

Click on the value to view details.

And that's it!

Summary

In this post we created a test schema and table in ATP, created a Stream Pool and Connect Configuration, launched an instance of Kafka Connect via the Debezium Docker image and created a source connector on Kafka Connect for our ATP table. We inserted records into the table and observed those records published as messages in the stream.

Kafka Connect integration is extremely powerful and can be used in any microservice architecture on the Oracle Cloud.

Photo by Bob Canning on Unsplash

Note: Comments are currently closed on this blog. Disqus is simply too bloated to justify its use with the low volume of comments on this blog. Please visit my contact page if you have something to say!

recursive.codes

recursive.codes

recursive.codes

Using Kafka Connect With Oracle Streaming Service And Autonomous DB

Posted By: Todd Sharp on 12/12/2019 5:08 GMT

Tagged: Cloud, Containers, Microservices, APIs

Preparing For Kafka Connect Integration

Autonomous DB Setup

Download Dependencies

Creating A Stream Pool

Creating A Connect Configuration

Configuring And Launching Kafka Connect

Testing The Integration

Summary

Related Posts