Streaming Events from SQL Server to Event Hub in Azure using Debezium

Streaming SQL Server CDC to Azure Event Hub
In the following article, we will see how to:
- Install Java
- Create Azure Event Hub namespace
- Setup Kafka Connect with Debezium SQL Server Connector
- Setup SQL Server Management Studio and enable CDC (Change Data Capture) on table
- Stream Event from SQL Server to Azure Event Hub
Java Installation
Make sure you have java in your machine.
Run the command, java -version in your terminal, to check for the same.

To download and setup jre, follow the below link and download jre for your system.
Run the .exe file and follow the installation setup. Once the installation is complete, copy the path to the java bin folder into your system environment variable path. To do so, search for Environment Variables from Start Menu, and open it. Select the Environment Variables option on bottom right, click on path and edit. Add the path, and click OK. Once done, you may restart your system.



Creating an Event Hub Namespace:
Follow the above link from Microsoft’s Official Documentation to create an Event Hub Namespace.
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-create
Setting up Kafka Connect and Debezium
What is Kafka Connect?
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka.
What is Debezium?
Debezium is built on top of Apache Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases.
Kafka libs:
https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz
Extract the files from above download into a folder named kafka.
Debezium SQL Server connector libs:
Extract the files from the above download, copy the .jar files from debezium-connector-sqlserver into kafka\kafka_2.12–2.2.1\libs folder.
Configure Kafka connect distributed properties:
Create a file(or replace the file) named connect-distributed.properties inside kafka\kafka_2.12–2.2.1\config and copy the following contents into that file.
bootstrap.servers=namespace.servicebus.windows.net:9093 #add event hub namespace
group.id=connect-cluster-group
# connect internal topic names, auto-created if not exists
config.storage.topic=connect-cluster-configs
offset.storage.topic=connect-cluster-offsets
status.storage.topic=connect-cluster-status
# internal topic replication factors - auto 3x replication in Azure Storage
config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
rest.advertised.host.name=connect
offset.flush.interval.ms=10000
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
# required EH Kafka security settings
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username= "$ConnectionString" password= "PASSWORD";
producer.security.protocol=SASL_SSL
producer.sasl.mechanism=PLAIN
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username= "$ConnectionString" password="PASSWORD";
consumer.security.protocol=SASL_SSL
consumer.sasl.mechanism=PLAIN
consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username= "$ConnectionString" password="PASSWORD";
plugin.path=E:\kafka\kafka_2.12–2.2.1\libs # path to the libs directory within the Kafka release
Copy the Host Name from Azure Event Hub Namespace and copy it to bootstrap.servers value.

Go to Shared access policies, select your policy and copy the Connection string primary key and replace “PASSWORD” with it in connect-distributed.properties. The connection string looks something like:
“Endpoint=sb://namespace.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=mkengtewngwngwk/gkewekbg=”

Also, replace plugins.path value with path to libs folder in kafka.
Save connect-distributed.properties inside config folder.
Open powershell, go to the directory where you have the kafka folder and run the following command:
./bin/windows/connect-distributed.bat ./config/connect-distributed.properties

Hurray.. Now we have kafka connect running. Moving forward, we’ll set-up SQL Server CDC, and configure it to post change events to Event Hub.
Setting up SSMS and enabling CDC (Change Data Capture) on Table
What is SQL Server?
By definition, Microsoft SQL Server is a relational database management system developed by Microsoft. It is a software product with the primary function of storing and retrieving data as requested by other software applications — which may run either on the same computer or on another computer across a network.
Downloading SQL Server
Follow the above link to download SQL Server 2019 Developer.
Once you download and install the SSMS, create a login, and provide it ‘sysAdmin’ role. Make sure that SQL server agent is running. If not, right click on it, and select Start.

Also, assure that you have TCP/IP request enabled. To check for same, press Ctrl+R and run
compmgmt.msc
Go to SQL Server Network Configuration in SQL Server Configuration Manager and enable TCP/IP.

Enabling CDC on table
Create a database and table and run the below script to enable CDC for your table:
- To enable CDC for your database, click new query in SSMS and run:
USE MyDB
GO
EXEC sys.sp_cdc_enable_db
GO
2. To enable CDC for your table:
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'users',
@role_name = N'MyRole',
@supports_net_changes = 1
GO
You can provide any name to @role_name.
3. Check if CDC is enabled:
USE MyDB
GO
EXEC sys.sp_cdc_help_change_data_capture
GO
Now, create a file named sqlserverproperties.json. and copy the following contents to the file.
{
"name": "sql-server-connection",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname": "localhost",
"database.port": "1433",
"database.user": "user",
"database.password": "user",
"database.dbname": "cdc",
"database.server.name": "userserver",
"table.whitelist": "dbo.users",
"database.history":"io.debezium.relational.history.MemoryDatabaseHistory"
}
}
Change database.user, database.password, database.dbname and table.whitelist with your SQL server settings and save the file to kafka folder.
Open bash, and run the following command to post your configurations.
curl -X POST -H “Content-Type: application/json” — data @sqlserverproperties.json http://localhost:8083/connectors
To check for status, run,
curl -s http://localhost:8083/connectors/sql-server-connection/status
Now that you have successfully setup everything, let’s make some changes to the table and look at the logs in Azure Event Hub. Go to Event Hubs, select userserver.dbo.users, Process Data and Explore.


You can see the change events captured from your table in the input preview.

So, IT’S DONE!!