<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CDC Archives - Big Data Processing</title>
	<atom:link href="https://bigdataproc.com/tag/cdc/feed/" rel="self" type="application/rss+xml" />
	<link>https://bigdataproc.com/tag/cdc/</link>
	<description>Big Data Solution for GCP, AWS, Azure and on-prem</description>
	<lastBuildDate>Sun, 15 Jan 2023 04:53:46 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.3.2</generator>
	<item>
		<title>GCP &#8211; Bring CDC data using Debezium</title>
		<link>https://bigdataproc.com/gcp-bring-cdc-data-using-debezium/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gcp-bring-cdc-data-using-debezium</link>
					<comments>https://bigdataproc.com/gcp-bring-cdc-data-using-debezium/#comments</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Wed, 09 Mar 2022 19:14:45 +0000</pubDate>
				<category><![CDATA[GCP]]></category>
		<category><![CDATA[CDC]]></category>
		<category><![CDATA[Debezium]]></category>
		<category><![CDATA[gcp]]></category>
		<category><![CDATA[kubernetes]]></category>
		<category><![CDATA[pubsub]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=327</guid>

					<description><![CDATA[<p>Installing debezium on GCP kubernetes to fetch CDC from multiple RDMBS and push it to google pub sub. </p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/gcp-bring-cdc-data-using-debezium/">Continue reading<span class="screen-reader-text">GCP &#8211; Bring CDC data using Debezium</span></a></div>
<p>The post <a rel="nofollow" href="https://bigdataproc.com/gcp-bring-cdc-data-using-debezium/">GCP &#8211; Bring CDC data using Debezium</a> appeared first on <a rel="nofollow" href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Few months back our organization decided to go with GCP for data platform.  And so we started evaluating multiple tools to bring data from different RDMBS sources. Our goal was to find a tool which helps us identify CDC from multiple sources we have (MySQL, oracle, sql server, db2 on mainframe) and bring it to either cloud storage or bigquery.  </p>



<p>And by the time I am writing this blog, GCP doesn&#8217;t have any tool which satisfies our requirement. It has data stream but it only supports oracle and MySQL.  And while searching outside GCP I stumbled upon Debezium.   Debezium is a open source tool which help identify CDC from multiple RDBMS sources and puts the data on Kafka or pubsub topics in real-time.  Much better,  we were looking for some batch solution and we found streaming. </p>



<p>In this blogpost, I will explain in details how to deploy Debezium on GCP Kubernatees cluster and connect that to gcp cloudsql and gcp pubsub topics. </p>



<h1 class="wp-block-heading">Deploying Debezium on GCP Kubernatees</h1>



<h2 class="wp-block-heading">Create Service Accounts</h2>



<p>To deploy debezium on kubernatees we first need to create an I Am service account.  create a service account with following roles. </p>



<ul><li>Cloud SQL Client</li><li>Pub/Sub Publisher</li></ul>



<p>You will also need to create a service account for pods.  use the following command line from to create service account. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">kubectl apply -f - &lt;&lt;EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: debezium-sa
EOF</pre>



<h2 class="wp-block-heading">Create MySQL User</h2>



<p>considering you already have a cloud sql (mysql) instance running. let&#8217;s create a user with proper access to read transaction logs. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">CREATE USER 'replication_user'@'%' IDENTIFIED BY 'secret';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replication_user'</pre>



<h2 class="wp-block-heading">Create GCP Kubernetes cluster for Debezium </h2>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">gcloud beta container clusters create "debezium-poc" --scopes=sql-admin,pubsub --region "us-east1" --service-account=sa-debzium-k8@dataframework.iam.gserviceaccount.com
</pre>



<p>now, once cluster is created we need to deploy pods and configuration for the pods.  </p>



<h3 class="wp-block-heading">Deploying config-map</h3>



<p><strong>mysql_config_map.yaml</strong></p>



<pre class="EnlighterJSRAW" data-enlighter-language="yaml" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">apiVersion: v1
kind: ConfigMap
metadata:
  name: debezium-mysql
  labels:
    app: debezium-mysql
data:
  application.properties: |-
      debezium.sink.type=pubsub
      debezium.source.connector.class=io.debezium.connector.mysql.MySqlConnector
      debezium.source.offset.storage.file.filename=data/offsets.dat
      debezium.source.offset.flush.interval.ms=0
      debezium.source.database.hostname=localhost
      debezium.source.database.port=3306
      debezium.source.database.user=replication_user
      debezium.source.database.password=secret
      debezium.source.database.server.id=184054
      debezium.source.database.server.name=dpmysql
      debezium.source.database.history = io.debezium.relational.history.FileDatabaseHistory
      debezium.source.database.history.file.filename = history_file.txt</pre>



<p>in above configuration.  Please change user and password as per the user you have created.  <strong>server.name</strong> could be anything which makes sense for you for the source. <strong>server.id</strong> needs to be a unique number so you can provide any random number. </p>



<p>To deploy the config map run the following command. </p>



<pre class="wp-block-code"><code>kubectl apply -f mysql_config_map.yaml</code></pre>



<h3 class="wp-block-heading">Deploying StatefulSet</h3>



<p>StatefulSet is consist of two containers.  </p>



<ul><li>debezium server &#8211;  While writing this blog <strong>1.7.0.Final </strong>is the latest version available so I am using it. however, you can use whatever version  is latest. </li><li>cloud-sql-proxy &#8211; This is required to connect cloud sql instance from kubernetes. </li></ul>



<p><strong>mysql_statefulset.yaml</strong></p>



<pre class="EnlighterJSRAW" data-enlighter-language="yaml" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: debezium-mysql
  labels:
    app: debezium-mysql
spec:
  replicas: 1
  serviceName: debezium-mysql
  selector:
    matchLabels:
      app: debezium-mysql
  template:
    metadata:
      labels:
        app: debezium-mysql
        version: v1
    spec:
      serviceAccountName: debezium-sa
      securityContext:
        fsGroup: 185 # Debezium container uses jboss user thats id is 185.
      containers:
        - name: debezium-server
          image: debezium/server:1.7.0.Final
          volumeMounts:
            - name: debezium-config-volume
              mountPath: /debezium/conf
            - name: debezium-data-volume
              mountPath: /debezium/data
        - name: cloud-sql-proxy
          image: gcr.io/cloudsql-docker/gce-proxy:1.27.1
          command: 
            - /cloud_sql_proxy
            - -instances=dataframework:us-east1:dpmysql-public=tcp:3306
          securityContext:
            runAsNonRoot: true
      volumes:
        - name: debezium-config-volume
          configMap:
            name: debezium-mysql
  volumeClaimTemplates:
    - metadata:
        name: debezium-data-volume
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Mi
</pre>



<p>To deploy the Statefulset run following command </p>



<pre class="wp-block-code"><code> kubectl apply -f mysql_statefulset.yaml</code></pre>



<h3 class="wp-block-heading">Deploying Service</h3>



<p>the last thing we need to deploy is a service for our pods. </p>



<p><strong>mysql_cdc_service.yaml</strong></p>



<pre class="EnlighterJSRAW" data-enlighter-language="yaml" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">apiVersion: v1
kind: Service
metadata:
  name: debezium-mysql
  labels:
    app: debezium-mysql
spec:
  type: ClusterIP
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
      name: http
  clusterIP: None
  selector:
    app: debezium-mysql</pre>



<p>To deploy service run the following command </p>



<pre class="wp-block-code"><code> kubectl apply -f mysql_cdc_service.yaml</code></pre>



<h2 class="wp-block-heading">Create PubSub Topic</h2>



<p>so now,  we have deployed the Debezium on Kubernetes, all we need is a pubsub topic created to capture all the changes.  </p>



<p>The topic name should be in format: <strong> &lt;server_name&gt;. &lt;database_name&gt;.&lt;table_name&gt;</strong></p>



<ul><li>server_name  &#8211; this should be from your config map <strong>debezium.source.database.server.name</strong> property</li><li>database_name &#8211; mysql database name </li><li>table_name &#8211; mysql table name. </li></ul>



<p></p>
<p>The post <a rel="nofollow" href="https://bigdataproc.com/gcp-bring-cdc-data-using-debezium/">GCP &#8211; Bring CDC data using Debezium</a> appeared first on <a rel="nofollow" href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/gcp-bring-cdc-data-using-debezium/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
