<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GCP Archives - Big Data Processing</title>
	<atom:link href="https://bigdataproc.com/category/gcp/feed/" rel="self" type="application/rss+xml" />
	<link>https://bigdataproc.com/category/gcp/</link>
	<description>Big Data Solution for GCP, AWS, Azure and on-prem</description>
	<lastBuildDate>Mon, 08 Jul 2024 14:07:12 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>GCP Cloud Composer &#8211; Configure Hashicorp Vault to store connection and variables</title>
		<link>https://bigdataproc.com/gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables</link>
					<comments>https://bigdataproc.com/gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Mon, 15 Jul 2024 22:00:00 +0000</pubDate>
				<category><![CDATA[Airflow]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[airflow]]></category>
		<category><![CDATA[cloud composer]]></category>
		<category><![CDATA[gcp]]></category>
		<category><![CDATA[vault]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=494</guid>

					<description><![CDATA[<p>connection airflow to hashicorp vault to store airflow connection and variables. </p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables/">Continue reading<span class="screen-reader-text">GCP Cloud Composer &#8211; Configure Hashicorp Vault to store connection and variables</span></a></div>
<p>The post <a href="https://bigdataproc.com/gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables/">GCP Cloud Composer &#8211; Configure Hashicorp Vault to store connection and variables</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Airflow providers multiple secrets backends to be configured for storing airflow connection and variables.  for a long time we were using airflow backends however recently I migrated all the connection to vault and started using vault as our backend. In this post I will show you a step by step guide on how to do this. </p>



<h2 class="wp-block-heading">Configure HashiCorp Vault</h2>



<h3 class="wp-block-heading">create mount point</h3>



<p>we have multiple composer(airflow) environment and so strategy we have used is created a single mount point named airflow and then use the different path for different airflow instances. you could choose to have different strategy as per your organization standard and requirement.  Run the following command to create the secrets mount point. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="bash" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">vault secrets enable -path=airflow -version=2 kv</pre>



<h3 class="wp-block-heading">Create role </h3>



<p>vault provides multiple ways to authenticate, what we are going to use is role. so let&#8217;s create a role.  please copy the secret id from the output and store it somewhere. this would be useful when we will vault connection in airflow. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="bash" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="create approle" data-enlighter-group="">vault write auth/approle/role/gcp_composer_role \
    role_id=gcp_composer_role \
    secret_id_ttl=0 \
    secret_id_num_uses=0 \
    token_num_uses=0 \
    token_ttl=24h \
    token_max_ttl=24h \
    token_policies=gcp_composer_policy</pre>



<h3 class="wp-block-heading">Create Policy </h3>



<p>role need to be associated with a policy. policy is nothing but a grant (access).  Run the following code to create a policy which would give <code>read</code> and <code>list</code> permission to <code>airflow</code> path, we created earlier. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="bash" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">vault policy write gcp_composer_policy - &lt;&lt;EOF
path "airflow/*" {
  capabilities = ["read", "list"]
}
EOF</pre>



<p>now are are all set in vault. let&#8217;s change the airflow configuration to start using vault. </p>



<h2 class="wp-block-heading">Configure Airflow (GCP Cloud Composer)</h2>



<p>Navigate to your airflow instances and override following two settings. </p>



<ul class="wp-block-list">
<li><strong>secrets.backend </strong>= <code>airflow.providers.hashicorp.secrets.vault.VaultBackend</code></li>



<li><strong>secrets.backend_kwargs </strong></li>
</ul>



<pre class="EnlighterJSRAW" data-enlighter-language="json" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">{
"mount_point": "airflow", 
"connections_path": "dev-composer/connections" , 
"variables_path": null, 
"config_path": null, 
"url": "&lt;your_vault_url>", 
"auth_type": "approle", 
"role_id":"gcp_composer_role", 
"secret_id":"&lt;your_secret_id>"
}</pre>



<p><strong>connection_path</strong>: path where you would like to store your airflow connection. for me it&#8217;s my composer name and then connection. if you have single airflow you could just store everything under connection.  <br><strong>variables_path</strong>:  i have specified null as I am storing variables in airflow. if you want to store variables also in vault, just provide the path. <br><strong>config_path</strong>: same as variables, I am keeping config in airflow <br><strong>url</strong>: replace with you vault url <br><strong>auth_type</strong>:  we are using approle to authenticate with vault as discussed above <br><strong>role_id</strong>:  the role we created above. if you have used different name, please replace here. <br><strong>secret_id</strong>:  secret_id we generated for the role</p>



<h2 class="wp-block-heading">How to store connections</h2>



<p>for connection create a path with connection name and put the json with proper key and value for connection. for example, default bigquery connection.  it would look like this. </p>



<p><strong>mount point:</strong>  airflow<br><strong>Path</strong>: dev-composer/connections/bigquery_default</p>



<pre class="EnlighterJSRAW" data-enlighter-language="json" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">{
  "conn_type": "google_cloud_platform",
  "description": "",
  "extra": "{\"extra__google_cloud_platform__project\": \"youre_project\", \"extra__google_cloud_platform__key_path\": \"\", \"extra__google_cloud_platform__key_secret_name\": \"\", \"extra__google_cloud_platform__keyfile_dict\": \"\", \"extra__google_cloud_platform__num_retries\": 5, \"extra__google_cloud_platform__scope\": \"\"}",
  "host": "",
  "login": "",
  "password": null,
  "port": null,
  "schema": ""
}</pre>



<h2 class="wp-block-heading">How to store variables</h2>



<p>for variables.  in the json key would alway be <code>value</code>. as shows below </p>



<p><strong>mount point</strong>: airflow <br><strong>path</strong>: dev-composer/variables/raw_project_name</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">{
  "value": "raw_project_id"
}</pre>
<p>The post <a href="https://bigdataproc.com/gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables/">GCP Cloud Composer &#8211; Configure Hashicorp Vault to store connection and variables</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/gcp-cloud-composer-configure-hashicorp-vault-to-store-connection-and-variables/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>GCP Cloud Composer &#8211; Things to consider before implementation</title>
		<link>https://bigdataproc.com/gcp-cloud-composer-things-to-consider-before-implementation/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gcp-cloud-composer-things-to-consider-before-implementation</link>
					<comments>https://bigdataproc.com/gcp-cloud-composer-things-to-consider-before-implementation/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Fri, 05 Jul 2024 14:36:14 +0000</pubDate>
				<category><![CDATA[Airflow]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[airflow]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=492</guid>

					<description><![CDATA[<p>When planning to use Google Cloud Composer (Airflow in GCP), there are few essential considerations to address before setup. While these can be configured post-setup, it would&#8230;</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/gcp-cloud-composer-things-to-consider-before-implementation/">Continue reading<span class="screen-reader-text">GCP Cloud Composer &#8211; Things to consider before implementation</span></a></div>
<p>The post <a href="https://bigdataproc.com/gcp-cloud-composer-things-to-consider-before-implementation/">GCP Cloud Composer &#8211; Things to consider before implementation</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>When planning to use <strong>Google Cloud Composer</strong> (Airflow in GCP), there are few essential considerations to address before setup. While these can be configured post-setup, it would be a tedious and time-consuming task. </p>



<h2 class="wp-block-heading">TimeZone for scheduler </h2>



<p>he default time for the scheduler is UTC. This means if you schedule a DAG to run at 5 PM, it will run at 5 PM UTC, not your local time. Calculating this for each DAG deployment is impractical. It’s advisable to change the default scheduling time.<br>To change this:</p>



<ul class="wp-block-list">
<li>Navigate to your airflow instance</li>



<li>Go to airflow configuration overrides. </li>



<li>click on edit and choose the time zone you want for <strong><code>core.default_timezone</code>  </strong></li>
</ul>



<h2 class="wp-block-heading">Where to store airflow connection and variables </h2>



<p>By default,&nbsp;<strong>Airflow</strong>&nbsp;(<strong>Google Cloud Composer</strong>) stores connections and variables within Airflow itself. However, it supports multiple external backends, including GCP, AWS, and HashiCorp Vault. Airflow does not version these connections or variables nor provides granular access control, making it prudent to store them externally. Organizational standards often require storing all secrets and certificates in a single system.</p>



<p>In our setup, we chose to store connections in HashiCorp Vault due to their sensitive nature, while non-sensitive variables remained in Airflow.</p>



<p>One key point to note: Airflow adds new backends as extra backends. If it cannot find a variable or connection in the external backend (e.g., Vault), it will search within Airflow itself.</p>



<h2 class="wp-block-heading">Default Role assignment for All Airflow Users</h2>



<p><strong>Airflow</strong>&nbsp;has built-in RBAC with five main roles: public, viewer, user, op, admin. The default role assigned to all users in GCP is &#8216;op&#8217;.</p>



<p>If this role doesn&#8217;t fit your organizational needs, create a custom role and change the default role assignment.</p>



<p>In our scenario, the &#8216;op&#8217; role includes permissions to create and maintain connections and variables. Since we maintain all connections in HashiCorp Vault, we didn&#8217;t want duplicates created within Airflow. Therefore, we created a custom role without these permissions and set it as the default role for all users. To change the default role, override <code><strong>webserver.rbac_user_registration_role</strong></code> to the custom role.</p>



<p>By addressing these configurations early on, you can streamline your use of&nbsp;<strong>Google Cloud Composer</strong>&nbsp;and&nbsp;<strong>Airflow</strong>&nbsp;in GCP, ensuring efficient and secure operations.</p>
<p>The post <a href="https://bigdataproc.com/gcp-cloud-composer-things-to-consider-before-implementation/">GCP Cloud Composer &#8211; Things to consider before implementation</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/gcp-cloud-composer-things-to-consider-before-implementation/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>GCP Security &#8211; Finding Zero Trust Policy issues using IAM policy Recommander</title>
		<link>https://bigdataproc.com/gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander</link>
					<comments>https://bigdataproc.com/gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Fri, 05 Apr 2024 12:56:02 +0000</pubDate>
				<category><![CDATA[GCP]]></category>
		<category><![CDATA[gcp]]></category>
		<category><![CDATA[google cloud platform]]></category>
		<category><![CDATA[security]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=486</guid>

					<description><![CDATA[<p>In GCP (Google cloud platform) fix you zero trust policy issue with GCP's IAM recommander.  IAM recommander will help you identify users or service account with the permission they are not using anymore. </p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander/">Continue reading<span class="screen-reader-text">GCP Security &#8211; Finding Zero Trust Policy issues using IAM policy Recommander</span></a></div>
<p>The post <a href="https://bigdataproc.com/gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander/">GCP Security &#8211; Finding Zero Trust Policy issues using IAM policy Recommander</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In our previous blog posts, we explored leveraging Google Recommender for cost optimization. Now, let&#8217;s dive into identifying security issues within your Google Cloud Platform (GCP) environment using Google Recommender. If you missed the previous <a href="https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/">post</a> on redirecting recommendations to BigQuery, I highly recommend giving it a read, as it lays out the groundwork for our current discussion.</p>



<p>Adhering to the principles of a zero trust policy, it&#8217;s crucial to ensure that individuals or service accounts only possess the permissions they truly require. Google Recommender plays a pivotal role in this aspect. By examining policy insights, if it&#8217;s flagged that a principal holds unnecessary permissions within their role, the IAM Recommender steps in to evaluate whether these permissions can be revoked or if there&#8217;s a more suitable role available. If revocation is possible, the IAM Recommender generates a recommendation to revoke the role. Alternatively, if there&#8217;s a better-suited role, it suggests replacing the existing one with the suggested role. This replacement could entail a new custom role, an existing custom role, or predefined roles.</p>



<p>If you&#8217;ve already redirected all recommendations to BigQuery, you can run the following query to gain insights into any surplus permissions held by individuals or service accounts. Furthermore, it will provide recommendations regarding roles that may need to be removed or replaced with more stringent alternatives.</p>



<h2 class="wp-block-heading">SQL to find GCP IAM recommendation </h2>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  cloud_entity_type,
  cloud_entity_id,
  recommendation_details,
  recommender_subtype,
  JSON_VALUE(recommendation_details, "$.overview.member") AS user,
  JSON_VALUE(recommendation_details, "$.overview.removedRole") AS existing_role,
  JSON_QUERY_ARRAY(recommendation_details, "$.overview.addedRoles") AS new_role,
  priority,
  JSON_VALUE(recommendation_details, "$.overview.minimumObservationPeriodInDays") AS minimumObservationPeriodInDays
FROM
  your_project.recommendations.recommendations_export`
WHERE
  recommender = "google.iam.policy.Recommender"
  AND state = "ACTIVE"
  AND TIMESTAMP_TRUNC(_PARTITIONTIME, DAY) = (
    SELECT
      TIMESTAMP_TRUNC(MAX(_PARTITIONTIME), DAY)
    FROM
      your_project.recommendations.recommendations_export
  )</pre>



<ul class="wp-block-list">
<li><strong>minimumObservationPeriodInDays</strong>: Additionally, it&#8217;s worth noting that the IAM Recommender only begins generating role recommendations once it has gathered a certain amount of permission usage data. By default, the minimum observation period is set to 90 days. However, for project-level role recommendations, you have the flexibility to manually adjust it to 30 or 60 days. If you wish to modify this setting, you can do so by visiting the following link: <a href="https://cloud.google.com/policy-intelligence/docs/configure-role-recommendations">Configure Role Recommendations</a>.</li>



<li><strong>cloud_entity_type</strong>:  shows if issue is at org level, folder level or project level </li>



<li><strong>cloud_entity_id</strong>:  shows you the id of the org, project or folder. you can use this id in you GCP console to search for particular entity.</li>



<li><strong>recommender_subtype</strong>:  will show you weather to remove role or replace role with another role, or if someone service account is using default role. </li>



<li><strong>user</strong>: Principle (user or service account) for which recommandation has generated </li>



<li><strong>existing_role</strong>:  show you the existing role </li>



<li><strong>new_role</strong>: role you should replace existing role with, in case when recommand_subtype is remove_role this would be empty. </li>



<li><strong>priority</strong>:  priority of a particular recommandation.</li>
</ul>
<p>The post <a href="https://bigdataproc.com/gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander/">GCP Security &#8211; Finding Zero Trust Policy issues using IAM policy Recommander</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/gcp-security-finding-zero-trust-policy-issues-using-iam-policy-recommander/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Optimize Your GCP Cloud Costs: Identifying Compute Engine Resources for Scale Down</title>
		<link>https://bigdataproc.com/optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down</link>
					<comments>https://bigdataproc.com/optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Thu, 21 Mar 2024 14:09:26 +0000</pubDate>
				<category><![CDATA[FinOps]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[finsops]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=483</guid>

					<description><![CDATA[<p>n our journey towards optimal cloud cost management, harnessing the power of data becomes paramount. In our latest exploration, we introduce a potent tool: a BigQuery query designed to uncover Compute Engine resources ripe for scale down, paving the way for significant cost savings.</p>
<p>Building upon our previous discussion on centralizing recommendations within BigQuery, this query serves as a cornerstone in our strategy. By focusing on Compute Engine resources and leveraging the insights provided, organizations can make informed decisions to optimize their cloud costs effectively.</p>
<p>Stay tuned as we delve deeper into this powerful tool, unraveling its potential to revolutionize your cloud cost optimization efforts. Every adjustment counts in the pursuit of efficiency and savings, and with our BigQuery-powered approach, the possibilities are limitless.</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down/">Continue reading<span class="screen-reader-text">Optimize Your GCP Cloud Costs: Identifying Compute Engine Resources for Scale Down</span></a></div>
<p>The post <a href="https://bigdataproc.com/optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down/">Optimize Your GCP Cloud Costs: Identifying Compute Engine Resources for Scale Down</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p></p>



<p>In our ongoing exploration of cloud cost optimization, we&#8217;re constantly seeking ways to maximize efficiency and minimize expenditure. In our <a href="https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/">previous blog post</a>, we discussed the importance of centralizing recommendations within BigQuery to streamline cost analysis. If you haven&#8217;t had the chance to read that article, I highly recommend doing so, as it lays the groundwork for the strategy we&#8217;re about to delve into.</p>



<p>Now, let&#8217;s dive into a powerful BigQuery query designed to uncover Compute Engine resources prime for scale down, further enhancing your cost optimization efforts.</p>



<h2 class="wp-block-heading">Unveiling the Query</h2>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
   r,
  SPLIT(r, "/")[4] AS project_name,
  ARRAY_REVERSE(SPLIT(r, "/"))[0] AS resource_name,
   recommender_subtype as action,
  description,
  primary_impact.cost_projection.cost_in_local_currency. units AS cost_savings_per_month
FROM
  &lt;your_project>.recommendations.recommendations_export,
  UNNEST(target_resources) r
WHERE
  recommender_subtype = "CHANGE_MACHINE_TYPE"
  AND TIMESTAMP_TRUNC(_PARTITIONTIME, DAY) = (
  SELECT
    TIMESTAMP_TRUNC(MAX(_PARTITIONTIME), DAY)
  FROM
    &lt;your_project>.recommendations.recommendations_export)
</pre>



<h2 class="wp-block-heading">Breaking Down the Query</h2>



<ol class="wp-block-list">
<li><strong>SELECT</strong>: The query selects essential fields like the resource and project names, recommendation action, description, and projected cost savings per month.</li>



<li><strong>FROM</strong>: It sources data from <code>recommendations.recommendations_export</code>, extracting target resources using <code>UNNEST</code>.</li>



<li><strong>WHERE</strong>: Filters recommendations to focus solely on those advocating for changing machine types. Additionally, it ensures we&#8217;re working with the latest data partition.</li>
</ol>



<h2 class="wp-block-heading">What It Means for You</h2>



<p>By running this query, you gain insights into Compute Engine resources where adjusting machine types could lead to substantial cost savings. Each recommendation is accompanied by a description and projected monthly savings, empowering you to make informed decisions about scaling down resources without sacrificing performance.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Cost optimization in the cloud isn&#8217;t just about cutting corners; it&#8217;s about strategic resource allocation. With the provided BigQuery query, identifying Compute Engine resources ripe for scale down becomes a streamlined process. Embrace data-driven decision-making to optimize your cloud costs effectively.</p>



<p>Stay tuned for more insights and tips on maximizing the value of your cloud investments!</p>



<p>Remember, when it comes to cloud cost optimization, every adjustment counts. Start uncovering opportunities for savings today with our BigQuery-powered approach. Your bottom line will thank you.</p>



<p>Would you like to dive deeper into any specific aspect or have further queries? Feel free to reach out!</p>
<p>The post <a href="https://bigdataproc.com/optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down/">Optimize Your GCP Cloud Costs: Identifying Compute Engine Resources for Scale Down</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/optimize-your-gcp-cloud-costs-identifying-compute-engine-resources-for-scale-down/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Unlocking GCP Cost Optimization Using Recommendation and BigQuery: A FinsOps Guide</title>
		<link>https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide</link>
					<comments>https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Tue, 05 Mar 2024 14:56:41 +0000</pubDate>
				<category><![CDATA[FinOps]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[finops]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=466</guid>

					<description><![CDATA[<p>Identify the Idle resources in your GCP cloud using google Recommendation and Bigquery to optimize your cost. </p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/">Continue reading<span class="screen-reader-text">Unlocking GCP Cost Optimization Using Recommendation and BigQuery: A FinsOps Guide</span></a></div>
<p>The post <a href="https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/">Unlocking GCP Cost Optimization Using Recommendation and BigQuery: A FinsOps Guide</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In recent days, I was working on costs optimization on the Google Cloud Platform (GCP) . Google offers recommendations for cost savings and security at the project level. However, managing these recommendations across numerous projects can be arduous, particularly in scenarios like mine where we oversee approximately 100 projects, with limited access to many. Fortunately, redirecting all recommendations to BigQuery and leveraging SQL&#8217;s analytical capabilities proved to be a game-changer. Additionally, configuring Looker Studio facilitated streamlined visualization.</p>



<p>In this blog post, I&#8217;ll illustrate the process of redirecting GCP recommendations to Google BigQuery and uncovering cost-saving recommendations specifically for idle resources. </p>



<h2 class="wp-block-heading">Redirecting GCP Recommendations to BigQuery </h2>



<ul class="wp-block-list">
<li>you will need a service account created at org level with following roles
<ul class="wp-block-list">
<li>roles/bigquery.dataEditor</li>



<li>roles/recommender.exporter</li>
</ul>
</li>



<li>Choose a project to which you want to send GCP recommandation, in our case we have created a separate project for billing and recommendation. Seperate project makes it easier for access control. </li>



<li>Now navigate to google bigquery and open data transfers from left side menu and click on CREATE TANSFERS.  </li>



<li>Choose the Recommander V1 from the option as shown in screenshot below and fill our other information</li>



<li>Once the data transfer is executed successfully you will be able to see following two tables in the dataset 
<ul class="wp-block-list">
<li>insights_export</li>



<li>recommendations_export</li>
</ul>
</li>
</ul>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="1024" height="895" src="https://bigdataproc.com/wp-content/uploads/2024/03/image-1024x895.png" alt="" class="wp-image-467" srcset="https://bigdataproc.com/wp-content/uploads/2024/03/image-1024x895.png 1024w, https://bigdataproc.com/wp-content/uploads/2024/03/image-300x262.png 300w, https://bigdataproc.com/wp-content/uploads/2024/03/image-768x671.png 768w, https://bigdataproc.com/wp-content/uploads/2024/03/image.png 1256w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div>


<h2 class="wp-block-heading">Analyse Data For GCP for Cost Saving </h2>



<p>Following query will give you list of all the idle resources which could be either deleted or shutdown to save cost.  Query will show you project name, resource name, action to be taken, description and how much you will cost you will be saving in your local currency. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  r,
  SPLIT(r, "/")[4] AS project_name,
  ARRAY_REVERSE(SPLIT(r, "/"))[0] AS resource_name,
   recommender_subtype as action,
  description,
  primary_impact.cost_projection.cost_in_local_currency. units AS cost_savings_per_month
FROM
  you_project.dataset.recommendations_export,
  UNNEST(target_resources) AS r
WHERE
  TIMESTAMP_TRUNC(_PARTITIONTIME, DAY) = (select TIMESTAMP_TRUNC(max(_PARTITIONTIME), DAY) from your_project.your_dataset.recommendations_export)
  AND primary_impact.category = "COST"
  AND state = "ACTIVE"
  AND recommender LIKE "%IdleResourceRecommender"</pre>



<h2 class="wp-block-heading">Visualizing Data </h2>



<p>In order to track if we are implementing this suggestion or not I further created a dashboard using looker studio. </p>



<h3 class="wp-block-heading">Query for looker studio dashboard. </h3>



<p>Following query will give you cost optimization recommendation of last 30 days. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  r,
  SPLIT(r, "/")[4] AS project_name,
  ARRAY_REVERSE(SPLIT(r, "/"))[0] AS resource_name,
   recommender_subtype as action,
  description,
  primary_impact.cost_projection.cost_in_local_currency. units AS cost_savings_per_month, 
  state, 
  date(last_refresh_time) as date
FROM
  you_project.dataset.recommendations_export,
  UNNEST(target_resources) AS r
WHERE
  TIMESTAMP_TRUNC(_PARTITIONTIME, DAY) > TIMESTAMP(current_date() - 30)
  AND primary_impact.category = "COST"
  AND state = "ACTIVE"
  AND recommender LIKE "%IdleResourceRecommender"</pre>



<h3 class="wp-block-heading">Dashboard </h3>



<p>This dashboard provides valuable insights indicating that our efforts towards cost optimization are bearing fruit. The noticeable decrease in overall recommendations signifies successful implementation of our strategies, affirming that we are indeed on the right track.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img decoding="async" width="1024" height="773" src="https://bigdataproc.com/wp-content/uploads/2024/03/Screenshot-2024-03-05-at-9.42.25 AM-1024x773.png" alt="" class="wp-image-474" srcset="https://bigdataproc.com/wp-content/uploads/2024/03/Screenshot-2024-03-05-at-9.42.25 AM-1024x773.png 1024w, https://bigdataproc.com/wp-content/uploads/2024/03/Screenshot-2024-03-05-at-9.42.25 AM-300x227.png 300w, https://bigdataproc.com/wp-content/uploads/2024/03/Screenshot-2024-03-05-at-9.42.25 AM-768x580.png 768w, https://bigdataproc.com/wp-content/uploads/2024/03/Screenshot-2024-03-05-at-9.42.25 AM-1536x1160.png 1536w, https://bigdataproc.com/wp-content/uploads/2024/03/Screenshot-2024-03-05-at-9.42.25 AM-2048x1547.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div><p>The post <a href="https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/">Unlocking GCP Cost Optimization Using Recommendation and BigQuery: A FinsOps Guide</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/unlocking-gcp-cost-optimization-using-recommendation-and-bigquery-a-finsops-guide/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Distcp to Copy your HDFS data to GCP Cloud Storage</title>
		<link>https://bigdataproc.com/distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage</link>
					<comments>https://bigdataproc.com/distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Thu, 22 Feb 2024 17:53:13 +0000</pubDate>
				<category><![CDATA[GCP]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=461</guid>

					<description><![CDATA[<p>Copy HDFS data from on-prem to cloud storage using distcp</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage/">Continue reading<span class="screen-reader-text">Distcp to Copy your HDFS data to GCP Cloud Storage</span></a></div>
<p>The post <a href="https://bigdataproc.com/distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage/">Distcp to Copy your HDFS data to GCP Cloud Storage</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>A while back, I found myself deeply immersed in a Hadoop migration project where our cloud platform of choice was Google Cloud Platform (GCP). Our mission? To seamlessly transition data from on-premises infrastructure to the cloud. Due to various constraints, utilizing hardware wasn&#8217;t a viable option. Thus, I embarked on a quest to explore multiple software solutions to tackle this challenge.</p>



<p>For one-off migrations, Spark emerged as a favorable choice. It facilitated direct data migration to BigQuery, bypassing the intermediary step of storing it in cloud storage. However, there was a caveat: Spark lacked the ability to detect changes, necessitating a full refresh each time. This approach proved less than ideal, especially when dealing with substantial datasets.</p>



<p>My gaze then turned to Cloudera BDR, but alas, it didn&#8217;t support integration with Google Cloud. Left with no alternative, I delved into Distcp. In this blog post, I&#8217;ll guide you through the setup process for Distcp, enabling seamless data transfer from an on-prem HDFS cluster to Google Cloud Storage.</p>



<h2 class="wp-block-heading">Service Account Setup</h2>



<p>To begin, create a GCP service account with read/write permissions for the designated Google Cloud Storage bucket. Obtain the JSON key associated with this service account. This key will need to be distributed across all nodes involved in the migration process. For instance, I&#8217;ve opted to store it at <code>/tmp/sa-datamigonpremtobigquery.json</code>. Also make sure, the user with which you are going to run distcp command have access to this path. </p>



<h2 class="wp-block-heading">HDFS.conf</h2>



<p>Please store following file on edge node in your home directory.  please replace the value of <strong>fs.gs.project.id </strong>with your project id.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="xml" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">&lt;configuration>
  &lt;property>
    &lt;name>fs.AbstractFileSystem.gs.impl&lt;/name>
    &lt;value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS&lt;/value>
    &lt;description>The AbstractFileSystem for 'gs:' URIs.&lt;/description>
  &lt;/property>
  &lt;property>
    &lt;name>fs.gs.project.id&lt;/name>
    &lt;value>raw-bucket&lt;/value>
    &lt;description>
      Optional. Google Cloud Project ID with access to GCS buckets.
      Required only for list buckets and create bucket operations.
    &lt;/description>
  &lt;/property>
  &lt;property>
    &lt;name>google.cloud.auth.type&lt;/name>
    &lt;value>SERVICE_ACCOUNT_JSON_KEYFILE&lt;/value>
    &lt;description>
      Authentication type to use for GCS access.
    &lt;/description>
  &lt;/property>
  &lt;property>
    &lt;name>google.cloud.auth.service.account.json.keyfile&lt;/name>
    &lt;value>/tmp/sa-datamigonpremtobigquery.json&lt;/value>
    &lt;description>
      The JSON keyfile of the service account used for GCS
      access when google.cloud.auth.type is SERVICE_ACCOUNT_JSON_KEYFILE.
    &lt;/description>
  &lt;/property>

  &lt;property>
    &lt;name>fs.gs.checksum.type&lt;/name>
    &lt;value>CRC32C&lt;/value>
    &lt;description>
          https://cloud.google.com/architecture/hadoop/validating-data-transfers
  &lt;/description>
  &lt;/property>

  &lt;property>
    &lt;name>dfs.checksum.combine.mode&lt;/name>
    &lt;value>COMPOSITE_CRC&lt;/value>
    &lt;description>
          https://cloud.google.com/architecture/hadoop/validating-data-transfers
  &lt;/description>
  &lt;/property>
&lt;/configuration>
</pre>



<h2 class="wp-block-heading">Executing Transfer</h2>



<pre class="EnlighterJSRAW" data-enlighter-language="raw" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">hadoop --debug distcp --conf hdfs.conf -pc -update -v -log hdfs:///tmp/distcp_log hdfs:///tmp/ gs://raw-bucket/ </pre>
<p>The post <a href="https://bigdataproc.com/distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage/">Distcp to Copy your HDFS data to GCP Cloud Storage</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/distcp-to-copy-your-hdfs-data-to-gcp-cloud-storage/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Data Mesh implementation with GCP BigQuery</title>
		<link>https://bigdataproc.com/data-mesh-implementation-with-gcp-bigquery/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=data-mesh-implementation-with-gcp-bigquery</link>
					<comments>https://bigdataproc.com/data-mesh-implementation-with-gcp-bigquery/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Thu, 08 Feb 2024 14:27:35 +0000</pubDate>
				<category><![CDATA[bigquery]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=443</guid>

					<description><![CDATA[<p>Implement Data Mesh using GCP Bigquery Analytics Hub.</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/data-mesh-implementation-with-gcp-bigquery/">Continue reading<span class="screen-reader-text">Data Mesh implementation with GCP BigQuery</span></a></div>
<p>The post <a href="https://bigdataproc.com/data-mesh-implementation-with-gcp-bigquery/">Data Mesh implementation with GCP BigQuery</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In today&#8217;s data-driven landscape, organizations often have multiple teams comprising domain experts who create their own unique products. However, enabling other teams to leverage these products efficiently requires a structured approach. Enter data mesh—a paradigm that decentralizes data ownership and processing. In this guide, we&#8217;ll explore how to implement data mesh using Google Cloud Platform (GCP) BigQuery, empowering teams to manage their data products seamlessly.</p>



<h2 class="wp-block-heading">Setting up Domain-Specific GCP Projects </h2>



<p>Begin by assigning a dedicated GCP project to each domain or business team. This ensures that teams have the autonomy to develop and manage their data products within their respective environments. By segregating projects based on domains, teams can focus on their specific requirements without interfering with others&#8217; workflows.</p>



<h2 class="wp-block-heading">Development and Promotion Workflow</h2>



<p>Within their assigned GCP projects, domain teams develop their data products tailored to their expertise. These products undergo rigorous testing and refinement in the development environment. However, it&#8217;s crucial to avoid publishing directly from the development environment to prevent potential disruptions for subscribers. Frequent changes in the development phase can lead to compatibility issues and operational challenges for downstream users.</p>



<h2 class="wp-block-heading">Promotion to Higher Environments</h2>



<p>Once a data product is deemed ready for consumption, it&#8217;s promoted to higher environments, typically housed in different GCP projects. This transition ensures that only validated and stable versions of products are made available to subscribers. By segregating development and production environments, organizations can maintain data integrity and stability while minimizing disruptions to subscriber workflows.</p>



<h2 class="wp-block-heading">Publishing Data Products</h2>



<p>When promoting a data product to a higher environment, a team lead assumes the responsibility of publishing it. This involves orchestrating a seamless transition and ensuring that subscribers can access the updated version without interruptions.</p>



<p>Make sure <strong>Analytics Hub API</strong> is enabled for this.  </p>



<p>Follow these steps to publish your product:</p>



<ol class="wp-block-list">
<li>Navigate to GCP BigQuery and access Analytics Hub. Click on &#8220;Create Exchange.&#8221;</li>



<li>Depending on the nature of your product, provide the necessary details and proceed by clicking &#8220;Create Exchange.&#8221;<br><img decoding="async" width="500" height="517" class="wp-image-444" style="width: 500px;" src="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.26.52-PM.png" alt="create exchange to hold data products in GCP bigquery Analytics Hub" srcset="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.26.52-PM.png 567w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.26.52-PM-290x300.png 290w" sizes="(max-width: 500px) 100vw, 500px" /></li>



<li class="has-regular-font-size">At this stage, you&#8217;ll have the option to configure permissions for administration, publishing, subscription, and viewing of the listing. You can either set these permissions now or configure them later.<br><img loading="lazy" decoding="async" width="500" height="410" class="wp-image-445" style="width: 500px;" src="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.30.45-PM.png" alt="GCP Bigquery Analytics Hub Exchange Permission setting" srcset="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.30.45-PM.png 558w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.30.45-PM-300x246.png 300w" sizes="auto, (max-width: 500px) 100vw, 500px" /></li>



<li class="has-regular-font-size">Once the exchange is created, proceed to create a listing for it. A listing involves selecting the dataset you wish to publish. Currently, GCP BigQuery only allows choosing datasets only. And depends on what location you choose for your exchange you should be able to select datasets. <br><img loading="lazy" decoding="async" width="500" height="169" class="wp-image-446" style="width: 500px;" src="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.35.22-PM.png" alt="GCP bigquery Analytics hub listing. " srcset="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.35.22-PM.png 951w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.35.22-PM-300x102.png 300w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.35.22-PM-768x260.png 768w" sizes="auto, (max-width: 500px) 100vw, 500px" /></li>



<li class="has-regular-font-size">Provide the required Listing Details and Listing Contact Information corresponding to your data product. Once completed, you&#8217;ll be able to publish the dataset through the listing.<br><br><img loading="lazy" decoding="async" width="500" height="123" class="wp-image-447" style="width: 500px;" src="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.41.15-PM.png" alt="Listing ready to publish" srcset="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.41.15-PM.png 1837w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.41.15-PM-300x74.png 300w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.41.15-PM-1024x253.png 1024w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.41.15-PM-768x189.png 768w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.41.15-PM-1536x379.png 1536w" sizes="auto, (max-width: 500px) 100vw, 500px" /><img loading="lazy" decoding="async" width="500" height="146" class="wp-image-450" style="width: 500px;" src="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.54.48-PM.png" alt="Implement data mesh using GCP BigQuery Analytics Hub" srcset="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.54.48-PM.png 942w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.54.48-PM-300x88.png 300w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-1.54.48-PM-768x224.png 768w" sizes="auto, (max-width: 500px) 100vw, 500px" /></li>
</ol>



<h2 class="wp-block-heading">Subscribing to the Data Products (listings)</h2>



<ul class="wp-block-list">
<li>Once the Data Product is published other users can search and subscribe to this listing.  User would need access to create a new Dataset in the project they want to subscribe this new data products. </li>
</ul>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%"><div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="506" src="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-2.02.09-PM-1-1024x506.png" alt="" class="wp-image-453" srcset="https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-2.02.09-PM-1-1024x506.png 1024w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-2.02.09-PM-1-300x148.png 300w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-2.02.09-PM-1-768x380.png 768w, https://bigdataproc.com/wp-content/uploads/2024/02/Screenshot-2024-02-07-at-2.02.09-PM-1.png 1117w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div></div>
</div>



<p></p>
<p>The post <a href="https://bigdataproc.com/data-mesh-implementation-with-gcp-bigquery/">Data Mesh implementation with GCP BigQuery</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/data-mesh-implementation-with-gcp-bigquery/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>GCP Bigquery &#8211; Find Partition column for each table</title>
		<link>https://bigdataproc.com/gcp-bigquery-find-partition-column-for-each-table/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gcp-bigquery-find-partition-column-for-each-table</link>
					<comments>https://bigdataproc.com/gcp-bigquery-find-partition-column-for-each-table/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Fri, 22 Sep 2023 13:25:25 +0000</pubDate>
				<category><![CDATA[bigquery]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=434</guid>

					<description><![CDATA[<p>Find partitioned column for each GCP bigquery table in your project to improve your query performance and cost.  </p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/gcp-bigquery-find-partition-column-for-each-table/">Continue reading<span class="screen-reader-text">GCP Bigquery &#8211; Find Partition column for each table</span></a></div>
<p>The post <a href="https://bigdataproc.com/gcp-bigquery-find-partition-column-for-each-table/">GCP Bigquery &#8211; Find Partition column for each table</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">What Are Partition Columns?</h2>



<p>In BigQuery, tables can be partitioned based on particular column. Partitioning involves dividing a large table into smaller, more manageable pieces or partitions. Each partition contains a subset of the data and is stored separately. The key idea behind partitioning is to allow BigQuery to scan and process only the partitions that are relevant to a query, rather than scanning the entire table. This can lead to dramatic improvements in query performance and cost savings.</p>



<h2 class="wp-block-heading">Why Are Partition Columns Important?</h2>



<h3 class="wp-block-heading">1. Query Performance</h3>



<p>Partitioning a table based on a meaningful column, such as a timestamp or date, can significantly speed up query execution. When you run a query that includes a filter condition on the partitioning column, BigQuery can prune irrelevant partitions, scanning only the data that meets the criteria. This reduces query execution time, especially when dealing with large datasets.</p>



<p>For example, if you have a daily time-series dataset and partition it by date, querying data for a specific date range becomes much faster because BigQuery only needs to scan the partitions corresponding to that range.</p>



<h3 class="wp-block-heading">2. Cost Efficiency</h3>



<p>Improved query performance isn&#8217;t the only benefit of partition columns. It also translates into cost savings. BigQuery charges you based on the amount of data processed during a query. By scanning fewer partitions, you reduce the amount of data processed, leading to lower query costs. This cost reduction can be substantial for organizations with large datasets and frequent queries.</p>



<h2 class="wp-block-heading">Query to Find partitioned column for all GCP bigquery tables </h2>



<p>And so, I was looking for a way to find partitioned column for each table in our project.  So we can improve our query performance and cost.  </p>



<p>Following query will help show you partitioned column for each Bigquery table in your GCP Project </p>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  table_catalog AS project_id,
  table_schema AS dataset,
  table_name,
  partition_column
FROM (
  SELECT
    table_catalog,
    table_schema,
    table_name,
    CASE
      WHEN is_partitioning_column = "YES" THEN column_name
    ELSE
    NULL
  END
    AS partition_column,
    ROW_NUMBER() OVER(PARTITION BY table_catalog, table_schema, table_name ORDER BY is_partitioning_column DESC) AS rnk
  FROM
    &lt;your_project>.&lt;your_region>.INFORMATION_SCHEMA.COLUMNS
  )a
WHERE
  a.rnk = 1</pre>
<p>The post <a href="https://bigdataproc.com/gcp-bigquery-find-partition-column-for-each-table/">GCP Bigquery &#8211; Find Partition column for each table</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/gcp-bigquery-find-partition-column-for-each-table/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>GCP &#8211; Create Custom Bigquery Linage using DataCatalog Python API</title>
		<link>https://bigdataproc.com/gcp-create-custom-bigquery-linage-using-datacatalog-python-api/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gcp-create-custom-bigquery-linage-using-datacatalog-python-api</link>
					<comments>https://bigdataproc.com/gcp-create-custom-bigquery-linage-using-datacatalog-python-api/#comments</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Mon, 31 Jul 2023 15:33:46 +0000</pubDate>
				<category><![CDATA[bigquery]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[dataplex]]></category>
		<category><![CDATA[gcp]]></category>
		<category><![CDATA[linage]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=402</guid>

					<description><![CDATA[<p>Blogpost shows how to create a custom linage using dataplex custom linage python client for your bigquery tables, if those tables are being ingested/modified by any external system. </p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/gcp-create-custom-bigquery-linage-using-datacatalog-python-api/">Continue reading<span class="screen-reader-text">GCP &#8211; Create Custom Bigquery Linage using DataCatalog Python API</span></a></div>
<p>The post <a href="https://bigdataproc.com/gcp-create-custom-bigquery-linage-using-datacatalog-python-api/">GCP &#8211; Create Custom Bigquery Linage using DataCatalog Python API</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In our GCP (google cloud platform) data warehousing workflow, we rely on GCP BigQuery for storing and analyzing data. However, the data ingestion process involves a different service that does not automatically show lineage in BigQuery. To address this limitation, I developed a Python utility that enables the creation of custom lineage for ingestion jobs using Dataplex Custom Linage Python Client.</p>



<p>Custom lineage creation involves three key tasks, each serving an essential purpose:</p>



<ol class="wp-block-list">
<li><strong>Create a Lineage Process: </strong>This step allows us to define a name for the lineage process. Leveraging GCP Cloud Composer, I often use the DAG name as the process name, facilitating seamless linking of the ingestion tables to their respective processes.</li>



<li><strong>Create the Run:</strong> For every execution of the above process, we should create a new run. I assign the task ID as the run name, ensuring a unique identifier for each run.</li>



<li><strong>Create a Lineage Event:</strong> In the final task, I specify the source and target mapping along with associated details, effectively establishing the lineage relationship between the datasets.</li>
</ol>



<p> </p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="313" src="https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Process-1024x313.png" alt="Image depicting the GCP BigQuery Custom Lineage Process." class="wp-image-412" srcset="https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Process-1024x313.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Process-300x92.png 300w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Process-768x235.png 768w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Process-1536x470.png 1536w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Process-2048x627.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Exploring the data lineage process using GCP BigQuery and dataplex custom lineage python client.</figcaption></figure></div>

<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="395" src="https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Runs-1024x395.png" alt="Bigquery Linage Runs" class="wp-image-411" srcset="https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Runs-1024x395.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Runs-300x116.png 300w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Runs-768x296.png 768w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Runs.png 1202w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Create Bigquery Custom Linage Runs using Dataplex Custom Linage Python Client</figcaption></figure></div>

<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="429" src="https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Run-Details-1024x429.png" alt="Bigquery Custom Run Details" class="wp-image-410" srcset="https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Run-Details-1024x429.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Run-Details-300x126.png 300w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Run-Details-768x322.png 768w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Run-Details-1536x644.png 1536w, https://bigdataproc.com/wp-content/uploads/2023/07/GCP-Bigquery-Custom-Linage-Run-Details.png 1550w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>


<p>Please find the entire code snippet on github</p>



<p><a href="https://gist.github.com/Gaurang033/01ab9d4cedfb1049dd23dd30cd88cdad" target="_blank" rel="noreferrer noopener">https://gist.github.com/Gaurang033/01ab9d4cedfb1049dd23dd30cd88cdad</a></p>



<h2 class="wp-block-heading">Install Dependencies </h2>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">google-cloud-datacatalog-lineage==0.2.3</pre>



<h2 class="wp-block-heading">Create Custom Linage Process</h2>



<p>For process you can also add custom attributes,  I have given an example of owner, framework and service. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="false" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def create_linage_process(project_id, process_display_name):
    parent = f"projects/{project_id}/locations/northamerica-northeast1"
    process = Process()
    process.display_name = process_display_name
    process.attributes = {
        "owner": "gaurangnshah@gmail.com",
        "framework": "file_ingestion_framework",
        "service": "databricks"
    }

    response = client.create_process(parent=parent, process=process)
    return response.name</pre>



<h2 class="wp-block-heading">Create Custom Linage Run </h2>



<p>following code will help you create the custom run for the linage process we created </p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def create_run(process_id, start_time, end_time, state, run_display_name):
    run = lineage_v1.Run()
    run.start_time = start_time
    run.end_time = end_time
    run.state = state
    run.display_name = run_display_name
    run.attributes = {
        "owner": "gaurang",
        "purpose": "Testing Linage"
    }

    request = lineage_v1.CreateRunRequest(parent=process_id, run=run)
    response = client.create_run(request=request)
    logger.info(f"New run Created {response.name}")
    return response.name</pre>



<h2 class="wp-block-heading">Create Custom Linage Event </h2>



<p>once you have linage run created you need to attach an even to that, event is nothing but source to target mapping.  for both source and target you need to use fully qualified name with proper protocols.  please visit following page to see all the supported protocols for source and target FQDN </p>



<p><a href="https://cloud.google.com//data-catalog/docs/fully-qualified-names" target="_blank" rel="noreferrer noopener">https://cloud.google.com//data-catalog/docs/fully-qualified-names</a></p>



<p></p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def create_lineage_event(run_id, source_fqdn, target_fqdn, start_time, end_time):
    source = lineage_v1.EntityReference()
    target = lineage_v1.EntityReference()
    source.fully_qualified_name = source_fqdn
    target.fully_qualified_name = target_fqdn
    links = [EventLink(source=source, target=target)]
    lineage_event = LineageEvent(links=links, start_time=start_time, end_time=end_time)

    request = lineage_v1.CreateLineageEventRequest(parent=run_id, lineage_event=lineage_event)
    response = client.create_lineage_event(request=request)
    print("Lineage event created: %s", response.name)</pre>



<h2 class="wp-block-heading">Update Custom Linage Process</h2>



<p>For us, it&#8217;s a same process which ingest new file into table, rather than creating new process every time, I am just updating the existing process to add new run and linage event. </p>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">def create_custom_linage_for_ingestion(project_id, process_display_name, source, target, start_time, end_time, state,
                                       run_display_name):
    process_id = create_linage_process(project_id, process_display_name=process_display_name)
    run_id = create_run(process_id=process_id, start_time=start_time, end_time=end_time, state=state,
                        run_display_name=run_display_name)
    create_lineage_event(run_id=run_id, start_time=start_time, end_time=end_time, source_fqdn=source,
                         target_fqdn=target)


def _get_process_id(project_id, process_display_name):
    parent = f"projects/{project_id}/locations/northamerica-northeast1"
    processes = client.list_processes(parent=parent)
    for process in processes:
        if process.display_name == process_display_name:
            return process.name
    return None


def _convert_to_proto_timestamp(timestamp):
    return timestamp.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + "Z"</pre>



<h2 class="wp-block-heading">How To Run? </h2>



<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">if __name__ == '__main__':
    project_id = "&lt;your_project_id>"
    process_display_name = "INGESTION_DAG_NAME"  ## DAG NAME
    source = "path:gs://&lt;your_bucket_name>/test_schema/test_20230604.csv"
    target = "bigquery:&lt;project_id>.gaurang.test_custom_linage"

    start_time = datetime.now() - timedelta(hours=3)
    process_start_time = _convert_to_proto_timestamp(start_time)  # Start time dag
    process_end_time = _convert_to_proto_timestamp(datetime.now())  # End Time

    state = "COMPLETED"
    run_display_name = "TASK_RUN_ID"
    create_or_update_custom_linage_for_ingestion(project_id, process_display_name, source, target, process_start_time,
                                                 process_end_time, state, run_display_name)</pre>
<p>The post <a href="https://bigdataproc.com/gcp-create-custom-bigquery-linage-using-datacatalog-python-api/">GCP &#8211; Create Custom Bigquery Linage using DataCatalog Python API</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/gcp-create-custom-bigquery-linage-using-datacatalog-python-api/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Google Bigquery &#8211; Find Query Cost By User</title>
		<link>https://bigdataproc.com/google-bigquery-find-query-cost-by-user/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=google-bigquery-find-query-cost-by-user</link>
					<comments>https://bigdataproc.com/google-bigquery-find-query-cost-by-user/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Fri, 02 Jun 2023 18:02:30 +0000</pubDate>
				<category><![CDATA[bigquery]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[looker studio]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=393</guid>

					<description><![CDATA[<p> Discover how to optimize BigQuery costs on Google Cloud Platform (GCP) by identifying users responsible for high query execution expenses. Learn effective strategies, including the use of BigQuery labels and the Job Information Schema, to educate users on cost-efficient query execution and achieve desired data analysis outcomes.</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/google-bigquery-find-query-cost-by-user/">Continue reading<span class="screen-reader-text">Google Bigquery &#8211; Find Query Cost By User</span></a></div>
<p>The post <a href="https://bigdataproc.com/google-bigquery-find-query-cost-by-user/">Google Bigquery &#8211; Find Query Cost By User</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>The organization I am currently consulting has recently migrated to the Google Cloud Platform (GCP) to leverage its powerful services, including Google BigQuery for efficient big data analysis. However, we have observed a significant increase in Query Execution costs and deemed it necessary to investigate the users or teams responsible for these expenses. By identifying the high spenders, we can provide them with valuable insights on optimizing query execution to minimize costs. It&#8217;s important to note that these users are transitioning from an on-premises environment where a CAPEX model was implemented, and they may not be fully aware of the cost implications associated with every query on GCP&#8217;s BigQuery. We aim to educate them on optimizing their queries to achieve the desired output while minimizing expenses effectively .</p>



<p>In this blog post, we will explore effective strategies to identify which teams or users are driving up costs</p>



<h2 class="wp-block-heading">Use BigQuery Labels for Cost Attribution</h2>



<p> To track query costs accurately, one option is to employ BigQuery labels. Although this method requires users to set labels manually before executing queries, it provides granular cost attribution. However, relying solely on users&#8217; compliance may not always yield optimal results.</p>



<h2 class="wp-block-heading">Leverage BigQuery Job Information Schema </h2>



<p>BigQuery maintains detailed information for each job execution, including user details, slot utilization, and data processed. By querying the job information schema, you can calculate the query analysis cost per user accurately.</p>



<p> Ensure that the following permissions are granted to run this query: </p>



<ul class="wp-block-list">
<li>bigquery.resourceViewer</li>



<li>bigquery.metadataViewer</li>
</ul>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  user_email,
  SUM(total_cost) total_cost_per_user
FROM (
  SELECT
    reservation_id,
    user_email,
    CASE
      WHEN reservation_id IS NULL THEN (SUM(total_bytes_processed)/1024/1024/1024/1024)*5 -- 5 USD by TB processed 
      WHEN reservation_id is not null and reservation_id &lt;> "default-pipeline" then (SUM(jbo.total_slot_ms)/(1000*60*60))*0.069 -- 0.69 USD per slot hour for northamerica-northeast1
  END
    AS total_cost
  FROM
    region-northamerica-northeast1.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION jbo
  WHERE
    DATE(creation_time) >= "2023-05-01" --change the filter 
  GROUP BY
    reservation_id,
    user_email )
GROUP BY
  user_email
ORDER BY
  total_cost_per_user DESC</pre>



<h2 class="wp-block-heading">Understand the Limitations of Cost Calculation using Information Schema</h2>



<p>If your organization is utilizing on-demand pricing in BigQuery, the cost calculated through the information schema will closely align with the cost report. </p>



<p>However, if you organization is using auto-scalling slots, cost calculation through the information schema may not provide accurate results. While the information schema captures slot utilization during query execution, it doesn&#8217;t account for slots used during scale-up, scale-down, or the cooldown period. As a result, there may be discrepancies between the cost reported in the information schema and the actual cost shown in the cost report. This difference becomes more prominent for queries with shorter execution times (within 1 minute).</p>



<h2 class="wp-block-heading">Looker Studio Reports for Quick Analysis and Visualization</h2>



<p>To streamline the process of extracting query cost information, consider creating Looker Studio reports. These reports offer date filters, enabling quick access to the desired information. Additionally, Looker Studio reports provide a visual representation of query costs, facilitating a better understanding of cost trends and patterns.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="565" src="https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-1024x565.jpg" alt="" class="wp-image-399" srcset="https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-1024x565.jpg 1024w, https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-300x165.jpg 300w, https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-768x423.jpg 768w, https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user.jpg 1090w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div><p>The post <a href="https://bigdataproc.com/google-bigquery-find-query-cost-by-user/">Google Bigquery &#8211; Find Query Cost By User</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/google-bigquery-find-query-cost-by-user/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
