<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>looker studio Archives - Big Data Processing</title>
	<atom:link href="https://bigdataproc.com/tag/looker-studio/feed/" rel="self" type="application/rss+xml" />
	<link>https://bigdataproc.com/tag/looker-studio/</link>
	<description>Big Data Solution for GCP, AWS, Azure and on-prem</description>
	<lastBuildDate>Thu, 26 Oct 2023 13:08:40 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Google Bigquery &#8211; Find Query Cost By User</title>
		<link>https://bigdataproc.com/google-bigquery-find-query-cost-by-user/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=google-bigquery-find-query-cost-by-user</link>
					<comments>https://bigdataproc.com/google-bigquery-find-query-cost-by-user/#respond</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Fri, 02 Jun 2023 18:02:30 +0000</pubDate>
				<category><![CDATA[bigquery]]></category>
		<category><![CDATA[GCP]]></category>
		<category><![CDATA[looker studio]]></category>
		<category><![CDATA[gcp]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=393</guid>

					<description><![CDATA[<p> Discover how to optimize BigQuery costs on Google Cloud Platform (GCP) by identifying users responsible for high query execution expenses. Learn effective strategies, including the use of BigQuery labels and the Job Information Schema, to educate users on cost-efficient query execution and achieve desired data analysis outcomes.</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/google-bigquery-find-query-cost-by-user/">Continue reading<span class="screen-reader-text">Google Bigquery &#8211; Find Query Cost By User</span></a></div>
<p>The post <a href="https://bigdataproc.com/google-bigquery-find-query-cost-by-user/">Google Bigquery &#8211; Find Query Cost By User</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>The organization I am currently consulting has recently migrated to the Google Cloud Platform (GCP) to leverage its powerful services, including Google BigQuery for efficient big data analysis. However, we have observed a significant increase in Query Execution costs and deemed it necessary to investigate the users or teams responsible for these expenses. By identifying the high spenders, we can provide them with valuable insights on optimizing query execution to minimize costs. It&#8217;s important to note that these users are transitioning from an on-premises environment where a CAPEX model was implemented, and they may not be fully aware of the cost implications associated with every query on GCP&#8217;s BigQuery. We aim to educate them on optimizing their queries to achieve the desired output while minimizing expenses effectively .</p>



<p>In this blog post, we will explore effective strategies to identify which teams or users are driving up costs</p>



<h2 class="wp-block-heading">Use BigQuery Labels for Cost Attribution</h2>



<p> To track query costs accurately, one option is to employ BigQuery labels. Although this method requires users to set labels manually before executing queries, it provides granular cost attribution. However, relying solely on users&#8217; compliance may not always yield optimal results.</p>



<h2 class="wp-block-heading">Leverage BigQuery Job Information Schema </h2>



<p>BigQuery maintains detailed information for each job execution, including user details, slot utilization, and data processed. By querying the job information schema, you can calculate the query analysis cost per user accurately.</p>



<p> Ensure that the following permissions are granted to run this query: </p>



<ul class="wp-block-list">
<li>bigquery.resourceViewer</li>



<li>bigquery.metadataViewer</li>
</ul>



<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  user_email,
  SUM(total_cost) total_cost_per_user
FROM (
  SELECT
    reservation_id,
    user_email,
    CASE
      WHEN reservation_id IS NULL THEN (SUM(total_bytes_processed)/1024/1024/1024/1024)*5 -- 5 USD by TB processed 
      WHEN reservation_id is not null and reservation_id &lt;> "default-pipeline" then (SUM(jbo.total_slot_ms)/(1000*60*60))*0.069 -- 0.69 USD per slot hour for northamerica-northeast1
  END
    AS total_cost
  FROM
    region-northamerica-northeast1.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION jbo
  WHERE
    DATE(creation_time) >= "2023-05-01" --change the filter 
  GROUP BY
    reservation_id,
    user_email )
GROUP BY
  user_email
ORDER BY
  total_cost_per_user DESC</pre>



<h2 class="wp-block-heading">Understand the Limitations of Cost Calculation using Information Schema</h2>



<p>If your organization is utilizing on-demand pricing in BigQuery, the cost calculated through the information schema will closely align with the cost report. </p>



<p>However, if you organization is using auto-scalling slots, cost calculation through the information schema may not provide accurate results. While the information schema captures slot utilization during query execution, it doesn&#8217;t account for slots used during scale-up, scale-down, or the cooldown period. As a result, there may be discrepancies between the cost reported in the information schema and the actual cost shown in the cost report. This difference becomes more prominent for queries with shorter execution times (within 1 minute).</p>



<h2 class="wp-block-heading">Looker Studio Reports for Quick Analysis and Visualization</h2>



<p>To streamline the process of extracting query cost information, consider creating Looker Studio reports. These reports offer date filters, enabling quick access to the desired information. Additionally, Looker Studio reports provide a visual representation of query costs, facilitating a better understanding of cost trends and patterns.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="1024" height="565" src="https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-1024x565.jpg" alt="" class="wp-image-399" srcset="https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-1024x565.jpg 1024w, https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-300x165.jpg 300w, https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user-768x423.jpg 768w, https://bigdataproc.com/wp-content/uploads/2023/06/bq_cost_per_user.jpg 1090w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div><p>The post <a href="https://bigdataproc.com/google-bigquery-find-query-cost-by-user/">Google Bigquery &#8211; Find Query Cost By User</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/google-bigquery-find-query-cost-by-user/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Airflow Operational Dashboard using Bigquery and Looker Studio</title>
		<link>https://bigdataproc.com/airflow-operational-dashboard-using-bigquery-and-looker-studio/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=airflow-operational-dashboard-using-bigquery-and-looker-studio</link>
					<comments>https://bigdataproc.com/airflow-operational-dashboard-using-bigquery-and-looker-studio/#comments</comments>
		
		<dc:creator><![CDATA[Gaurang]]></dc:creator>
		<pubDate>Wed, 10 May 2023 16:53:50 +0000</pubDate>
				<category><![CDATA[Airflow]]></category>
		<category><![CDATA[airflow]]></category>
		<category><![CDATA[bigquery]]></category>
		<category><![CDATA[gcp]]></category>
		<category><![CDATA[looker studio]]></category>
		<guid isPermaLink="false">https://bigdataproc.com/?p=372</guid>

					<description><![CDATA[<p>create the operation dashboard for airflow (cloud composer) dags using bigquery and looker studio</p>
<div class="more-link-wrapper"><a class="more-link" href="https://bigdataproc.com/airflow-operational-dashboard-using-bigquery-and-looker-studio/">Continue reading<span class="screen-reader-text">Airflow Operational Dashboard using Bigquery and Looker Studio</span></a></div>
<p>The post <a href="https://bigdataproc.com/airflow-operational-dashboard-using-bigquery-and-looker-studio/">Airflow Operational Dashboard using Bigquery and Looker Studio</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In our daily operations, we rely on Airflow (Cloud Composer) to run hundreds of dags. While we have integrated it with ServiceNow and SMTP for airflow notifications, we found that these measures were insufficient in providing us with valuable insights. We needed a way to track the number of failed dags over a specific period, identify which dags were failing more frequently, and gain a comprehensive understanding of our workflow performance.</p>



<p>To address these challenges, we decided to create a Looker Studio dashboard by leveraging the power of BigQuery and redirecting Airflow (Cloud Composer) logs through a log sink. By storing our logs in BigQuery, we gained access to a wealth of data that allowed us to generate informative charts and visualizations. In this blog post, I will guide you through the step-by-step process of setting up this invaluable solution.</p>



<h2 class="wp-block-heading">Create a Log Sink </h2>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<p>To begin the process, navigate to the log router and create a sink using the following query. When configuring the sink, select the desired target as a BigQuery dataset where you want all the logs to be redirected. Additionally, it is recommended to choose a partitioned table, as this will optimize query performance, facilitate the cleanup of older logs, and reduce costs. By partitioning the table, BigQuery will scan less data when querying specific time ranges, resulting in faster results and more efficient resource usage.</p>



<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">"Marking task as"
resource.type="cloud_composer_environment"
log_name: "airflow-worker"
labels.workflow!="airflow_monitoring"</pre>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img decoding="async" width="569" height="809" src="https://bigdataproc.com/wp-content/uploads/2023/05/image.png" alt="log sink to redirect airflow logs to bigquery for looker studio dashboard" class="wp-image-374" srcset="https://bigdataproc.com/wp-content/uploads/2023/05/image.png 569w, https://bigdataproc.com/wp-content/uploads/2023/05/image-211x300.png 211w" sizes="(max-width: 569px) 100vw, 569px" /></figure>



<p></p>
</div>
</div>



<p>Upon successful  creation of the log sink you would be able to see <strong>airflow_worker </strong>table created in the dataset you have specified during the log sink configuration. </p>



<h2 class="wp-block-heading">Write a query to get insight </h2>



<p>Following query retrives the data from airflow_worker table and </p>



<ul class="wp-block-list"><li><strong>Adjusting timestamps to &#8220;America/Toronto&#8221; timezone:</strong> Airflow (Cloud Composer) logs are stored in UTC timestamps by default. However, our scheduler is set to the &#8220;America/Toronto&#8221; timezone. To ensure consistency, I&#8217;m converting the timestamps to the &#8220;America/Toronto&#8221; timezone. Keep in mind that you may need to modify this part based on your own timezone settings.</li><li><strong>Retrieving status information</strong>: The status of each Airflow DAG is captured in the <code>textPayload</code> column. I&#8217;m using regular expressions to extract one of three possible statuses: &#8220;Success,&#8221; &#8220;Fail,&#8221; or &#8220;Skip.&#8221; This allows us to easily identify the execution outcome of each DAG run.  <br>Since status information is only available at task level,  I am considering dag as failed if any task within the dag has failed.  if you choose to show information at task level you might need to modify this query. </li></ul>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<pre class="EnlighterJSRAW" data-enlighter-language="sql" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">SELECT
  *
FROM (
  SELECT
    DATE(timestamp, "America/Toronto") AS execution_date, -- this is UTC date 
    datetime(timestamp, "America/Toronto") as execution_timestamp, 
    labels.workflow,
    DATE(CAST(labels.execution_date AS timestamp), "America/Toronto") AS schedule_date, -- UTC Date 
    datetime(cast(labels.execution_date AS timestamp), "America/Toronto") AS schedule_timestamp, --- UTC timestamp 
    labels.try_number,
    CASE
      WHEN CONTAINS_SUBSTR(textPayload, "Success") THEN "success"
      WHEN CONTAINS_SUBSTR(textPayload, "SKIPPED") THEN "skip"
    ELSE
    "fail"
  END
    AS status,
    ROW_NUMBER() OVER(PARTITION BY labels.workflow, CAST(labels.execution_date AS timestamp)
    ORDER BY
      CASE
        WHEN CONTAINS_SUBSTR(textPayload, "Success") THEN 1
        WHEN CONTAINS_SUBSTR(textPayload, "SKIPPED") THEN 2
      ELSE
      0
    END
      ) AS rnk
  FROM
    <code data-enlighter-language="raw" class="EnlighterJSRAW">ss-org-logging-project.airflow_dags_status.airflow_worker</code> )
WHERE
  rnk = 1  </pre>



<h2 class="wp-block-heading">Dashboard. </h2>



<p>Unfortunately due to my organization policy I can&#8217;t share my looker studio dashboard outside my organization so you will have to create your own dashboard. I am uploading screenshot of my dashboard for your reference. </p>
</div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img decoding="async" src="https://bigdataproc.com/wp-content/uploads/2023/05/image-1.png" alt="airflow daily status" class="wp-image-378" width="904" height="583" srcset="https://bigdataproc.com/wp-content/uploads/2023/05/image-1.png 1205w, https://bigdataproc.com/wp-content/uploads/2023/05/image-1-300x193.png 300w, https://bigdataproc.com/wp-content/uploads/2023/05/image-1-1024x660.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/05/image-1-768x495.png 768w" sizes="(max-width: 904px) 100vw, 904px" /><figcaption>showing airflow daily status.</figcaption></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://bigdataproc.com/wp-content/uploads/2023/05/image-2-1024x662.png" alt="airflow prod weekly status" class="wp-image-379" width="768" height="497" srcset="https://bigdataproc.com/wp-content/uploads/2023/05/image-2-1024x662.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/05/image-2-300x194.png 300w, https://bigdataproc.com/wp-content/uploads/2023/05/image-2-768x496.png 768w, https://bigdataproc.com/wp-content/uploads/2023/05/image-2.png 1204w" sizes="auto, (max-width: 768px) 100vw, 768px" /><figcaption>Airflow Prod Weekly Filed DAG List</figcaption></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://bigdataproc.com/wp-content/uploads/2023/05/image-3-1024x670.png" alt="" class="wp-image-380" width="768" height="503" srcset="https://bigdataproc.com/wp-content/uploads/2023/05/image-3-1024x670.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/05/image-3-300x196.png 300w, https://bigdataproc.com/wp-content/uploads/2023/05/image-3-768x502.png 768w, https://bigdataproc.com/wp-content/uploads/2023/05/image-3.png 1196w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://bigdataproc.com/wp-content/uploads/2023/05/image-4-1024x544.png" alt="" class="wp-image-381" width="768" height="408" srcset="https://bigdataproc.com/wp-content/uploads/2023/05/image-4-1024x544.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/05/image-4-300x159.png 300w, https://bigdataproc.com/wp-content/uploads/2023/05/image-4-768x408.png 768w, https://bigdataproc.com/wp-content/uploads/2023/05/image-4.png 1200w" sizes="auto, (max-width: 768px) 100vw, 768px" /><figcaption>Airflow Prod DAG Failure List</figcaption></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://bigdataproc.com/wp-content/uploads/2023/05/image-5-1024x526.png" alt="" class="wp-image-382" width="768" height="395" srcset="https://bigdataproc.com/wp-content/uploads/2023/05/image-5-1024x526.png 1024w, https://bigdataproc.com/wp-content/uploads/2023/05/image-5-300x154.png 300w, https://bigdataproc.com/wp-content/uploads/2023/05/image-5-768x394.png 768w, https://bigdataproc.com/wp-content/uploads/2023/05/image-5.png 1190w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>
</div>
</div>



<p></p>
<p>The post <a href="https://bigdataproc.com/airflow-operational-dashboard-using-bigquery-and-looker-studio/">Airflow Operational Dashboard using Bigquery and Looker Studio</a> appeared first on <a href="https://bigdataproc.com">Big Data Processing </a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bigdataproc.com/airflow-operational-dashboard-using-bigquery-and-looker-studio/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
