When planning to use Google Cloud Composer (Airflow in GCP), there are few essential considerations to address before setup. While these can be configured post-setup, it would be a tedious and time-consuming task.
TimeZone for scheduler
he default time for the scheduler is UTC. This means if you schedule a DAG to run at 5 PM, it will run at 5 PM UTC, not your local time. Calculating this for each DAG deployment is impractical. It’s advisable to change the default scheduling time.
To change this:
- Navigate to your airflow instance
- Go to airflow configuration overrides.
- click on edit and choose the time zone you want for
core.default_timezone
Where to store airflow connection and variables
By default, Airflow (Google Cloud Composer) stores connections and variables within Airflow itself. However, it supports multiple external backends, including GCP, AWS, and HashiCorp Vault. Airflow does not version these connections or variables nor provides granular access control, making it prudent to store them externally. Organizational standards often require storing all secrets and certificates in a single system.
In our setup, we chose to store connections in HashiCorp Vault due to their sensitive nature, while non-sensitive variables remained in Airflow.
One key point to note: Airflow adds new backends as extra backends. If it cannot find a variable or connection in the external backend (e.g., Vault), it will search within Airflow itself.
Default Role assignment for All Airflow Users
Airflow has built-in RBAC with five main roles: public, viewer, user, op, admin. The default role assigned to all users in GCP is ‘op’.
If this role doesn’t fit your organizational needs, create a custom role and change the default role assignment.
In our scenario, the ‘op’ role includes permissions to create and maintain connections and variables. Since we maintain all connections in HashiCorp Vault, we didn’t want duplicates created within Airflow. Therefore, we created a custom role without these permissions and set it as the default role for all users. To change the default role, override webserver.rbac_user_registration_role
to the custom role.
By addressing these configurations early on, you can streamline your use of Google Cloud Composer and Airflow in GCP, ensuring efficient and secure operations.
Be First to Comment