Since Airflow is not directly using OAuth but instead relies on the flask-oauthlib, there is no single place where one can find how to configure and work with Keycloak and Airflow. Hence writing one, hoping it’ll be useful to someone.
What is..
Airflow : Is a tool for workflow management. It’s a GUI based tool where the user can control and track progress of a workflow. There are backend components to handle the triggering of the workflow artifacts. For this post the version used is 1.10.10
Keycloak: Gives OIDC and OAuth interfaces for SSO, User Federation and Authorization. (And much more).
Why
Airflow supports creating users via CLI, it serves well for testing purposes. But in production, standard login and authorization interface needs to be followed, and since OIDC and OAuth are the most prevalent ones, it makes sense to configure Airflow with Keycloak.
How
Configuring Keycloak
Download keycloak from : https://www.keycloak.org/downloads
Copy to any directory on Linux machine and extract the contents from the tar file
Go to bin and start the keycloak server with below command line
./standalone.sh -b "0.0.0.0" -bmanagement "0.0.0.0"Call the addUser script to create an admin user
Login to the Admin console, create a realm. Or you could connect to any existing realm if it’s already created.
Create a client for Airflow or use an existing client. Example:
The Credentials tab contains the client id and password, note that since it will be used later on in Airflow configuration.
Create a user or use existing users
Note Airflow calls the userinfo end point to get the user details, for this it needs the id attribute on the user mapped to the id token attribute
This attribute needs to be mapped in the client mapping
This configures a client and user in Keycloak. Later in the post we will cover the https setting.
Configuring Airflow
Install the following plugins
pip install Flask-OIDC
pip install Flask-OAuthlibMake the following changes in webserver.py
import os os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1' from airflow import configuration as conf from flask_appbuilder.security.manager import AUTH_DB from flask_appbuilder.security.manager import AUTH_OAUTH basedir = os.path.abspath(os.path.dirname(__file__)) AUTH_TYPE = AUTH_OAUTH AUTH_USER_REGISTRATION = True AUTH_USER_REGISTRATION_ROLE = "Admin" OAUTH_PROVIDERS = [{ 'name':'google', 'token_key':'access_token', 'icon':'fa-google', 'remote_app': { 'base_url':'<keycloak host and port>/auth/realms/<your realm>/protocol/openid-connect/', 'request_token_params':{ 'scope': 'email profile' }, 'access_token_url':'<keycloak host and port>/auth/realms/<your realm>/protocol/openid-connect/token', 'authorize_url':'<keycloak host and port>/auth/realms/<your realm>/protocol/openid-connect/auth', 'consumer_key': '<client id in keycloak>', 'consumer_secret': '<client password>' } }]
Note the following:
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
This is to enable working with http connection of Keycloak.
provider: google
This is one of the providers supported by Airflow, somehow there are hardcoded providers. Keycloak is not one of them, so we will work with google anyways.
Start the webserver, the login page will look like so, click on the ‘G’ and then Sign In:
It will take you to the Keycloak login, key in the user credentials:
On successful login, it should take you back to the home page with DAG lists
Handling HTTPS
Keycloak standalone comes with self signed certificate. If you see in the standalone/configuration directory, there is a standalone.xml pointing to application.keystore
This certificate, Airflow cannot verify. We need to do the following:
Create CA certificate and use this to sign server certificate for Keycloak. This answer on stackoverflow was helpful. Note the DNS entry when creating the server certificate should point to the Keycloak host.
Convert the PEM private key and certificate to JKS for configuring in Keycloak
openssl pkcs12 -inkey serverkey.pem -in servercert.pem -export -out keys.pkcs12 -nodes
keytool -importkeystore -srckeystore keys.pkcs12 -srcstoretype pkcs12 -destkeystore generated.jks
Configure this generated jks keystore in Keycloak like so
The CA certificate should be configured in the webserver.py of Airflow like so
#os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1' this is not needed
os.environ['SSL_CERT_FILE']='<path to CA cert file>'
Change all the http to https for Keycloak URLs in the webserver.py
Enjoy!
Problems
Need to handle refreshing the OAuth token, is not coming from the framework
Logout from Airflow does not logout the user from Keycloak - the session is still on, need to invoke logout API specifically. Refer this: https://github.com/apache/airflow/issues/11305
All users get the same role as
AUTH_USER_REGISTRATION_ROLE,
hence need to handle RBAC differently
I am using Airflow 2.1.0 and Keycloak with a self-signed certificate and, no matter what I do, after Keycloak login I get the error "SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate". Do you have any suggestions?