Superset Integration
Your Hydrolix stack includes Superset, an open-source data visualization platform. You can use Superset to query, visualize, and analyze data ingested into Hydrolix.
Accessing your stack's Superset
Your stack's Superset instance runs its web interface at port 9088. To visit it in your web browser, go to https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live:9088
.
The first time you visit your stack's Superset, log in with username admin
and password admin
, for release higher than 2.11.7 the default password is $host-rogue-derby
. You should then proceed to update that account's password by selecting Settings > Info from Superset's main menu bar, then clicking the 🔒 Reset Password button. If you wish, you can also create non-admin accounts via Settings > List Users.
Connecting Superset to Hydrolix
Superset has a default connection to Hydrolix created:
This connection is using SQL Alchemy driver and connecting through a local NGINX proxy which load balances the request to the head service.
The connection URL to use is:
clickhouse+native://127.0.0.1/sample_project
This will instruct superset to connect locally on the load balancer, and to use the default project sample_project
If you find yourself unable to establish a connection here, please contact Hydrolix support.
Working with your data in Superset
Once Superset recognizes your Hydrolix stack as a datasource, you can use it to power every Superset feature from simple SQL queries to complex visualizations.
To do this, when setting up a new query or dataset within Superset, select your Hydrolix-based database as its datasource:
Note that Superset attaches the label clickhouse to the name you chose for your Hydrolix datasource. This references the Clickhouse SQL driver that Superset uses to communicate with Hydrolix's query engine.
Customise your superset
By default Hydrolix is using a standard configuration for superset:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# This file is included in the final Docker image and SHOULD be overridden when
# deploying the image to prod. Settings configured here are intended for use in local
# development environments. Also note that superset_config_docker.py is imported
# as a final step as a means to override "defaults" configured here
#
import logging
import os
from cachelib.redis import RedisCache
from celery.schedules import crontab
logger = logging.getLogger()
def get_env_variable(var_name, default=None):
"""Get the environment variable or raise exception."""
try:
return os.environ[var_name]
except KeyError:
if default is not None:
return default
else:
error_msg = "The environment variable {} was missing, abort...".format(var_name)
raise EnvironmentError(error_msg)
DATABASE_DIALECT = get_env_variable("DATABASE_DIALECT")
DATABASE_USER = get_env_variable("DATABASE_USER")
DATABASE_PASSWORD = get_env_variable("DATABASE_PASSWORD")
DATABASE_HOST = get_env_variable("DATABASE_HOST")
DATABASE_PORT = get_env_variable("DATABASE_PORT")
DATABASE_DB = get_env_variable("DATABASE_DB")
SECRET_KEY = get_env_variable("SUPERSET_SECRET_KEY")
# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = "%s://%s:%s@%s:%s/%s" % (
DATABASE_DIALECT,
DATABASE_USER,
DATABASE_PASSWORD,
DATABASE_HOST,
DATABASE_PORT,
DATABASE_DB,
)
REDIS_HOST = get_env_variable("REDIS_HOST")
REDIS_PORT = get_env_variable("REDIS_PORT")
REDIS_CELERY_DB = get_env_variable("REDIS_CELERY_DB", 0)
REDIS_RESULTS_DB = get_env_variable("REDIS_RESULTS_DB", 1)
class CeleryConfig(object):
BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks")
CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
CELERYD_LOG_LEVEL = "DEBUG"
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_ACKS_LATE = False
CELERYBEAT_SCHEDULE = {
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute=10, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
FEATURE_FLAGS = {"ALERT_REPORTS": True}
ALERT_REPORTS_NOTIFICATION_DRY_RUN = True
WEBDRIVER_BASEURL = "http://superset:8088/"
# The base URL for the email report hyperlinks.
WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL
SQLLAB_CTAS_NO_LIMIT = True
RESULTS_BACKEND = RedisCache(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_RESULTS_DB)
CACHE_CONFIG = {
"CACHE_TYPE": "redis",
"CACHE_DEFAULT_TIMEOUT": 60 * 60 * 8, # seconds
"CACHE_KEY_PREFIX": "superset_results",
"CACHE_REDIS_URL": f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}",
}
DATA_CACHE_CONFIG = {
"CACHE_TYPE": "redis",
"CACHE_DEFAULT_TIMEOUT": 60 * 60 * 8, # seconds
"CACHE_KEY_PREFIX": "superset_results",
"CACHE_REDIS_URL": f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}",
}
ENABLE_PROXY_FIX = True
PROXY_FIX_CONFIG = {"x_for": 1, "x_proto": 1, "x_host": 1, "x_port": 1, "x_prefix": 0}
FEATURE_FLAGS = {"DASHBOARD_NATIVE_FILTERS": True, "ENABLE_TEMPLATE_PROCESSING": True}
try:
import superset_config_local
from superset_config_local import * # noqa
logger.info(f"Loaded your Docker configuration at " f"[{superset_config_local.__file__}]")
except ImportError:
logger.info("Using default Docker config...")
except Exception:
logger.exception("error loading local config")
You can update this configuration file to include specific features which fits your need.
For example you can add the following variable:
MAPBOX_API_KEY = ''
with API Key to provide maps into your dashboards.
You may also want to use a different authentication mechanism such as OAuth.
You can create your own configuration file and override completely the one provided by Hydrolix, using hdxctl
.
An example would be:
hdxctl files set hdxcli-xxxx superset_config.py superset
Don't override the whole config
By default superset allows you to add extra functionality but you don't need to add the full config, you can add only the variable you need to set.
This doesn't update the cluster
The command only saves the configuration file into S3, to apply this configuration to Grafana you need to update your cluster to restart Grafana and uses the new settings.
hdxctl update hdxcli-abcde hdx-xyz
Setting up OAuth
This is an example setup using Google OAuth:
- Go to https://console.developers.google.com/apis/credentials
- Click Create Credentials, then click OAuth Client ID in the drop-down menu
- Enter the following:
- Application Type: Web Application
- Name: Superset
- Authorized JavaScript Origins:
https://$host.hydrolix.live
- Authorized Redirect URLs:
https://$host.hydrolix.live:9088/oauth-authorized/google
- Click Create
- Copy the Client ID and Client Secret from the ‘OAuth Client’ modal
- On your the host running HDXCTL create a new file like the following (superset_config.py):
from flask_appbuilder.security.manager import AUTH_OID, AUTH_REMOTE_USER, AUTH_DB, AUTH_LDAP, AUTH_OAUTH
CSRF_ENABLED = True
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = 'Admin'
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
'name': 'google',
'icon': 'fa-google',
'token_key': 'access_token',
'remote_app': {
'client_id' : 'XXXXXXXXXXXXXXX.apps.googleusercontent.com',
'client_secret' : 'YYYYYYYYYYYYYYY',
'api_base_url': 'https://www.googleapis.com/oauth2/v2/',
'client_kwargs': {
'scope': 'email profile'
},
'request_token_url': None,
'access_token_url': 'https://accounts.google.com/o/oauth2/token',
'authorize_url': 'https://accounts.google.com/o/oauth2/auth',
}
}
]
This will instruct superset to allow new signup users using the OAuth ID created in google cloud console, and the new users will be registered as Admin, you can change the role to something else by modifying the configuration.
- Save your superset configuration file:
hdxctl files set hdxcli-xxxx superset_config.py superset
- Update your cluster to take into account the new configuration file:
hdxctl update hdxcli-xxxx hdx-xyz
Changing default timeout
By default superset is configured with a timeout of 60s for charts and dashboards, but sometime you may requesting more data and this timeout might be too short.
You can change the default timeout value from 60s by changing the configuration file and include the following variable:
SUPERSET_WEBSERVER_TIMEOUT = 300
Once you have modified your configuration file you need to set it, like in the example above
After setting your configuration you also need to update the worker timeout, you can do that by using hdxctl
hdxctl update --superset-timeout 300 hdxcli-xxxx hdx-xyz
Using Query Pools with User Identification
Hydrolix allows you to select alternate query pools with query option hdx_query_pool_name
. You can also add a comment to the query with hdx_query_admin_comment
using HTTP headers, parameters or a custom SETTINGS clause.
In this example, we change the default query pool for users who have the role soc
and use a dedicated query pool. We also add automatically the name of user generating the query, whether it's on SQL Labs or in dashboards.
import re
def SQL_QUERY_MUTATOR(sql, username, security_manager, database):
## Enable query modification only for SELECT statement
select_search = re.search(r'^SELECT', sql, flags=re.I)
if select_search:
## Get current user information
user = security_manager.find_user(username=username)
## Get user current roles
user_roles = security_manager.get_user_roles(user)
## Check if user has the role named SOC and assigned pool SOC
for user_role in user_roles:
if user_role.name == "soc":
pool = "soc"
else:
## otherwise default query pool
pool = "query-peer0"
## Check if the query already contains SETTINGS instruction
settings_search = re.search(r'.*(SETTINGS\s+hdx_.*)\s+LIMIT', sql, flags=re.I|re.M)
to_ret = ""
## If query has settings, extract settings information and query
## If query has settings adding a comma to separate with our own
## extra settings that will be added
if settings_search:
settings = settings_search.group(1)
query = re.sub(settings, '', sql, flags=re.I|re.M)
to_ret += f"{query} {settings}, "
else:
to_ret += f"{sql} SETTINGS "
## Add user information settings and query pool to query
to_ret += f"hdx_query_admin_comment='User: {user}', hdx_query_pool_name='{pool}'"
return to_ret
else:
return sql
By providing user information in the comment settings, it allows you to monitor active queries and the person running the query. This is part of our monitoring query dashboard:
Getting more help
If you need more help with having your Superset and Hydrolix installations working together, or you'd just like to learn more about this integration, please contact Hydrolix support.
Updated 15 days ago