Your Hydrolix stack includes Superset, an open-source data visualization platform. You can use Superset to query, visualize, and analyze data ingested into Hydrolix.

Accessing your stack's Superset

Your stack's Superset instance runs its web interface at port 9088. To visit it in your web browser, go to https://YOUR-HYDROLIX-HOSTNAME.hydrolix.live:9088.

The first time you visit your stack's Superset, log in with username admin and password admin, for release higher than 2.11.7 the default password is $host-rogue-derby. You should then proceed to update that account's password by selecting Settings > Info from Superset's main menu bar, then clicking the 🔒 Reset Password button. If you wish, you can also create non-admin accounts via Settings > List Users.

Connecting Superset to Hydrolix

Superset has a default connection to Hydrolix created:

This connection is using SQL Alchemy driver and we are connecting via a local NGINX proxy which load balances the request to the head service.
The connection URL to use is:

clickhouse+native://127.0.0.1/sample_project

This will instruct superset to connect locally on the load balancer, and to use the default project sample_project

If you find yourself unable to establish a connection here, please contact Hydrolix support.

Working with your data in Superset

Once Superset recognizes your Hydrolix stack as a datasource, you can use it to power every Superset feature from simple SQL queries to complex visualizations.

To do this, when setting up a new query or dataset within Superset, select your Hydrolix-based database as its datasource:

Selecting a Hydrolix-based datasourceSelecting a Hydrolix-based datasource

Note that Superset attaches the label clickhouse to the name you chose for your Hydrolix datasource. This references the Clickhouse SQL driver that Superset uses to communicate with Hydrolix's query engine.

Customise your superset

By default Hydrolix is using a standard configuration for superset:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
# This file is included in the final Docker image and SHOULD be overridden when
# deploying the image to prod. Settings configured here are intended for use in local
# development environments. Also note that superset_config_docker.py is imported
# as a final step as a means to override "defaults" configured here
#
import logging
import os

from cachelib.redis import RedisCache
from celery.schedules import crontab

logger = logging.getLogger()


def get_env_variable(var_name, default=None):
    """Get the environment variable or raise exception."""
    try:
        return os.environ[var_name]
    except KeyError:
        if default is not None:
            return default
        else:
            error_msg = "The environment variable {} was missing, abort...".format(var_name)
            raise EnvironmentError(error_msg)


DATABASE_DIALECT = get_env_variable("DATABASE_DIALECT")
DATABASE_USER = get_env_variable("DATABASE_USER")
DATABASE_PASSWORD = get_env_variable("DATABASE_PASSWORD")
DATABASE_HOST = get_env_variable("DATABASE_HOST")
DATABASE_PORT = get_env_variable("DATABASE_PORT")
DATABASE_DB = get_env_variable("DATABASE_DB")
SECRET_KEY = get_env_variable("SUPERSET_SECRET_KEY")

# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = "%s://%s:%[email protected]%s:%s/%s" % (
    DATABASE_DIALECT,
    DATABASE_USER,
    DATABASE_PASSWORD,
    DATABASE_HOST,
    DATABASE_PORT,
    DATABASE_DB,
)

REDIS_HOST = get_env_variable("REDIS_HOST")
REDIS_PORT = get_env_variable("REDIS_PORT")
REDIS_CELERY_DB = get_env_variable("REDIS_CELERY_DB", 0)
REDIS_RESULTS_DB = get_env_variable("REDIS_RESULTS_DB", 1)


class CeleryConfig(object):
    BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
    CELERY_IMPORTS = ("superset.sql_lab", "superset.tasks")
    CELERY_RESULT_BACKEND = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}"
    CELERYD_LOG_LEVEL = "DEBUG"
    CELERYD_PREFETCH_MULTIPLIER = 1
    CELERY_ACKS_LATE = False
    CELERYBEAT_SCHEDULE = {
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=10, hour=0),
        },
    }


CELERY_CONFIG = CeleryConfig

FEATURE_FLAGS = {"ALERT_REPORTS": True}
ALERT_REPORTS_NOTIFICATION_DRY_RUN = True
WEBDRIVER_BASEURL = "http://superset:8088/"
# The base URL for the email report hyperlinks.
WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL

SQLLAB_CTAS_NO_LIMIT = True

RESULTS_BACKEND = RedisCache(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_RESULTS_DB)

CACHE_CONFIG = {
    "CACHE_TYPE": "redis",
    "CACHE_DEFAULT_TIMEOUT": 60 * 60 * 8,  # seconds
    "CACHE_KEY_PREFIX": "superset_results",
    "CACHE_REDIS_URL": f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}",
}

DATA_CACHE_CONFIG = {
    "CACHE_TYPE": "redis",
    "CACHE_DEFAULT_TIMEOUT": 60 * 60 * 8,  # seconds
    "CACHE_KEY_PREFIX": "superset_results",
    "CACHE_REDIS_URL": f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}",
}

ENABLE_PROXY_FIX = True
PROXY_FIX_CONFIG = {"x_for": 1, "x_proto": 1, "x_host": 1, "x_port": 1, "x_prefix": 0}

FEATURE_FLAGS = {"DASHBOARD_NATIVE_FILTERS": True, "ENABLE_TEMPLATE_PROCESSING": True}


try:
    import superset_config_local
    from superset_config_local import *  # noqa

    logger.info(f"Loaded your Docker configuration at " f"[{superset_config_local.__file__}]")
except ImportError:
    logger.info("Using default Docker config...")
except Exception:
    logger.exception("error loading local config")

You can update this configuration file to include specific features which fits your need.

For example you can add the following variable:
MAPBOX_API_KEY = '' with API Key to provide maps into your dashboards.

You may also want to use a different authentication mechanism such as OAuth

You can create your own configuration file and override completely the one provided by Hydrolix, using hdxctl.

An example would be:

hdxctl files set hdxcli-xxxx superset_config.py superset

📘

Don't override the whole config

By default superset allows you to add extra functionality but you don't need to add the full config, you can add only the variable you need to set.

🚧

This doesn't update the cluster

The command only saves the configuration file into S3, to apply this configuration to Grafana you need to update your cluster to restart Grafana and uses the new settings.
hdxctl update hdxcli-abcde hdx-xyz

Setting up OAuth

This is an example setup using Google OAuth:

  • Go to https://console.developers.google.com/apis/credentials
  • Click Create Credentials, then click OAuth Client ID in the drop-down menu
  • Enter the following:
    • Application Type: Web Application
    • Name: Superset
    • Authorized JavaScript Origins: https://$host.hydrolix.live
    • Authorized Redirect URLs: https://$host.hydrolix.live:9088/oauth-authorized/google
  • Click Create
    • Copy the Client ID and Client Secret from the ‘OAuth Client’ modal
  • On your the host running HDXCTL create a new file like the following (superset_config.py):
from flask_appbuilder.security.manager import AUTH_OID, AUTH_REMOTE_USER, AUTH_DB, AUTH_LDAP, AUTH_OAUTH

CSRF_ENABLED = True
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = 'Admin'
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
    {
        'name': 'google',
        'icon': 'fa-google',
        'token_key': 'access_token',
        'remote_app': {
            'client_id' : 'XXXXXXXXXXXXXXX.apps.googleusercontent.com',
            'client_secret' : 'YYYYYYYYYYYYYYY',
            'api_base_url': 'https://www.googleapis.com/oauth2/v2/',
            'client_kwargs': {
                'scope': 'email profile'
            },
            'request_token_url': None,
            'access_token_url': 'https://accounts.google.com/o/oauth2/token',
            'authorize_url': 'https://accounts.google.com/o/oauth2/auth',
        }
    }
]

This will instruct superset to allow new signup users using the OAuth ID created in google cloud console, and the new users will be registered as Admin, you can change the role to something else by modifying the configuration.

  • Save your superset configuration file:
hdxctl files set hdxcli-xxxx superset_config.py superset
  • Update your cluster to take into account the new configuration file:
hdxctl update hdxcli-xxxx hdx-xyz

Getting more help

If you need more help with having your Superset and Hydrolix installations working together, or you'd just like to learn more about this integration, please contact Hydrolix support.


Did this page help you?