Row-level Access Control

Implement data access controls on tables, using filter criteria attached to roles

Overview

Row-level access control is a core data security feature of the Hydrolix platform. Using row policies and roles, data administrators can implement granular rules for users accessing their datasets.

Data administrators use the Config API to configure row policies on a table and associate roles to accounts.

Changes take place immediately. The query system requests an account's combined row policies, which it then enforces by installing a ClickHouse row policy on the table engine reading the Hydrolix raw partitions.

Use cases

Row-level access controls are useful under a variety of circumstances. They

  • allow multiple distinct user populations to be served from the same table, each with a different row policy
  • decrease data ingestion complexity when data streams arrive from a single source, but serve user populations with distinct data access permissions
  • decrease management effort for table settings and transforms when tables are structurally identical, differing only in data access permissions
  • decrease application or query complexity for cross-table joins where multiple tables were formerly in use
  • support compatibility by retaining default visibility on all rows when no row policies are in effect

Concepts

Hydrolix supports a fully-featured account permissions system that allows definition of roles. Multiple roles can be assigned to users and service accounts, to describe rich access permissions.

Row-level access control is implemented by defining row policies and associating them with roles.

Hydrolix row policies

  • apply to a single table; consider raw and summary data tables, see Policy design
  • can be attached to multiple roles; this allows granular data access rules, see Roles and row policies
  • contain arbitrary SQL filters determining row visibility, see Filters
  • combine predictably and deterministically, see Combining row policies

When an authorized account initiates a query, the final expression of row policies for the account's UUID is installed into the query engine.

Only rows passing this final, combined filter expression are visible to this query.

Limitations

Enforcement occurs in the query subsystem of a Hydrolix cluster.

All query interfaces apply identical access control logic as defined in this page. See Search Tools for a listing of interfaces.

Applications outside the cluster with access to raw Hydrolix partitions aren't subject to row-level access controls. For example, this feature isn't implemented in the Hydrolix Connector for Apache Spark because the client has direct access to the raw partitions.

Policy design

When designing a policy to enforce data access rules, consider these three aspects of the feature:

  • Each table requires its own row policy definitions

    All defined filters are configured and enforced on a single table. Don't forget summary tables derived from a raw table. For multiple tables in the same project, define row policies for each.

  • Row policies combine across one or more roles

    All row policies associated with each role are grouped according to rules described in Combining row policies.

  • Principle of least privilege: read-only

    Assigning read-only permissions prevents opportunities for permission escalation. Here is a non-exhaustive list of techniques for row-level access permission circumvention. Avoid these by assigning read-only permissions and applying the principle of least privilege.

⚠️

Use of Hydrolix row-level access control works best and makes the most sense for read-only accounts.

Accounts with any of the following permissions could change the column definitions or contents of a table:

  • permissions to modify the existing table
  • permissions to modify transforms
  • permissions to run ALTER on existing tables
  • permissions to create summary tables
  • permissions to modify or delete row policies
  • permissions to change their own permissions

Roles are intentionally composable in an additive fashion to express rich access rules. Roles can be empty, contain either RBAC permissions or row policies, or both. Here are two policy strategies describing how to compose roles, permissions, and row policies in the context of positive permissions;

Policy strategy: Access control on sensitive tables

Accounts using the default read_only role include permissions to view contents of any project or table in the cluster.

A viable strategy is to install appropriate row policies on a specific table with sensitive data, but allow other access to continue as before.

The technique demonstrated in the Row-level Access How-to shows row policies in an otherwise empty role. This assumes the querying accounts also have the read_only role.

Policy strategy: Tightest access control

An alternate strategy takes advantage of the ability to construct an RBAC policy allowing permissions to a specific table. Applying that policy to a role with the desired row policies enforces tightest control.

This type of policy design limits data access in the Hydrolix cluster to the visible rows in the specified table.

Logical diagram

This diagram illustrates the results of combining two roles, each with row policies. One of the row policies, section_sports_eu, demonstrates two restrictive and two non-restrictive filters.

Roles and row policies

This approach enables flexible, multi-dimensional access control, ideal for environments where users need visibility across multiple departments or regions without compromising security boundaries.

When row policies from different roles are combined into the final row policy expression, they're joined with a SQL logical OR operation.

This allows roles to compose for greater access. See also Combining row policies.

Filters

Filters are arbitrary SQL expressions which evaluate to a boolean or a uint8.

A uint8 is converted to a boolean using toBool meaning that 0 is FALSE and anything else is TRUE.

A row is accessible if the filter evaluates to true.

If the filter evaluates to false for a row, that row is inaccessible to the user's query. The query system won't return the row from the Hydrolix raw partition data to the user's query.

When multiple row policies or multiple roles with row policies are in effect, the filters are combined. See Combining row policies.

Filter verification and constraints

When the Config API receives a request to create or modify a filter, it calls the core database engine table function hdx_verify_filter_expr with the candidate filter. This ensures that the database engine can enforce these constraints on the filter:

  • the primary column of a table can't be used in a filter
  • aggregation functions are forbidden: the filter must operate only on data in the row
  • non-aggregation functions are acceptable
  • alias columns can't be used, as they don't exist in the underlying partitons
  • when referring to renamed columns, only the current column name is valid
  • column names in the SQL filter must exist in the autoview, the union of all columns defined in all transforms

Combining row policies

Each Hydrolix row policy consists of a filter and the boolean restrictive field.

The following rules describe how row policies combine.

Row policies from a single role combine into an intermediate expression.

  • The filters from all restrictive row policies, restrictive=True, are joined with the SQL logical AND operator into a single expression
  • The filters from all non-restrictive row policies, restrictive=False, are joined with SQL logical OR operator into a single expression
  • The resulting non-restrictive and restrictive expressions are joined with SQL logical AND operator

Row policies from different roles combine into a final expression.

  • Intermediate policy expressions for each role are built
  • All intermediate expressions are joined with the SQL logical OR operator

The logical diagram illustrates the rules of row policy and role combination using a table similar to that shown in the Row-level Access How-to.

See also Policy design.

Query-time operation

When the query head receives a query over any interface, it uses the authenticated account's UUID to look up applicable row policies. It installs the final expression of the combined row policies for the account in question, then continues execution of the query, dispatching instructions to the query peers, and performing the final aggregation and sorting of the responses.

The query head refreshes its view of row policies using private Config API endpoint /config/v1/roles/row_policies/ on a frequency specified by the tunable user_acl_refresh_interval_secs, by default 30 seconds.

Query peers, responsible for reading the raw partitions, initialize the ClickHouse engine with true for the internal function setEnabledUsersWithoutRowPoliciesCanReadRows. This allows smooth forward and backward compatibility for tables without any row policies.

Related material