Data Access Policy Design

Implement data access controls using the composable features of row- and column-level access control. Use calculated columns for data masking.

This page describes how the features work together.

Design guidance⚓︎

When designing a policy to enforce data access rules, consider

Every table is distinct: Every single summary and raw table needs its own column and row-policies. For multiple tables in the same project, define row filters and blocked columns for each table.
Scope of permissions: Permissions can be scoped to tables, projects, organizations and global. For maximal control, apply the principle of least privilege.

Assigning scoped permissions, especially read-only permissions, prevents opportunities for permission escalation. See limitations.
Row policy merging rules: Row policies merge using a logical OR into a final filter. This final expression is the union of all intermediate filter expressions across multiple roles. See combining row policies.
Column policy merging rules: Column policies merge across multiple roles into a set of allowed columns. The allowed columns set is the complement of the intersection of all blocked columns across all roles. See combining column policies.
Data masking: Implement data masking using calculated columns to create a derived, masked column and column-level access control to block access to the original, unmasked column.
More roles mean more access: The Hydrolix access and permissions system is designed so that adding more roles to an account increases access.
Enforcement at query time: The query system periodically retrieves and reloads row policies from private Config API endpoint /config/v1/roles/row_policies/. When handling a query, it retrieves users permissions and column policies from private endpoint /config/v1/users/sqlperms/.
System boundaries: See Limitations

Strategies⚓︎

Data access features are intentionally designed to be composable and additive. Roles are the central object for applying and managing data access.

Adding more roles to an account increases access.

Data administrators design and construct roles from sets of permissions like select_sql, row policy filters, and blocked columns to implement data access controls.

This section describes two common data access strategies.

Strict composition⚓︎

Define table-scoped roles for each desired level of data access.

This strategy capitalizes on the additive nature of access permissions and depends on default-deny behavior. More roles and permissions means more access.

In this approach, the data administrator

creates a role with table-scoped permissions, minimally select_sql
creates row and column policies associated with the role
assigns the role to any accounts requiring this level of access
ensures none of these accounts have write or modification permissions; see the warning in limitations

Employ this strategy for clearest access controls over different kinds of tables and data. The clarity of controls is offset by a proliferation of roles.

Sensitive tables only⚓︎

Define a broader set of read permissions to all tables in a project or all projects in a cluster. Refine access controls by applying specific row and column policies to sensitive tables.

In this approach, the data administrator

creates a base role with broader scoped permissions, minimally select_sql on project or organization, even read_only global preset is possible
creates another role with table-scoped permissions, which should include select_sql
creates row and column policies associated with the latter role
assigns the role to any accounts requiring this level of access
ensures none of these accounts have write or modification permissions; see the warning in limitations

Accounts with these roles have general read access rights. For specific tables, policies constrain access rights.

Employ this strategy for simplest expression of access controls over similar tables and data.

The Row-level Access Control page demonstrates this approach with multiple row policies in an otherwise empty role which combine with a role like the global preset read_only.

Entity relationship diagrams⚓︎

Simplified entity relationship diagram⚓︎

erDiagram
  Account ||--o{ Role : can-contain-multiple
  Role ||--o{ Policy : can-contain-multiple
  Role ||--o{ RowPolicy : can-contain-multiple
  Role ||--o{ ColumnPolicy : can-contain-multiple
  ColumnPolicy ||--o{ Column : blocks-multiple
  RowPolicy ||--|| RowPolicyFilter : holds-multiple
  Policy ||--o{ ScopedPermission : can-contain-multiple

Full entity relationship diagram⚓︎

Full entity relationship diagram

erDiagram
  Account ||--o{ Role : holds
  Account {
    string name
    type User_or_Service
  }
  Role ||--o{ Policies : can-contain-multiple
  Role {
    int id
    string name UK
  }
  Role ||--o{ RowPolicies : can-contain-multiple
  Role ||--o{ ColumnPolicies : can-contain-multiple
  ColumnPolicies ||--o{ ColumnPolicy : can-contain-multiple
  ColumnPolicy {
    string name UK
    string table
  }
  ColumnPolicy ||--o{ Column : blocks
  Column {
    string name UK
  }
  RowPolicies ||--o{ RowPolicy : can-contain-multiple
  RowPolicy {
    string name UK
    string table
    string filter
    boolean restrictive
  }
  Policies ||--o{ ScopedPermission : can-contain-multiple
  ScopedPermission ||--|| Scope : contains
  Scope {
    uuid scope_id PK
    string scope_name
    string scope_type
  }
  ScopedPermission ||--o{ Permissions : contains
  Permissions ||--|| Permission : contains
  Permission {
    string name
  }

Limitations⚓︎

Enforcement occurs in the query system of a Hydrolix cluster.

All query interfaces apply identical access control logic as defined in this page. See Search Tools for a listing of interfaces.

Applications outside the cluster with access to raw Hydrolix partitions aren't subject to column- and row-level access controls and can't make use of the calculated columns feature. For example, these features aren't supported in the Hydrolix Connector for Apache Spark because the client has direct access to the raw partitions.

Hydrolix data access controls work best for read-only accounts.

Accounts with any of the following permissions could change column definitions, ingestion behavior, or contents of a table:

permissions to modify the existing table, transform, or views
permissions to run ALTER on existing tables
permissions to create summary tables
permissions to modify or delete column and row policies
permissions to change their own permissions