FeatureQL for the Impatient

This guide helps SQL practitioners understand FeatureQL by comparing it with familiar SQL concepts.

Core concepts

Features are functions

In SQL, we create one table from other tables. In FeatureQL, we define features as formulas based on other features.

Unlike tables, when we persist features with CREATE FEATURES AS, no data is evaluated. Only their definition is stored in the feature registry, similar to a view or function in SQL.

Think of features as database functions with automatic parameter detection. When we reference other features, they automatically become inputs to our formula.

Features are pure functions: their output depends only on their inputs. The exception is EXTERNAL functions, which fetch external data.

We can evaluate features by binding values to their inputs using the BIND functions.

Simple example step by step

Let's create a customer segment.

First, the minimal setup:

Note: We define in the WITH section the features that we do not want to return in the result set.

Now let's create some mockup external data:

We can now map features to existing table columns by using the EXTERNAL_SQL function:

Then finally add our business logic:

Feature organization

Catalog vs local definitions

Catalog features (persisted globally):

  • Defined with the CREATE FEATURES AS statement
  • Available everywhere in the organization
  • Namespaced (e.g., fm.finance.revenue, fm.marketing.campaign_roi)
  • Have granular permissions

Local features (in WITH or SELECT clause):

  • Defined in our query
  • Override catalog features if same name
  • Useful for prototyping

In practice, core features like CUSTOMER_ID, CUSTOMER_LAST_LOGIN_DATE have been created in the catalog by the data team with this:

So end-users can just write:

Entities and INPUT relationship

ENTITY() creates a semantic label for a data domain. INPUT() parameters can reference entities by type annotation:

CUSTOMERS := ENTITY(),
CUSTOMER_ID := INPUT(BIGINT#CUSTOMERS),  -- CUSTOMER_ID references the CUSTOMERS entity as a BIGINT key
sql

This clarify the nature of features, prevents mixing incompatible features (e.g., accidentally joining customer features with product IDs) and allow automated binding with ADD_FIELDS().

Working with features

Type system

  • Automatic type inference: Types are inferred for pure transformations
  • Explicit types at boundaries: Only INPUT() and EXTERNAL_XXX() need explicit type declarations
  • No coercion: Cast explicitly when needed with ::TYPE or CAST(x AS TYPE)
  • Advanced types: Use ARRAY, ROW and ARRAY(ROW(...)) all the time

Metaprogramming

@echo and @literal compute values at compilation time.

This is useful for computed literal parameters without having to use external ways of generating text such as jinja templates.

QUERY := @literal(`SELECT * FROM dim_customers WHERE date = '` || CURRENT_DATE() || `'`),
sql

Hybrid queries

Mix SQL and FeatureQL to integrate with existing SQL based clients.

Array of rows operations

FeatureQL promotes ECM (Entity Centric Modeling) that stores facts as arrays or arrays of rows within each dimension table (also known as One Big Table modeling).

FeatureQL has then strong support for ARRAY and ARRAY of ROWs that are the core data structures of ECM.

More details on Arrays of rows patterns .

Operations on fields

OperationSyntaxDescription
AccessORDERS[amount]Extract one field from an array of rows as a array of values (can also extract multiple fields as an array of ROWs)
ZipZIP(<array1>, <array2>)Zips two arrays as an array of rows (can also zip a single array to get a array of rows)
CombineARRAY_MERGE(<array_of_rows1>, <array_of_rows2>)Combines array of rows
UnwrapUNWRAP(<array_of_rows>)Unwraps an array of rows with one field as a array of values (convenient when the name of the field is not named)
TransformTRANSFORM(<array_of_rows>, <subquery>)Applies transformations to an array of rows (map, filter, aggregate...) as if it was a table
EnrichADD_FIELDS(<feature> TO <array_of_rows>)Adds features as fields to an array of rows by joining on an existing field

Note: Operations on fields are similar to operations on ROW except for ZIP and TRANSFORM.

Operations on rows

OperationSyntaxDescription
ExtractORDERS[1] or ORDERS[1:3]Get the first or first to third row as a array of rows
ConcatenateARRAY_CONCAT(<array_of_rows1>, <array_of_rows2>)Concatenates the rows of the two arrays of rows (we can also deduplicate on a field)

Note: Operations on rows are similar to operations on ARRAY.

Transform operations

You can apply a FeatureQL query to any array of rows as if it was a table (equivalent to a correlated subquery in SQL) using the TRANSFORM function.

This query can:

  • SELECT: generate new fields from existing fields
  • WHERE: filter pre or post aggregation
  • GROUP BY: aggregate metrics on one or more fields
  • ORDER BY: sort by any post-aggregation field
  • LIMIT/OFFSET: limit the post-aggregation result

The query must be self-contained (no external data references) and use only pure transformations. If you need external data, add it to your array first with ARRAY_MERGE() or ADD_FIELDS() before applying TRANSFORM(). You can also chain multiple TRANSFORM functions to apply transformations sequentially.

SQL equivalent comparison

Main patterns

What we wantSQLFeatureQL
Get dataSELECT * FROM customers WHERE id IN (1,2)CUSTOMER_DETAILS := EXTERNAL_SQL(...) then BIND_TABLE(ARRAY[1,2])
Join on PKSELECT * FROM t1 JOIN t2 ON t1.id = t2.idFeatures with same INPUT automatically align
Join on FKSELECT o.order_id, c.customer_name FROM orders o JOIN customers c ON o.order_customer_id = c.customer_idADD_FIELDS(CUSTOMER_NAME TO ZIP(ORDER_CUSTOMER_ID))[customer_name] gets attributes of the foreign key
Group bySELECT SUM(price) as sum_price GROUP BY order_countrySELECT SUM(price) GROUP BY order_country as sum_price - the group by is part of the feature definition

Key differences from SQL

  1. Primary key thinking: INPUT() represents your row identifier (like a primary key)
  2. No implicit behavior: No NULL coalescing, no type coercion
  3. Dependency tracking: Reference a feature = automatic dependency
  4. Nested data model: Arrays of rows instead of normalized tables
  5. Functional purity: Features are formulas, not stored data

Advanced patterns

Historical data and time travel

  • SCD_AT_TIME(): Get the state of a slowly changing dimension (SCD type 2) at a certain point in time
  • ACCUMULATION_AT_TIME(): Get the state of an accumulation of facts at a certain point in time

Testing and reusability

  • VARIANT(): Create feature variations (e.g., test different thresholds, swap sources, etc.)
  • MACRO(): Define reusable functions in FeatureQL
  • UDF(): Define reusable functions in the language of the backend database

Developer tools

SHOW FEATURES;                    -- Browse catalog
SHOW CREATE FEATURES <name>;      -- See definition
SHOW FUNCTION <function>;         -- Get function docs
SHOW DOCS;                        -- Browse documentation
sql
Last update at: 2025/11/06 07:00:15
Last updated: 2025-11-06 07:00:51