Skip to main content
Skip to main content
Edit this page

BigQuery and ClickHouse: equivalent concepts

The tables below map each BigQuery concept to its ClickHouse equivalent — what to use instead, and where the model differs. For function-by-function SQL syntax mapping, see the BigQuery → ClickHouse SQL translation reference. For the end-to-end migration walkthrough, see Migrating from BigQuery to ClickHouse Cloud.

Resource hierarchy

How the platform organizes accounts, logical containers for data, and where compute is provisioned.

BigQueryClickHouseNotes
OrganizationOrganizationRoot node of the hierarchy in both.
ProjectService (region-scoped); warehouse for grouping services with shared storage and independent computeA ClickHouse service is one storage + one compute pool. Use a warehouse to group services that share storage but scale compute independently.
DatasetDatabaseLogical container that organizes tables and scopes access.
FolderWarehouse grouping, or separate services per workloadClickHouse has no folder primitive — grouping is at the service / warehouse level.
IAM permissionsConsole roles plus SQL grantsTwo-layer access: roles in console.clickhouse.cloud plus SQL grants in the database. Console users can also be granted DB roles for SQL Console use.

Compute, capacity, pricing

How processing is allocated to a query, sized, and billed.

BigQueryClickHouseNotes
SlotReplica (whole node); queries parallelize across replicasA replica is the unit of compute in a ClickHouse service; queries run across all replicas of the service. See the callout below for the granularity difference with BigQuery slots.
Slot reservationVertical and horizontal autoscaling bounds; warehouses for workload isolationClickHouse uses bounds-based autoscaling rather than guaranteed-capacity reservations.
QuotasWorkload classes plus per-query limitsCovers memory, CPU, concurrency, and I/O scheduling. The two quota models don't map row-by-row, but concept-level coverage is similar.
On-demand pricing (per TB scanned)Compute-time (replica-hours) plus storage and transfer — see billing overviewThe two pricing models are not directly comparable.
Logical vs physical storage billingCompressed storage only — see billing overviewClickHouse Cloud bills compressed storage. The logical-vs-physical distinction does not apply.
Slot vs replica

A BigQuery slot is much finer-grained than a ClickHouse replica — it's closer to a CPU thread within a replica than to a whole replica. Both are the unit of compute being allocated to a query, but with very different sizing.

Storage and tables

How tables are stored: engines, schema, partitioning, snapshots, and access primitives.

In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and ORDER BY / PARTITION BY / TTL clauses configure physical layout and retention. Many BigQuery per-feature settings map to a clause in the ClickHouse CREATE TABLE statement.

BigQueryClickHouseNotes
TableMergeTree-family tableEngine choice determines storage and merge behavior — pick by access pattern (MergeTree for append-mostly facts, ReplacingMergeTree for upserts, AggregatingMergeTree for pre-aggregations).
Column schema modes (NULLABLE, REQUIRED, REPEATED)Nullable(T) for optional; omit for required; Array(T) for repeated; Array(Tuple(...)) or Nested for repeated recordsIn ClickHouse, columns are non-nullable unless wrapped with Nullable(T). Nullability has a small storage and query cost, so use it only when the column actually needs nulls.
Schema evolution (add / drop / modify columns)ALTER TABLE ... ADD / DROP / MODIFY COLUMNSame DDL surface as BigQuery. Many column changes are metadata-only.
PartitioningPARTITION BY clause on the tablePartitions are defined at table creation; a partition expression determines how rows are grouped into parts on disk.
ClusteringORDER BY columns in the table definitionDefined as part of the table; data is physically sorted on disk by the ORDER BY columns.
External tables / BigLakes3 / gcs / azureBlobStorage table functions for direct file access; Iceberg engine for open catalogsObject storage and open-table formats are read directly through these functions and engines. ClickHouse does not provide a unified-governance layer over external storage.
Object tables (SQL access to unstructured files)s3 / gcs table functions over binary formatsClickHouse treats unstructured-object access as a special case of external file reading via table functions, not as a dedicated table type.
Apache IcebergIceberg engine (read-only)Reads Iceberg tables stored in S3, Azure, HDFS, or local storage; writes are not supported. See the engine page for the current list of supported features.
Default table / partition / dataset expirationTTL clause on the table, column, or partitionBoth support automatic deletion of data older than a configured window. TTL can be set at table creation or via ALTER TABLE ... MODIFY TTL.
Table snapshotService-level backupSee callout below — granularity differs significantly.
Time travelPoint-in-time backup restore into a new serviceBackups are service-scoped, not table-scoped, so the restore unit is the whole service rather than a single table at a moment in time.
Authorized viewsView with SQL SECURITY DEFINER (runs with the view-owner's privileges)See CREATE VIEW for the syntax and the INVOKER / DEFINER / NONE modes.
Row-level securityRow policy — a WHERE-style expression evaluated per userRow policies apply transparently to every query against the table.
Wildcard tables (_TABLE_SUFFIX)Merge table engine (persistent grouping) or merge() function (inline)Same idea, different syntax. Merge is a persistent table-of-tables; merge() is inline without creating one.
Table cloneCREATE TABLE ... AS SELECT copy, or backup restore into a new serviceClickHouse has no copy-on-write primitive — every copy reads the source data fully.
Backups

ClickHouse Cloud backups are per-service. Restoring a backup creates a new service — a single table cannot be restored back into the original service. Plan accordingly if your current workflow relies on per-table snapshots.

Query model and performance

How queries run and are accelerated — indexes, materialized views, caches, and streaming inputs.

Query acceleration in ClickHouse comes from three layers: primary-key ordering (a sparse index over the on-disk sort order), secondary indexes on non-key columns, and materialized views — incremental or refreshable. The rows below map BigQuery's acceleration features onto these primitives.

BigQueryClickHouseNotes
Primary key (advisory)Primary key — drives the on-disk sort order and the sparse primary indexNeither system enforces uniqueness; the optimizer uses the key to prune granules, avoid re-sorts, and short-circuit LIMIT.
Foreign key (advisory)Wide tables or dictionaries for lookupsClickHouse doesn't accept foreign-key declarations even as advisory hints.
Search indexFull-text indexToken index over string columns.
Vector indexVector ANN indexApproximate nearest-neighbor lookups over embedding columns.
Materialized viewIncremental MV (updates on every insert) or refreshable MV (runs on a schedule)ClickHouse supports two MV models — see callout.
Scheduled queryRefreshable MV — runs the query on a schedule and maintains its result tableRefreshable MVs replace the scheduled-query-into-target-table pattern.
Streaming insertsNative INSERT over HTTP or the native protocol for direct ingest; ClickPipes for managed streamingClickPipes covers Kafka, Kinesis, Pub/Sub, MySQL, Postgres, and object storage.
Continuous queriesStreaming table engine (Kafka, Pub/Sub, etc.) feeding a materialized view that writes to a destination tableSame end-to-end model: ingest → transform → write.
Dry runEXPLAIN ESTIMATE — reports rows, parts, and marks the query would readOther EXPLAIN variants (PLAN, PIPELINE, SYNTAX) cover deeper plan inspection.
Federated queries (Spanner, Cloud SQL, AlloyDB)External OLTP attached via database engine (PostgreSQL, MySQL, MongoDB, SQLite)Distinct from external tables in object storage — these attach a live source so its tables are queryable directly.
Cached resultsQuery cacheBoth transparently reuse results of recently executed queries.
Sessions / multi-statement queriesPer-statement execution; multi-step state managed in the client or an orchestratorClickHouse has no per-session variables or shared state.

Secondary indexes

Indexes on non-primary-key columns, used when queries filter by columns outside the sort order:

  • Bloom-filter — equality lookups (=, IN)
  • Token-bloom — substring search on tokenized text
  • Minmax — range pruning by per-part min/max
Materialized view update model

ClickHouse has two MV models: incremental MVs update on every base-table insert (cost proportional to the insert) and refreshable MVs run on a schedule. BigQuery materialized views correspond to the refreshable model. Use incremental for high-throughput aggregations, refreshable for periodic snapshots.

SQL and functions

The query-language surface: SQL coverage, UDFs, and the built-in function library.

ClickHouse SQL covers the standard SELECT / JOIN / GROUP BY / window-function surface. Function-by-function mapping (date, JSON, string, regex, window) lives in the BigQuery → ClickHouse SQL translation reference; the rows below are concept-level only.

BigQueryClickHouseNotes
Standard SQLClickHouse SQL — same SELECT / JOIN / GROUP BY, with lambdas and aggregate combinators as additional language featuresCompatible at the level of basic SQL. Lambdas and combinators are the two extensions worth getting familiar with.
Aggregate functionsAggregate functions composable with combinators (-Array, -Map, -ForEach, -If, …)Combinators compose any aggregate with any input shape.
Array functions, UNNESTArray functions and lambdasCommon patterns: arrayFilter, arrayMap, arrayZip, arrayReduce.
SQL UDFsCREATE FUNCTION (SQL expression)Same model — function from a SQL expression.
JavaScript UDFsExecutable UDF shelling out to a Python, shell, or other scriptDifferent language and execution model, similar role.
Stored proceduresClient-side or orchestrator-side procedural logic (dbt, Airflow)ClickHouse has no procedural SQL.
Multi-statement transactionsPer-insert and per-DDL atomic guarantees; application-layer grouping for multi-write batchesMulti-statement transactions are on the roadmap.
Sketches (HLL, approximate quantiles)uniqHLL12, quantileTDigest, quantileDDSketch, and others — composable via -State/-Merge combinatorsApproximate aggregates that serialize as state and merge across queries.

Security and governance

Access control, encryption, masking, and network boundaries.

Authorized views and row-level security are listed under Storage and tables.

BigQueryClickHouseNotes
Policy tags / column-level access controlColumn-level grants on specific columns of a tableGrants apply at the column level. BigQuery's centralized taxonomy/policy-tag governance has no direct equivalent.
Data maskingViews, row policies, or function-based transforms — see data masking patternsNo column-mask primitive yet; patterns are SQL-level.
Customer-managed encryption keys (CMEK)CMEK on the serviceBYOK in AWS KMS, with rotation and revocation.
AEAD / SQL-level encryption functionsEncryption functions (encrypt / decrypt)Covers AES-128/256-CBC/GCM and AEAD modes.
Differential privacyExternal noise application, or via a UDFNo built-in differential privacy in ClickHouse.
VPC Service ControlsPrivateLink (AWS / Azure) and IP allowlists for ingress restrictionBoundary semantics are narrower than VPC SC.

Data sharing

Cross-organization data exchange and clean-room patterns.

BigQueryClickHouseNotes
Analytics Hub / data exchanges / listingsRead access to a shared database, or a dedicated service with consumer-specific row policiesClickHouse has no in-product data marketplace; sharing uses standard access primitives.
Data clean roomsRow policies and authorized views — assembled per use caseNo managed clean-room product.

Operations and ecosystem

Day-2 concerns: ingestion, ML/BI integration, observability, metadata, and disaster recovery.

ClickHouse surfaces operational state through system.* tables (queries, sessions, replication, parts, metrics) and the cloud console; managed ingestion is handled by ClickPipes; ML, BI, and notebook workflows are typically handled in external systems that read from ClickHouse.

BigQueryClickHouseNotes
BigQuery MLExternal training and serving (notebooks, Spark, Vertex AI, feature stores) reading from ClickHouse; see AI/ML in Cloud for managed-side featuresClickHouse has no in-database ML — the typical pattern is to use ClickHouse as the analytical store and run training elsewhere.
BI EngineDirect querying — ClickHouse is a column-oriented analytical engineClickHouse has no separate caching layer to provision for BI workloads; queries run against the storage engine directly.
OMNI / cross-cloud federated queryOne ClickHouse service per supported region where the data lives, with cross-region replication as neededPattern is one service per cloud, not federated queries across clouds.
Data sources / file formatsFile-format and connector libraryManaged connectors (ClickPipes) for sources like Kafka, Pub/Sub, MySQL, Postgres, and object storage; SQL table functions for ad-hoc reads of files in object storage.
Query jobs (ID, history, cancel)system.query_log and system.processes for inspection; KILL QUERY to cancelSame information, exposed through system tables instead of a job API.
INFORMATION_SCHEMANative system.* tables for ClickHouse-specific detail, or the ANSI information_schema views for tool compatibilityBoth surfaces available.
Data Transfer ServiceClickPipes — scheduled and streaming ingestion from SaaS, storage, and OLTP sourcesCovers the same scheduling and source-coverage role.
Audit logsCloud audit log and system tablesBoth systems log admin and query activity.
Change data capture ingestionClickPipes for Postgres, MySQL, or KafkaManaged CDC from OLTP and streaming sources into ClickHouse tables.
BigQuery Studio notebooks / BigQuery DataFramesJupyter with clickhouse-connect or another client libraryNo in-product notebook environment or pandas-compatible in-DB API; notebook-side libraries cover the same workflow.
Data Canvas / managed data preparationsSQL Console and ClickPipes; visual data-prep in an external orchestratorSQL Console is the UI counterpart; ClickPipes covers managed ingestion.
Gemini in BigQuery (SQL generation, code completion)Ask-AI button in docs and consoleLLM assistance is surfaced through Ask-AI; ClickHouse has no in-query assistant.
Knowledge Catalog / data lineage / data qualitysystem.* tables for metadata; external tools (dbt, DataHub) for lineage and qualityClickHouse exposes metadata via system tables rather than a managed catalog product.
Cross-region replication / managed disaster recoveryMulti-AZ HA within a region (automatic); cross-region replication via Replicated*MergeTree engines or the Enterprise tier's advanced DR featuresMulti-AZ HA is on by default within a region. Cross-region replication is configurable; latency between regions affects write performance.