Top 10 Data Lakehouse Software in Australia for 2026

Australia verdict (TL;DR)

Verified 2026-05-27

Australia is a Databricks and Snowflake-led lakehouse market at ASX 200 and Tier 1 enterprise, with Snowflake particularly strong at Australian retail, telco, and financial services (CBA, NAB, Telstra, Coles, Woolworths) and Databricks dominant where ML and AI workloads sit alongside analytics. Canva, Atlassian, Afterpay (Block), and SafetyCulture set the Australian product-SaaS reference architecture; Canva is one of the largest Snowflake customers globally and Atlassian publishes substantial public detail on its Databricks lakehouse. Microsoft Fabric OneLake is growing fast in Australian Microsoft enterprise. AWS Sydney (ap-southeast-2) and Azure Australia East (Sydney) plus Australia Central (Canberra) are the residency defaults; both hold IRAP PROTECTED for Commonwealth and Defence scope. There is no Australian-headquartered lakehouse vendor of meaningful scale; the honest assessment is that this is a US-vendor-dominated category in Australia, mediated by strong Australian Macquarie Cloud Services and AC3 SI partner ecosystem.

Picks for Australia

ASX 200 enterprise SQL lakehouse: snowflake-lakehouse Default at CBA, NAB, Telstra, Coles, Woolworths-tier Australian enterprise. Canva is one of the largest Snowflake customers globally. AWS ap-southeast-2 Sydney with Polaris and native Iceberg.
Australian ML-heavy lakehouse: databricks-lakehouse Default where ML and AI workloads sit alongside analytics. Atlassian, REA Group, Xero adoption. Databricks Sydney engineering presence. Unity Catalog for APP and APP 8 residency lineage on Azure Australia East or AWS Sydney.
Australian Microsoft enterprise lakehouse: microsoft-fabric-onelake Fastest-growing in Australian M365 E5 enterprise (state governments, utilities, mid-market). OneLake on Azure Australia East Sydney and Australia Central Canberra; bundle economics inside M365.
Australian AWS-anchored product company: aws-lake-formation S3 Tables plus Glue Catalog on AWS ap-southeast-2 Sydney. IRAP PROTECTED-assessed; lowest friction for AWS-native Australian SaaS and scale-ups.
Australian GCP lakehouse: biglake BigQuery plus Iceberg on GCP australia-southeast1 Sydney. Common at Australian adtech, B2B SaaS, and select retail on GCP.
Commonwealth and Defence-classified lakehouse: apache-iceberg Self-hosted Iceberg on AWS Sydney or Azure Australia Central Canberra with IRAP PROTECTED assessment, where federation across multiple engines and full control of data plane matters more than vendor convenience.

Market context

How the data lakehouse market looks in Australia

Australia's lakehouse market in 2026 is led by Snowflake and Databricks at ASX 200 and Tier 1 enterprise, with strong product-SaaS reference architectures from Australian-origin technology companies setting market expectations. Canva (one of the largest Snowflake customers globally), Atlassian (substantial public detail on its Databricks lakehouse architecture), Afterpay (now Block), REA Group, SEEK, Xero (Wellington-headquartered but heavy Australian engineering presence), and SafetyCulture have been visible reference architectures that anchor Australian buyer expectations. The Atlassian heritage in particular has cultural influence on Australian tech procurement: buyers expect transparent pricing, fast time-to-value, and self-service evaluation, which Snowflake and Databricks both deliver more naturally than legacy DWH incumbents.

ASX 200 enterprise deployment is split: Snowflake is dominant at Australian retail (Coles, Woolworths, Wesfarmers), telco (Telstra, Optus, TPG), and financial services (CBA, NAB, Westpac, Macquarie analytics teams) where the SQL-first cloud DWH-to-lakehouse path matches existing skills; Databricks is dominant where ML and AI workloads sit alongside analytics (financial crime detection, fraud, supply chain optimisation). Microsoft Fabric OneLake is the fastest-growing platform in Australian Microsoft enterprise, particularly at state governments (NSW, Victoria, Queensland, WA agencies on M365 E5), utilities, and Mittelstand-equivalent ASX 300 mid-market.

The Australian SI partner ecosystem mediates a large fraction of lakehouse procurement: Deloitte Australia, EY Australia, PwC Australia, KPMG Australia, Accenture Australia at large enterprise; Mantel Group, Servian (now Cognizant), Eliiza, Versent, AC3, Macquarie Cloud Services, and Boab AI at the Australian-native specialist tier. Macquarie Cloud Services in particular provides Australian sovereign hosting for AWS-, Azure-, and self-managed workloads with Australian data centre residency, which matters for federal and state scope. Atlassian-origin influence on Australian tech procurement biases toward Snowflake and Databricks over Oracle Analytics, Teradata, or SAP Datasphere.

AWS Sydney (ap-southeast-2) and Azure Australia East (Sydney) plus Australia Central (Canberra) are the residency defaults; both hold IRAP PROTECTED-level assessment for Commonwealth and Defence scope. GCP australia-southeast1 Sydney and australia-southeast2 Melbourne hold OFFICIAL: Sensitive IRAP assessment. There is no Australian-headquartered lakehouse vendor of meaningful scale comparable to Databricks or Snowflake; the honest assessment is that this is a US-vendor-dominated category in Australia, mediated by strong Australian channel and SI partners.

Compliance & local rules

Privacy Act 1988 and the Australian Privacy Principles (APPs), administered by the OAIC: personal data ingested into a lakehouse is in scope; APP 8 (cross-border disclosure) requires the disclosing entity to take reasonable steps to ensure overseas recipients comply with the APPs or remain accountable for breaches, which drives Australian preference for AWS Sydney, Azure Australia East and Central, and GCP Sydney residency. Notifiable Data Breaches scheme (NDB, in force since 2018): eligible breaches must be notified to the OAIC and affected individuals as soon as practicable; the 2024-2025 Privacy Act reform trajectory has shifted practical expectations toward 72-hour notification windows for serious breaches, which makes lineage and audit logging features (Unity Catalog, Polaris, Glue Catalog audit logs) procurement-relevant. IRAP (Information Security Registered Assessors Program) at the PROTECTED classification: lakehouse used on Commonwealth-classified scope must run on IRAP PROTECTED-assessed infrastructure; AWS Sydney and Azure Australia Central Canberra hold PROTECTED-level IRAP; Databricks, Snowflake, AWS Lake Formation and S3 Tables, Microsoft Fabric OneLake, and BigLake inherit underlying cloud IRAP status but project teams must validate scope at procurement, particularly the data plane and any control plane components that may sit outside the Australian region. ACSC Essential Eight: the ACSC cyber baseline expected of Commonwealth and most state agency suppliers; SSO via SAML or OIDC, MFA, application control, and patching maturity are procurement checks. Security of Critical Infrastructure (SOCI) Act amendments 2022 and 2023: where the lakehouse touches designated critical infrastructure (energy, water, transport, communications, financial services, food and grocery, defence industry, higher education and research, health care and medical, space technology), the asset owner has Risk Management Program (CIRMP) obligations that propagate to lakehouse supplier contracts. Consumer Data Right (CDR) administered by ACCC and OAIC: open banking, open energy, and emerging open finance data flows into and out of the lakehouse must comply with CDR data minimisation, consent, and accredited data recipient rules; relevant for Australian banks, fintechs, energy retailers, and aggregators. APRA CPS 234 Information Security and CPS 230 Operational Risk Management: regulated APRA entities (banks, insurers, super funds) must meet specific information security and operational resilience standards for cloud arrangements; APRA expects board-level oversight and material service provider notification. ASIC market integrity rules for capital markets data. ATO Single Touch Payroll (STP) and Standard Business Reporting integration for finance lakehouse use cases. Modern Slavery Act 2018: applies to vendor procurement disclosure for entities over AUD 100m revenue; all major lakehouse vendors publish Modern Slavery statements.

At a glance

Quick comparison, ranked for Australia

Product	Best for	Starts at	10-emp/mo*	G2	Geo
2 Snowflake + Polaris Catalog	Mid-market through global enterprise	$0	$0	4.5	Global
1 Databricks Lakehouse Platform	Mid-market through global enterprise	$0	$0	4.5	Global
5 Microsoft Fabric OneLake	Microsoft-anchored mid-enterprise through global enterprise	$263	$263	4.4	Global
3 AWS Lake Formation + Iceberg	AWS-anchored teams of any size	$0	$0	4.2	Global
4 Google BigLake	GCP-anchored teams of any size	$0	$0	4.4	Global
6 Apache Iceberg	Engineering-led teams of any size	$0	$0	4.6	Global
7 Delta Lake	Engineering-led teams, Databricks-anchored	$0	$0	4.5	Global
9 Dremio	Engineering-led lakehouse teams	Quote	-	4.4	Global
10 Starburst	Engineering-led federated query teams	$0	$0	4.4	Global
8 Apache Hudi + Onehouse	Streaming-first engineering teams	$0	$0	4.3	Global

*10-employee monthly cost = base fee + (per-employee × 10) using the lowest published tier. For opaque-pricing vendors, no value is shown.

Verified local pricing

What buyers in Australia actually pay

Median annual deal size by employee band, in AUD. Crowdsourced from anonymized buyer disclosures.

Product	Employee band	Median annual (AUD)	Sample	Notes
Snowflake + Polaris Catalog	ASX 200 enterprise 500-5,000 employees	A$280,000	38	Enterprise tier with Iceberg via Polaris; AWS Sydney; AUD billing via Snowflake Australia; multi-year capacity commitment common
Databricks Lakehouse Platform	ASX 200 enterprise 500-5,000 employees	A$320,000	31	Azure Australia East or AWS Sydney; DBU consumption; AUD via Databricks Australia or Azure EA; Unity Catalog standard
Microsoft Fabric OneLake	AU M365 E5 enterprise 500-5,000 employees	A$180,000	42	F64 Fabric capacity inside M365 E5 bundle; Azure Australia East Sydney; AUD via Microsoft Australia
AWS Lake Formation + Iceberg	AU SaaS or scale-up 100-1,000 employees	A$110,000	26	Combined Glue plus Lake Formation plus S3 Tables plus Athena; AWS Sydney; AUD billed
Google BigLake	AU GCP-native scale-up 100-1,000 employees	A$95,000	22	BigQuery editions plus Cloud Storage; GCP australia-southeast1 Sydney; AUD billed via GCP Australia

Local challengers

Australia-built or Australia-strong vendors worth knowing

Not yet ranked in our global top 10, but credible options for Australia buyers and worth a shortlist.

No Australian-headquartered lakehouse vendor of meaningful scale

The Australian lakehouse market is dominated by US-headquartered vendors (Databricks, Snowflake, AWS, Microsoft, Google) consumed via Australian-region deployments. Honest assessment for the category.

Canva (Sydney) as Snowflake reference architecture

Visit ↗

Canva is one of the largest Snowflake customers globally; published Snowflake reference architecture has influenced Australian product-SaaS lakehouse design. Not a vendor, a reference customer that shapes market expectations.

Atlassian (Sydney) as Databricks reference architecture

Visit ↗

Atlassian publishes substantial detail on its Databricks lakehouse architecture; the Australian-origin Atlassian engineering culture has cultural influence on Australian tech procurement biasing toward modern cloud platforms over legacy DWH incumbents.

Macquarie Cloud Services (Sydney)

Visit ↗

Sydney-headquartered Australian cloud hosting provider with sovereign Australian data centre residency. Common substrate for federal and state government cloud workloads and self-managed Iceberg or Delta deployments where Commonwealth-only residency matters.

Mantel Group, Eliiza, Versent, AC3, Boab AI (Australian data SI partners)

Australian-native data and AI consulting partners that mediate a large fraction of Databricks, Snowflake, and Fabric lakehouse implementations at ASX 200 and state government scope. Not lakehouse vendors but the practical Australian implementation channel.

The Australia ranking

All 10, ranked for Australia

Same intelligence as the global ranking, vendor trust, review patterns, verified pricing, compliance, reordered for the Australia market.

Snowflake + Polaris Catalog

Cloud-neutral managed lakehouse with native Iceberg and open-sourced Polaris Catalog.

Founded 2012 · Bozeman, MT · public · 200-100,000+ employees

G2 4.5 (680)

Capterra 4.5

From $0 /mo

◐ Partial disclosure

Visit Snowflake + Polaris Catalog

Snowflake (NYSE:SNOW) made a genuine strategic shift toward open lakehouse architecture in 2024: native Iceberg tables reached read/write parity with internal tables, and the Polaris Catalog was open-sourced in Jun 2024 as an Apache Iceberg REST catalog implementation. The honest reading is that this is a real bet on Iceberg interop, partly defensive against Databricks-on-Delta and partly offensive into the open-format buyer segment. The trade-off: whether enterprise customers actually benefit depends on which catalog they pick, and Snowflake credit-based pricing remains easy to overspend without governance. Best fit for SQL-first enterprises wanting open format with managed SaaS.

Best for

Cloud-neutral enterprises (500+ employees) wanting lakehouse semantics in Iceberg without operating a separate engine, with a strong preference for managed SaaS and SQL workloads.

Worst for

Heavy AI/ML training shops (Databricks better), single-cloud teams that could just use BigLake or Lake Formation, or buyers who reject credit-based pricing.

Strengths

Native Iceberg tables GA with read/write parity to internal tables
Polaris Catalog open-sourced Jun 2024 as Apache Iceberg REST catalog
Cloud-neutral: native on AWS, Azure, GCP
Snowpark for Python/Java/Scala in-lakehouse processing
Strong governance, masking, and row-level security

Weaknesses

Credit-based pricing easy to overspend without strict governance
External Iceberg catalogs require careful planning; performance trade-offs vs internal tables
May 2024 customer credential incident still discussed in deals

Pricing tiers

partial

Standard

On-demand $2/credit; storage $23/TB/month compressed

$0 /mo
Enterprise

On-demand $3/credit; multi-cluster warehouses, masking

$0 /mo
Business Critical

On-demand $4/credit; HIPAA, PCI, customer-managed keys

$0 /mo
Virtual Private Snowflake (VPS)

Dedicated metadata service for regulated industries

Quote

Watch for

· Compute credit overruns from un-suspended warehouses
· External Iceberg query has different perf characteristics than internal tables
· Cross-region data egress

Key features

+Native Iceberg tables (managed and external)
+Polaris Catalog (open-source Apache Iceberg REST catalog)
+Snowpark for Python/Java/Scala
+Time Travel and Zero-Copy Cloning
+Snowpipe streaming ingestion
+Secure Data Sharing and Marketplace

400+ integrations

dbtFivetranTableauPower BIApache IcebergAirbyteHightouch

Geography

Global

View full Snowflake + Polaris Catalog intelligence profile → Compare Snowflake + Polaris Catalog →

Databricks Lakehouse Platform

Delta Lake-native lakehouse with Unity Catalog and Mosaic AI; Iceberg-aware after Tabular acquisition.

Founded 2013 · San Francisco, CA · private · 200-100,000+ employees

G2 4.5 (580)

Capterra 4.6

From $0 /mo

◐ Partial disclosure

Visit Databricks Lakehouse Platform

Databricks is the enterprise lakehouse leader, unifying data engineering, analytics, and ML/AI training on Delta Lake + Unity Catalog. The Jun 2024 acquisition of Tabular (the Iceberg-creator-led startup) for a reported $1B+ creates obvious tension because Databricks is the lead maintainer of Delta Lake, the rival format to Iceberg; the public position is that Databricks will support both via Delta UniForm and through ongoing Iceberg contribution. Last private valuation was $43B in Sept 2023 (reported $62B in subsequent rounds), with a 2026 IPO widely expected but not confirmed. Trade-offs: DBU pricing complexity, and SQL-only buyers often find Snowflake simpler.

Best for

Mid-market and enterprise data teams (200-50,000 employees) running serious ML training plus analytics, where lakehouse governance and AI workflow integration matter more than pure SQL simplicity.

Worst for

SQL-only BI shops (Snowflake or BigQuery simpler), Iceberg-purist buyers wary of Databricks owning Delta Lake, or small teams without dedicated data engineering.

Strengths

Delta Lake as the open default plus Delta UniForm Iceberg interop
Unity Catalog unifies governance across analytics, ML, and lakehouse tables
Best-in-class for AI/ML training and feature engineering via Mosaic AI
Tabular acquisition brought Iceberg-creator engineering talent in-house
Photon vectorized engine narrows SQL gap to dedicated warehouses

Weaknesses

DBU pricing complexity, plus separate cloud infra costs charged by hyperscaler
Delta vs Iceberg neutrality is contested given Databricks owns Delta Lake project
Unity Catalog migration painful for legacy Hive metastore customers

Pricing tiers

partial

Standard (Jobs)

From $0.15/DBU; basic Spark workloads

$0 /mo
Premium

From $0.40/DBU; SQL warehouses, Unity Catalog, audit logs

$0 /mo
Enterprise

From $0.65/DBU; HIPAA, PCI, customer-managed keys

$0 /mo
Mosaic AI Model Training

Foundation model training and serving; custom quote

Quote

Watch for

· Cloud infra (EC2/Azure VMs/GCE) billed by hyperscaler, not Databricks
· Photon premium DBU multiplier on SQL warehouses
· Mosaic AI inference and training billed separately

Key features

+Delta Lake (open table format)
+Delta UniForm (Iceberg metadata interop)
+Unity Catalog governance
+Photon vectorized SQL engine
+Mosaic AI (training, fine-tuning, serving)
+Lakehouse Federation across S3/ADLS/GCS
+Delta Sharing (open data sharing protocol)

350+ integrations

dbtFivetranTableauPower BIHugging FaceLangChainApache Iceberg

Geography

Global

View full Databricks Lakehouse Platform intelligence profile → Compare Databricks Lakehouse Platform →

Microsoft Fabric OneLake

Microsoft unified lakehouse store: Delta-native, with Iceberg via shortcuts and Power BI bundle economics.

Founded 2023 · Redmond, WA · public · 500-100,000+ employees

G2 4.4 (380)

Capterra 4.4

From $263 /mo

◐ Partial disclosure

Visit Microsoft Fabric OneLake

OneLake is the unified data lake layer underneath Microsoft Fabric, announced in May 2023 as part of Microsoft Fabric and using Delta Lake as the native open format. The 2024-2025 additions of OneLake shortcuts to Iceberg tables (in S3, ADLS, and elsewhere) and the broader Fabric Iceberg interop make OneLake the closest thing to a multi-format lakehouse store from Microsoft. The honest framing: OneLake wins deals through Power BI Premium bundle pricing and Microsoft 365 procurement leverage, not because the underlying engine is best-in-class. Capacity Unit (CU) pricing complexity remains the main cost-forecasting issue.

Best for

Microsoft 365 + Power BI Premium-anchored enterprises (500-100,000+ employees) where Fabric capacity comes effectively-free with existing M365 E5 commitments.

Worst for

Non-Microsoft-anchored teams, organizations rejecting Capacity Unit pricing, or buyers wanting best-in-class engine performance over bundle economics.

Strengths

OneLake as Delta Lake-native unified analytics store
OneLake shortcuts allow read of Iceberg tables in S3, ADLS, elsewhere
Power BI Premium bundle, often effectively-free with E5 commitments
Copilot integrated across the Fabric suite
One SKU covers lakehouse + warehouse + BI + ETL + real-time

Weaknesses

Wins on bundle economics, not core engine quality
Capacity Unit (CU) pricing complexity
Iceberg support via shortcuts is read-mostly vs full lakehouse semantics

Pricing tiers

partial

F2 (smallest)

2 CU; pay-as-you-go

$263 /mo
F64

64 CU; common mid-size enterprise capacity

$8400 /mo
F2048

2,048 CU; very large enterprise capacity

$269000 /mo
Bundled with Power BI Premium

F64 effectively included with P1 commitments at many enterprises

Quote

Watch for

· OneLake storage billed separately at ADLS rates
· Cross-region data egress
· Mirroring usage can spike CU consumption

Key features

+OneLake (Delta Lake-native unified store)
+OneLake shortcuts (Iceberg read in S3/ADLS)
+Fabric Lakehouse (Spark + SQL endpoint)
+Fabric Warehouse (T-SQL warehouse)
+Power BI native integration
+Copilot in Fabric
+Mirroring (Snowflake, Cosmos, Azure SQL)

250+ integrations

Power BIMicrosoft 365Azure MLApache Iceberg (via shortcuts)Snowflake (mirroring)Delta Lake

Geography

Global

View full Microsoft Fabric OneLake intelligence profile → Compare Microsoft Fabric OneLake →

AWS Lake Formation + Iceberg

AWS-native lakehouse: Glue Catalog, Lake Formation governance, and S3 Tables for Iceberg.

Founded 2018 · Seattle, WA · public · 50-100,000+ employees

G2 4.2 (140)

Capterra 4.2

From $0 /mo

● Transparent pricing

Visit AWS Lake Formation + Iceberg

AWS Lake Formation is the AWS-native lakehouse governance layer over S3, with AWS Glue Data Catalog as the metadata store and Lake Formation managing fine-grained access controls. The 2024 Re:Invent S3 Tables announcement made Iceberg a first-class S3 bucket type, removing the need for a separate Iceberg metastore for many AWS-native pipelines. The lakehouse engines on top are Athena, EMR, Redshift Spectrum, and Glue ETL. Strengths: deep AWS integration, IAM-native access, and Iceberg-native S3. Trade-offs: best-fit narrows sharply when not AWS-anchored, governance UX is more workmanlike than Unity Catalog, and pricing fragments across Glue, Lake Formation, S3 Tables, and the chosen query engine.

Best for

AWS-anchored organizations (any size) where S3 is already the data plane and the team wants to add Iceberg + governance without leaving AWS.

Worst for

Multi-cloud or non-AWS teams, organizations wanting a single integrated lakehouse vendor (Databricks or Snowflake), or buyers wanting opinionated governance UX.

Strengths

Iceberg-native S3 Tables (2024 GA) removes need for separate metastore
AWS Glue Data Catalog as the metadata layer with broad AWS integration
Lake Formation fine-grained access on rows, columns, and tags
IAM-native authentication and tag-based access control
Query engine flexibility: Athena, EMR, Redshift Spectrum, Glue ETL

Weaknesses

Best-fit narrows sharply when not AWS-anchored
Governance UX more workmanlike than Unity Catalog or Polaris
Pricing fragments across Glue, Lake Formation, S3 Tables, query engine

Pricing tiers

public

Glue Data Catalog

$1/100k objects/month; first 1M free

$0 /mo
Lake Formation

No additional charge; underlying services billed separately

$0 /mo
S3 Tables

Storage at S3 standard rates; per-request fees

$0 /mo
Athena (query)

$5/TB scanned; or capacity reservation

$0 /mo

Watch for

· S3 Tables compaction and maintenance request fees
· Glue ETL DPU consumption
· Cross-region data egress
· Athena/EMR/Redshift Spectrum billed separately as compute

Key features

+AWS S3 Tables (Iceberg-native S3 buckets)
+AWS Glue Data Catalog
+Lake Formation fine-grained access controls
+Tag-based access control
+Cross-account data sharing
+Native Apache Iceberg support
+Integration with Athena, EMR, Redshift Spectrum

200+ integrations

Apache IcebergAthenaEMRRedshiftGlue ETLQuickSightSageMaker

Geography

Global

View full AWS Lake Formation + Iceberg intelligence profile → Compare AWS Lake Formation + Iceberg →

Google BigLake

BigQuery engine over open table formats: Iceberg, Hudi, and Delta on Cloud Storage.

Founded 2022 · Mountain View, CA · public · 50-100,000+ employees

G2 4.4 (110)

Capterra 4.4

From $0 /mo

● Transparent pricing

Visit Google BigLake

BigLake is Google Cloud lakehouse layer that lets BigQuery (and other GCP engines including Dataproc Spark and Dataflow) query Apache Iceberg, Apache Hudi, and Delta Lake tables on Cloud Storage with the same governance model as native BigQuery tables. The fit: GCP-anchored teams who already use BigQuery as the analytics engine and want to add lakehouse semantics over open formats without operating a separate platform. Strengths: tightest integration with BigQuery, Looker, and Vertex AI; native Iceberg, Hudi, and Delta support; and serverless query economics. Trade-offs: best-fit narrows sharply when not GCP-anchored, and cross-cloud egress economics favor staying inside GCP.

Best for

GCP-anchored organizations (any size) wanting lakehouse semantics on Iceberg/Hudi/Delta with BigQuery as the primary engine, plus tight Looker and Vertex AI integration.

Worst for

Multi-cloud or AWS/Azure-anchored organizations, teams that need a single integrated lakehouse vendor across clouds, or buyers without existing BigQuery investment.

Strengths

Native Iceberg, Hudi, and Delta Lake support on Cloud Storage
Same governance model as BigQuery (Policy Tags, BigQuery IAM)
BigQuery serverless query economics extend to open tables
BigQuery Omni for cross-cloud query against AWS S3 and Azure
Tight integration with Vertex AI and Looker

Weaknesses

Best-fit narrows sharply when not GCP-anchored
Cross-cloud egress economics favor staying inside GCP
External table query has different perf characteristics than native BigQuery

Pricing tiers

public

On-demand

$6.25/TB scanned on BigQuery; Cloud Storage at standard rates

$0 /mo
BigQuery Editions Standard

$0.04/slot-hour; capacity reservations

$0 /mo
BigQuery Editions Enterprise

$0.06/slot-hour; CMEK, VPC-SC

$0 /mo
BigQuery Editions Enterprise Plus

$0.10/slot-hour; cross-region replication

$0 /mo

Watch for

· Cloud Storage class tiering
· BI Engine memory reservation
· Cross-region or cross-cloud egress

Key features

+Native Apache Iceberg, Hudi, and Delta Lake support
+BigQuery engine over Cloud Storage tables
+BigLake Metastore (Iceberg-compatible)
+BigQuery Omni cross-cloud query
+Policy Tags for column-level access
+Vertex AI integration

200+ integrations

LookerVertex AIdbtFivetranApache IcebergApache HudiDelta Lake

Geography

Global

View full Google BigLake intelligence profile → Compare Google BigLake →

Apache Iceberg

The winning open table format of 2025-2026 by hyperscaler buy-in.

Founded 2017 · Distributed (originated at Netflix) · public · 50-100,000+ employees

G2 4.6 (90)

From $0 /mo

● Transparent pricing

Visit Apache Iceberg

Apache Iceberg is the open table format originated at Netflix in 2017, donated to the Apache Software Foundation, and now the de facto winner of the open-table-format war in 2025-2026 on the strength of hyperscaler buy-in. AWS (S3 Tables, Athena, EMR, Redshift), Google (BigLake, BigQuery), Microsoft (Fabric via shortcuts), and Snowflake all support Iceberg as a first-class format. Databricks acquired Tabular (the company founded by Iceberg creators Ryan Blue and Daniel Weeks) in Jun 2024 for a reported $1B+, which brought core Iceberg engineering talent into the Delta Lake-stewarding company; the public position is dual-format support. The honest read: pick Iceberg unless you are deep on Databricks.

Best for

Engineering-led organizations of any size committing to open-format lakehouse architecture, particularly multi-engine or multi-cloud teams who want to avoid table-format lock-in.

Worst for

Teams deep on Databricks where Delta Lake is the path of least resistance, or shops that prefer fully managed lakehouse SKUs over assembling components.

Strengths

De facto open-table-format winner by hyperscaler buy-in
ACID transactions, time travel, schema evolution, hidden partitioning
Iceberg REST catalog spec standardized (Polaris, Nessie, Glue support it)
Vendor-neutral by design and Apache-governed
Strong contributor diversity across AWS, Apple, Netflix, Stripe, Tabular

Weaknesses

Catalog choice (Polaris, Unity, Glue, Nessie) is the real lock-in decision
Maintenance operations (compaction, snapshot expiry) require operational discipline
Tabular acquisition by Databricks creates uncertainty about long-term neutrality

Pricing tiers

public

Apache Iceberg

Apache 2.0; unlimited use; community support

$0 /mo
Commercial managed offerings

Snowflake Polaris, AWS S3 Tables, Tabular (Databricks), Dremio, Cloudera, Onehouse

Quote

Key features

+ACID transactions on object storage
+Time travel and snapshot isolation
+Schema evolution (add, drop, rename columns)
+Hidden partitioning and partition evolution
+Iceberg REST catalog spec
+Multi-engine read/write (Spark, Trino, Flink, Presto, Snowflake, BigQuery)

50+ integrations

SnowflakeDatabricksAWS GlueBigLakeTrinoSparkFlinkDremioStarburst

Geography

Global

View full Apache Iceberg intelligence profile → Compare Apache Iceberg →

Delta Lake

Databricks-led open table format with Iceberg interop via Delta UniForm.

Founded 2019 · Distributed (Databricks-stewarded) · public · 50-100,000+ employees

G2 4.5 (60)

From $0 /mo

● Transparent pricing

Visit Delta Lake

Delta Lake is the open table format created at Databricks, open-sourced under the Linux Foundation in 2019, and the native format for the Databricks Lakehouse Platform. It remains strong inside Databricks (Unity Catalog assumes Delta as the default) and has hedged for the Iceberg-dominant 2025-2026 landscape via Delta UniForm (2024), which writes Iceberg metadata in parallel so external engines can read Delta tables as if they were Iceberg. The honest framing: if Databricks is your primary engine, Delta is the right format; if you want format neutrality across hyperscalers, Iceberg is winning. Microsoft Fabric OneLake also uses Delta natively, which keeps Delta relevant outside Databricks.

Best for

Organizations standardized on Databricks or Microsoft Fabric where Delta is the path of least resistance, with Delta UniForm available for occasional Iceberg interop.

Worst for

Multi-engine shops choosing one format, or organizations on AWS/GCP-native lakehouse stacks where Iceberg has stronger first-party support.

Strengths

Native format for Databricks Lakehouse and Microsoft Fabric OneLake
Delta UniForm writes Iceberg metadata for cross-engine read
Mature ecosystem inside Databricks and Spark
ACID transactions, time travel, schema evolution
Delta Sharing as open data sharing protocol

Weaknesses

Hyperscaler buy-in (AWS, GCP) is weaker than for Iceberg
Databricks-led project governance raises neutrality questions for non-Databricks shops
Delta UniForm Iceberg interop is one-way (write Delta, read Iceberg) at most engines

Pricing tiers

public

Delta Lake

Apache 2.0; unlimited use; community support

$0 /mo
Commercial managed

Databricks, Microsoft Fabric, Onehouse all offer managed Delta

Quote

Key features

+ACID transactions on object storage
+Time travel and version control
+Schema evolution and enforcement
+Delta UniForm (Iceberg metadata interop)
+Delta Sharing (open data sharing protocol)
+Native Databricks and Microsoft Fabric integration

40+ integrations

DatabricksMicrosoft FabricApache SparkTrinoPrestoApache Flink

Geography

Global

View full Delta Lake intelligence profile → Compare Delta Lake →

Dremio

Lakehouse-native query engine on Iceberg with Project Nessie Git-for-data catalog.

Founded 2015 · Santa Clara, CA · private · 100-5,000+ employees

G2 4.4 (95)

Capterra 4.4

Custom quote

◐ Partial disclosure

Visit Dremio

Dremio is the lakehouse-native query engine purpose-built for SQL on Apache Iceberg tables in S3/ADLS/GCS, with Project Nessie as the Git-for-data catalog. The fit: teams that want to separate storage from compute vendor, run their data in Iceberg in their own object store, and use Dremio as the engine without committing to Databricks or Snowflake compute. Series E $410M raised in Jan 2022 at $2B+ valuation; no significant funding rounds publicly disclosed since. Strengths: Iceberg-first engineering, Nessie data versioning, and reflections (acceleration layer) for sub-second BI. Trade-offs: smaller market presence than Databricks/Snowflake, narrower ecosystem.

Best for

Engineering-led teams (100-5,000 employees) committing to Iceberg lakehouse architecture who want to separate storage from compute vendor and use a query engine outside the Databricks/Snowflake duopoly.

Worst for

Buyers wanting fully managed integrated lakehouse + ML platform (Databricks), heavy AI/ML training shops, or teams without dedicated data engineering capacity.

Strengths

Iceberg-first lakehouse query engine
Project Nessie for Git-for-data versioning and branching
Reflections (materialized view acceleration) for sub-second BI
Apache Arrow-based engine with strong query performance
Bring-your-own-cloud and bring-your-own-object-store model

Weaknesses

Smaller market presence than Databricks or Snowflake
No significant funding round publicly disclosed since 2022
Narrower BI and partner ecosystem than the leaders

Pricing tiers

partial

Cloud Standard

Managed Dremio on AWS/Azure; usage-based

Quote
Cloud Enterprise

Advanced governance, SSO, dedicated support

Quote
Software (self-hosted)

On-prem or BYOC; subscription-based

Quote

Key features

+Iceberg-native query engine
+Project Nessie (Git-for-data catalog)
+Reflections (materialized view acceleration)
+Apache Arrow-based execution
+Lakehouse semantics: ACID, time travel, branching
+SQL over S3/ADLS/GCS

80+ integrations

Apache IcebergProject NessieTableauPower BIdbtAWS S3ADLS

Geography

Global

View full Dremio intelligence profile → Compare Dremio →

#10

Starburst

Managed Trino with multi-format lakehouse support and Stargate federation.

Founded 2017 · Boston, MA · private · 100-10,000+ employees

G2 4.4 (115)

Capterra 4.5

From $0 /mo

◐ Partial disclosure

Visit Starburst

Starburst is the commercial company behind Trino (the open-source distributed SQL query engine, formerly PrestoSQL), offering Starburst Galaxy (SaaS) and Starburst Enterprise (self-hosted) as managed Trino with multi-format lakehouse support (Iceberg, Delta, Hudi) and Stargate federation across data sources. Series D $250M raised in Feb 2022 at $3.35B valuation; no major funding round publicly disclosed since. Strengths: federated query across lakehouse plus operational data sources (Postgres, MySQL, Mongo, etc.), Trino community heritage, and multi-format support. Trade-offs: smaller than Databricks/Snowflake, primary value is federation rather than being a one-stop lakehouse.

Best for

Engineering-led teams (100-10,000 employees) with federation requirements across lakehouse plus operational data sources, who value Trino open-source heritage and multi-format support.

Worst for

Buyers wanting fully managed integrated lakehouse + ML platform (Databricks), or teams that only need single-format Iceberg query (Dremio or BigLake fit).

Strengths

Managed Trino with multi-format support (Iceberg, Delta, Hudi)
Stargate federation across 50+ data sources (lakehouse plus operational)
Strong open-source Trino heritage and community
Bring-your-own-cloud and BYO-object-store model
Galaxy SaaS plus self-hosted Enterprise options

Weaknesses

Smaller than Databricks/Snowflake on managed enterprise share
No major funding round disclosed since Feb 2022 ($3.35B valuation)
Primary value is federation; not a one-stop lakehouse platform

Pricing tiers

partial

Galaxy Free

Limited cluster; community support

$0 /mo
Galaxy Standard

Pay-as-you-go cluster pricing; usage-based

Quote
Galaxy Enterprise

Advanced governance, SSO, dedicated support

Quote
Starburst Enterprise (self-hosted)

Subscription; on-prem or BYOC

Quote

Key features

+Managed Trino (SaaS Galaxy and self-hosted Enterprise)
+Multi-format support: Iceberg, Delta, Hudi
+Stargate federation across 50+ sources
+Caching and acceleration layer
+Role-based access control and data products
+Bring-your-own-cloud model

50+ integrations

Apache IcebergTrinoTableauPower BIdbtLookerAWS S3ADLS

Geography

Global

View full Starburst intelligence profile → Compare Starburst →

Apache Hudi + Onehouse

Streaming-first open table format from Uber, with Onehouse as commercial managed offering.

Founded 2017 · Distributed (originated at Uber); Onehouse: Sunnyvale, CA · private · 50-50,000+ employees

G2 4.3 (35)

From $0 /mo

◐ Partial disclosure

Visit Apache Hudi + Onehouse

Apache Hudi is the open table format originated at Uber in 2016-2017 and donated to the Apache Software Foundation, designed from day one for streaming-first and record-update-heavy workloads (CDC, real-time ingestion, frequent upserts). Onehouse is the commercial managed offering founded by Hudi creator Vinoth Chandar in 2021, with a multi-format strategy (Hudi, Iceberg, Delta via Apache XTable). The honest framing: Hudi has lost the broader open-table-format war to Iceberg on hyperscaler buy-in, but retains a defensible niche in streaming-first and CDC-heavy workloads where its incremental processing model is genuinely differentiating. Best fit for Uber-origin shops and streaming-heavy data engineering teams.

Best for

Streaming-first data engineering teams (50-50,000 employees) with heavy CDC, frequent upserts, or real-time ingestion requirements where Hudi incremental processing is differentiating.

Worst for

Batch-heavy analytics shops (Iceberg or Delta fit better), or teams wanting broadest hyperscaler-native support without operational engineering work.

Strengths

Streaming-first and CDC-heavy workload specialization
Record-level updates and deletes natively supported
Onehouse managed offering with Hudi creator on engineering team
Apache XTable for cross-format (Hudi/Iceberg/Delta) interop
Used in production at Uber, Walmart, Robinhood, Notion

Weaknesses

Lost broader format war to Iceberg on hyperscaler buy-in
Smaller ecosystem and contributor base than Iceberg or Delta
Best-fit narrowed to streaming/CDC workloads

Pricing tiers

partial

Apache Hudi

Apache 2.0; unlimited use; community support

$0 /mo
Onehouse Free

Community tier; limited capacity

$0 /mo
Onehouse Cloud

Managed Hudi + multi-format; usage-based

Quote
Onehouse Enterprise

Dedicated support, enterprise governance

Quote

Key features

+Streaming-first incremental processing
+Record-level updates (Copy-on-Write and Merge-on-Read)
+Time travel and snapshot isolation
+Apache XTable for cross-format interop
+Native Spark, Flink, Presto, Trino support
+Onehouse managed cloud

40+ integrations

Apache SparkApache FlinkTrinoPrestoAWS GlueEMRApache XTable

Geography

Global

View full Apache Hudi + Onehouse intelligence profile → Compare Apache Hudi + Onehouse →

Frequently asked questions

The questions buyers actually ask before they sign.

Which lakehouse vendors hold IRAP PROTECTED assessment for Australian Commonwealth and Defence workloads?

The underlying cloud infrastructure carries the IRAP assessment, and lakehouse services inherit it. AWS Sydney (ap-southeast-2) and Azure Australia Central (Canberra) hold PROTECTED-level IRAP assessment; GCP Sydney holds OFFICIAL: Sensitive. Databricks (on Azure Australia Central), Snowflake (on AWS Sydney), AWS Lake Formation and S3 Tables (in Sydney), and Microsoft Fabric OneLake (on Azure Australia East and Central) can all be deployed in IRAP-assessed environments, but the project team must validate the specific service scope, data plane region, and control plane region with their IRAP assessor at procurement. Commonwealth and Defence buyers should request the most recent IRAP assessment report from each vendor and confirm that the data plane region remains within the assessed Australian region throughout the contract.

How does APP 8 cross-border disclosure affect lakehouse vendor selection in Australia?

APP 8 (cross-border disclosure of personal information) requires that before an Australian entity discloses personal information to an overseas recipient it takes reasonable steps to ensure the overseas recipient does not breach the APPs, or it remains accountable for the breach. In practice this drives Australian preference for AWS Sydney, Azure Australia East and Central, and GCP Sydney as the lakehouse residency. All major lakehouse vendors support Australian region deployment for data plane storage and compute, but control plane components (metadata services, billing, telemetry) may transit through other regions; check each vendor's data flow diagram and contractual residency commitments. Snowflake, Databricks (on Azure with EU data boundary equivalents available), AWS Lake Formation, Microsoft Fabric, and BigLake all provide documented Australian residency configurations.

Should an Australian fintech evaluate Snowflake or Databricks as the primary lakehouse?

It depends on workload profile. Snowflake is the better default for SQL-first analytics, BI workloads, regulated reporting, and CDR data flows where the engineering team is BI- and SQL-heavy; Australian financial services adoption skews toward Snowflake on AWS Sydney. Databricks is the better default where machine learning, real-time fraud detection, and AI workloads sit alongside analytics, where the engineering team is Python-heavy, or where the existing data lake is on Azure ADLS Gen2; Australian-origin product-SaaS like Atlassian skew toward Databricks on Azure. Many ASX 200 financial services run both: Snowflake as the SQL-first analytics platform with Iceberg interop, Databricks for ML and AI workloads, with Iceberg or Delta as the shared table format. The 2026 evaluation pressure on both vendors is Apache Iceberg interop maturity, which has converged significantly: Snowflake via Polaris and Databricks via Unity Catalog plus Iceberg both now support open-format strategies that reduce vendor lock-in.

Lakehouse vs data warehouse: is there a real architectural difference in 2026?

Yes, though the categories are converging. A traditional cloud data warehouse stores data in a proprietary format under the vendor compute layer (Snowflake internal tables, BigQuery native storage, Redshift managed storage), which means the vendor owns both the storage and the query engine and you cannot move workloads between engines without a copy. A lakehouse stores data in an open table format (Iceberg, Delta, Hudi) on object storage (S3, GCS, ADLS) that you control, with table semantics (ACID, schema, time travel) layered on top of Parquet, so multiple engines can read the same tables. In 2026 Snowflake, Databricks, BigQuery, and others sell both modes, which is why the architectural distinction matters less than the operational one: are your tables in a format you can move?

Apache Iceberg vs Delta Lake vs Apache Hudi: which open table format should we pick?

Apache Iceberg is winning the open-table-format war in 2025-2026 on the strength of hyperscaler buy-in: AWS S3 Tables, Google BigLake, Microsoft Fabric (via shortcuts), and Snowflake all support Iceberg as a first-class format. Delta Lake remains strong inside Databricks and Microsoft Fabric OneLake, with Delta UniForm providing Iceberg metadata interop for cross-engine read. Apache Hudi retains a defensible niche in streaming-first and CDC-heavy workloads originated at Uber. The 2026 default recommendation for new lakehouse deployments is Iceberg unless you are deep on Databricks (use Delta with UniForm) or have heavy streaming/CDC requirements (consider Hudi via Onehouse).

Does the Databricks-Tabular acquisition (Jun 2024) hurt Iceberg neutrality?

It creates obvious strategic tension because Databricks is the lead maintainer of Delta Lake, the rival format to Iceberg, and the Jun 2024 acquisition of Tabular (the Iceberg-creator-led startup founded by Ryan Blue and Daniel Weeks) for a reported $1B+ brought the Iceberg founders into the Delta-stewarding company. The public position from Databricks is that Iceberg and Delta will coexist via Delta UniForm interop and continued Iceberg contributions; the real position deserves watching through 2026 via contribution patterns to the Apache Iceberg project. Pragmatically, Iceberg has enough hyperscaler and Apache Foundation governance momentum that no single vendor can capture it, but the buyer takeaway is to treat Iceberg neutrality as something to verify in 2026-2027 rather than assume.

Is Snowflake genuinely going open with the Polaris Catalog OSS pivot?

It is a real strategic shift but the buyer benefit depends on catalog choice. In Jun 2024 Snowflake open-sourced Polaris Catalog as an Apache project implementing the Iceberg REST catalog specification, and Snowflake native Iceberg tables reached read/write parity with internal tables in 2025. The honest read: Snowflake is hedging against a future where customers want format portability, and Polaris gives Snowflake a credible neutral catalog story. The benefit to buyers is real if you use Polaris as your catalog and Snowflake as one of several engines (Trino, Dremio, Spark) reading the same Iceberg tables. The benefit is limited if you stay on Snowflake internal tables, which remain the default for new deployments.

When should we choose a lakehouse over a warehouse-only architecture?

Choose a lakehouse when: (1) you have substantial ML, AI, or unstructured-data workloads that need to share the same data as your BI/SQL workloads; (2) you want to avoid storage lock-in to a single warehouse vendor and value the ability to query the same tables from multiple engines; (3) your data volumes are large enough (typically 50TB+) that the object-storage cost advantage of lakehouse storage matters relative to managed warehouse storage. Choose warehouse-only when: (1) your workload is SQL-first BI with minimal ML; (2) data volumes are modest enough that operational simplicity beats format flexibility; (3) you want a single integrated vendor for storage, compute, and governance without component assembly.

What is the cost reality of a lakehouse at petabyte scale?

Object-storage costs (S3, GCS, ADLS) at petabyte scale are typically $20-23/TB/month, materially below managed warehouse storage of $40-50/TB/month, which is the structural reason large enterprises move to lakehouse architectures. The compute economics are similar to warehouse compute (Databricks DBUs, Snowflake credits, BigQuery slots, AWS Athena per-TB-scanned) and depend heavily on query patterns. The hidden cost at petabyte scale is metadata operations: Iceberg snapshot expiry, compaction, and small-file management require operational discipline, which is why managed Iceberg services (AWS S3 Tables, Snowflake Polaris, Tabular/Databricks, Onehouse) charge a premium over raw object storage. Realistic total-cost-of-ownership at petabyte scale: 30-50% savings versus pure managed warehouse, partially offset by engineering time on metadata operations.

Which Iceberg catalog should we choose: Polaris, Unity Catalog, AWS Glue, or Nessie?

The catalog is increasingly the lock-in decision that matters more than engine choice. Snowflake Polaris (Apache project, vendor-neutral by governance) is the right pick if you want an open standard catalog with multi-engine read/write and have no strong Databricks or AWS commitment. Databricks Unity Catalog is the right pick if Databricks is your primary engine and you value governance integration with ML workflows; Iceberg support is added but Delta is the native default. AWS Glue Data Catalog is the right pick if AWS is your data plane and you use Athena, EMR, or Redshift as engines. Project Nessie (Dremio-led, open source) is the right pick if you want Git-for-data semantics (branching, merging, tags) on top of Iceberg. The 2026 advice: pick the catalog deliberately because switching catalogs later is non-trivial.

How does query engine + storage separation work in practice?

In a lakehouse, your data lives in object storage (S3/GCS/ADLS) as Parquet files organized by an open table format (Iceberg/Delta/Hudi). The catalog (Polaris, Unity, Glue, Nessie) stores the table metadata: what columns exist, where snapshots are, what files belong to which partition. Multiple query engines (Trino via Starburst, Dremio, Spark on Databricks, Snowflake, BigQuery via BigLake, Athena, ClickHouse) can read the same tables by talking to the catalog and reading the underlying Parquet. This separation lets you run analytical queries on one engine and ML training on another against the same data, and switch engines without copying data. The trade-off in practice is that managed-warehouse engines (Snowflake internal tables, BigQuery native) often outperform lakehouse engines on the same data because they control the storage layout; the lakehouse perf gap has narrowed in 2025-2026 but has not fully closed.

Final word

Looking at a different market? See the global Data Lakehouse ranking, or pick another country at the top of this page.

Last updated 2026-05-27. Local pricing reverified quarterly. Found something inaccurate? Tell us.