Skip to content
Z Zendikt
Editorial deep-dive · 10 products · Verified 2026-05-27

Top 10 Data Lakehouse Software for 2026

Independent 2026 ranking of data lakehouse platforms and open table formats. Databricks vs Snowflake, Iceberg vs Delta vs Hudi, honest pricing, real residency picks.

Verdict (TL;DR)

Verified 2026-05-27

Databricks and Snowflake are the two enterprise-grade lakehouse platforms, now both pretending to be format-neutral after Databricks acquired Tabular (Iceberg) for ~$1B+ in Jun 2024 and Snowflake open-sourced Polaris Catalog the same month. Apache Iceberg is winning the open-table-format war on hyperscaler buy-in (AWS, GCP, Microsoft, Snowflake); Delta Lake remains strongest inside Databricks via Delta UniForm interop; Apache Hudi is the streaming-first specialist most common in Uber-origin shops. AWS Lake Formation, Google BigLake, and Microsoft Fabric OneLake are the hyperscaler-native lakehouse offerings, each tightest to its own object store. Dremio and Starburst are the query-engine specialists for buyers separating storage from compute. The honest 2026 view: pick Iceberg as your open table format unless you are deep on Databricks, and treat catalog choice (Polaris, Unity Catalog, Glue, Nessie) as the lock-in decision that matters more than the engine.

Best for your specific use case

  • Enterprise lakehouse + AI/ML platform: Databricks Lakehouse Platform Delta Lake-native, Unity Catalog governance, Mosaic AI for training, and Tabular acquisition gives them Iceberg-creator talent in-house. The high-end default if you are doing serious ML alongside analytics.
  • Cloud-neutral SQL lakehouse on Iceberg: Snowflake + Polaris Catalog Native Iceberg tables GA in 2025; Polaris open-sourced Jun 2024 as Apache Iceberg REST catalog. Right answer for SQL-first enterprises wanting open format with managed SaaS.
  • AWS-anchored lakehouse on S3: AWS Lake Formation + Iceberg Glue Data Catalog plus S3 Tables (Iceberg-native S3 buckets, 2024 GA) plus Lake Formation governance. Lowest friction if S3 is already the data plane.
  • GCP-anchored multi-format lakehouse: Google BigLake BigQuery query engine over Iceberg, Hudi, and Delta on Cloud Storage. Best when BigQuery is already the analytics layer and you want lakehouse semantics without a separate platform.
  • Microsoft 365 + Power BI lakehouse: Microsoft Fabric OneLake OneLake is Delta Lake-native and now supports Iceberg via shortcuts. Wins through M365 E5/Fabric capacity bundle economics, not engine quality.
  • Open table format standard: Apache Iceberg The open-format winner of 2025-2026 by hyperscaler buy-in. Use as the table format spec, then pick your catalog (Polaris, Unity, Glue, Nessie) and engine separately.
  • Delta-Iceberg interop: Delta Lake Still the right format inside Databricks. Delta UniForm (2024) writes Iceberg metadata alongside Delta so external engines can read. Strong if Databricks is the primary engine.
  • Streaming-first lakehouse: Apache Hudi + Onehouse Uber-origin format optimized for incremental ingest and record-level updates. Onehouse is the managed offering. Niche but well-fitted for CDC-heavy streaming pipelines.
  • Open lakehouse query engine on Iceberg: Dremio Lakehouse-native query engine with Nessie catalog (Git-for-data). Strong for teams wanting to query Iceberg on S3/ADLS/GCS without committing to Databricks or Snowflake compute.
  • Trino-based federated lakehouse query: Starburst Managed Trino with Iceberg + Delta + Hudi support and Stargate federation. Best for federated query across lakehouse plus operational data sources.

A data lakehouse is the unified architecture that combines data lake storage (S3, GCS, ADLS) with warehouse-grade SQL, ACID transactions, and governance, powered by open table formats (Apache Iceberg, Delta Lake, Apache Hudi) that layer table semantics on top of Parquet files. The category is distinct from pure cloud data warehouses (covered separately) where storage is proprietary and compute is bundled; the lakehouse promise is that your data sits in your object store, in an open format you control, and any compliant engine can read it.

The structural story for 2026 is twofold. First, Apache Iceberg has effectively won the open-table-format war on the strength of hyperscaler buy-in: AWS S3 Tables (2024 GA), Google BigLake, Microsoft Fabric OneLake (via shortcuts), and Snowflake all support Iceberg as a first-class format. Delta Lake remains dominant inside Databricks but has hedged via Delta UniForm interop that writes Iceberg metadata in parallel. Apache Hudi retains a streaming-first niche around CDC-heavy and record-update-heavy workloads.

Second, the Jun 2024 deals reshaped the politics. Databricks acquired Tabular (the company founded by the original Iceberg creators) for a reported $1B+, and the same month Snowflake open-sourced Polaris Catalog as an Apache project. The public position from both companies is that Iceberg and Delta will coexist; the real position deserves watching through 2026 via contribution patterns to the Apache Iceberg and Delta Lake projects.

The honest framing for buyers: the engine is increasingly a commodity, the catalog is increasingly the lock-in. Choose your catalog (Polaris, Unity Catalog, AWS Glue, Project Nessie) deliberately because it controls who can read your tables and how access is governed.

At a glance

Quick comparison

Product Best for Starts at 10-emp/mo* Pricing G2 Geo
1 Databricks Lakehouse Platform
Mid-market through global enterprise
$0 $0 4.5 Global
2 Snowflake + Polaris Catalog
Mid-market through global enterprise
$0 $0 4.5 Global
3 AWS Lake Formation + Iceberg
AWS-anchored teams of any size
$0 $0 4.2 Global
4 Google BigLake
GCP-anchored teams of any size
$0 $0 4.4 Global
5 Microsoft Fabric OneLake
Microsoft-anchored mid-enterprise through global enterprise
$263 $263 4.4 Global
6 Apache Iceberg
Engineering-led teams of any size
$0 $0 4.6 Global
7 Delta Lake
Engineering-led teams, Databricks-anchored
$0 $0 4.5 Global
8 Apache Hudi + Onehouse
Streaming-first engineering teams
$0 $0 4.3 Global
9 Dremio
Engineering-led lakehouse teams
Quote - 4.4 Global
10 Starburst
Engineering-led federated query teams
$0 $0 4.4 Global

*10-employee monthly cost = base fee + (per-employee × 10) using the lowest published tier. For opaque-pricing vendors, no value is shown.

Pricing calculator

What will it actually cost you?

Enter your team size below. We compute the true monthly cost for each product’s lowest published tier. Opaque-pricing vendors are excluded, get a quote.

Multi-state requires Gusto Plus or higher; OnPay charges no extra. Calculator picks the cheapest valid tier.

Estimated monthly cost (cheapest first)

    Note: Estimates are list-price floors. Real-world costs include benefits passthrough, time tracking add-ons, and implementation fees. Negotiated rates often run 10–30% lower at scale.
    Personalized ranking

    Weight what matters to you

    Drag the sliders. The list re-ranks in real time based on your priorities. Default weights match our methodology.

    Your personalized ranking

    Default weights
      Migration matrix

      How hard is it to switch?

      Switching cost is the lock-in tax. Read row → column: “If I'm on X today, how painful is moving to Y?” Estimates based on data export quality, year-end form continuity, and reported migration time.

      From ↓ / To → Databricks Lakehouse Platform Snowflake + Polaris Catalog AWS Lake Formation + Iceberg Google BigLake Microsoft Fabric OneLake Apache Iceberg Delta Lake Apache Hudi + Onehouse Dremio Starburst
      Databricks Lakehouse Platform
      -
      OK 4
      Hard 7
      Hard 7
      OK 4
      Hard 7
      Hard 7
      Medium 6
      Medium 6
      Hard 7
      Snowflake + Polaris Catalog
      OK 4
      -
      Hard 7
      Hard 7
      OK 4
      Hard 7
      Hard 7
      Medium 6
      Medium 6
      Hard 7
      AWS Lake Formation + Iceberg
      Hard 7
      Hard 7
      -
      Medium 6
      Hard 7
      Medium 6
      Medium 6
      Medium 5
      Medium 5
      Medium 6
      Google BigLake
      Hard 7
      Hard 7
      Medium 6
      -
      Hard 7
      Medium 6
      Medium 6
      Medium 5
      Medium 5
      Medium 6
      Microsoft Fabric OneLake
      OK 4
      OK 4
      Hard 7
      Hard 7
      -
      Hard 7
      Hard 7
      Medium 6
      Medium 6
      Hard 7
      Apache Iceberg
      Hard 7
      Hard 7
      Medium 6
      Medium 6
      Hard 7
      -
      Medium 6
      Medium 5
      Medium 5
      Medium 6
      Delta Lake
      Hard 7
      Hard 7
      Medium 6
      Medium 6
      Hard 7
      Medium 6
      -
      Medium 5
      Medium 5
      Medium 6
      Apache Hudi + Onehouse
      Medium 6
      Medium 6
      Medium 5
      Medium 5
      Medium 6
      Medium 5
      Medium 5
      -
      OK 4
      Medium 5
      Dremio
      Medium 6
      Medium 6
      Medium 5
      Medium 5
      Medium 6
      Medium 5
      Medium 5
      OK 4
      -
      Medium 5
      Starburst
      Hard 7
      Hard 7
      Medium 6
      Medium 6
      Hard 7
      Medium 6
      Medium 6
      Medium 5
      Medium 5
      -
      Easy (0–2) OK (3–4) Medium (5–6) Hard (7–8) Very hard (9–10)
      The ranking

      All 10, ranked and reviewed

      Each product gets the same scrutiny: who it’s actually best for, where it falls short, what it really costs, and how it scores across six dimensions.

      #1

      Databricks Lakehouse Platform

      Delta Lake-native lakehouse with Unity Catalog and Mosaic AI; Iceberg-aware after Tabular acquisition.

      Founded 2013 · San Francisco, CA · private · 200-100,000+ employees
      G2 4.5 (580)
      Capterra 4.6
      From $0 /mo
      ◐ Partial disclosure
      Visit Databricks Lakehouse Platform

      Databricks is the enterprise lakehouse leader, unifying data engineering, analytics, and ML/AI training on Delta Lake + Unity Catalog. The Jun 2024 acquisition of Tabular (the Iceberg-creator-led startup) for a reported $1B+ creates obvious tension because Databricks is the lead maintainer of Delta Lake, the rival format to Iceberg; the public position is that Databricks will support both via Delta UniForm and through ongoing Iceberg contribution. Last private valuation was $43B in Sept 2023 (reported $62B in subsequent rounds), with a 2026 IPO widely expected but not confirmed. Trade-offs: DBU pricing complexity, and SQL-only buyers often find Snowflake simpler.

      Best for

      Mid-market and enterprise data teams (200-50,000 employees) running serious ML training plus analytics, where lakehouse governance and AI workflow integration matter more than pure SQL simplicity.

      Worst for

      SQL-only BI shops (Snowflake or BigQuery simpler), Iceberg-purist buyers wary of Databricks owning Delta Lake, or small teams without dedicated data engineering.

      Strengths

      • Delta Lake as the open default plus Delta UniForm Iceberg interop
      • Unity Catalog unifies governance across analytics, ML, and lakehouse tables
      • Best-in-class for AI/ML training and feature engineering via Mosaic AI
      • Tabular acquisition brought Iceberg-creator engineering talent in-house
      • Photon vectorized engine narrows SQL gap to dedicated warehouses

      Weaknesses

      • DBU pricing complexity, plus separate cloud infra costs charged by hyperscaler
      • Delta vs Iceberg neutrality is contested given Databricks owns Delta Lake project
      • Unity Catalog migration painful for legacy Hive metastore customers

      Pricing tiers

      partial
      • Standard (Jobs)
        From $0.15/DBU; basic Spark workloads
        $0 /mo
      • Premium
        From $0.40/DBU; SQL warehouses, Unity Catalog, audit logs
        $0 /mo
      • Enterprise
        From $0.65/DBU; HIPAA, PCI, customer-managed keys
        $0 /mo
      • Mosaic AI Model Training
        Foundation model training and serving; custom quote
        Quote
      Watch for
      • · Cloud infra (EC2/Azure VMs/GCE) billed by hyperscaler, not Databricks
      • · Photon premium DBU multiplier on SQL warehouses
      • · Mosaic AI inference and training billed separately

      Key features

      • +Delta Lake (open table format)
      • +Delta UniForm (Iceberg metadata interop)
      • +Unity Catalog governance
      • +Photon vectorized SQL engine
      • +Mosaic AI (training, fine-tuning, serving)
      • +Lakehouse Federation across S3/ADLS/GCS
      • +Delta Sharing (open data sharing protocol)
      350+ integrations
      dbtFivetranTableauPower BIHugging FaceLangChainApache Iceberg
      Geography
      Global
      #2

      Snowflake + Polaris Catalog

      Cloud-neutral managed lakehouse with native Iceberg and open-sourced Polaris Catalog.

      Founded 2012 · Bozeman, MT · public · 200-100,000+ employees
      G2 4.5 (680)
      Capterra 4.5
      From $0 /mo
      ◐ Partial disclosure
      Visit Snowflake + Polaris Catalog

      Snowflake (NYSE:SNOW) made a genuine strategic shift toward open lakehouse architecture in 2024: native Iceberg tables reached read/write parity with internal tables, and the Polaris Catalog was open-sourced in Jun 2024 as an Apache Iceberg REST catalog implementation. The honest reading is that this is a real bet on Iceberg interop, partly defensive against Databricks-on-Delta and partly offensive into the open-format buyer segment. The trade-off: whether enterprise customers actually benefit depends on which catalog they pick, and Snowflake credit-based pricing remains easy to overspend without governance. Best fit for SQL-first enterprises wanting open format with managed SaaS.

      Best for

      Cloud-neutral enterprises (500+ employees) wanting lakehouse semantics in Iceberg without operating a separate engine, with a strong preference for managed SaaS and SQL workloads.

      Worst for

      Heavy AI/ML training shops (Databricks better), single-cloud teams that could just use BigLake or Lake Formation, or buyers who reject credit-based pricing.

      Strengths

      • Native Iceberg tables GA with read/write parity to internal tables
      • Polaris Catalog open-sourced Jun 2024 as Apache Iceberg REST catalog
      • Cloud-neutral: native on AWS, Azure, GCP
      • Snowpark for Python/Java/Scala in-lakehouse processing
      • Strong governance, masking, and row-level security

      Weaknesses

      • Credit-based pricing easy to overspend without strict governance
      • External Iceberg catalogs require careful planning; performance trade-offs vs internal tables
      • May 2024 customer credential incident still discussed in deals

      Pricing tiers

      partial
      • Standard
        On-demand $2/credit; storage $23/TB/month compressed
        $0 /mo
      • Enterprise
        On-demand $3/credit; multi-cluster warehouses, masking
        $0 /mo
      • Business Critical
        On-demand $4/credit; HIPAA, PCI, customer-managed keys
        $0 /mo
      • Virtual Private Snowflake (VPS)
        Dedicated metadata service for regulated industries
        Quote
      Watch for
      • · Compute credit overruns from un-suspended warehouses
      • · External Iceberg query has different perf characteristics than internal tables
      • · Cross-region data egress

      Key features

      • +Native Iceberg tables (managed and external)
      • +Polaris Catalog (open-source Apache Iceberg REST catalog)
      • +Snowpark for Python/Java/Scala
      • +Time Travel and Zero-Copy Cloning
      • +Snowpipe streaming ingestion
      • +Secure Data Sharing and Marketplace
      400+ integrations
      dbtFivetranTableauPower BIApache IcebergAirbyteHightouch
      Geography
      Global
      #3

      AWS Lake Formation + Iceberg

      AWS-native lakehouse: Glue Catalog, Lake Formation governance, and S3 Tables for Iceberg.

      Founded 2018 · Seattle, WA · public · 50-100,000+ employees
      G2 4.2 (140)
      Capterra 4.2
      From $0 /mo
      ● Transparent pricing
      Visit AWS Lake Formation + Iceberg

      AWS Lake Formation is the AWS-native lakehouse governance layer over S3, with AWS Glue Data Catalog as the metadata store and Lake Formation managing fine-grained access controls. The 2024 Re:Invent S3 Tables announcement made Iceberg a first-class S3 bucket type, removing the need for a separate Iceberg metastore for many AWS-native pipelines. The lakehouse engines on top are Athena, EMR, Redshift Spectrum, and Glue ETL. Strengths: deep AWS integration, IAM-native access, and Iceberg-native S3. Trade-offs: best-fit narrows sharply when not AWS-anchored, governance UX is more workmanlike than Unity Catalog, and pricing fragments across Glue, Lake Formation, S3 Tables, and the chosen query engine.

      Best for

      AWS-anchored organizations (any size) where S3 is already the data plane and the team wants to add Iceberg + governance without leaving AWS.

      Worst for

      Multi-cloud or non-AWS teams, organizations wanting a single integrated lakehouse vendor (Databricks or Snowflake), or buyers wanting opinionated governance UX.

      Strengths

      • Iceberg-native S3 Tables (2024 GA) removes need for separate metastore
      • AWS Glue Data Catalog as the metadata layer with broad AWS integration
      • Lake Formation fine-grained access on rows, columns, and tags
      • IAM-native authentication and tag-based access control
      • Query engine flexibility: Athena, EMR, Redshift Spectrum, Glue ETL

      Weaknesses

      • Best-fit narrows sharply when not AWS-anchored
      • Governance UX more workmanlike than Unity Catalog or Polaris
      • Pricing fragments across Glue, Lake Formation, S3 Tables, query engine

      Pricing tiers

      public
      • Glue Data Catalog
        $1/100k objects/month; first 1M free
        $0 /mo
      • Lake Formation
        No additional charge; underlying services billed separately
        $0 /mo
      • S3 Tables
        Storage at S3 standard rates; per-request fees
        $0 /mo
      • Athena (query)
        $5/TB scanned; or capacity reservation
        $0 /mo
      Watch for
      • · S3 Tables compaction and maintenance request fees
      • · Glue ETL DPU consumption
      • · Cross-region data egress
      • · Athena/EMR/Redshift Spectrum billed separately as compute

      Key features

      • +AWS S3 Tables (Iceberg-native S3 buckets)
      • +AWS Glue Data Catalog
      • +Lake Formation fine-grained access controls
      • +Tag-based access control
      • +Cross-account data sharing
      • +Native Apache Iceberg support
      • +Integration with Athena, EMR, Redshift Spectrum
      200+ integrations
      Apache IcebergAthenaEMRRedshiftGlue ETLQuickSightSageMaker
      Geography
      Global
      #4

      Google BigLake

      BigQuery engine over open table formats: Iceberg, Hudi, and Delta on Cloud Storage.

      Founded 2022 · Mountain View, CA · public · 50-100,000+ employees
      G2 4.4 (110)
      Capterra 4.4
      From $0 /mo
      ● Transparent pricing
      Visit Google BigLake

      BigLake is Google Cloud lakehouse layer that lets BigQuery (and other GCP engines including Dataproc Spark and Dataflow) query Apache Iceberg, Apache Hudi, and Delta Lake tables on Cloud Storage with the same governance model as native BigQuery tables. The fit: GCP-anchored teams who already use BigQuery as the analytics engine and want to add lakehouse semantics over open formats without operating a separate platform. Strengths: tightest integration with BigQuery, Looker, and Vertex AI; native Iceberg, Hudi, and Delta support; and serverless query economics. Trade-offs: best-fit narrows sharply when not GCP-anchored, and cross-cloud egress economics favor staying inside GCP.

      Best for

      GCP-anchored organizations (any size) wanting lakehouse semantics on Iceberg/Hudi/Delta with BigQuery as the primary engine, plus tight Looker and Vertex AI integration.

      Worst for

      Multi-cloud or AWS/Azure-anchored organizations, teams that need a single integrated lakehouse vendor across clouds, or buyers without existing BigQuery investment.

      Strengths

      • Native Iceberg, Hudi, and Delta Lake support on Cloud Storage
      • Same governance model as BigQuery (Policy Tags, BigQuery IAM)
      • BigQuery serverless query economics extend to open tables
      • BigQuery Omni for cross-cloud query against AWS S3 and Azure
      • Tight integration with Vertex AI and Looker

      Weaknesses

      • Best-fit narrows sharply when not GCP-anchored
      • Cross-cloud egress economics favor staying inside GCP
      • External table query has different perf characteristics than native BigQuery

      Pricing tiers

      public
      • On-demand
        $6.25/TB scanned on BigQuery; Cloud Storage at standard rates
        $0 /mo
      • BigQuery Editions Standard
        $0.04/slot-hour; capacity reservations
        $0 /mo
      • BigQuery Editions Enterprise
        $0.06/slot-hour; CMEK, VPC-SC
        $0 /mo
      • BigQuery Editions Enterprise Plus
        $0.10/slot-hour; cross-region replication
        $0 /mo
      Watch for
      • · Cloud Storage class tiering
      • · BI Engine memory reservation
      • · Cross-region or cross-cloud egress

      Key features

      • +Native Apache Iceberg, Hudi, and Delta Lake support
      • +BigQuery engine over Cloud Storage tables
      • +BigLake Metastore (Iceberg-compatible)
      • +BigQuery Omni cross-cloud query
      • +Policy Tags for column-level access
      • +Vertex AI integration
      200+ integrations
      LookerVertex AIdbtFivetranApache IcebergApache HudiDelta Lake
      Geography
      Global
      #5

      Microsoft Fabric OneLake

      Microsoft unified lakehouse store: Delta-native, with Iceberg via shortcuts and Power BI bundle economics.

      Founded 2023 · Redmond, WA · public · 500-100,000+ employees
      G2 4.4 (380)
      Capterra 4.4
      From $263 /mo
      ◐ Partial disclosure
      Visit Microsoft Fabric OneLake

      OneLake is the unified data lake layer underneath Microsoft Fabric, announced in May 2023 as part of Microsoft Fabric and using Delta Lake as the native open format. The 2024-2025 additions of OneLake shortcuts to Iceberg tables (in S3, ADLS, and elsewhere) and the broader Fabric Iceberg interop make OneLake the closest thing to a multi-format lakehouse store from Microsoft. The honest framing: OneLake wins deals through Power BI Premium bundle pricing and Microsoft 365 procurement leverage, not because the underlying engine is best-in-class. Capacity Unit (CU) pricing complexity remains the main cost-forecasting issue.

      Best for

      Microsoft 365 + Power BI Premium-anchored enterprises (500-100,000+ employees) where Fabric capacity comes effectively-free with existing M365 E5 commitments.

      Worst for

      Non-Microsoft-anchored teams, organizations rejecting Capacity Unit pricing, or buyers wanting best-in-class engine performance over bundle economics.

      Strengths

      • OneLake as Delta Lake-native unified analytics store
      • OneLake shortcuts allow read of Iceberg tables in S3, ADLS, elsewhere
      • Power BI Premium bundle, often effectively-free with E5 commitments
      • Copilot integrated across the Fabric suite
      • One SKU covers lakehouse + warehouse + BI + ETL + real-time

      Weaknesses

      • Wins on bundle economics, not core engine quality
      • Capacity Unit (CU) pricing complexity
      • Iceberg support via shortcuts is read-mostly vs full lakehouse semantics

      Pricing tiers

      partial
      • F2 (smallest)
        2 CU; pay-as-you-go
        $263 /mo
      • F64
        64 CU; common mid-size enterprise capacity
        $8400 /mo
      • F2048
        2,048 CU; very large enterprise capacity
        $269000 /mo
      • Bundled with Power BI Premium
        F64 effectively included with P1 commitments at many enterprises
        Quote
      Watch for
      • · OneLake storage billed separately at ADLS rates
      • · Cross-region data egress
      • · Mirroring usage can spike CU consumption

      Key features

      • +OneLake (Delta Lake-native unified store)
      • +OneLake shortcuts (Iceberg read in S3/ADLS)
      • +Fabric Lakehouse (Spark + SQL endpoint)
      • +Fabric Warehouse (T-SQL warehouse)
      • +Power BI native integration
      • +Copilot in Fabric
      • +Mirroring (Snowflake, Cosmos, Azure SQL)
      250+ integrations
      Power BIMicrosoft 365Azure MLApache Iceberg (via shortcuts)Snowflake (mirroring)Delta Lake
      Geography
      Global
      #6

      Apache Iceberg

      The winning open table format of 2025-2026 by hyperscaler buy-in.

      Founded 2017 · Distributed (originated at Netflix) · public · 50-100,000+ employees
      G2 4.6 (90)
      From $0 /mo
      ● Transparent pricing
      Visit Apache Iceberg

      Apache Iceberg is the open table format originated at Netflix in 2017, donated to the Apache Software Foundation, and now the de facto winner of the open-table-format war in 2025-2026 on the strength of hyperscaler buy-in. AWS (S3 Tables, Athena, EMR, Redshift), Google (BigLake, BigQuery), Microsoft (Fabric via shortcuts), and Snowflake all support Iceberg as a first-class format. Databricks acquired Tabular (the company founded by Iceberg creators Ryan Blue and Daniel Weeks) in Jun 2024 for a reported $1B+, which brought core Iceberg engineering talent into the Delta Lake-stewarding company; the public position is dual-format support. The honest read: pick Iceberg unless you are deep on Databricks.

      Best for

      Engineering-led organizations of any size committing to open-format lakehouse architecture, particularly multi-engine or multi-cloud teams who want to avoid table-format lock-in.

      Worst for

      Teams deep on Databricks where Delta Lake is the path of least resistance, or shops that prefer fully managed lakehouse SKUs over assembling components.

      Strengths

      • De facto open-table-format winner by hyperscaler buy-in
      • ACID transactions, time travel, schema evolution, hidden partitioning
      • Iceberg REST catalog spec standardized (Polaris, Nessie, Glue support it)
      • Vendor-neutral by design and Apache-governed
      • Strong contributor diversity across AWS, Apple, Netflix, Stripe, Tabular

      Weaknesses

      • Catalog choice (Polaris, Unity, Glue, Nessie) is the real lock-in decision
      • Maintenance operations (compaction, snapshot expiry) require operational discipline
      • Tabular acquisition by Databricks creates uncertainty about long-term neutrality

      Pricing tiers

      public
      • Apache Iceberg
        Apache 2.0; unlimited use; community support
        $0 /mo
      • Commercial managed offerings
        Snowflake Polaris, AWS S3 Tables, Tabular (Databricks), Dremio, Cloudera, Onehouse
        Quote

      Key features

      • +ACID transactions on object storage
      • +Time travel and snapshot isolation
      • +Schema evolution (add, drop, rename columns)
      • +Hidden partitioning and partition evolution
      • +Iceberg REST catalog spec
      • +Multi-engine read/write (Spark, Trino, Flink, Presto, Snowflake, BigQuery)
      50+ integrations
      SnowflakeDatabricksAWS GlueBigLakeTrinoSparkFlinkDremioStarburst
      Geography
      Global
      #7

      Delta Lake

      Databricks-led open table format with Iceberg interop via Delta UniForm.

      Founded 2019 · Distributed (Databricks-stewarded) · public · 50-100,000+ employees
      G2 4.5 (60)
      From $0 /mo
      ● Transparent pricing
      Visit Delta Lake

      Delta Lake is the open table format created at Databricks, open-sourced under the Linux Foundation in 2019, and the native format for the Databricks Lakehouse Platform. It remains strong inside Databricks (Unity Catalog assumes Delta as the default) and has hedged for the Iceberg-dominant 2025-2026 landscape via Delta UniForm (2024), which writes Iceberg metadata in parallel so external engines can read Delta tables as if they were Iceberg. The honest framing: if Databricks is your primary engine, Delta is the right format; if you want format neutrality across hyperscalers, Iceberg is winning. Microsoft Fabric OneLake also uses Delta natively, which keeps Delta relevant outside Databricks.

      Best for

      Organizations standardized on Databricks or Microsoft Fabric where Delta is the path of least resistance, with Delta UniForm available for occasional Iceberg interop.

      Worst for

      Multi-engine shops choosing one format, or organizations on AWS/GCP-native lakehouse stacks where Iceberg has stronger first-party support.

      Strengths

      • Native format for Databricks Lakehouse and Microsoft Fabric OneLake
      • Delta UniForm writes Iceberg metadata for cross-engine read
      • Mature ecosystem inside Databricks and Spark
      • ACID transactions, time travel, schema evolution
      • Delta Sharing as open data sharing protocol

      Weaknesses

      • Hyperscaler buy-in (AWS, GCP) is weaker than for Iceberg
      • Databricks-led project governance raises neutrality questions for non-Databricks shops
      • Delta UniForm Iceberg interop is one-way (write Delta, read Iceberg) at most engines

      Pricing tiers

      public
      • Delta Lake
        Apache 2.0; unlimited use; community support
        $0 /mo
      • Commercial managed
        Databricks, Microsoft Fabric, Onehouse all offer managed Delta
        Quote

      Key features

      • +ACID transactions on object storage
      • +Time travel and version control
      • +Schema evolution and enforcement
      • +Delta UniForm (Iceberg metadata interop)
      • +Delta Sharing (open data sharing protocol)
      • +Native Databricks and Microsoft Fabric integration
      40+ integrations
      DatabricksMicrosoft FabricApache SparkTrinoPrestoApache Flink
      Geography
      Global
      #8

      Apache Hudi + Onehouse

      Streaming-first open table format from Uber, with Onehouse as commercial managed offering.

      Founded 2017 · Distributed (originated at Uber); Onehouse: Sunnyvale, CA · private · 50-50,000+ employees
      G2 4.3 (35)
      From $0 /mo
      ◐ Partial disclosure
      Visit Apache Hudi + Onehouse

      Apache Hudi is the open table format originated at Uber in 2016-2017 and donated to the Apache Software Foundation, designed from day one for streaming-first and record-update-heavy workloads (CDC, real-time ingestion, frequent upserts). Onehouse is the commercial managed offering founded by Hudi creator Vinoth Chandar in 2021, with a multi-format strategy (Hudi, Iceberg, Delta via Apache XTable). The honest framing: Hudi has lost the broader open-table-format war to Iceberg on hyperscaler buy-in, but retains a defensible niche in streaming-first and CDC-heavy workloads where its incremental processing model is genuinely differentiating. Best fit for Uber-origin shops and streaming-heavy data engineering teams.

      Best for

      Streaming-first data engineering teams (50-50,000 employees) with heavy CDC, frequent upserts, or real-time ingestion requirements where Hudi incremental processing is differentiating.

      Worst for

      Batch-heavy analytics shops (Iceberg or Delta fit better), or teams wanting broadest hyperscaler-native support without operational engineering work.

      Strengths

      • Streaming-first and CDC-heavy workload specialization
      • Record-level updates and deletes natively supported
      • Onehouse managed offering with Hudi creator on engineering team
      • Apache XTable for cross-format (Hudi/Iceberg/Delta) interop
      • Used in production at Uber, Walmart, Robinhood, Notion

      Weaknesses

      • Lost broader format war to Iceberg on hyperscaler buy-in
      • Smaller ecosystem and contributor base than Iceberg or Delta
      • Best-fit narrowed to streaming/CDC workloads

      Pricing tiers

      partial
      • Apache Hudi
        Apache 2.0; unlimited use; community support
        $0 /mo
      • Onehouse Free
        Community tier; limited capacity
        $0 /mo
      • Onehouse Cloud
        Managed Hudi + multi-format; usage-based
        Quote
      • Onehouse Enterprise
        Dedicated support, enterprise governance
        Quote

      Key features

      • +Streaming-first incremental processing
      • +Record-level updates (Copy-on-Write and Merge-on-Read)
      • +Time travel and snapshot isolation
      • +Apache XTable for cross-format interop
      • +Native Spark, Flink, Presto, Trino support
      • +Onehouse managed cloud
      40+ integrations
      Apache SparkApache FlinkTrinoPrestoAWS GlueEMRApache XTable
      Geography
      Global
      #9

      Dremio

      Lakehouse-native query engine on Iceberg with Project Nessie Git-for-data catalog.

      Founded 2015 · Santa Clara, CA · private · 100-5,000+ employees
      G2 4.4 (95)
      Capterra 4.4
      Custom quote
      ◐ Partial disclosure
      Visit Dremio

      Dremio is the lakehouse-native query engine purpose-built for SQL on Apache Iceberg tables in S3/ADLS/GCS, with Project Nessie as the Git-for-data catalog. The fit: teams that want to separate storage from compute vendor, run their data in Iceberg in their own object store, and use Dremio as the engine without committing to Databricks or Snowflake compute. Series E $410M raised in Jan 2022 at $2B+ valuation; no significant funding rounds publicly disclosed since. Strengths: Iceberg-first engineering, Nessie data versioning, and reflections (acceleration layer) for sub-second BI. Trade-offs: smaller market presence than Databricks/Snowflake, narrower ecosystem.

      Best for

      Engineering-led teams (100-5,000 employees) committing to Iceberg lakehouse architecture who want to separate storage from compute vendor and use a query engine outside the Databricks/Snowflake duopoly.

      Worst for

      Buyers wanting fully managed integrated lakehouse + ML platform (Databricks), heavy AI/ML training shops, or teams without dedicated data engineering capacity.

      Strengths

      • Iceberg-first lakehouse query engine
      • Project Nessie for Git-for-data versioning and branching
      • Reflections (materialized view acceleration) for sub-second BI
      • Apache Arrow-based engine with strong query performance
      • Bring-your-own-cloud and bring-your-own-object-store model

      Weaknesses

      • Smaller market presence than Databricks or Snowflake
      • No significant funding round publicly disclosed since 2022
      • Narrower BI and partner ecosystem than the leaders

      Pricing tiers

      partial
      • Cloud Standard
        Managed Dremio on AWS/Azure; usage-based
        Quote
      • Cloud Enterprise
        Advanced governance, SSO, dedicated support
        Quote
      • Software (self-hosted)
        On-prem or BYOC; subscription-based
        Quote

      Key features

      • +Iceberg-native query engine
      • +Project Nessie (Git-for-data catalog)
      • +Reflections (materialized view acceleration)
      • +Apache Arrow-based execution
      • +Lakehouse semantics: ACID, time travel, branching
      • +SQL over S3/ADLS/GCS
      80+ integrations
      Apache IcebergProject NessieTableauPower BIdbtAWS S3ADLS
      Geography
      Global
      #10

      Starburst

      Managed Trino with multi-format lakehouse support and Stargate federation.

      Founded 2017 · Boston, MA · private · 100-10,000+ employees
      G2 4.4 (115)
      Capterra 4.5
      From $0 /mo
      ◐ Partial disclosure
      Visit Starburst

      Starburst is the commercial company behind Trino (the open-source distributed SQL query engine, formerly PrestoSQL), offering Starburst Galaxy (SaaS) and Starburst Enterprise (self-hosted) as managed Trino with multi-format lakehouse support (Iceberg, Delta, Hudi) and Stargate federation across data sources. Series D $250M raised in Feb 2022 at $3.35B valuation; no major funding round publicly disclosed since. Strengths: federated query across lakehouse plus operational data sources (Postgres, MySQL, Mongo, etc.), Trino community heritage, and multi-format support. Trade-offs: smaller than Databricks/Snowflake, primary value is federation rather than being a one-stop lakehouse.

      Best for

      Engineering-led teams (100-10,000 employees) with federation requirements across lakehouse plus operational data sources, who value Trino open-source heritage and multi-format support.

      Worst for

      Buyers wanting fully managed integrated lakehouse + ML platform (Databricks), or teams that only need single-format Iceberg query (Dremio or BigLake fit).

      Strengths

      • Managed Trino with multi-format support (Iceberg, Delta, Hudi)
      • Stargate federation across 50+ data sources (lakehouse plus operational)
      • Strong open-source Trino heritage and community
      • Bring-your-own-cloud and BYO-object-store model
      • Galaxy SaaS plus self-hosted Enterprise options

      Weaknesses

      • Smaller than Databricks/Snowflake on managed enterprise share
      • No major funding round disclosed since Feb 2022 ($3.35B valuation)
      • Primary value is federation; not a one-stop lakehouse platform

      Pricing tiers

      partial
      • Galaxy Free
        Limited cluster; community support
        $0 /mo
      • Galaxy Standard
        Pay-as-you-go cluster pricing; usage-based
        Quote
      • Galaxy Enterprise
        Advanced governance, SSO, dedicated support
        Quote
      • Starburst Enterprise (self-hosted)
        Subscription; on-prem or BYOC
        Quote

      Key features

      • +Managed Trino (SaaS Galaxy and self-hosted Enterprise)
      • +Multi-format support: Iceberg, Delta, Hudi
      • +Stargate federation across 50+ sources
      • +Caching and acceleration layer
      • +Role-based access control and data products
      • +Bring-your-own-cloud model
      50+ integrations
      Apache IcebergTrinoTableauPower BIdbtLookerAWS S3ADLS
      Geography
      Global
      Buying guide

      6 steps to pick the right data lakehouse

      1. 1
        Decide format first

        Pick Apache Iceberg unless you are deep on Databricks (then Delta with UniForm) or have heavy streaming/CDC requirements (then Hudi via Onehouse). The format determines which catalogs and engines you can use.

      2. 2
        Pick the catalog deliberately

        The catalog (Polaris, Unity Catalog, AWS Glue, Nessie) is the real lock-in decision in 2026. Choose based on governance needs, engine ecosystem, and the team that owns it.

      3. 3
        Audit your existing data plane

        If you are already heavily on S3/AWS, Lake Formation + S3 Tables is lower friction than Databricks or Snowflake. If on GCP, BigLake. If on Azure with M365 E5, OneLake. Hyperscaler lock-in is real but bundle economics often win.

      4. 4
        Separate engine selection from storage

        Once your tables are in Iceberg (or Delta with UniForm) in your object store, you can pick query engines independently: Snowflake for managed SQL, Databricks for ML, Dremio or Starburst for federated query, Athena for ad-hoc.

      5. 5
        Model the petabyte trajectory

        Lakehouse cost advantages materialize at scale (typically 50TB+ where object-storage savings compound). Below that, warehouse simplicity often wins on operational economics. Model your 3-year data volume before committing.

      6. 6
        Negotiate format-portability clauses

        For managed lakehouse contracts (Snowflake, Databricks, Dremio, Starburst), negotiate explicit clauses confirming your data remains in open format in your object store and can be read by other engines. This is the operational substance of lakehouse versus warehouse.

      Frequently asked questions

      The questions buyers actually ask before they sign a data lakehouse contract.

      Lakehouse vs data warehouse: is there a real architectural difference in 2026?
      Yes, though the categories are converging. A traditional cloud data warehouse stores data in a proprietary format under the vendor compute layer (Snowflake internal tables, BigQuery native storage, Redshift managed storage), which means the vendor owns both the storage and the query engine and you cannot move workloads between engines without a copy. A lakehouse stores data in an open table format (Iceberg, Delta, Hudi) on object storage (S3, GCS, ADLS) that you control, with table semantics (ACID, schema, time travel) layered on top of Parquet, so multiple engines can read the same tables. In 2026 Snowflake, Databricks, BigQuery, and others sell both modes, which is why the architectural distinction matters less than the operational one: are your tables in a format you can move?
      Apache Iceberg vs Delta Lake vs Apache Hudi: which open table format should we pick?
      Apache Iceberg is winning the open-table-format war in 2025-2026 on the strength of hyperscaler buy-in: AWS S3 Tables, Google BigLake, Microsoft Fabric (via shortcuts), and Snowflake all support Iceberg as a first-class format. Delta Lake remains strong inside Databricks and Microsoft Fabric OneLake, with Delta UniForm providing Iceberg metadata interop for cross-engine read. Apache Hudi retains a defensible niche in streaming-first and CDC-heavy workloads originated at Uber. The 2026 default recommendation for new lakehouse deployments is Iceberg unless you are deep on Databricks (use Delta with UniForm) or have heavy streaming/CDC requirements (consider Hudi via Onehouse).
      Does the Databricks-Tabular acquisition (Jun 2024) hurt Iceberg neutrality?
      It creates obvious strategic tension because Databricks is the lead maintainer of Delta Lake, the rival format to Iceberg, and the Jun 2024 acquisition of Tabular (the Iceberg-creator-led startup founded by Ryan Blue and Daniel Weeks) for a reported $1B+ brought the Iceberg founders into the Delta-stewarding company. The public position from Databricks is that Iceberg and Delta will coexist via Delta UniForm interop and continued Iceberg contributions; the real position deserves watching through 2026 via contribution patterns to the Apache Iceberg project. Pragmatically, Iceberg has enough hyperscaler and Apache Foundation governance momentum that no single vendor can capture it, but the buyer takeaway is to treat Iceberg neutrality as something to verify in 2026-2027 rather than assume.
      Is Snowflake genuinely going open with the Polaris Catalog OSS pivot?
      It is a real strategic shift but the buyer benefit depends on catalog choice. In Jun 2024 Snowflake open-sourced Polaris Catalog as an Apache project implementing the Iceberg REST catalog specification, and Snowflake native Iceberg tables reached read/write parity with internal tables in 2025. The honest read: Snowflake is hedging against a future where customers want format portability, and Polaris gives Snowflake a credible neutral catalog story. The benefit to buyers is real if you use Polaris as your catalog and Snowflake as one of several engines (Trino, Dremio, Spark) reading the same Iceberg tables. The benefit is limited if you stay on Snowflake internal tables, which remain the default for new deployments.
      When should we choose a lakehouse over a warehouse-only architecture?
      Choose a lakehouse when: (1) you have substantial ML, AI, or unstructured-data workloads that need to share the same data as your BI/SQL workloads; (2) you want to avoid storage lock-in to a single warehouse vendor and value the ability to query the same tables from multiple engines; (3) your data volumes are large enough (typically 50TB+) that the object-storage cost advantage of lakehouse storage matters relative to managed warehouse storage. Choose warehouse-only when: (1) your workload is SQL-first BI with minimal ML; (2) data volumes are modest enough that operational simplicity beats format flexibility; (3) you want a single integrated vendor for storage, compute, and governance without component assembly.
      What is the cost reality of a lakehouse at petabyte scale?
      Object-storage costs (S3, GCS, ADLS) at petabyte scale are typically $20-23/TB/month, materially below managed warehouse storage of $40-50/TB/month, which is the structural reason large enterprises move to lakehouse architectures. The compute economics are similar to warehouse compute (Databricks DBUs, Snowflake credits, BigQuery slots, AWS Athena per-TB-scanned) and depend heavily on query patterns. The hidden cost at petabyte scale is metadata operations: Iceberg snapshot expiry, compaction, and small-file management require operational discipline, which is why managed Iceberg services (AWS S3 Tables, Snowflake Polaris, Tabular/Databricks, Onehouse) charge a premium over raw object storage. Realistic total-cost-of-ownership at petabyte scale: 30-50% savings versus pure managed warehouse, partially offset by engineering time on metadata operations.
      Which Iceberg catalog should we choose: Polaris, Unity Catalog, AWS Glue, or Nessie?
      The catalog is increasingly the lock-in decision that matters more than engine choice. Snowflake Polaris (Apache project, vendor-neutral by governance) is the right pick if you want an open standard catalog with multi-engine read/write and have no strong Databricks or AWS commitment. Databricks Unity Catalog is the right pick if Databricks is your primary engine and you value governance integration with ML workflows; Iceberg support is added but Delta is the native default. AWS Glue Data Catalog is the right pick if AWS is your data plane and you use Athena, EMR, or Redshift as engines. Project Nessie (Dremio-led, open source) is the right pick if you want Git-for-data semantics (branching, merging, tags) on top of Iceberg. The 2026 advice: pick the catalog deliberately because switching catalogs later is non-trivial.
      How does query engine + storage separation work in practice?
      In a lakehouse, your data lives in object storage (S3/GCS/ADLS) as Parquet files organized by an open table format (Iceberg/Delta/Hudi). The catalog (Polaris, Unity, Glue, Nessie) stores the table metadata: what columns exist, where snapshots are, what files belong to which partition. Multiple query engines (Trino via Starburst, Dremio, Spark on Databricks, Snowflake, BigQuery via BigLake, Athena, ClickHouse) can read the same tables by talking to the catalog and reading the underlying Parquet. This separation lets you run analytical queries on one engine and ML training on another against the same data, and switch engines without copying data. The trade-off in practice is that managed-warehouse engines (Snowflake internal tables, BigQuery native) often outperform lakehouse engines on the same data because they control the storage layout; the lakehouse perf gap has narrowed in 2025-2026 but has not fully closed.

      Glossary

      Lakehouse
      Unified data architecture combining data lake storage (S3/GCS/ADLS) with warehouse-grade SQL, ACID transactions, and governance, powered by open table formats.
      Open table format
      A specification (Apache Iceberg, Delta Lake, Apache Hudi) that layers table semantics (schema, ACID, time travel) on top of Parquet files in object storage.
      Catalog
      The metadata service that tracks tables, schemas, and snapshots in a lakehouse. Examples: Snowflake Polaris, Databricks Unity Catalog, AWS Glue, Project Nessie.
      Delta UniForm
      Databricks 2024 feature that writes Iceberg metadata alongside Delta tables so external engines can read Delta tables as if they were Iceberg.
      Polaris Catalog
      Snowflake-originated open-source Iceberg REST catalog, released as an Apache project in Jun 2024.
      Iceberg REST catalog spec
      Standard API specification for Iceberg catalogs, allowing different catalog implementations (Polaris, Nessie, Glue) to be interchangeable for compliant clients.

      Final word

      See the full intelligence profile for any product on this page, including verified pricing, vendor trust scores, and review patterns. Browse the Data Lakehouse category page →

      Last updated 2026-05-27. Pricing data is reverified quarterly. Found something inaccurate? Tell us.