United Kingdom verdict (TL;DR)
Verified 2026-05-27UK lakehouse adoption mirrors the US duopoly with regional sovereignty caveats. Databricks and Snowflake dominate UK fintech and enterprise; Microsoft Fabric OneLake is strong at UK Microsoft-anchored enterprises and growing in NHS-adjacent analytics via the NHS-Microsoft Azure relationship. AWS Lake Formation and BigLake adopted by UK AWS- and GCP-anchored teams respectively. Apache Iceberg adoption is high among UK fintech where format portability is valued. UK GDPR (post-Brexit, ICO-governed) drives UK-region residency on AWS eu-west-2, Azure UK South, and GCP europe-west2 for regulated workloads. UK lakehouse buyers have no native UK-built champions; this is honestly a US-vendor-dominated category in the UK.
Picks for United Kingdom
- UK fintech lakehouse (Monzo, Revolut, Wise-tier): databricks-lakehouse Default at UK engineering-heavy fintech. AWS eu-west-2 London or Azure UK South. Unity Catalog for UK GDPR data lineage.
- UK enterprise SQL lakehouse: snowflake-lakehouse Native Iceberg with Polaris Catalog. AWS eu-west-2 London and Azure UK South. Strong UK fintech and B2B SaaS adoption.
- UK Microsoft enterprise and NHS-adjacent: microsoft-fabric-onelake OneLake on Azure UK South. NHS-Microsoft strategic relationship makes Fabric the path-of-least-resistance for NHS data analytics.
- UK AWS-anchored lakehouse: aws-lake-formation S3 Tables + Glue Catalog on AWS eu-west-2 London. ICO-aligned UK GDPR DPA available.
- UK GCP-anchored lakehouse: biglake GCP europe-west2 London. BigQuery + Iceberg for UK media, retail, and adtech analytics.
How the data lakehouse market looks in United Kingdom
The UK lakehouse market in 2026 is led by Databricks and Snowflake across the fintech and B2B SaaS verticals, with Microsoft Fabric OneLake growing at UK Microsoft-anchored enterprises and via the NHS-Microsoft Azure strategic relationship.
UK fintech is structurally the most data-mature vertical in the UK economy and has been an early lakehouse adopter. Monzo, Revolut, Wise, Checkout.com, and the broader London fintech cohort have moved to lakehouse architectures alongside their existing Snowflake warehouse deployments, with Apache Iceberg increasingly the default for new tables where format portability is valued. Databricks adoption is strong at the data-engineering-heavy UK fintechs where ML workloads run alongside BI.
NHS Digital and NHS England are the dominant UK public-sector data buyers and the NHS-Microsoft Azure strategic relationship means Microsoft Fabric (and underlying OneLake) is the path-of-least-resistance for NHS clinical and operational data analytics. NHS DSPT compliance is pre-mapped to Azure UK Gov regions.
Post-Brexit UK GDPR is operationally similar to EU GDPR for lakehouse buyers: UK region residency (AWS eu-west-2 London, Azure UK South/UK West, GCP europe-west2 London), ICO-approved DPAs, and deletion-on-request capability across lakehouse tables. There are no UK-headquartered lakehouse vendors of meaningful scale; this is honestly a US-vendor-dominated category in the UK.
UK GDPR (post-Brexit, ICO-governed): requires data residency options, DPAs, deletion-on-request, and data subject access rights; all major lakehouse vendors offer UK-region deployment and UK GDPR-compliant DPAs. NHS DSPT: required for NHS data processors; Microsoft Fabric/Azure (NHS strategic relationship), and select Databricks and Snowflake configurations can satisfy DSPT. UK Cyber Essentials Plus: increasingly required for UK government and NHS contracts. FCA SS2/21: UK financial services regulator guidance on cloud outsourcing applies to lakehouse deployments at regulated firms; requires risk assessment, exit plans, and operational resilience documentation.
Quick comparison, ranked for United Kingdom
| Product | Best for | Starts at | 10-emp/mo* | Pricing | G2 | Geo |
|---|---|---|---|---|---|---|
| 1 Databricks Lakehouse Platform | Mid-market through global enterprise | $0 | $0 | 4.5 | Global | |
| 2 Snowflake + Polaris Catalog | Mid-market through global enterprise | $0 | $0 | 4.5 | Global | |
| 5 Microsoft Fabric OneLake | Microsoft-anchored mid-enterprise through global enterprise | $263 | $263 | 4.4 | Global | |
| 3 AWS Lake Formation + Iceberg | AWS-anchored teams of any size | $0 | $0 | 4.2 | Global | |
| 4 Google BigLake | GCP-anchored teams of any size | $0 | $0 | 4.4 | Global | |
| 6 Apache Iceberg | Engineering-led teams of any size | $0 | $0 | 4.6 | Global | |
| 7 Delta Lake | Engineering-led teams, Databricks-anchored | $0 | $0 | 4.5 | Global | |
| 9 Dremio | Engineering-led lakehouse teams | Quote | - | 4.4 | Global | |
| 10 Starburst | Engineering-led federated query teams | $0 | $0 | 4.4 | Global | |
| 8 Apache Hudi + Onehouse | Streaming-first engineering teams | $0 | $0 | 4.3 | Global |
*10-employee monthly cost = base fee + (per-employee × 10) using the lowest published tier. For opaque-pricing vendors, no value is shown.
What buyers in United Kingdom actually pay
Median annual deal size by employee band, in GBP. Crowdsourced from anonymized buyer disclosures.
| Product | Employee band | Median annual (GBP) | Sample | Notes |
|---|---|---|---|---|
| Databricks Lakehouse Platform | 200-2,000 employees (UK fintech/SaaS) | £112,000 | 48 | Premium tier; AWS eu-west-2 London or Azure UK South; GBP via reseller |
| Snowflake + Polaris Catalog | 200-2,000 employees | £84,000 | 41 | Enterprise with Iceberg; UK region; GBP billed via reseller |
| Microsoft Fabric OneLake | 500-5,000 employees (M365 E5) | £72,000 | 38 | F64 Fabric capacity bundled with M365 E5; Azure UK South |
| Google BigLake | 100-1,000 employees | £46,000 | 29 | BigQuery editions plus Cloud Storage; GCP europe-west2 London |
United Kingdom-built or United Kingdom-strong vendors worth knowing
Not yet ranked in our global top 10, but credible options for United Kingdom buyers and worth a shortlist.
No UK-headquartered lakehouse vendor of meaningful scale
The UK lakehouse market is dominated by US-headquartered vendors (Databricks, Snowflake, AWS, Google, Microsoft). UK fintech and tech buyers consume these via UK-region deployments rather than UK-built alternatives. This is the honest assessment for the category.
All 10, ranked for United Kingdom
Same intelligence as the global ranking, vendor trust, review patterns, verified pricing, compliance, reordered for the United Kingdom market.
Databricks Lakehouse Platform
Delta Lake-native lakehouse with Unity Catalog and Mosaic AI; Iceberg-aware after Tabular acquisition.
Databricks is the enterprise lakehouse leader, unifying data engineering, analytics, and ML/AI training on Delta Lake + Unity Catalog. The Jun 2024 acquisition of Tabular (the Iceberg-creator-led startup) for a reported $1B+ creates obvious tension because Databricks is the lead maintainer of Delta Lake, the rival format to Iceberg; the public position is that Databricks will support both via Delta UniForm and through ongoing Iceberg contribution. Last private valuation was $43B in Sept 2023 (reported $62B in subsequent rounds), with a 2026 IPO widely expected but not confirmed. Trade-offs: DBU pricing complexity, and SQL-only buyers often find Snowflake simpler.
Mid-market and enterprise data teams (200-50,000 employees) running serious ML training plus analytics, where lakehouse governance and AI workflow integration matter more than pure SQL simplicity.
SQL-only BI shops (Snowflake or BigQuery simpler), Iceberg-purist buyers wary of Databricks owning Delta Lake, or small teams without dedicated data engineering.
Strengths
- Delta Lake as the open default plus Delta UniForm Iceberg interop
- Unity Catalog unifies governance across analytics, ML, and lakehouse tables
- Best-in-class for AI/ML training and feature engineering via Mosaic AI
- Tabular acquisition brought Iceberg-creator engineering talent in-house
- Photon vectorized engine narrows SQL gap to dedicated warehouses
Weaknesses
- DBU pricing complexity, plus separate cloud infra costs charged by hyperscaler
- Delta vs Iceberg neutrality is contested given Databricks owns Delta Lake project
- Unity Catalog migration painful for legacy Hive metastore customers
Pricing tiers
partial- Standard (Jobs)From $0.15/DBU; basic Spark workloads$0 /mo
- PremiumFrom $0.40/DBU; SQL warehouses, Unity Catalog, audit logs$0 /mo
- EnterpriseFrom $0.65/DBU; HIPAA, PCI, customer-managed keys$0 /mo
- Mosaic AI Model TrainingFoundation model training and serving; custom quoteQuote
- · Cloud infra (EC2/Azure VMs/GCE) billed by hyperscaler, not Databricks
- · Photon premium DBU multiplier on SQL warehouses
- · Mosaic AI inference and training billed separately
Key features
- +Delta Lake (open table format)
- +Delta UniForm (Iceberg metadata interop)
- +Unity Catalog governance
- +Photon vectorized SQL engine
- +Mosaic AI (training, fine-tuning, serving)
- +Lakehouse Federation across S3/ADLS/GCS
- +Delta Sharing (open data sharing protocol)
Snowflake + Polaris Catalog
Cloud-neutral managed lakehouse with native Iceberg and open-sourced Polaris Catalog.
Snowflake (NYSE:SNOW) made a genuine strategic shift toward open lakehouse architecture in 2024: native Iceberg tables reached read/write parity with internal tables, and the Polaris Catalog was open-sourced in Jun 2024 as an Apache Iceberg REST catalog implementation. The honest reading is that this is a real bet on Iceberg interop, partly defensive against Databricks-on-Delta and partly offensive into the open-format buyer segment. The trade-off: whether enterprise customers actually benefit depends on which catalog they pick, and Snowflake credit-based pricing remains easy to overspend without governance. Best fit for SQL-first enterprises wanting open format with managed SaaS.
Cloud-neutral enterprises (500+ employees) wanting lakehouse semantics in Iceberg without operating a separate engine, with a strong preference for managed SaaS and SQL workloads.
Heavy AI/ML training shops (Databricks better), single-cloud teams that could just use BigLake or Lake Formation, or buyers who reject credit-based pricing.
Strengths
- Native Iceberg tables GA with read/write parity to internal tables
- Polaris Catalog open-sourced Jun 2024 as Apache Iceberg REST catalog
- Cloud-neutral: native on AWS, Azure, GCP
- Snowpark for Python/Java/Scala in-lakehouse processing
- Strong governance, masking, and row-level security
Weaknesses
- Credit-based pricing easy to overspend without strict governance
- External Iceberg catalogs require careful planning; performance trade-offs vs internal tables
- May 2024 customer credential incident still discussed in deals
Pricing tiers
partial- StandardOn-demand $2/credit; storage $23/TB/month compressed$0 /mo
- EnterpriseOn-demand $3/credit; multi-cluster warehouses, masking$0 /mo
- Business CriticalOn-demand $4/credit; HIPAA, PCI, customer-managed keys$0 /mo
- Virtual Private Snowflake (VPS)Dedicated metadata service for regulated industriesQuote
- · Compute credit overruns from un-suspended warehouses
- · External Iceberg query has different perf characteristics than internal tables
- · Cross-region data egress
Key features
- +Native Iceberg tables (managed and external)
- +Polaris Catalog (open-source Apache Iceberg REST catalog)
- +Snowpark for Python/Java/Scala
- +Time Travel and Zero-Copy Cloning
- +Snowpipe streaming ingestion
- +Secure Data Sharing and Marketplace
Microsoft Fabric OneLake
Microsoft unified lakehouse store: Delta-native, with Iceberg via shortcuts and Power BI bundle economics.
OneLake is the unified data lake layer underneath Microsoft Fabric, announced in May 2023 as part of Microsoft Fabric and using Delta Lake as the native open format. The 2024-2025 additions of OneLake shortcuts to Iceberg tables (in S3, ADLS, and elsewhere) and the broader Fabric Iceberg interop make OneLake the closest thing to a multi-format lakehouse store from Microsoft. The honest framing: OneLake wins deals through Power BI Premium bundle pricing and Microsoft 365 procurement leverage, not because the underlying engine is best-in-class. Capacity Unit (CU) pricing complexity remains the main cost-forecasting issue.
Microsoft 365 + Power BI Premium-anchored enterprises (500-100,000+ employees) where Fabric capacity comes effectively-free with existing M365 E5 commitments.
Non-Microsoft-anchored teams, organizations rejecting Capacity Unit pricing, or buyers wanting best-in-class engine performance over bundle economics.
Strengths
- OneLake as Delta Lake-native unified analytics store
- OneLake shortcuts allow read of Iceberg tables in S3, ADLS, elsewhere
- Power BI Premium bundle, often effectively-free with E5 commitments
- Copilot integrated across the Fabric suite
- One SKU covers lakehouse + warehouse + BI + ETL + real-time
Weaknesses
- Wins on bundle economics, not core engine quality
- Capacity Unit (CU) pricing complexity
- Iceberg support via shortcuts is read-mostly vs full lakehouse semantics
Pricing tiers
partial- F2 (smallest)2 CU; pay-as-you-go$263 /mo
- F6464 CU; common mid-size enterprise capacity$8400 /mo
- F20482,048 CU; very large enterprise capacity$269000 /mo
- Bundled with Power BI PremiumF64 effectively included with P1 commitments at many enterprisesQuote
- · OneLake storage billed separately at ADLS rates
- · Cross-region data egress
- · Mirroring usage can spike CU consumption
Key features
- +OneLake (Delta Lake-native unified store)
- +OneLake shortcuts (Iceberg read in S3/ADLS)
- +Fabric Lakehouse (Spark + SQL endpoint)
- +Fabric Warehouse (T-SQL warehouse)
- +Power BI native integration
- +Copilot in Fabric
- +Mirroring (Snowflake, Cosmos, Azure SQL)
AWS Lake Formation + Iceberg
AWS-native lakehouse: Glue Catalog, Lake Formation governance, and S3 Tables for Iceberg.
AWS Lake Formation is the AWS-native lakehouse governance layer over S3, with AWS Glue Data Catalog as the metadata store and Lake Formation managing fine-grained access controls. The 2024 Re:Invent S3 Tables announcement made Iceberg a first-class S3 bucket type, removing the need for a separate Iceberg metastore for many AWS-native pipelines. The lakehouse engines on top are Athena, EMR, Redshift Spectrum, and Glue ETL. Strengths: deep AWS integration, IAM-native access, and Iceberg-native S3. Trade-offs: best-fit narrows sharply when not AWS-anchored, governance UX is more workmanlike than Unity Catalog, and pricing fragments across Glue, Lake Formation, S3 Tables, and the chosen query engine.
AWS-anchored organizations (any size) where S3 is already the data plane and the team wants to add Iceberg + governance without leaving AWS.
Multi-cloud or non-AWS teams, organizations wanting a single integrated lakehouse vendor (Databricks or Snowflake), or buyers wanting opinionated governance UX.
Strengths
- Iceberg-native S3 Tables (2024 GA) removes need for separate metastore
- AWS Glue Data Catalog as the metadata layer with broad AWS integration
- Lake Formation fine-grained access on rows, columns, and tags
- IAM-native authentication and tag-based access control
- Query engine flexibility: Athena, EMR, Redshift Spectrum, Glue ETL
Weaknesses
- Best-fit narrows sharply when not AWS-anchored
- Governance UX more workmanlike than Unity Catalog or Polaris
- Pricing fragments across Glue, Lake Formation, S3 Tables, query engine
Pricing tiers
public- Glue Data Catalog$1/100k objects/month; first 1M free$0 /mo
- Lake FormationNo additional charge; underlying services billed separately$0 /mo
- S3 TablesStorage at S3 standard rates; per-request fees$0 /mo
- Athena (query)$5/TB scanned; or capacity reservation$0 /mo
- · S3 Tables compaction and maintenance request fees
- · Glue ETL DPU consumption
- · Cross-region data egress
- · Athena/EMR/Redshift Spectrum billed separately as compute
Key features
- +AWS S3 Tables (Iceberg-native S3 buckets)
- +AWS Glue Data Catalog
- +Lake Formation fine-grained access controls
- +Tag-based access control
- +Cross-account data sharing
- +Native Apache Iceberg support
- +Integration with Athena, EMR, Redshift Spectrum
Google BigLake
BigQuery engine over open table formats: Iceberg, Hudi, and Delta on Cloud Storage.
BigLake is Google Cloud lakehouse layer that lets BigQuery (and other GCP engines including Dataproc Spark and Dataflow) query Apache Iceberg, Apache Hudi, and Delta Lake tables on Cloud Storage with the same governance model as native BigQuery tables. The fit: GCP-anchored teams who already use BigQuery as the analytics engine and want to add lakehouse semantics over open formats without operating a separate platform. Strengths: tightest integration with BigQuery, Looker, and Vertex AI; native Iceberg, Hudi, and Delta support; and serverless query economics. Trade-offs: best-fit narrows sharply when not GCP-anchored, and cross-cloud egress economics favor staying inside GCP.
GCP-anchored organizations (any size) wanting lakehouse semantics on Iceberg/Hudi/Delta with BigQuery as the primary engine, plus tight Looker and Vertex AI integration.
Multi-cloud or AWS/Azure-anchored organizations, teams that need a single integrated lakehouse vendor across clouds, or buyers without existing BigQuery investment.
Strengths
- Native Iceberg, Hudi, and Delta Lake support on Cloud Storage
- Same governance model as BigQuery (Policy Tags, BigQuery IAM)
- BigQuery serverless query economics extend to open tables
- BigQuery Omni for cross-cloud query against AWS S3 and Azure
- Tight integration with Vertex AI and Looker
Weaknesses
- Best-fit narrows sharply when not GCP-anchored
- Cross-cloud egress economics favor staying inside GCP
- External table query has different perf characteristics than native BigQuery
Pricing tiers
public- On-demand$6.25/TB scanned on BigQuery; Cloud Storage at standard rates$0 /mo
- BigQuery Editions Standard$0.04/slot-hour; capacity reservations$0 /mo
- BigQuery Editions Enterprise$0.06/slot-hour; CMEK, VPC-SC$0 /mo
- BigQuery Editions Enterprise Plus$0.10/slot-hour; cross-region replication$0 /mo
- · Cloud Storage class tiering
- · BI Engine memory reservation
- · Cross-region or cross-cloud egress
Key features
- +Native Apache Iceberg, Hudi, and Delta Lake support
- +BigQuery engine over Cloud Storage tables
- +BigLake Metastore (Iceberg-compatible)
- +BigQuery Omni cross-cloud query
- +Policy Tags for column-level access
- +Vertex AI integration
Apache Iceberg
The winning open table format of 2025-2026 by hyperscaler buy-in.
Apache Iceberg is the open table format originated at Netflix in 2017, donated to the Apache Software Foundation, and now the de facto winner of the open-table-format war in 2025-2026 on the strength of hyperscaler buy-in. AWS (S3 Tables, Athena, EMR, Redshift), Google (BigLake, BigQuery), Microsoft (Fabric via shortcuts), and Snowflake all support Iceberg as a first-class format. Databricks acquired Tabular (the company founded by Iceberg creators Ryan Blue and Daniel Weeks) in Jun 2024 for a reported $1B+, which brought core Iceberg engineering talent into the Delta Lake-stewarding company; the public position is dual-format support. The honest read: pick Iceberg unless you are deep on Databricks.
Engineering-led organizations of any size committing to open-format lakehouse architecture, particularly multi-engine or multi-cloud teams who want to avoid table-format lock-in.
Teams deep on Databricks where Delta Lake is the path of least resistance, or shops that prefer fully managed lakehouse SKUs over assembling components.
Strengths
- De facto open-table-format winner by hyperscaler buy-in
- ACID transactions, time travel, schema evolution, hidden partitioning
- Iceberg REST catalog spec standardized (Polaris, Nessie, Glue support it)
- Vendor-neutral by design and Apache-governed
- Strong contributor diversity across AWS, Apple, Netflix, Stripe, Tabular
Weaknesses
- Catalog choice (Polaris, Unity, Glue, Nessie) is the real lock-in decision
- Maintenance operations (compaction, snapshot expiry) require operational discipline
- Tabular acquisition by Databricks creates uncertainty about long-term neutrality
Pricing tiers
public- Apache IcebergApache 2.0; unlimited use; community support$0 /mo
- Commercial managed offeringsSnowflake Polaris, AWS S3 Tables, Tabular (Databricks), Dremio, Cloudera, OnehouseQuote
Key features
- +ACID transactions on object storage
- +Time travel and snapshot isolation
- +Schema evolution (add, drop, rename columns)
- +Hidden partitioning and partition evolution
- +Iceberg REST catalog spec
- +Multi-engine read/write (Spark, Trino, Flink, Presto, Snowflake, BigQuery)
Delta Lake
Databricks-led open table format with Iceberg interop via Delta UniForm.
Delta Lake is the open table format created at Databricks, open-sourced under the Linux Foundation in 2019, and the native format for the Databricks Lakehouse Platform. It remains strong inside Databricks (Unity Catalog assumes Delta as the default) and has hedged for the Iceberg-dominant 2025-2026 landscape via Delta UniForm (2024), which writes Iceberg metadata in parallel so external engines can read Delta tables as if they were Iceberg. The honest framing: if Databricks is your primary engine, Delta is the right format; if you want format neutrality across hyperscalers, Iceberg is winning. Microsoft Fabric OneLake also uses Delta natively, which keeps Delta relevant outside Databricks.
Organizations standardized on Databricks or Microsoft Fabric where Delta is the path of least resistance, with Delta UniForm available for occasional Iceberg interop.
Multi-engine shops choosing one format, or organizations on AWS/GCP-native lakehouse stacks where Iceberg has stronger first-party support.
Strengths
- Native format for Databricks Lakehouse and Microsoft Fabric OneLake
- Delta UniForm writes Iceberg metadata for cross-engine read
- Mature ecosystem inside Databricks and Spark
- ACID transactions, time travel, schema evolution
- Delta Sharing as open data sharing protocol
Weaknesses
- Hyperscaler buy-in (AWS, GCP) is weaker than for Iceberg
- Databricks-led project governance raises neutrality questions for non-Databricks shops
- Delta UniForm Iceberg interop is one-way (write Delta, read Iceberg) at most engines
Pricing tiers
public- Delta LakeApache 2.0; unlimited use; community support$0 /mo
- Commercial managedDatabricks, Microsoft Fabric, Onehouse all offer managed DeltaQuote
Key features
- +ACID transactions on object storage
- +Time travel and version control
- +Schema evolution and enforcement
- +Delta UniForm (Iceberg metadata interop)
- +Delta Sharing (open data sharing protocol)
- +Native Databricks and Microsoft Fabric integration
Dremio
Lakehouse-native query engine on Iceberg with Project Nessie Git-for-data catalog.
Dremio is the lakehouse-native query engine purpose-built for SQL on Apache Iceberg tables in S3/ADLS/GCS, with Project Nessie as the Git-for-data catalog. The fit: teams that want to separate storage from compute vendor, run their data in Iceberg in their own object store, and use Dremio as the engine without committing to Databricks or Snowflake compute. Series E $410M raised in Jan 2022 at $2B+ valuation; no significant funding rounds publicly disclosed since. Strengths: Iceberg-first engineering, Nessie data versioning, and reflections (acceleration layer) for sub-second BI. Trade-offs: smaller market presence than Databricks/Snowflake, narrower ecosystem.
Engineering-led teams (100-5,000 employees) committing to Iceberg lakehouse architecture who want to separate storage from compute vendor and use a query engine outside the Databricks/Snowflake duopoly.
Buyers wanting fully managed integrated lakehouse + ML platform (Databricks), heavy AI/ML training shops, or teams without dedicated data engineering capacity.
Strengths
- Iceberg-first lakehouse query engine
- Project Nessie for Git-for-data versioning and branching
- Reflections (materialized view acceleration) for sub-second BI
- Apache Arrow-based engine with strong query performance
- Bring-your-own-cloud and bring-your-own-object-store model
Weaknesses
- Smaller market presence than Databricks or Snowflake
- No significant funding round publicly disclosed since 2022
- Narrower BI and partner ecosystem than the leaders
Pricing tiers
partial- Cloud StandardManaged Dremio on AWS/Azure; usage-basedQuote
- Cloud EnterpriseAdvanced governance, SSO, dedicated supportQuote
- Software (self-hosted)On-prem or BYOC; subscription-basedQuote
Key features
- +Iceberg-native query engine
- +Project Nessie (Git-for-data catalog)
- +Reflections (materialized view acceleration)
- +Apache Arrow-based execution
- +Lakehouse semantics: ACID, time travel, branching
- +SQL over S3/ADLS/GCS
Starburst
Managed Trino with multi-format lakehouse support and Stargate federation.
Starburst is the commercial company behind Trino (the open-source distributed SQL query engine, formerly PrestoSQL), offering Starburst Galaxy (SaaS) and Starburst Enterprise (self-hosted) as managed Trino with multi-format lakehouse support (Iceberg, Delta, Hudi) and Stargate federation across data sources. Series D $250M raised in Feb 2022 at $3.35B valuation; no major funding round publicly disclosed since. Strengths: federated query across lakehouse plus operational data sources (Postgres, MySQL, Mongo, etc.), Trino community heritage, and multi-format support. Trade-offs: smaller than Databricks/Snowflake, primary value is federation rather than being a one-stop lakehouse.
Engineering-led teams (100-10,000 employees) with federation requirements across lakehouse plus operational data sources, who value Trino open-source heritage and multi-format support.
Buyers wanting fully managed integrated lakehouse + ML platform (Databricks), or teams that only need single-format Iceberg query (Dremio or BigLake fit).
Strengths
- Managed Trino with multi-format support (Iceberg, Delta, Hudi)
- Stargate federation across 50+ data sources (lakehouse plus operational)
- Strong open-source Trino heritage and community
- Bring-your-own-cloud and BYO-object-store model
- Galaxy SaaS plus self-hosted Enterprise options
Weaknesses
- Smaller than Databricks/Snowflake on managed enterprise share
- No major funding round disclosed since Feb 2022 ($3.35B valuation)
- Primary value is federation; not a one-stop lakehouse platform
Pricing tiers
partial- Galaxy FreeLimited cluster; community support$0 /mo
- Galaxy StandardPay-as-you-go cluster pricing; usage-basedQuote
- Galaxy EnterpriseAdvanced governance, SSO, dedicated supportQuote
- Starburst Enterprise (self-hosted)Subscription; on-prem or BYOCQuote
Key features
- +Managed Trino (SaaS Galaxy and self-hosted Enterprise)
- +Multi-format support: Iceberg, Delta, Hudi
- +Stargate federation across 50+ sources
- +Caching and acceleration layer
- +Role-based access control and data products
- +Bring-your-own-cloud model
Apache Hudi + Onehouse
Streaming-first open table format from Uber, with Onehouse as commercial managed offering.
Apache Hudi is the open table format originated at Uber in 2016-2017 and donated to the Apache Software Foundation, designed from day one for streaming-first and record-update-heavy workloads (CDC, real-time ingestion, frequent upserts). Onehouse is the commercial managed offering founded by Hudi creator Vinoth Chandar in 2021, with a multi-format strategy (Hudi, Iceberg, Delta via Apache XTable). The honest framing: Hudi has lost the broader open-table-format war to Iceberg on hyperscaler buy-in, but retains a defensible niche in streaming-first and CDC-heavy workloads where its incremental processing model is genuinely differentiating. Best fit for Uber-origin shops and streaming-heavy data engineering teams.
Streaming-first data engineering teams (50-50,000 employees) with heavy CDC, frequent upserts, or real-time ingestion requirements where Hudi incremental processing is differentiating.
Batch-heavy analytics shops (Iceberg or Delta fit better), or teams wanting broadest hyperscaler-native support without operational engineering work.
Strengths
- Streaming-first and CDC-heavy workload specialization
- Record-level updates and deletes natively supported
- Onehouse managed offering with Hudi creator on engineering team
- Apache XTable for cross-format (Hudi/Iceberg/Delta) interop
- Used in production at Uber, Walmart, Robinhood, Notion
Weaknesses
- Lost broader format war to Iceberg on hyperscaler buy-in
- Smaller ecosystem and contributor base than Iceberg or Delta
- Best-fit narrowed to streaming/CDC workloads
Pricing tiers
partial- Apache HudiApache 2.0; unlimited use; community support$0 /mo
- Onehouse FreeCommunity tier; limited capacity$0 /mo
- Onehouse CloudManaged Hudi + multi-format; usage-basedQuote
- Onehouse EnterpriseDedicated support, enterprise governanceQuote
Key features
- +Streaming-first incremental processing
- +Record-level updates (Copy-on-Write and Merge-on-Read)
- +Time travel and snapshot isolation
- +Apache XTable for cross-format interop
- +Native Spark, Flink, Presto, Trino support
- +Onehouse managed cloud
Frequently asked questions
The questions buyers actually ask before they sign.
Does our UK lakehouse need to store data in a UK region for UK GDPR compliance?
Which lakehouse is best for NHS-adjacent UK clinical data analytics?
Lakehouse vs data warehouse: is there a real architectural difference in 2026?
Apache Iceberg vs Delta Lake vs Apache Hudi: which open table format should we pick?
Does the Databricks-Tabular acquisition (Jun 2024) hurt Iceberg neutrality?
Is Snowflake genuinely going open with the Polaris Catalog OSS pivot?
When should we choose a lakehouse over a warehouse-only architecture?
What is the cost reality of a lakehouse at petabyte scale?
Which Iceberg catalog should we choose: Polaris, Unity Catalog, AWS Glue, or Nessie?
How does query engine + storage separation work in practice?
Final word
Looking at a different market? See the global Data Lakehouse ranking, or pick another country at the top of this page.
Last updated 2026-05-27. Local pricing reverified quarterly. Found something inaccurate? Tell us.