March 22, 2024
Power BI Face Off: Databricks vs. Microsoft FabricAtScale has a strong history of collaboration and innovation with Microsoft, and was the first semantic layer platform to introduce native support for Microsoft’s Data Analysis Expressions (DAX) query language. This support lets Power BI users connect to AtScale in live connection mode, establishing high performance, live query access to datasets managed on cloud data platforms, such as Databricks. AtScale serves as the critical bridge, connecting best-in-class analytics platforms such as Power BI and Excel to the world’s most powerful cloud data platforms.
The Modern Data Stack
The data landscape has shifted, with centralized Lakehouse platforms becoming increasingly popular due to their ability to support high speed and cost effective analytical workloads in the cloud. However, data teams face challenges in how to efficiently stitch together their ecosystem of services and utilities and make data assets accessible to decision-makers. The term the “Modern Data Stack,” describes architectures that address these challenges, ensuring seamless integration of business users with their organization’s data in the cloud.
Andreessen Horowitz’s “Emerging Architectures for Modern Data Infrastructure” provides a clear representation of modern data stacks, illustrating the flow of data from various sources through ingestion and transport services into core platforms. The Lakehouse platform provides the storage, query and processing capabilities necessary to build a world class analytics infrastructure, while Business Intelligence, Artificial Intelligence and Machine Learning tools are the typical chose for creating analytical products and outputs.
Power BI’s Role in the Modern Data Stack
While Databricks is the centerpiece of most organizations’s Modern Data Stack, PowerBI has become a frontrunner as the tool of choice for data analysis. Its ease of use and popularity amongst business users lead to its consistent recognition as a Leader in the 2023 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms. A report published by Databricks further confirms PowerBI as the most popular data and AI product of 2023 on top of the Databricks Lakehouse.
The Power BI disconnect
Despite its popularity, Power BI’s architecture and position in the Modern Data Stack are not so straightforward. Although PowerBI provides a “DirectQuery” option to connect directly to cloud data sources without the need to import data into a separate data model or dataset, it suffers from a number of disadvantages:
- The Power BI query must be translated into a native source system query (i.e Databricks SQL) retrieved, and then transformed into a model understandable to Power BI.
- Due to its multidimensional architecture, Power BI often needs to issue dozens of SQL queries against the native source system to construct a simple visualization.
- DirectQuery is real time, but slow and unable to scale to large data volumes.
To address these limitations, Microsoft recommends adopting “Import Mode” where data is imported into Power BI and stored within a tabular semantic model (Azure Analysis Services). However, this approach comes with its own set of challenges. It creates a data silo outside the centralized cloud data platform, hampering the full utilization of elastic compute capabilities from Databricks. Additionally, any changes in the source data necessitate dataset refreshes, adding complexity to the process.
- When users interact with an Import Mode report, the local tabular data model is queried directly in-memory and users are working on a data silo outside of their centralized cloud data platform.
- Once the data is stored and processed locally in Power BI, end users experience a low latency BI experience on subset of their data, but are restricted from making use of the elastic compute provided by Databricks.
If you choose to use Import Mode, the Power BI engine must first copy the data into the dataset during refresh before being made available to the business user. Any changes at the source in Databricks require a dataset refresh.
This incohesive experience doesn’t align with the Unified Data Infrastructure as defined in the “Modern Data Stack” as organizations are not able to leverage their investment in Databricks. While attempting to bridge the gap between the Data Warehouse and business users, Import Mode has instead turned these elastic data platforms into data sources and bypassed their elastic compute in favor of in-memory processing which is limited to 400GB of in-memory data.
Direct Lake to the Rescue?
Microsoft’s recent announcement of Fabric, which aims to simplify the Power BI architecture making and create a more cohesive data story. At the heart of the announcement Microsoft introduced “Direct Lake,” an approach that tries to bypass the need for Databricks. With Direct Lake, Power BI users can load data directly from OneLake, without having to query a Data Warehouse or Lakehouse.
Direct Lake promises the speed of Import Mode combined with near real time data availability similar to DirectQuery. However, using Direct Lake requires a provisioned a Fabric Lakehouse with One Lake as the storage layer. This introduces known limitations regarding the size of data that can be analyzed and manual configuration in PowerBI. For larger data volumes, Microsoft’s official documentation recommends using Direct Query as a fallback mode. This reintroduces Synapse as a SQL endpoint to process PowerBI queries on One Lake data. While Direct Lake is still not Generally Available, Microsoft will continue to build native capabilities around OneLake that bypass organizations existing investments in Databricks an analytics platforms.
With this latest announcement, Microsoft has embedded a multidimensional interface / semantic layer to their Fabric Lakehouse. This enables Power BI to query an XMLA endpoint, which is the communication protocol of Azure Analysis Services, while using DAX as the query dialect. It appears that Microsoft has embedded a semantic layer deeply into Fabric and in doing so, they have validated the approach that AtScale has been offering for the last 10 years.
Power BI + AtScale + Modern Data Stack
Recognizing Power BI’s widespread adoption among the Global 2000, AtScale remains committed to enhancing its Microsoft Analytics cloud integrations. With AtScale’s native DAX support, Power BI can harness the richness of its native formula expression language in exactly the same way as with Azure Analysis Services. This unique capability empowers Power BI users to navigate metadata and use visualizations in their reports without requiring Power BI needing to translate DAX into a SQL dialect or other query languages, thus avoiding associated performance impacts.
AtScale offers a compelling solution that delivers on the promises of Direct Lake, while leveraging your existing investments in Databricks. By using AtScale’s live connection mode, Power BI users can access up-to-date data at full fidelity, without the need for pre-aggregation or data engineering work. This ensures a seamless and high performance experience for decision makers and information workers who demand the the speed-of-thought performance that Power BI offers.
The seamless integration of AtScale and Power BI within the Modern Data Stack represents a significant milestone in the world of data analytics. As organizations seek to leverage the potential of cloud data platforms, this collaboration serves as a key bridge, facilitating real-time, high-performance data access and empowering businesses to make informed decisions swiftly and effectively.
ANALYST REPORT