June 21, 2022
Rethinking Feature Stores to Better Align AI/ML to Business ImpactIn today’s data-driven business world, the ability to quickly and accurately query vast data warehouses is more crucial than ever. However, as the volume of questions and data increases, the bottleneck of human analysts becomes a significant challenge. While reliable, traditional data analysis methods are time-consuming and often inconsistent from one analyst to another. Enter the integration of AtScale’s Semantic Layer and Generative AI, a breakthrough solution that promises to revolutionize how businesses interact with their data.
The Challenge of Text-to-SQL
Businesses are adopting data-driven cultures faster than ever, generating vast amounts of data every day while simultaneously asking more complex questions about this data. To facilitate this, businesses are exploring the idea of converting a natural language question into database-readable SQL using Natural Language Processing (NLP) technologies built on large language models (LLMs). While these NLP technologies have made significant strides, this task, known as Natural Language Querying (NLQ), remains a formidable challenge.
Generative AI models, like OpenAI’s GPT-4, have shown promise in generating human-like text, but they struggle with the nuances of enterprise SQL generation. The complexity of generating accurate joins, defining business KPIs, and translating business terminology into database tables and columns are significant hurdles. This is where AtScale’s Semantic Layer and Query Engine come into play.
AtScale’s Semantic Layer and Query Engine
AtScale’s technology addresses the limitations of existing NLQ solutions by providing a robust Semantic Layer and Query Engine that bridges the gap between business questions and accurate data retrieval.
The Semantic Layer
The AtScale Semantic Layer serves as a centralized repository for business logic, KPIs, and table relationships. It allows different teams to leverage consistent KPI definitions, ensuring that data-driven insights are reliable and comparable across the organization. Feeding the metadata of the Semantic Layer into the LLM provides the necessary context to generate accurate SQL queries. This allows the LLM to translate from business terminology into what we call AtScale Inbound Queries.
The Query Engine
The AtScale Query Engine simplifies the interaction between BI tools and data warehouses. It translates AtScale Inbound Queries from BI tools like Power BI, Excel, or Tableau into the SQL dialect required by the underlying data warehouse. The Inbound Queries all reference a single logical table comprising the KPIs and metadata of the semantic layer as columns. Having the LLM create Inbound Queries enables us to use this same translation logic, ensuring that queries are executed efficiently and accurately while removing the need for the LLM to infer joins or create metrics from scratch.
Experimental Setup and Results
To demonstrate the effectiveness of integrating AtScale’s Semantic Layer and Query Engine with an LLM, an experiment was conducted using the TPC-DS benchmark dataset, a data warehouse performance testing standard.
Experimental Design
- Dataset: The TPC-DS dataset contains millions of rows across two dozen tables.
- Question Set: A set of 40 natural language questions, varying in complexity, was created to evaluate performance.
- Control System: The LLM was provided with only the schemas and primary/foreign keys of the underlying data warehouse tables.
- Evaluation System: The LLM was provided with the AtScale Semantic Layer metadata and used the AtScale Query Engine for query execution.
- Feedback Loop: If SQL syntax errors are detected, the LLM is prompted for a new response, it was given 3 chances.
Results
The integration of AtScale’s Semantic Layer and Query Engine led to a dramatic improvement in performance. The control system, which had access only to the table schemas and primary/foreign keys, achieved an accuracy of 20%. In contrast, the evaluation system, supported by the Semantic Layer and Query Engine, achieved an impressive accuracy of 92.5%.
Performance by Complexity:
- Low Question Complexity, Low Schema Complexity: 100% accuracy with Semantic Layer vs. 60% without.
- Low Question Complexity, High Schema Complexity: 100% accuracy with Semantic Layer vs. 10% without.
- High Question Complexity, Low Schema Complexity: 100% accuracy with Semantic Layer vs. 10% without.
- High Question Complexity, High Schema Complexity: 70% accuracy with Semantic Layer vs. 0% without.
Key Takeaways
Integrating AtScale’s Semantic Layer and Query Engine with Generative AI models transforms the NLQ landscape. By providing the necessary business context and simplifying query generation, this approach makes NLQ solutions viable for everyday business use.
Benefits:
- Consistency and Accuracy: The Semantic Layer ensures that all queries are based on consistent definitions and business logic, reducing errors and inconsistencies.
- Efficiency: The Query Engine streamlines the interaction between the LLM and data warehouses, ensuring queries are executed efficiently.
- Scalability: This solution can handle complex queries involving multiple tables and advanced KPIs, making it suitable for large-scale business operations.
Future Directions
While the results are promising, there is room for further improvement. Future research will focus on enhancing the metadata delivery to the LLM and refining the feedback loop for query correction. Providing fine-tuning datasets for the LLM could improve performance, especially for more complex queries.
AtScale’s Semantic Layer and Query Engine, combined with advanced Generative AI models, offer a powerful solution for enabling natural language prompting for business intelligence tasks. By downloading the full white paper, readers can explore the detailed methodology, experimental setup, and comprehensive results that underscore this transformative approach.
Download the full white paper here to dive deeper into how AtScale redefines data interaction in the business world.
WHITEPAPER