Story

5 cutting-edge solutions for universal semantic layer for enterprise data: breaking the barriers of BI tools

Business leaders recognize the power of data-driven decisions. Yet, an ocean of organizational data and low trust of its sources hinder tapping the true value of this data. Leaders are looking forward to solutions, but find the reports generated by BI applications lacking in specific decision intelligence that they need. These findings are part of a 2023 global study conducted by Oracle and Seth Stephens-Davidowitz, a New York Times bestselling author.

The frustrations and dilemma stem from the complexities involved with organizing, preparing and making data ready for consumption by various analytical and BI tools. In large enterprises, the proliferation of data sources and legacy silos add to the chaos. With multiple tools used in the consumption layer by various departments, each having their own data definitions, reports are inconsistent, and hence, not trusted. A simple example would be in defining the term “revenue” which may be calculated in completely different ways in finance and sales reports, when pulled out of two different BI tools.

As enterprise data complexities grow, the universal semantic layer (USL) has emerged as a mature solution for this challenge. A USL is a separate layer in the stack which elegantly and efficiently integrates various data sources like lakes, warehouses or tables. Then, using common and universally defined business semantics, it creates a single source of truth that is available and accessible to all the tools in the consumption layer. It also maintains common business descriptions for all data elements and converts complex underlying data structures into terms and constructs that are intuitive for business users. In addition, a USL references the underlying data while ensuring centralized application of calculations and business rules, without altering the source system. It also enables centralized data governance and role-based access control, with control and auditing.

USL empowers business users to perform self-service analytics on large datasets. By leveraging technical advancements in OLAP and AI, the USL can enhance processing speed while reducing cloud storage and compute costs through techniques like pre-aggregation and caching. Let’s take a look at five products that offer the capability to build a USL for enterprise data.

Denodo

Denodo is a data virtualization platform that connects to multiple data sources through data blending, federation, selective materialization, full replication, streaming and runtime joins. It combines these sources to simulate a semantic layer. For a comprehensive control and auditing of data activity, the application of governance and security policies is centralized in Denado.

Denodo’s logical data abstraction layer represents data assets in an abstracted form, detached from their source systems. It features a smart query accelerator, a multi-source query optimizer, an MPP engine and AI-driven acceleration. However, queries are executed at the source and aggregated together later which makes it slow, with degrading performance benchmarks seen with larger datasets or complex queries. Since this is an in-memory system, the processing costs increase exponentially as number of queries grow.

Denodo started as a data integration platform/ETL tool and gradually grew its stack into the semantic layer space. However, it lacks the maturity and advanced capabilities that modern enterprises need to deal with growing data volumes, complexities and concurrent users. Its architecture fails to deliver on the promise of high-speed analytics at optimized querying costs.

Kyvos

Kyvos is a Gen AI powered semantic layer that offers high-speed access to on-premises as well as on-cloud enterprise data for AI and analytics initiatives. It has been benchmarked to manage vast data volumes and query loads, ensuring sub-second queries on billions of rows for thousands of concurrent users, with no performance degradation. The platform has a comprehensive and modern suite of features as a universal semantic layer, such as AI-powered aggregation, caching and AI-driven self-tuning to optimize resource consumption.

Kyvos’ key strength is in its price-performance of querying models and scaling of analytics, claiming 3x reduction on cost for 100x faster performance. The platform processes data in advance that is used by optimized data models, allowing users to query these models multiple times, leading to faster performance and reducing compute costs at runtime.

From an analytics standpoint, Kyvos’ advanced OLAP features on cloud, including unbalanced and ragged hierarchies, parent-child hierarchies, alternate hierarchies and custom rollups, enabling users to analyze data from a big-picture perspective down to granular detail.

Kyvos facilitates the seamless building of semantic models on incremental data, leveraging its distributed architecture and horizontal scalability. This approach allows for the addition of any number of columns, dimensions or measures to the model without imposing limitations or causing disruptions.  Kyvos also offers advanced multidimensional modeling, conversational analytics with rich context and native LangChain connector for building AI-powered apps.

Dremio

Dremio is a data lakehouse platform for SQL-based analytics, based on community-driven standards like Apache Iceberg and Apache Arrow. Dremio promises high performance, self-service access to data anywhere—be it on-premises, hybrid or in the cloud—with no data movement. Apache Iceberg is the data catalog used in Dremio and offers automated data optimization features. The platform also facilitates the modernization of legacy Hadoop infrastructure into a more scalable and flexible lakehouse platform.

Organizations can build a distributed data architecture with a single solution for data mesh using Dremio. It acts as a universal semantic layer by unifying data access, establishing semantic consistency and ensuring unified security and governance across heterogeneous data sources and formats.

Dremio, however, incurs high querying costs and faces performance issues while handling large datasets, complex queries and high-concurrency workloads due to its distributed computing and in-memory processing.

AtScale

AtScale provides a universal semantic layer for defining business metrics independently from BI tools and cloud data platforms. AtScale’s semantic models are based on Semantic Modeling Language (SML), a YAML-based modeling language. It aims to deliver interactive query response while minimizing load and concurrency across operational data. It also centralizes governance and control, while enabling federated analytics consumption.

The platform allows users to interact directly with data using their tool of choice, promoting a culture of self-service analytics. It provides composable analytics features enabling distributed teams to create, govern and share business definitions using an object-oriented approach. Complex models can be assembled using plug-and-play semantics from a controlled library of semantic objects that include dimensions, metrics and calculations.

Some users have reported that AtScale is challenging to implement. It is built to work with Apache Hadoop, and this dependence is an issue for organizations that may need to tweak their tech stack to accommodate the implementation.

Starburst

Starburst is an open data lakehouse platform offering the performance of a data warehouse and the scale of a data lake on a fully managed, multi-cloud platform. The platform supports deployment in various cloud environments such as AWS, Azure and Google Cloud, providing a single point of access to all data.

This lakehouse analytics platform is built on top of Trino, previously known as Presto SQL. Users can provision, scale and manage and deploy optimized Trino clusters and automatically scale them based on workload needs.  It offers features such as a universal interface for faster data discovery, built-in access controls and data governance and self-service capabilities.

Starburst data products partially work as a semantic layer, offering curated datasets for specific use cases only. In addition, it is a compute-intensive, in-memory solution. It does not suffice the enterprises requirements for a unified and standardized view of large-scale data across departments. Instead of a single semantic layer, Starburst requires creating multiple SQL queries and views to support data at scale, which results in high maintenance and costs.

 

 

Leave a Response