The Semantic Layer is The Infrastructure Enterprise AI Actually Requires

Blog

28 May

When enterprise AI doesn't perform reliably in production, the AI model is rarely the problem. The data underneath it is. Inconsistent definitions across systems, identifiers and schemas that do not align, missing relationships between facts that should connect. AI has made the cost of these difficult to ignore.

Global companies like IKEA, AstraZeneca, and Netflix are running ontologies and knowledge graphs as core infrastructure. For example IKEA describes not just its products but their use contexts: an AI could understand how different kinds of sofas, sofa cushions, and throws are related, because their ontology captures the meaning and business rules about the relationships, like a cushion or throw makes sofa more comfortable, and which kind of cushion is right for a particular sofa.

Semantic layers are having momentum for a good reason. They are one of the critical prerequisites for enterprise AI that actually works in production and creates business value.

While there are different approaches to implementing a semantic layer, with the coming of AI the main function has switched from business metrics to formalised knowledge of the business environment. Ontologies and knowledge graphs are the ideal tool for describing that knowledge and connecting your data to it.

Mikko Koho

**Principal Consultant in Data & AI, PhD**
Mikko Koho has over 15 years of experience in data engineering, knowledge graphs, and AI, with a PhD specialising in semantics. He works with large organisations across industries to distil actionable business insights from complex, disparate data and build the knowledge infrastructure their AI initiatives require.

Seen at a Client Site, Recognised Everywhere

I recently visited a client who was trying to solve exactly this. There was a large number of siloed databases, poor documentation, and no reliable way of connecting data between the systems. They were starting their data descriptions from scratch and wondering how to make their data work for AI.

In many organisations, the same move plays out. Data is fragmented, some ad-hoc solution is developed to get the data that is needed across different systems.

Without proper knowledge infrastructure the maintenance grows unmanageable and the project is quietly abandoned. Then a new technology wave arrives and they start over, which again works. Until it doesn't.

Bringing Meaning to The Data

Enterprise data is spread across systems built at different times, by different teams, for different purposes. Each system has its own schema, its own data modeling and naming conventions, and its own implicit assumptions about what the data means. A CRM defines a customer one way, an ERP defines it another, a support system has a third definition. These three definitions are usually compatible but not identical, and the differences matter.

“A CRM defines a customer one way, an ERP defines it another, a support system has a third definition. These three definitions are usually compatible but not identical, and the differences matter.”

The deeper problem is that much of the meaning is not explicitly recorded anywhere. Consider a database field called order_closed_date populated on every order record.

In the order management system, "closed" usually means the goods have been delivered and the order is fulfilled.
In the finance system, "closed" can mean the invoice has been paid in full.
In the CRM, an opportunity being "closed" only signals that it has stopped progressing – closed-won, closed-lost, or simply abandoned.

Three legitimate definitions, all stored as a date in a field with effectively the same name. The database schema does not record which definition applies, and the only way to know is through organisational knowledge.

The same is true of column names more generally: a column called customer_id in one database and cust_no in another could refer to the same thing, but the certainty comes from convention, not from anything machine-readable.

Without explicit, machine-readable meaning, an AI system has to guess. When it answers a question for which the relevant data spans over multiple systems, it infers meaning from what it can see in the query results, like column names, and sample values, and tries to picture how the systems could connect and how to stitch together results from different systems. This works often enough to be impressive in demos, and not often enough to be trustworthy in production.

From Metrics to Meaning

The semantic layer is the response to this problem. It harmonises the underlying source databases by using the concepts relevant to your enterprise.

Traditionally, the term referred to the place where business metrics, measures, and dimensions are defined consistently, so that every report and dashboard calculates KPIs the same way. But it was designed for a human analyst, a BI tool, an SQL query,not to capture the structure of the business itself, that is, what entities exist, how they relate, and what rules govern those relationships, i.e., the semantics. It operates over the metrics, not over the underlying meanings of the values themselves: it can ensure that revenue is calculated the same way everywhere, but it cannot tell you why revenue is down in Q2.

The modern semantic layer needs to support AI systems and thus go further. The focus shifts from measures over tables to entities and relationships in a graph, built on two foundations.

An ontology is the formal, machine-readable definition of your business environment. It defines what kinds of entities exist, e.g., Customer, Contract, Product, Order, Supplier, Site, Asset, Employee, what properties those entities have, and the rules that govern how they relate. A Customer has a Contract. A Contract covers a Product. An Order is fulfilled from an Inventory location, and requires certain Components, and is sourced from a Supplier. These are explicit assertions about how the business works, expressed in a form a machine can act on.
A knowledge graph is data expressed as nodes (entities) and edges (relationships), connecting your data to the ontology. A company is a node. A contract is a node. The edge between them is a fact. The whole graph is connected, traversable, and queryable.

Why the Semantic Layer Changes What AI Can Do

Generative AI and AI agents are probabilistic by nature. When exposed to data without an explicit semantic foundation, they infer meaning from what they see, like column names and individual values. They can also confidently get it wrong, which quickly degrades the trustworthiness of an enterprise AI system.

A semantic layer provides the deterministic grounding that the probabilistic models need. A metrics layer supports lookup: what is X. An ontology additionally supports context and reasoning: why did X happen, and what should we do about it. When an AI system retrieves data and reasons over an ontology-powered semantic layer, the answers become explainable. Every recommendation traces a chain through named entities and explicit relationships, rather than emerging from what the AI has learned from its inputs.

“For agentic AI, systems that not only answer questions but propose decisions and take actions, this matters even more. Agents need the structured facts of the semantic layer to make autonomous reasoning grounded, consistent, and auditable.”

AI makes the cost visible of not having formal machine-readable organisational knowledge available. A wrong answer from a report can be caught in review. A wrong answer from an AI agent acting on incomplete or ambiguous data is much harder to catch, and to explain.

The principle is simple: the AI model stays in its strengths of language understanding and generation. The semantic layer takes responsibility for the parts that have to be correct.

Start Small, Build Iteratively

The pattern that works in practice is to pick one business problem where the cost of disconnected data is concrete and measurable, like customer attrition with hidden warning signs, working capital trapped in mismatched orders and inventory, regulatory reporting that takes weeks of manual reconciliation.

Build the ontology fragment that covers that problem.
Map two or three source systems.
Stand up the graph.
Deliver the use case, and then expand.

Each subsequent use case shares the entities and relationships already modelled, so the second one costs less than the first, and the third less than the second. The economics invert: In traditional integration each new project gets harder but with a semantic layer each new project gets easier.

Data EngineeringData & AIAI Engineering

Mikko Koho

Mikko Koho has over 15 years of experience in data engineering, knowledge graphs, and AI, with a PhD specialising in semantics. He works with large organisations across industries to distill actionable business insights from complex, disparate data and build the knowledge infrastructure their AI initiatives require.

https://www.linkedin.com/in/mikko-koho/