DDD as an Anti-Hallucination Architecture: Guardrails for Agentic AI in MeshSync

Agentic AI changes the economics of software implementation. It can generate code quickly, follow local conventions, write tests, propose migrations, and explore unfamiliar parts of a system without waiting for a human to manually traverse every file.

It also changes the risk profile.

The failure mode is not only that an agent may hallucinate an API call or invent a class that does not exist. The deeper problem is that an agent can produce code that looks plausible while quietly damaging the architecture: a domain object importing persistence annotations, a query handler writing to a repository, a REST resource reaching into another bounded context, a dynamic Map<String, Object> sneaking into a contract, or a generated worker payload becoming an informal agreement again.

In other words, hallucination is not just a model problem. It is a boundary problem.

That is the design principle behind the current backend work. MeshSync is being rebuilt as a Quarkus multi-module backend using Domain-Driven Design, pragmatic CQRS, and hexagonal architecture. The goal is not to make AI agents magically correct. The goal is to give them rails: a system where mistakes become visible, localized, and testable before they become product behavior.

This article reviews the MeshSync Java backend architecture from that angle: how DDD boundaries, application contracts, architecture tests, worker contracts, and domain-level gates reduce the space in which an AI agent can hallucinate a clean implementation.

flowchart LR
    A[Agentic AI contribution] --> B[Layer guidelines]
    B --> C[DDD modules and bounded contexts]
    C --> D[CQRS commands and queries]
    D --> E[Ports, adapters, and typed contracts]
    E --> F[Architecture tests]
    F --> G[Human review]
    G --> H[Clean implementation]
    F -. violations .-> A

The useful loop is not “AI writes, human hopes”. It is “AI writes inside constraints, tests reject architectural drift, humans review the remaining decision”.

The Real Risk Is Plausible Drift

Most conversations about AI hallucination focus on factual errors: a non-existent method, a dependency that does not expose the described API, a wrong import, or a test that asserts behavior the system does not actually need.

Those are real issues, but they are often easy to catch. The compiler, type checker, or test runner usually complains.

Architecture drift is quieter.

An AI agent can place the right behavior in the wrong layer. It can solve the immediate task by coupling two bounded contexts directly. It can use a primitive string where the domain already has a value object. It can put a transaction boundary in a presentation resource because that looks familiar from another framework. It can serialize a domain entity because the JSON response needs the same fields. It can generate a generic abstraction before the product has enough concrete cases to justify it.

Every one of those choices may compile. Some may even pass a narrow unit test. The implementation is “working”, but the system has become harder to reason about.

For MeshSync, that matters because the product surface is broad: storage providers, asset libraries, metadata, collections, tags, marketplaces, subscriptions, identity, production workflows, communication, analytics, worker orchestration, IP checks, recommendations, and generated contracts. A small amount of drift repeated across modules becomes expensive very quickly.

The answer is not to forbid AI assistance. The answer is to make the architecture executable.

DDD Gives The Agent A Map

The Java backend is structured as a multi-module Gradle project. The important detail is not the number of modules; it is that each module represents a boundary with ownership.

Current modules include shared kernel, worker backend client, catalog, storage, library, search, collection, tag, marketplace, subscription, identity, production, communication, analytics, boot, and architecture tests.

That gives an agent a first question before writing code: which bounded context owns this behavior?

Without that question, agentic work tends to become location-driven: find a similar file, copy the pattern, patch until tests pass. With DDD, the first decision is semantic. A pricing recommendation belongs to analytics. A storage provider catalog belongs to storage. A thumbnail queue adapter may belong to collection or catalog depending on the use case. Identity does not become a convenient dependency just because the current endpoint knows a user ID.

The layer taxonomy is equally explicit:

Layer	Owns	Must avoid
`domain/`	Aggregates, value objects, domain events, specifications, repository contracts	Frameworks, persistence, JSON, HTTP, dependency injection
`application/`	Commands, queries, use-case handlers, DTOs, ports	Quarkus, JPA, REST annotations, direct infrastructure dependencies
`infrastructure/`	Persistence, queue clients, external services, framework wiring	Presentation dependencies and hidden domain mutation
`presentation/`	REST resources, request/response mapping, OpenAPI annotations	Domain entities, repositories, direct handler injection

This is not documentation as decoration. The architecture tests enforce it.

For AI-assisted implementation, that matters because the model no longer has to infer every rule from taste. The codebase can reject violations mechanically.

Domain Purity Is A Guardrail

The domain layer is where the backend becomes resistant to hallucination.

In MeshSync, domain classes are pure Java. They do not import Quarkus, Jakarta persistence, Jackson, OpenAPI annotations, or dependency injection. Domain identifiers are value objects. Repositories are interfaces. Events implement a shared domain event contract.

That sounds like classic DDD discipline, but it has a newer benefit: it prevents AI from smuggling infrastructure assumptions into the business model.

If an agent tries to make a domain object easier to persist by adding @Entity, the architecture suite rejects it. If it tries to make a domain object easier to serialize by adding Jackson annotations, the suite rejects it. If it reaches from a domain object into an application handler or framework service, the dependency rule rejects it.

This is one of the most useful anti-hallucination patterns: the domain should not know the tool that helped implement it.

The model can propose implementation. The framework can host runtime behavior. The database can store state. But the domain stays the place where product truth is expressed in explicit types and transitions.

flowchart TB
    subgraph Domain[Domain core]
      A[Aggregates]
      B[Value objects]
      C[Domain events]
      D[Specifications]
    end

    subgraph Outer[Outer concerns]
      E[JPA and Panache]
      F[JAX-RS]
      G[OpenAPI]
      H[Queues and worker clients]
      I[JSON mapping]
    end

    E --> J[Infrastructure adapters]
    H --> J
    I --> J
    F --> K[Presentation DTOs]
    G --> K
    J --> A
    J --> B
    K --> J
    A -. forbidden imports .-> L[Architecture tests]
    B -. forbidden imports .-> L
    C -. aggregate ID contract .-> L

The dependency direction keeps product rules from becoming a side effect of persistence, HTTP, or worker integration choices.

CQRS Reduces Ambiguous Intent

MeshSync uses pragmatic CQRS. That does not mean separate read and write databases everywhere. It means use-case segregation: commands and queries are modeled separately where it helps, while sharing the same domain and persistence model unless a module has a concrete reason to split further.

This is a good fit for agentic implementation because it removes a common ambiguity: is this code trying to change the system, or read from it?

Commands implement command contracts. Queries implement query contracts. Handlers have predictable names. Presentation resources dispatch through CommandBus and QueryBus, rather than injecting handlers directly. Command dispatch is transactional, so aggregate writes and outbox logs can share a unit of work. Query handlers are checked so they do not call write methods such as save, delete, persist, update, or flush.

For a human, those rules clarify the design. For an AI agent, they narrow the implementation search space.

If the task is to accept a recommendation, the shape is a command. If the task is to list recommendations, the shape is a query. If a REST resource needs behavior, it dispatches through a bus. If an operation mutates an aggregate, it should not hide inside a query handler.

The architecture suite checks those expectations. This turns CQRS from a naming convention into a behavioral guardrail.

Strong Typing Limits Fiction

Dynamic contracts are one of the places where AI hallucination becomes expensive. A Map<String, Object> can hold anything, which means the agent can invent fields and still compile. A raw Object return type can defer truth until runtime. Primitive pagination parameters can spread subtle inconsistency across modules. Local var can make generated code harder to review when type names carry domain meaning.

MeshSync pushes in the opposite direction.

The architecture tests reject raw Object in domain, application, and presentation contracts. They reject generic object maps such as Map<String, Object> and Map<String, ?> in contract surfaces. Pagination contracts must use PageRequest instead of loose int page and int size. Domain identifiers ending in Id must be value objects. The codebase also enforces explicit local variable types by rejecting var in main Java sources.

That last rule may feel strict. In this context, it is intentional.

When an agent writes code, explicit types are review affordances. They make the resulting diff easier to scan. They expose whether the agent understood the domain object it is manipulating. They make it harder for broad generic shapes to spread simply because they are convenient.

The pattern is simple: when a concept matters, name it in the type system.

Risky shape	Cleaner shape
`String userId` everywhere	`UserId` at domain boundaries
`Map<String, Object> metadata`	Explicit DTO or value object
`Object payload`	Typed payload variant
`int page, int size`	`PageRequest`
Unstructured enum strings	String enum contract with parsing

This does not eliminate mistakes, but it makes mistakes visible in the places where code review and tests can reason about them.

Architecture Tests Are The Second Reviewer

The most important implementation choice in meshsync-backend-java is the dedicated meshsync-architecture-tests module.

It contains ArchUnit and source-scanning rules for layer dependencies, annotations, forbidden imports, naming conventions, CQRS behavior, ports and adapters, contract conventions, object typing, primitive contracts, strong typing, technical remark reporting, and traceability reporting.

That suite is the anti-hallucination mechanism.

Not because tests can understand product intent perfectly. They cannot. But they can catch the predictable ways an implementation assistant breaks architecture while trying to be helpful.

Examples of enforced rules include:

Domain classes must not depend on infrastructure, presentation, application, Quarkus, Jakarta, Jackson, or OpenAPI.
Application classes must remain framework-free outside the boot composition root.
Presentation must not import domain entities or domain repositories directly.
Presentation must dispatch through CommandBus and QueryBus instead of direct handler injection.
Query handlers must not call write methods.
Command bus dispatch methods must be transactional.
Domain events must implement the shared event contract and expose an aggregate identifier accessor.
Event publishers must not invent synthetic aggregate IDs.
Application ports must be interfaces.
JPA entities must live in infrastructure persistence.
JAX-RS resources must live in presentation.
Contracts must avoid raw object types and generic dynamic maps.

This turns architecture into a CI-enforced interface between human intent and machine-generated code.

flowchart LR
    A[Implementation diff] --> B[Unit tests]
    A --> C[Integration tests]
    A --> D[Architecture tests]
    A --> E[OpenAPI contract checks]

    D --> F{Boundary respected?}
    E --> G{External API preserved?}
    B --> H{Behavior correct?}
    C --> I{Runtime path works?}

    F --> J[Reviewable change]
    G --> J
    H --> J
    I --> J

Unit tests ask whether behavior works. Architecture tests ask whether the behavior was placed in the system correctly.

Ports And Adapters Keep Workers Outside The Domain

MeshSync still depends on worker systems. Some work belongs outside the Java backend: thumbnail generation, technical metadata, IP checks, metadata enrichment, search, and other processing tasks.

The Java backend handles that through ports and adapters.

The IP check flow is a useful example. The application defines an IpCheckWorkerQueuePort. The infrastructure adapter uses the generated worker backend client, generated message names, and generated queue names to dispatch work. The domain does not know about the worker client, Redis, queue names, retries, backoff, or webhook URLs.

That separation matters for AI guardrails because worker integrations are a common hallucination surface. An agent can easily invent a queue name, change a payload shape, or place retry semantics inside a domain object if the codebase does not make the integration boundary obvious.

In MeshSync, the adapter owns the integration mechanics:

Generated QueueNames and MessageTypes provide the worker contract surface.
Processing defaults define priority, attempts, backoff type, and backoff delay.
A webhook URL is assembled from configuration.
The adapter maps the domain result and request DTO into a worker message.
Worker dispatch failures become application exceptions or domain error state, not silent assumptions.

The worker can be probabilistic or heuristic. The backend boundary cannot be.

That is the key distinction. AI workers may produce uncertain analysis; the backend should receive that uncertainty as explicit data, normalize it, validate it, and decide what domain state transition is allowed.

AI Suggestions Are Not Domain Authority

The recommendation model in analytics captures the right posture toward AI output.

The aggregate is explicitly described as an AI-generated recommendation, but it is not treated as truth. It has a lifecycle: PENDING, ACTIVE, ACCEPTED, DISMISSED, and EXPIRED. It has a confidence score constrained between 0.0 and 1.0, rounded to two decimals. It has display thresholds and labels. It has an expiration date. It records resolution time and dismiss reason.

That is a product decision embedded in the domain: AI can suggest, but the suggestion must live inside a governed state machine.

A recommendation cannot be accepted unless it is active. It cannot be dismissed unless it is active. It can expire from pending or active. Confidence is a value object, not a free-floating double. The user action is part of the model.

This is the same idea as human-in-the-loop, but implemented as domain behavior rather than a slide in an architecture deck.

stateDiagram-v2
    [*] --> PENDING
    PENDING --> ACTIVE: activate
    PENDING --> EXPIRED: expire
    ACTIVE --> ACCEPTED: user accepts
    ACTIVE --> DISMISSED: user dismisses
    ACTIVE --> EXPIRED: ttl elapsed
    ACCEPTED --> [*]
    DISMISSED --> [*]
    EXPIRED --> [*]

The AI output becomes a recommendation only after it enters a domain lifecycle with confidence, ownership, expiry, and user resolution.

This is one of the cleanest guardrails in the backend: AI is not allowed to be the final actor. It produces candidates. The domain decides which states exist and which transitions are legal.

IP Checks Show Guardrails As Product Policy

The IP check flow is even more direct.

An external worker can analyze a model and return a status, confidence, risk level, trademark risk, copyright risk, matched brands, matched franchises, similar model count, heuristic results, and suggested changes. That output is useful, but it is not blindly trusted.

The backend maps webhook payloads into typed completion DTOs. It accepts known worker statuses such as clear, flagged, and error; it normalizes broader completion statuses such as completed, success, and failed; it derives flagged status from risk and recommendation when necessary; and it rejects unsupported worker statuses.

Then the aggregate validates scores and applies domain transitions.

IpCheckResult enforces score bounds. It records pending, in-progress, clean, flagged, and error outcomes. It raises completion and flagged events. It records findings and suggested changes. It also models acknowledgement explicitly.

Most importantly, publishing is gated by domain policy:

IP check condition	Publish behavior
No result	Allowed
Pending or in progress	Blocked until completion
Error	Allowed with warning
High or critical recommendation to block	Blocked
Medium risk warning not acknowledged	Blocked until acknowledgement
Warning acknowledged	Allowed
Clean result	Allowed

That is the guardrail pattern in miniature.

The worker can classify. The domain owns the release decision. A high-risk result cannot be bypassed by acknowledgement. A medium-risk result requires explicit acknowledgement. An error does not silently become a block or a pass; it becomes a specific policy outcome.

This is how a system should absorb uncertain AI output: as evidence, not authority.

Traceability Keeps The Agent Honest

MeshSync also uses RequirementCoverage annotations and generated traceability reports. Tests and source evidence can reference requirement IDs and explain what they cover. The architecture test suite checks that known coverage references appear in the traceability report.

There is a similar mechanism for technical remarks. The codebase has reporting around TechnicalRemark annotations, placeholder reasons, and generated remark reports. There are remediation rules that reject unresolved technical remark annotations in main source, enforce snake_case JPA column names, keep certain application packages from accumulating vague services, and prevent native SQL or direct textual entity-manager queries in production code.

For AI-assisted development, this is subtle but powerful.

Agents are good at filling gaps with plausible implementation. Traceability forces a different question: what requirement or architecture decision is this code evidence for?

That makes hallucination harder to hide. If a change claims to implement a requirement, there should be a requirement reference. If it introduces technical debt, the debt should be visible and reported. If it changes an API, OpenAPI compatibility checks should make that visible. If it moves behavior across layers, architecture tests should fail.

The workflow becomes less about trusting the agent and more about demanding receipts from the implementation.

API Contracts Protect The Migration Boundary

The Java backend is also a migration story. MeshSync has an existing NestJS backend, and the Quarkus backend must preserve external API compatibility while internals change.

The contract testing documentation is explicit about that boundary: preserve paths, HTTP methods, operation IDs where present, status codes, auth requirements, request schemas, response schemas, pagination shape, and error response shape. OpenAPI annotations stay in presentation. Domain and application contracts remain framework-free. OpenAPI diffing is the compatibility gate, and breaking changes require documented migration exceptions.

This is another anti-hallucination guardrail.

An agent may implement an endpoint that feels RESTful but breaks the current client contract. It may rename a field to something more idiomatic in Java. It may narrow a schema because the domain value object is stricter than the existing public API. It may return a different error shape because a framework mapper made that easy.

Contract tests keep the agent from confusing internal cleanliness with external compatibility.

That distinction matters. DDD is not permission to break users. It is a way to make the inside of the system coherent while preserving the outside contracts deliberately.

What This Architecture Does Not Do

It is worth being precise about the promise.

This architecture does not make AI hallucination impossible. It does not guarantee that every generated implementation is strategically correct. It does not replace human review. It does not remove the need for domain knowledge, product judgment, or security review.

What it does is reduce the blast radius.

If an agent invents a repository dependency across bounded contexts, a rule can catch it. If it places JPA annotations in the domain, a rule can catch it. If it uses a dynamic object map in a contract, a rule can catch it. If it writes from a query handler, a rule can catch it. If it bypasses command bus transactions, a rule can catch it. If it treats AI output as immediate product truth, the domain model should make that impossible or at least visible.

The strongest systems do not assume contributors are always correct. They make incorrect contribution paths narrow.

Lessons For Agentic AI Implementation

The MeshSync backend points to a broader pattern for teams using AI agents seriously.

First, write layer rules for agents, not only for humans. Humans can infer taste from context. Agents need sharp boundaries and executable feedback.

Second, keep the domain framework-free. If an AI-generated change cannot express a business rule without importing infrastructure, the model probably has not found the right design yet.

Third, prefer typed contracts over dynamic payloads. The more the system says in types, the less the agent can hide in plausible runtime shapes.

Fourth, enforce architecture with tests. Documentation helps, but CI-enforced rules change behavior.

Fifth, treat AI outputs as suggestions that enter domain workflows. Confidence, expiry, acknowledgement, ownership, and policy gates should be modeled explicitly.

Sixth, preserve external contracts with compatibility tests. A clean internal architecture is not clean if it breaks callers accidentally.

Seventh, make traceability part of the implementation loop. Requirements, ADRs, technical remarks, tests, and code evidence should remain connected.

The Clean Implementation Loop

The most important outcome is not that the Java backend uses DDD. Plenty of systems claim that.

The important outcome is that the architecture gives both humans and agents the same implementation loop:

Choose the bounded context.
Choose the layer.
Express the use case as a command, query, port, adapter, aggregate behavior, or DTO.
Keep the contract explicit.
Run behavioral, contract, and architecture tests.
Review the remaining product decision.

That loop is what makes agentic AI useful without letting it become architectural entropy.

MeshSync needs automation. It needs worker systems, AI-assisted analysis, generated contracts, marketplace intelligence, and fast implementation cycles. But the more automation enters the system, the more important the boundaries become.

DDD is often presented as a modeling discipline. In this backend, it is also a control system for AI-assisted software engineering.

The agent may write code. The architecture decides where that code is allowed to live.