Working in banking as a Senior Software Engineer, code governance is a non-negotiable requirement, dictated by strict quality and compliance standards.
When I started integrating AI-assisted development tools into the team's workflow, the benefits were obvious: speed, contextual suggestions, boilerplate generation. But the critical issue wasn't the quality of the generated code — it was its consistency with project standards.
Without a structured context, AI tends to produce technically valid but architecturally inconsistent solutions — diverging patterns, poorly distributed responsibilities, ignored conventions.
The challenge wasn't generation, but governance: how to confine the AI within our standards instead of suffering its operational anarchy.
The starting idea: best practices shouldn't be delegated to the AI, but defined at the architectural level. From there, I structured the project so that constraints are intrinsic, ensuring compliance with standards before the AI writes the first line of code.
LLMs operate on statistical patterns derived from millions of repositories. Most open-source code doesn't follow enterprise standards. The result is that generated code tends to:
The code compiles, tests pass, but technical debt accumulates — debt that surfaces months later, when you scale or start new development.
Instead of correcting generated code after the fact, we built a system where AI operates within predefined rails: enforced architecture, typed validation, generators for boilerplate, catalogued errors, domain-specialized agents.
In practice, a playground for agents — with well-defined rules.
The approach was iterative: over the course of a few weeks, each constraint was refined based on problematic patterns that emerged in reviews.
In a React application without explicit constraints, it's common to receive from AI monolithic components that fetch data, manage local state, validate input, and render UI. It's not a model bug — it's the most probable output given the repositories it was trained on.
The app was split into three layers with unidirectional dependencies:
Service Layer (src/service/) — Business logic, API calls, queries and mutations. No React component imports.
Pages Layer (src/pages/) — UI components and orchestration logic. Can import from Service Layer, never the reverse.
Store Layer (src/store/) — Pure UI state: modals, drawers, temporary selections. Server state lives exclusively in the data fetching library cache, with no replicas in the local store.
With this structure, when AI needs to add a feature, it knows exactly where to place each piece.
Every endpoint has a validation schema. Types are always inferred from the schema, never defined manually.
// service/screens/products/constants/schemas.ts
export const ProductSchema = schema.object({
id: schema.string().uuid(),
name: schema.string().min(1),
price: schema.number().positive(),
})
export type Product = InferType<typeof ProductSchema>Every query validates the response before returning it:
// service/screens/products/queries/queryProductList.ts
export const productListQueryConfig = defineQuery({
cacheKey: PRODUCTS_CACHE_KEYS.LIST(),
queryFn: async () => {
const response = await httpClient.get(PRODUCTS_URLS.list)
return schema.array(ProductSchema).parse(response.data)
},
})The AI can't invent fields that don't exist in the schema. If the API response changes and the schema isn't updated, the code doesn't compile.
The basic approach — code plus message — doesn't scale. We built a catalog where each error has a schema defining the optional payload, a severity, and a mapping to an i18n key.
const errorCatalog = {
VALIDATION_FAILED: {
shape: schema.object({
missingFields: schema.array(schema.string()).optional(),
}),
severity: "error",
},
RATE_LIMITED: {
shape: schema.object({ retryAfter: schema.number() }),
severity: "warning",
},
}The catalog type enforces that every error code maps to an existing translation key — new errors without a translation generate a TypeScript compile-time error.
Cache keys follow the same principle: typed factories, not free strings.
const PRODUCTS_LIST_QUERY = () => ["products", "list"] as const
export const PRODUCTS_CACHE_KEYS = { LIST: PRODUCTS_LIST_QUERY }Creating a new feature requires dozens of files. Without automation, every developer — or agent — might interpret the structure differently, or waste tokens generating boilerplate.
npm run gen:page # new page with complete structure
npm run gen:feature # feature with service layer
npm run gen:service # service layer onlyGenerated files contain correct import aliases and typed placeholders that cause TypeScript errors until replaced with the real implementation — impossible to forget anything.
The workflow changes: instead of asking AI "create a dashboard feature", you run the generator. The AI starts from a correct scaffold, not templates to interpret.
Agents and skills were separated by domain: agents define behavior and decision rules, skills codify operational procedures and implementation patterns.
Each agent contains explicit anti-patterns with explanation, reference patterns, and a post-implementation checklist.
## Anti-patterns
❌ useState for errors → use query.error
❌ Manual types → use InferType<>
❌ Inline queries → use defineQuery()
## Checklist
- [ ] Validation schema defined
- [ ] Cache key factory updated
- [ ] Hook wrapper created
- [ ] Error derived from query statePre-commit hooks. Linting isn't enough if it's not automatic. I configured pre-commit hooks that run formatting and linting automatically on every staged file. Non-compliant code doesn't enter the repository — regardless of who wrote it, human or AI.
Instead of separate documentation that becomes obsolete, the project has a contract file containing architecture, conventions, import aliases, commands, examples. When the AI starts working, it reads this file and operates according to the defined rules.
It's a living document: when the AI produces inconsistent code, I identify the missing pattern and add it to the contract.
Hallucinations cluster into three categories.
Patterns from training data. The AI suggests structures it has seen elsewhere — utils/ folders inside every feature, direct fetch() in components, manual types instead of inferring from schema. The solution isn't repeating rules in prompts, but making them non-bypassable.
❌ DO NOT create feature-specific utils folders
✅ Shared utilities go in service/shared/utilities/
Imports and aliases. The AI uses relative paths instead of configured aliases when these aren't visible in context. Documenting all aliases with examples in the contract resolves the issue stably.
Duplicated state. Recurring pattern: useState to manage errors from mutations or queries.
// ❌ Problematic pattern — duplicated state, possible misalignment
const [error, setError] = useState(null);
const { mutate } = useMutation({
onError: (err) => setError(err)
});
// ✅ Correct pattern — derive, don't duplicate
const { mutate, error, isError } = useMutation({...});
{isError && <ErrorBanner message={error.message} />}The AI produces this because it's the most common pattern in training data. Showing both examples in the contract eliminates it systematically.
Code reviews went back to focusing on business logic. Comments related to inconsistent patterns or misplaced files dropped drastically — from a recurring problem to a rare exception.
New developers have an operational guide from day one. The onboarding time on project structure compressed significantly.
Duplicate utilities disappeared. Schema validation on every API response catches data misalignments before they reach the UI.
This approach has a cost. Building generators, configuring agents, keeping the contract updated takes time — time not spent on features. Pre-commit hooks occasionally slow down commits. The contract risks becoming too long if not periodically reviewed.
The return is in medium-term maintainability: less technical debt, faster reviews, quicker onboarding. But it's an upfront investment that needs to be justified by project scale and expected lifespan.
Define the architecture and make it non-bypassable. The chosen pattern doesn't matter — what matters is that it's documented, enforced via linting and hooks, and every file has a unique position.
Every external data point passes through a schema. API responses, form inputs, environment variables — everything validated at the entry point.
A generator for every new feature makes a difference. Even just one, generating the base structure with correct patterns, reduces variations.
Separate instructions by domain. Separate agents or files for service layer, UI, testing allow the AI to load only relevant context.
Automate enforcement. Pre-commit hooks, automatic linting, import restrictions: constraints that can be bypassed will be bypassed.
The paradox of this approach is that the more structural constraints a project has, the more useful AI becomes. Without having to decide where to put a file or what to name a variable, it focuses on solving the actual problem.
It's not about limiting the tool. It's about giving it a context where its statistical probabilities converge toward our conventions.
The question isn't "should I use AI to write code?", but "how do I structure the project so that generated code is still maintainable six months from now?".
Have you applied similar patterns in your project? What difficulties have you encountered integrating AI into your development workflow?