From Zero to Agentic: How I Took a Native Engineering Team Up the AI Fluency Curve

When I started leading the native team at Leonardo.Ai, AI adoption looked like this: a few people asking ChatGPT questions, a few more using Copilot autocomplete, and nobody who'd really clocked what had changed since Sonnet 3.5.

That's not a criticism. Surface-level adoption was the norm everywhere. But I'd spent the previous chapter of my career building an AI-first software development tool, so I had a different vantage point. I could see where things were heading. I just didn't think it was my job to force anyone there.

The Philosophy: Create Conditions, Don't Mandate Tools

My approach to tooling adoption for new paradigms is simple: let engineers use what they like. I'm not in the trenches writing code with them every day. Far be it for me to tell someone their workflow is wrong when they're shipping fine.

But "let people figure it out" isn't a leadership strategy either. It's an abdication of one. The EM's job isn't to mandate tools - it's to read the moment, create the right conditions, and remove friction so the team can move itself.

For the first few months, that meant watching. Paying attention to what people were using, what they were struggling with, and whether anyone had noticed the gap between where AI tooling was and where it was about to go.

Most hadn't. And that was fine - until it wasn't.

Reading the Moment

When Sonnet 4 dropped and the agentic loop gained real traction, the step change became undeniable. This wasn't a better autocomplete. This was a fundamentally different way of working - agents that could reason about your codebase, plan multi-step changes, and execute them with minimal hand-holding.

I raised it with my manager and the other EMs. The consensus was clear: this needed more than a Slack message. We decided to do a company-wide presentation on what had changed and why it mattered, followed by structured training.

The key decision here was who would lead the training. We could have brought in an external group. We could have put together a generic training module. Instead, we found people within the company who were already seeing results - practitioners from each discipline who could show what was actually working, not what theoretically should.

Full support from my manager Karim made this possible. And we managed to get Anthropic to present Claude Code to us as part of it, which didn't hurt.

Peer-Led Beats Top-Down

This turned out to be one of the most important decisions in the whole process.

When an outside consultant tells you "AI will transform your workflow," it's easy to nod along and change nothing. When your colleague - someone who works on the same codebase, deals with the same constraints - shows you what they shipped last week using an agent, it lands differently.

People saw their peers talking about what was working: the specific prompts, the specific workflows, the specific problems they'd solved. It wasn't theoretical. It was "I did this yesterday and it saved me two hours." That kind of credibility can't be manufactured by a training programme.

After the training sessions, I canvassed for feedback and started the AI tools guild - a cross-functional group focused on knowledge sharing with regular cadence. Not a one-off event, but an ongoing conversation. The guild gave people a place to share wins, surface problems, and build on each other's approaches.

The Messy Middle

Here's the part most AI adoption stories skip: the bit where everything is annoying and half-broken.

As more team members started experimenting, the problems came fast:

AI code review noise. People were running AI review tools that flooded PRs with low-signal suggestions. Reviewers were drowning in automated comments that weren't helpful.
PR slop. Code that was clearly AI-generated without enough human oversight - structurally fine but missing context, over-engineered, or subtly wrong.
Too much code to review. Agents can generate a lot of code quickly. That's a feature until your reviewer is staring at a 1,500-line diff.
Useless generated code. Agents confidently producing solutions that didn't account for the codebase's actual patterns and constraints.

Every team hits these walls. The question is whether you solve them for the team or with the team.

We tackled them together in our weekly native engineering meeting. Different people owned different problems. The solutions were pragmatic, not prescriptive:

AI review stays local. We decided AI code review should run locally before a PR is created - the author resolves real issues, and nobody else sees the noise. This was a simple process change that eliminated the biggest source of friction overnight.

More time planning, less time prompting. PR slop was almost always a planning problem. If the prompt was vague, the output was vague. We started investing more upfront in specifying what we wanted - which, it turns out, is just good engineering practice with or without AI.

Shared context, iterated together. We invested heavily in our CLAUDE.md and AGENTS.md files, treating them as living documents that encoded our codebase's patterns and constraints. We created slash commands and shared tooling for common workflows. When an agent kept making the same mistake, someone would update the context files and the whole team benefited.

The team's existing culture of helping each other and finding the best solutions together made all of this work. If you don't have that culture, this is harder. But it's also a reason to build it - AI adoption rewards collaborative teams disproportionately.

Meeting People Where They Are

There was no outright resistance on the team. But there were people who were slower to adopt, and that's worth understanding.

The sceptics weren't the least experienced engineers. They were often the fastest - people who were already highly productive and felt like waiting for an agent was slower than just doing it themselves. And honestly? For some tasks, they were right. The early agentic tools weren't good at everything.

I didn't pressure anyone. Instead, I suggested a zero-risk experiment to each of my reports: fire off a task before heading to lunch. Review the result when you get back. If it's good - great, you just saved some time. If not, no problem, you didn't waste anything. Same thing before going into meetings.

That framing mattered. There's no downside. You're not betting your afternoon on an agent working perfectly. You're running an experiment during time you weren't going to use anyway. The worst case is you delete the output.

Over time, seeing peers push decent code quickly and celebrating wins in our weekly meeting did the rest. Adoption isn't a switch you flip - it's a gradient, and people move along it at their own pace when the conditions are right.

What Agent-First Looks Like Now

Six months on, the team works in a way that would have been unrecognisable at the start.

The baseline is that most engineers use agents for at least prototyping, debugging, code review, and codebase understanding. It goes up from there.

Those of us who are fully agent-first work like this: we give an agent a prompt with enough context, MCP connections to the Jira ticket, gh CLI access, shared skills and slash commands. We ask for a plan first, review it, then let it implement. We use the best models available - cost isn't much of a concern compared to engineering salaries, and that's a position held from the top down at Leonardo.Ai.

The biggest workflow shift was git worktrees for parallel agents. Worktrees were completely new to most of us - they let you check out multiple branches simultaneously in separate directories, so you can run agents in parallel without them stepping on each other. We fire off tasks on worktrees that are small enough to specify well enough to one-shot. That's a skill in itself: learning to decompose work into agent-sized chunks.

Things we'd never have thought to delegate six months ago - a time-consuming rebase, conflict resolution, snapshot test re-recording, pushing and updating the PR description - are now agent tasks. We adopted PR templates so agents know how to create PRs consistently. And we lean heavily into AGENTS.md iteration when new issues pop up; when the agent keeps getting something wrong, the fix is almost always better context, not a better prompt.

We also built guidelines for using these tools safely - because with great agentic power comes great potential for your agent to post salary data in Slack.

The EM's Playbook

If I had to distil this into a framework, it's four moves:

Watch first. Don't mandate tools before you understand where your team is and what they actually need. Your job is to read the moment, not create it prematurely.
Mobilise through peers. When the moment comes, get practitioners to lead - people who've seen results in your actual codebase, your actual constraints. Credibility is everything.
Build cadence, not events. A guild, a weekly discussion slot, a shared channel - whatever creates ongoing conversation. One-off training fades. Regular knowledge sharing compounds.
Solve problems with the team, not for them. When adoption hits friction (and it will), resist the urge to hand down edicts. Let the team own the solutions. They'll be better solutions, and the team will actually follow them.

The tools will keep changing. The models will keep improving. But the leadership challenge stays the same: how do you create an environment where a team adopts new paradigms willingly, effectively, and sustainably?

You don't do it by mandating a tool. You do it by building the conditions where the team moves itself.

This post is part of a series on building AI-augmented engineering teams. Previously: how I built security guidelines that engineers actually follow. Next up: how we stopped guessing about prompts and models.