From Developers to Operators: What Agentic Engineering Requires from Your Organization

Robert von Massow

Jun 1, 2026 ·AI & ML · 10 min read

Discussions about agentic software engineering often revolve around what the models can do, which is understandable, but it misses the point.

The real question is: what kind of environment do you need for this to work? And who is responsible when it doesn’t?

Because in practice, things are already breaking in predictable ways. Developers use agents to generate code, tests pass, and changes get shipped. Everything looks fine. Until it isn’t. When something goes wrong, the explanation is often surprisingly consistent:

“The AI generated it.”

At first glance, that sounds like a technical limitation. It isn’t. It’s an organizational one. Responsibility has quietly diffused. In practice, no one really knows who is accountable anymore.

To understand why, it helps to be explicit about what we mean by “the system”. In this context, the system is not just the codebase. It is the entire engineering environment: code, infrastructure, processes, and the people operating them. Agentic software engineering introduces autonomous components into this system. Agents that can execute parts of the development process.

We’ve seen similar patterns elsewhere. Autonomous driving is a good example. The interesting question there is not just whether the system works, but who is accountable when it fails. The driver? The manufacturer? The software? Agentic software engineering introduces the same ambiguity, just inside your organization.

What’s changing is not just how software is written. It’s how this system behaves as a whole. Engineers are no longer only implementing logic. They are directing systems that do. In effect, developers are becoming operators of their own small, autonomous “agent teams”. And like any operator, they are accountable for the outcome.

This shift is not just conceptual. It shows up in day-to-day work. Less time writing code, more time reviewing, validating, and guiding. That is a different kind of work. Not necessarily what engineers were trained for, and not always what they expected. Which makes this a management problem as well.

If organizations expect higher delivery speed through agents, they also have to accept that a significant part of engineering time moves into oversight and decision-making. Not away from it.

The problem is that most engineering environments are not designed for reliable machine execution. And most organizations are not designed to clearly assign responsibility when that execution is delegated.

To make this work, you need three things in place:

Systems that can be reliably executed by machines
Clear ownership and accountability
Engineers who understand that responsibility cannot be delegated to the agent

Most teams focus on the first. And underestimate the other two.

Machine-Operable Systems: Can an Agent Actually Work in Your Environment?

Before getting into responsibility, there is a more basic question: can an agent even operate effectively in your environment?

Most engineering systems today are built for humans. They rely on experience, conventions, and a fair amount of implicit knowledge. That works because humans fill in the gaps. Agents don’t. They don’t infer intent or resolve ambiguity. They work with whatever signals you give them, nothing more. Which means ambiguity is not just inconvenient, it’s a failure mode.

In a lot of organizations, important parts of the development process are implicit. Setup steps are never written down, tests fail “sometimes” but are still considered fine, services behave slightly differently depending on where you run them, and dependencies only make sense once you’ve been around for a while. A human developer can deal with that. An agent can’t. It will just proceed with incomplete information and produce something that looks plausible.

Agents don’t fail loudly. They fail convincingly.

System

For agentic workflows to work at all, your environment needs to be fully automatable and reproducible. One-command setup, deterministic builds, deterministic test runs, no hidden local configuration, no “this only works on my machine”. None of this is new, but it becomes non-negotiable. If your system cannot be executed reliably by a machine, it will not be executed reliably by an agent.

Testing also changes its role. In traditional development, tests are a safety net. With agents, they define what “correct” means. Agents optimize against whatever signals you give them, whether those signals are good or not. If your tests are flaky, incomplete, or misleading, the agent will optimize exactly that. A human might question a test. An agent won’t.

Tests are no longer just verification. They are the control system.

The same applies to architecture. Agents do best in systems that are easy to reason about, with clear boundaries, explicit interfaces, and well-defined responsibilities. They struggle in systems with hidden side effects, tight coupling, and undocumented assumptions. This is not about monolith vs. microservices. It’s about whether the system is understandable. If a machine cannot understand the boundaries, it cannot operate safely within them.

Most teams have decent observability at the system level. Logs, metrics, traces. What’s usually missing is visibility into decisions. What was the goal? What did the agent try? Why this solution? Without that, you’re debugging results without understanding how they were produced.

You’re not just debugging systems anymore. You’re debugging decisions.

At this point, the shift becomes clear. You are no longer building systems only for humans. You are building systems that need to work for both humans and machines. A lot of teams only realize this once they start using agents, and then things start to break in ways that are hard to explain. It often looks like a model problem. It isn’t.

Even if your system is fully machine-operable, that only gets you so far. It makes agentic workflows possible, nothing more. The harder question comes next.

Organizational Accountability: Who Owns the Outcome?

A machine-operable system makes agentic workflows possible. It does not make them safe.

So the question shifts. It’s no longer about whether it works. It’s about who owns the outcome.

This is where things usually start to get fuzzy. Agentic systems create the impression that execution and responsibility move together. In practice, they don’t.

It usually plays out the same way. An engineer uses an agent to generate code, tests pass, the change is shipped, something breaks. And the explanation is:

“The AI generated it.”

That sounds reasonable at first, but it isn’t an explanation. It’s a sign that ownership was never clearly defined. An agent doesn’t act independently. It executes within a context that was defined by someone. Prompts, constraints, validation steps, all of that is human input.

The agent executed. Someone decided to trust the result.

That’s the point where responsibility sits.

The important question is not who wrote the code. It’s who decided it was good enough. Responsibility lives at these decision points: defining the task, accepting the output, deciding to ship. If those are not clearly owned, responsibility diffuses. Engineers assume the system is good enough, managers assume the process is under control, and in the end no one really owns the decision.

Externally, the organization is still accountable. That doesn’t change. Internally, it becomes unclear who is accountable for what.

There is another layer to this that is easy to miss. Specifications are shifting.

Traditionally, behavior is defined through explicit artifacts. Requirements, interfaces, code. With agents, a significant part of that moves into prompts. When an engineer prompts an agent, they are not just requesting code. They are defining expected behavior. In effect, they are writing a specification.

The problem is that this specification is rarely recorded. It lives in chat sessions. It is not versioned, not reviewed, not shared. Two engineers working on the same system can operate with different assumptions, use different prompts, and end up implementing slightly different behavior. Sometimes even conflicting behavior.

The system still compiles. Tests may still pass. But the underlying definition of behavior is no longer consistent.

The specification is no longer in one place. It’s spread across conversations.

prompts

That makes accountability harder, not easier.

Agents increase speed. That part is obvious. What’s less obvious is that they also increase the number of implicit decisions. To maintain control, those decisions need to become visible. Intent needs to be traceable. Ownership needs to be explicit. Decisions need to be reviewable, not just outcomes.

Without that, you lose the ability to reason about your own system.

This is where management comes in. If you push for more speed, encourage the use of agents, but don’t redefine ownership, you end up with a system where execution gets faster and responsibility becomes less clear. At the same time, the work itself changes. Less writing, more reviewing, more deciding. That is oversight work, whether it is called that or not.

If you want faster delivery, you have to accept the cost of validation.

Ignoring that trade-off doesn’t remove it. It just hides it.

The principle itself is simple.

You can delegate execution to an agent. You cannot delegate responsibility.

Teams that make this explicit tend to make progress. The others move fast for a while, until something breaks and no one can really explain why.

And even if responsibility is clearly defined, one question remains. Are your engineers actually prepared to work this way?

From Developers to Operators

This is where the shift becomes visible. Not in architecture diagrams, but in daily work.

Engineers used to spend most of their time writing code. With agents, a growing part of that is delegated. Code is generated, tests are suggested, changes are proposed. The job doesn’t disappear, but it changes shape.

More time goes into defining tasks, guiding execution, reviewing output, and deciding what is acceptable.

You’re no longer just writing code. You’re running a system that writes code.

That sounds abstract, but it shows up in very concrete ways. Less time in the editor, more time evaluating results. Less implementation, more judgment.

Working with agents feels less like using a tool and more like working with a team. A team that is fast, productive, and occasionally wrong in ways that are hard to spot. Like any junior team, it needs clear instructions, boundaries, and feedback. The quality of the outcome depends less on the agent itself, and more on how it is directed.

One of the more subtle shifts is how engineers relate to the output. It’s easy to treat it as something external. Something the AI produced. But that framing breaks down quickly.

You are not responsible for the code alone. You are responsible for the system that produced it.

That includes how the task was framed, what signals were given, and why the result was accepted.

This also changes what the job feels like. Less time writing, more time reviewing. Less building, more deciding. That is a different kind of work. It requires stronger abstraction skills, better judgment, and clearer communication of intent. It’s not what many engineers were originally trained for, and not always what they expected.

Organizations need to adapt to that as well. If you expect higher throughput from agents, you have to account for where the time goes. More of it goes into oversight, validation, and decision-making. Not less.

If you ignore that, you create pressure for speed without supporting the work required to maintain quality. The result is predictable.

To work effectively in this model, engineers need to expand their skill set. Framing problems clearly, defining intent precisely, evaluating outcomes critically. In other words, they need to think more like operators.

Which brings things back to where this started.

Agentic software engineering is often framed as a model problem. It isn’t. It’s a system problem. And an organizational one.

If you want it to work, three things need to be in place:

systems that can be executed reliably by machines
clear ownership and accountability
engineers who are prepared to operate, not just implement

Most teams focus on the first.

The ones that succeed will figure out the other two.

Because in the end, the question is not what agents can do.

It’s whether your organization is ready to work with them.

About the author

Robert von Massow

With over a decade of hands-on AWS experience and certifications spanning Developer to Security Specialty, Robert works as a Cloud Consultant at superluminar. Here he shares stories and insights from his work — from serious AWS challenges to playful experiments and everything in between.