What does software development look like when agents write 100% of the code?

15 Jun, 2026

2026 has been an inflection point for agentic coding. In just two years the capabilities of models and harnesses went from a toy autocomplete to being powerful enough to generate an overwhelming portion of the code for production systems. If capabilities continue improving from here, the entire SDLC will be redefined around agents as first-class participants. We are already seeing experiments in the wild to build these new kinds of "software factories" where code is autonomously written, tested, and deployed end to end by long-running background agents ^{1, 2, 3, 4}.

As best practices emerge, the new meta of software engineering will converge on building systems that can scale high quality AI generated code. Below are some of the broad trends that point in this direction, and how they’re shaping the Bastion roadmap.

Decoupling agents from your local machine

Once you've learnt how to use one coding agent, it's only a matter of time until you start spinning up multiple agents to work in parallel. Tools like git worktrees have been especially popular in enabling multiple agents to work simultaneously on the same local codebase. But at some point, you realise that source code isolation is not enough. Coding agents on the same machine can still collide on ports, processes, local state, and other system dependencies. Not to mention the security implications of allowing multiple agents to run untrusted bash commands on your machine.

Additionally, as models improve, these agents are becoming increasingly capable of executing tasks over longer time durations. Consequently, this also means that work is dependent on your local machine always being online.

At this point its much more effective to give coding agents their own computers that are decoupled from yours and strongly isolated. This eliminates a whole bunch race conditions from agents operating in parallel and allows work on long-running tasks to continue even when you're offline.

Micromanaging agents doesn't scale

However, even if you could infinitely scale the number of parallel background agents, you will always end up being the blocker if you have to babysit every single session. Human cognition hasn't changed and it’s a known fact by now that multitasking kills productivity.

Let's say you start a prompt for one agent. While that's loading you start a prompt for another. You do this N times. Like an event loop you then go through the stack to see which agents are blocked and you give them the necessary context to continue. This is not scalable and if you're trying to micromanage agents, you'll burn out eventually.

Real productivity is unlocked when the human work is done upstream of agents. When product vision is clear, architecture is decided, and deliverables have been broken down into sequential and parallel units of work with detailed specs. By the time your coding agents start, they have all the context required to effectively one-shot a task.

Closing the loop

But it's still hard to not feel like you need to constantly watch over every agent session when you think about all the times you've had to take multiple turns because the agent created a bug or missed a major edge case. However, the difference between an agent that needs to be constantly steered to an agent that can one-shot a PR is whether or not they are operating on an open or closed loop.

An agent that is forced to check its work against a comprehensive suite of unit, integration, and end to end tests produces significantly better results over multiple iterations.

Testing has always been one of those things developers know they should do, but often avoid until the regressions become more painful than writing the tests. In agentic engineering, this trade off doesn't exist. Agent's are great at following red/green TDD. And over time, the tests they build provides a core guardrail that maintains your confidence and velocity in the agent's output.

However, tests are only one part of closing the loop. A whole suite of tools are starting to emerge that enable agents to act independently. Need to verify a new frontend? Use agent-browser. Need to test email notifications? Use agentmail. Software for agents is becoming a highly valuable category for closing the loop.

What happens to the craft of building software?

It is understandable to feel as if the craft of software engineering is being taken away. But as agents compress the implementation time from weeks to hours, the scarce skill shifts toward deciding what should be built, how it should behave, and how its correctness should be verified. As your proficiency in using these tools get better, you start to spend a bigger portion of your time solving the harder problems. This is where the real craft lies.

However, there is an argument to be made for the engineers who care a lot about how code looks. There is no doubt a sense of satisfaction from writing clean code that is easy to parse. For these folks, it's a lot harder to remove themselves from the implementation. There is a constant urge to nitpick every function, line, and variable name these agents produce.

But as more of the codebase is generated by agents, line-by-line nitpicking becomes another human bottleneck. Evaluation of code quality must shift from taste at the implementation layer to constraints at the system layer. Your confidence should come from deterministic processes: test coverage, lint rules, type checks, runtime observability, and harnesses that steer the agent’s output.

Over time, these systems you build to increase your confidence and velocity in AI-generated code become the software factory. Sustainable productivity is achieved when you remove yourself from the individual sessions and start operating at a layer above.

Humans build the factories that produce the code

As you get better at building these systems, your focus shifts from reviewing PRs to reviewing specs, context, and verification processes. Assuming the latter is done right then the PR reviews feel more like the final skim before merging to production. And if something doesn't seem right, its now arguably a lot easier to nuke the PR, tune the inputs, and let the agent run it back.

If we project this trend forward, we are going to see a whole new ecosystem of developer tools and platforms to ensure agents are generating code under the correct conditions. A large portion of software engineering will be spec reviews, context plumbing, and closing the loop that allows agents to autonomously verify their work based on comprehensive test suites, tooling, and monitors.

The codebase may become the domain of agents, but the factory remains the domain of humans. The engineer’s job moves up a layer.