Open ChatGPT, Claude, or any of their cousins and you get the same thing: a text box and a blinking cursor. You type, it answers, you read. It is the most familiar interface in software, and for three years it has been the face of the entire AI industry.
It is also a graphical user interface. Nothing more.
The chatbot is to AI what the desktop was to the personal computer: the visible surface you click around in. And just as the Windows desktop was never the actual operating system, the chat window was never the actual intelligence. Something has to schedule the work, hold the memory, route the tools, and keep the parallel tasks from stepping on each other. For most of the chatbot era, almost nothing did. The session ended and everything vanished.
That layer is now being built, and it has a name. The agent operating system.
Why "operating system" is the right word
It is tempting to dismiss this as marketing. Every new category in software eventually gets called an OS by someone trying to sell it. But in this case the analogy is precise, because the problems are the same problems an OS was invented to solve.
In March 2024, a group of researchers published a paper titled AIOS: LLM Agent Operating System. Their argument was simple. Agents built on large language models were each reinventing the same plumbing, and doing it badly. Every agent framework had its own ad hoc way of managing context, storing memory, calling tools, and deciding what to run next. None of them shared resources. The result was inefficient and, when agents had unrestricted access to model and tool resources, sometimes harmful.
Their fix was to lift those concerns out of the agents and into a kernel. The AIOS kernel sits as an abstraction layer above the regular OS kernel and provides a fixed set of services to every agent running on it:
- An LLM core that runs the model
- A scheduler that orders and dispatches work
- A context manager that handles the model's working window
- A memory manager for short term state
- A storage manager for persistence
- A tool manager for external calls
Agents talk to the kernel through an SDK, and the kernel turns their requests into a chain of scheduled "syscalls" dispatched to the right module. If that vocabulary sounds borrowed, that is the point. The paper reported up to 2.1x faster execution for agents served this way, for the same reason a real OS beats a pile of programs each fighting over the CPU: someone is finally in charge of the resources.
That is the whole thesis of the agent OS in one sentence. The intelligence is not the operating system. The intelligence is one component the operating system schedules.
The parts every agent OS has
Strip away the branding and every agent operating system, research prototype or shipping product, is assembling the same parts. They map onto concepts that have been in operating systems for fifty years.
| Operating system concept | Agent OS equivalent |
|---|---|
| CPU scheduler | Agent loop and task scheduler |
| RAM | Context window management |
| Filesystem | Persistent memory store |
| Applications | Skills and tools |
| Processes | Sub-agents running in parallel |
| Cron | Proactive scheduled tasks |
| Device drivers | Tool and data connectors (MCP) |
| System calls | The agent SDK |
Once you see the table, the products in this space stop looking like competitors selling different things and start looking like different implementations of the same operating system. Which is exactly what happened with Unix, Windows, and macOS. Same primitives, different opinions about how to expose them.
Hermes: an OS you can self-host
The clearest shipping example is Hermes Agent, released by Nous Research in February 2026 under the MIT License. It is not pitched as a chatbot or an IDE plugin. It is meant to be installed on your own server and left running.
Read its feature list against the table above and the mapping is almost one to one:
- Memory is the filesystem. Hermes keeps persistent memory locally in a
~/.hermes/directory. It retains your preferences, projects, and environment across sessions. The longer it runs, the more it knows, and none of that context resets when a conversation ends. - Skills are the applications. When Hermes solves a hard problem, it writes a reusable skill document so it does not have to solve it again. Skills use a portable
SKILL.mdformat compatible with an open standard, which makes them searchable and shareable between agents. It ships with dozens of built in skills and creates more on its own. - Sub-agents are the processes. Hermes runs parallel sub-agents, each with its own isolated conversation and terminal, working on separate streams at the same time.
- The scheduler is cron. A built in scheduler runs unattended jobs like reports, backups, and briefings on a timer, without anyone prompting it.
- The gateway is the I/O layer. A single process connects Hermes to Telegram, Discord, Slack, WhatsApp, Signal, and the command line, so you can start something in one place and continue it in another.
As one writeup put it, the "AI agent" label is too small for this. A single agent is an application. What Hermes provides is the layer that hosts applications. The line that sticks: chatbots respond, an agent OS operates.
Crucially, Hermes is deliberately model agnostic. It runs on Nous Portal, OpenRouter, any OpenAI compatible endpoint, or a local model. The operating system does not care which brain you plug in, which is the second tell that this is infrastructure and not a product feature.
Claude and OpenAI are building the same thing from the top down
Hermes builds the OS from the bottom up, as open infrastructure. The large labs are arriving at the same place from the top down, by turning their chatbots into platforms.
Anthropic stopped describing Claude as a chat product some time ago. The Claude Agent SDK is, in their own framing, the infrastructure behind Claude Code exposed as a library: the agent loop, the built in tools, and the context management, handed to developers so they do not rebuild it. In November 2025 they added the ability for Claude to discover, learn, and execute tools dynamically, aimed at a future where a single agent works across thousands of tools. Those are operating system concerns. Tool discovery is device enumeration. Context management is memory management. The agent loop is the scheduler.
OpenAI made the shift explicit at DevDay in October 2025 with AgentKit and the Apps SDK. AgentKit bundles a visual agent builder, a connector registry, prebuilt UI components, and evaluation tooling. The Apps SDK turns ChatGPT into a host for third party applications that run inside it. Commentators described it bluntly as repositioning ChatGPT from a conversational interface into an application platform and an operating system for intelligent agents. The text box is becoming the login screen for something much larger behind it.
Two labs, two strategies, one destination. The chatbot is being demoted to a window, and a runtime is growing up behind the glass.
Memory is the part that changes everything
If you had to pick the single feature that separates a chatbot from an agent operating system, it is memory.
A chatbot is stateless by design. Each session starts from zero. You re-explain who you are, what you are working on, and what you decided last time. This is the equivalent of a computer that forgets every file the moment you close the lid.
Every serious agent OS treats memory as a first class layer, not an add on. AIOS has a dedicated memory manager and storage manager in its kernel. Hermes writes everything to a persistent local store and grows more useful the longer it runs. Anthropic and OpenAI both shipped memory features that carry context across conversations. The pattern is identical across all of them: persistence is being pulled down into the platform so individual agents do not each have to solve it.
This is the same move that filesystems represented for early computers. Before the filesystem, a program that wanted to keep data invented its own scheme for writing to disk. The filesystem turned storage into a shared service with a common interface, and applications got smaller and more capable as a result. Agent memory is following the same arc. It is becoming a service the OS provides rather than a problem each agent solves alone.
MCP is the driver model
An operating system is only as useful as the hardware it can talk to, and it talks to hardware through drivers. Without a driver model, every application would need to know the intimate details of every printer, disk, and network card. The driver layer is what lets an application say "write this file" and not care what is underneath.
Agents have exactly this problem with tools and data. An agent that wants to read your calendar, query a database, or search the web needs a way to do it that does not require a bespoke integration for every single source. That is what the Model Context Protocol provides. Introduced by Anthropic in late 2024, MCP is an open standard that the docs describe as a USB-C port for AI applications: one consistent way to connect an agent to external tools, data sources, and workflows.
It is the driver model of the agent OS. An agent speaks MCP, and any tool that exposes an MCP server becomes usable without custom glue. The protocol is now supported across Claude, ChatGPT, and most major development tools, which is precisely how a driver standard wins. Adoption by the ecosystem matters more than technical elegance, because the value is in not having to write the integration twice.
AIOS made this explicit in its computer-use design: the tool manager was extended with an MCP server so agents could reach external systems through a standard interface inside a sandbox. The kernel handles the protocol. The agent just asks.
So what does the full stack look like
Put the pieces together and a recognizable layer cake emerges, with the same shape as the computing stack everyone already knows.
- The hardware layer is the models. Claude, the GPT family, open weights, whatever runs the inference. Interchangeable, like CPUs.
- The driver layer is MCP and the tool connectors, giving the system a standard way to reach the outside world.
- The kernel is the agent OS itself: scheduler, memory manager, context manager, tool manager. AIOS as a research blueprint, Hermes as a shipping implementation, the labs' SDKs as proprietary variants.
- The applications are skills and sub-agents, the actual units of work.
- The GUI is the chatbot. The window you look through, not the machine behind it.
The chatbot was never going away. The desktop did not disappear when operating systems matured. It just stopped being mistaken for the whole computer. The same correction is happening now. The text box is sliding into its proper place as one interface among many, and the interesting engineering is moving down into the kernel.
What this means if you are building
If you are building agents, the lesson from the operating system era is direct. You do not have to build the kernel yourself, but you cannot avoid its problems. Context management, persistent memory, scheduling, tool routing, and process isolation show up the moment your agent does anything beyond a single turn. The choice is whether you reimplement them per project or adopt a layer that provides them.
The one piece that benefits most from being shared is the model layer. An agent OS should stay model agnostic, the same way a real OS runs on more than one chip. That is the gap a gateway fills. Requesty sits at the hardware layer of this stack and gives your agent OS a single endpoint that routes across providers, fails over when one goes down, and caches across the whole fleet, so the kernel above it never has to care which model is answering. You get $10 free to wire it into whatever runtime you are building.
The chatbot was the GUI. The agent operating system is the machine. We spent three years staring at the window. The next few years are about what runs behind it.
Frequently asked questions
- What is an agent operating system?
- An agent operating system is a runtime layer that sits between large language models and the agents built on top of them, providing the same core services a traditional OS provides to applications: scheduling, memory management, storage, tool access, and process isolation. The 2024 research paper AIOS first formalized this by proposing an OS kernel that isolates LLM resources and services from agent applications. The idea is that agents should not each reimplement memory, scheduling, and tool routing. Those belong in a shared layer below them, the same way file systems and process scheduling belong in an OS rather than in every application.
- How is an agent OS different from a chatbot?
- A chatbot is an interface. You type, it responds, and the context disappears when the session ends. An agent operating system is a persistent runtime. It remembers across sessions, schedules work that runs without you asking, coordinates multiple sub-agents in parallel, and connects to tools and data through standard protocols. The chatbot is the graphical layer that humans see. The agent OS is the kernel underneath that runs the work.
- What is Hermes Agent?
- Hermes Agent is an open-source autonomous agent released by Nous Research in February 2026 under the MIT License. It is designed to be self-hosted on your own server with persistent local memory in a ~/.hermes/ directory, automated skill creation in a portable SKILL.md format, parallel sub-agents, a built-in cron scheduler, and a single gateway that connects to chat platforms like Telegram, Discord, Slack, WhatsApp, and Signal. It maps cleanly onto operating system concepts: memory as the filesystem, skills as applications, sub-agents as processes, and the scheduler as cron.
- What role does the Model Context Protocol (MCP) play in an agent OS?
- MCP is the driver interface of the agent operating system. It is an open standard introduced by Anthropic in late 2024 that gives agents a single, consistent way to connect to external tools, data sources, and workflows. Just as an OS uses device drivers so applications do not need to know the details of every piece of hardware, MCP lets an agent connect to databases, file systems, search engines, and APIs without custom integration code for each one. It is now supported across Claude, ChatGPT, and most major development tools.
- Do I need an agent OS to build production agents?
- Not always, but the components show up whether you name them or not. Any serious agent needs to manage context windows, persist memory across runs, schedule background work, route tool calls, and isolate parallel tasks. You can build those yourself or adopt a runtime that provides them. The practical question is the same one teams faced with operating systems decades ago: write your own scheduler and memory manager, or build on a shared layer and spend your time on the application. A gateway like Requesty handles the model routing, failover, and caching piece of that stack so your agent OS can stay model agnostic.
- MAY '26
Building Production AI Agents in 2026: The Complete SDK Guide
A hands on guide to the three major agent SDKs of 2026: Claude Agent SDK, OpenAI Agents SDK, and Google ADK. Learn how each one works, when to pick it, and how to route all of them through a unified AI gateway for cost tracking, failover, and observability.
- MAY '26
Multi Agent Orchestration Patterns That Actually Work in Production
Six battle tested orchestration patterns for multi agent AI systems in 2026. Learn when to use each pattern, how they fail, and how to add cost controls, failover, and observability with a unified AI gateway.
- MAY '26
The MCP Ecosystem in 2026: Building Agent Tool Infrastructure That Scales
The Model Context Protocol has become the universal standard for connecting AI agents to tools and data. With 10,000+ servers, 97 million monthly SDK downloads, and adoption by every major lab, MCP is the infrastructure layer powering agentic AI. Learn how it works, what changed in 2026, and how to manage MCP at scale with a centralized gateway.
- MAY '26
Agent Harness: Why Your LLM Gateway Is the Backbone of Production Agents
The model is the brain. The harness is the body. In 2026 the agent harness has become the critical infrastructure layer for production AI. This post breaks down the stack and shows how an LLM gateway like Requesty fits in with real code examples.

