Capability is not autonomy
The most common confusion in artificial intelligence is between what a system can do and what it can be left to do. These sound like the same property. They are not, and the difference is where most of the unsolved work in this field lives.
Capability is performance on demand. You ask, the system answers, and the quality of the answer tells you how capable the system is. Nearly every measure of progress we have is a measure of capability: benchmarks, evaluations, leaderboards, demonstrations. All of them put a question in front of a model and grade what comes back.
Autonomy is different. Autonomy is what happens when nobody is standing there to ask, to grade, or to correct. It is the property of being safely left alone with something that matters.
The two come apart constantly. A model can draft a contract better than most lawyers and still be a system you would never let send an email unsupervised. A model can find a subtle bug in twenty thousand lines of code and still, given a repository and a week, produce nothing a team could merge. We have built systems with extraordinary capability and almost no autonomy, and the strange thing is how rarely this is named as the actual gap.
I find employment a useful lens, since it is the oldest institution humans have for delegating work to other minds. When you hire someone, you are not buying their answers. You are buying their ability to hold a piece of the world steady without you. The new hire who knows everything but must be told what to do each morning is not an employee. They are a very expensive reference book.
What makes a person employable is mostly invisible in an interview. They remember what happened last month and act on it. They notice when something looks wrong and stop. They know what they do not know, and they ask. They feel the difference between a decision they can make alone and one they cannot. And when they fail, the failure has an owner, which means it gets corrected rather than repeated.
None of this is intelligence in the way the field measures intelligence. All of it is what autonomy is made of.
This is why I am skeptical that autonomy arrives as a side effect of scale. Larger models are more capable, and capability matters; an agent cannot do work it does not understand. But the structures that turn capability into autonomy (memory that persists and stays coherent, judgment about when to act and when to stop, accountability that makes errors converge instead of compound) are not properties of models. They are properties of systems built around models. They have to be designed, tested, and earned.
That is the work of Artemis Labs. Not to make models smarter; others are doing that well. Our work is to build what has to exist around intelligence before it can be trusted with anything real.
The field will keep producing more capable models, and we will keep using them. The question we care about is older and harder: when can you go home and leave the work running?
Artemis Labs