3 min read

OpenAI's Atlas Agent Is a Promising Intern Who Can't Work a Full Shift

OpenAI's Atlas Agent Is a Promising Intern Who Can't Work a Full Shift
OpenAI's Atlas Agent Is a Promising Intern Who Can't Work a Full Shift
5:43

We watched a seasoned tech journalist put OpenAI's new Atlas browser through its paces, and the results are exactly what you'd expect from a technology that's still finding its footing: occasionally brilliant, frequently confused, and perpetually running out of stamina.

Kyle Orland at Ars Technica spent a day treating Atlas' Agent Mode like an unpaid intern, assigning it the kinds of soul-crushing digital tasks that make us question our career choices. The setup is straightforward—you tell the AI what you want done on the web, and it clicks, scrolls, and types its way through the task while you theoretically do something more valuable with your time.

New call-to-action

The Good: It Actually Works Sometimes

The most impressive result? Atlas managed to monitor a Pittsburgh radio station's live stream, identify songs as they played, and automatically build a Spotify playlist. That's three different interfaces, multiple authentication steps, and real-time audio processing. Orland gave it a 9/10, with points deducted only because you can't leave it running in the background all day—which is exactly when you'd want this kind of automation.

The agent also excelled at email drudgery, scanning a week's worth of PR contacts and building a formatted spreadsheet without explicit instructions. It correctly identified the right Gmail account, constructed an appropriate search query, and extracted structured data from unstructured emails. This is the kind of task that makes grown professionals weep with boredom, and Atlas handled it competently.

Even the weird stuff worked. Asked to build a Neocities fan page for the Star Trek character Tuvix (complete with accusations of murder against Captain Janeway), Atlas delivered a functional Web 1.0 site in two minutes. The prose was diplomatic where Orland wanted accusations, and the images broke because the agent hotlinked instead of uploading files, but the fundamental task succeeded.

The Bad: It's Like Watching Paint Dry, Then Run Out

Here's the problem that surfaced in nearly every test: "technical constraints on session length" means the agent just... stops. In the middle of tasks. Without finishing.

Asked to scan 164 emails for PR contacts, it processed 12 and quit. Told to find and download Mac game demos from Steam, it spent ten minutes in an infinite loop, found the demos, and never attempted a single download. These aren't edge cases—they're the central use case for automation. We want agents precisely because tasks are long and repetitive.

According to Orland's testing, published October 23, 2025, the agent scored a median of 7.5/10 across six varied tasks. That's respectable for a preview feature, but the variance tells the real story. It achieved a 9/10 on radio playlist creation and a 1/10 on game demo downloads. You can't deploy something that might completely fail depending on which website it encounters.

What This Means for Marketing Work

Let's be direct about what this technology represents for our world: a partial solution to tasks nobody wants to do, hampered by limitations that make it unreliable for the very automation we need most.

Could Atlas scan competitor websites for pricing updates? Probably, until it hits that session timeout. Could it monitor social media for brand mentions and compile reports? Sure, for about four minutes. Could it handle the repetitive data entry that consumes three hours of someone's week? Only if that three hours can be broken into six-minute chunks that you manually restart.

The agent's ability to interpret instructions, navigate interfaces, and adapt to obstacles is genuinely impressive. The fact that it understood "find WYEP on Radio Garden" but smartly pivoted to wyep.org when that failed shows contextual reasoning that goes beyond simple automation. When it accidentally clicked an EVE Online ad while trying to complete the radio task, it recognized the error and corrected course.

But impressive technology that can't complete tasks is just expensive theater.

The Honest Assessment

OpenAI is clearly positioning this as the future of web automation, and they're probably right about the direction. The execution just isn't there yet. Atlas performs like a smart but easily distracted assistant who needs constant supervision—which defeats the purpose of automation.

The most telling detail in Orland's testing? He had to keep prompting the agent to continue working, even on tasks it was handling competently. It played 2048 for four minutes and stopped despite an unfinished board. It added two songs to the Spotify playlist and called it done. These aren't technical failures; they're design decisions that prioritize compute costs over user value.

For marketers evaluating whether to integrate agentic AI into workflows, this testing offers a clear answer: wait. Not because the technology is fundamentally flawed, but because it can't yet deliver on its core promise. Automation that requires supervision isn't automation—it's just a different kind of work.

We'll be watching to see if OpenAI extends those session lengths, improves the agent's ability to complete tasks without prompting, and solves the background processing limitation. Until then, Atlas is a fascinating preview of a future that's still several updates away.

If you're trying to determine which AI tools actually deliver ROI versus which are just impressive demos, our growth strategists cut through the hype. Let's talk about building AI strategies that work today, not tomorrow.

OpenAI Releases GPT-OSS

OpenAI Releases GPT-OSS

OpenAI just did something we thought extinct: they returned to their open-source roots. After keeping their best models locked behind paywalls since...

Read More
OpenAI Wants to Be Your Browser Now. Of Course It Does.

OpenAI Wants to Be Your Browser Now. Of Course It Does.

We knew this was coming. The moment ChatGPT added search capabilities last year, the browser announcement became a matter of when, not if. Today...

Read More