The Weekend Byte
Posts
The Most AI Company Ever

The Most AI Company Ever

What happens when you staff a company with only AI agents?

Jason Rebholz
May 04, 2025

Big news: I’m officially unemployed. Stay tuned for details on what’s coming next. All I’ll say is that I’m very excited. In the meantime, I‘m taking some time off to recharge, so expect a few special editions of the newsletter.

In other exciting news, my wife and I woke up early last weekend to cheer on my sister-in-law while she ran her first half-marathon. I think our sign gave her and the other runners an appropriate level of encouragement because she crushed it.

Don’t worry, no pups were harmed in the making of that sign. Anywho, in the cyber and AI world today, we’re covering:

What happens when you staff a company entirely with AI agents?
The oh-days (zero days) of summer
Don’t cross the mouse

-Jason

p.s. while I was rooting for my sister-in-law in her half marathon, robots ran their own half marathon.

AI Spotlight
A company of robots

A bunch of professor nerds decided to channel their inner tech bro CEO vibes and create a fake company and only employ AI agents. They called the company TheAgentCompany. Cut them some slack, they’re nerds, not marketing experts.

The outcome in their report was…well…what you might expect.

They tested various tasks that employees would do in any company. These ranged from software engineering, project management, financial management, and all the mundane administrative tasks like tracking down who in the company knows wtf is going on to unblock something you’re working on.

The AI agents used various LLM models, including Anthropic Claude, ChatGPT GPT-4o, Google Gemini, Amazon Nova, Meta Llama, and Alibaba Qwen. Each agent was given access to a web browser, code editor, and a Linux terminal. The company had all the core essentials for an employee. Things like an Intranet, task management software, messaging application, and a Microsoft Word and Drive alternative.

From there, the nerds gave the agents specific job roles and provided specific tasks that a human in that role would typically tackle day-to-day. This wasn’t done lightly...per the paper, “it took 20 computer science students, software engineers, and project managers over 2 months, consuming approximately 3,000 person hours in total.”

So, how did the AI agents perform? Like a 1st-grade school play, it was terrible…but the kids AI agents had fun, so that’s all that really matters, right? The best-performing agent, Claude, only completed 24% of the assigned tasks. The others…much worse.

Time for a PIP? The common failure points for all of the agents were:

Lack of common sense: Like Brad from finance, the AI agents were dumb and lacked domain expertise. These could be basic things like not knowing “.docx” file is a Microsoft Word file.
Lack of social skills: The agents didn’t follow advice or guidance from other bots, like who to talk to next. Instead of following the next steps, they considered the task done. Even AI agents are lazy.
Browsing incompetence: Like your grandma in the early 2000s, the AI agents could not figure out how to navigate the Internet. They routinely got stuck on, of all things, pop-ups. They wanted to follow all of them. I can also only presume another contributing factor was they were looking for cat memes…no judgment there.
Creating shortcuts: In the most dad move ever, the agents tried to be clever and come up with shortcuts that just made the trip longer. In one instance, the agent just renamed another user to the name of the person they needed to contact.

Okay, bad start…but…in the not-so-distant future, I expect a much better outcome. The technology keeps getting better. Just this week, my wife, who is in the medical field, started testing out an AI copilot to help with notes and filling out documentation. She’s all in on it after only a few days. The time savings are huge. It doesn’t remove the human. It supercharges them.

Each month, the AI tech will continue to improve, and we’re going to find that while it may replace some functions, it will ultimately be a productivity boon, allowing people and companies to do so much more.

I’m bullish on the future.

Security Deep Dive
The oh-days of summer

Google’s Threat Intelligence Group (GTIG) tracked 75 new zero-day vulnerabilities, or “oh-days,” as we nerds like to call it, that attackers exploited in 2024. This is a 35% decrease from 2023, where GTIG saw 98 zero-day vulnerabilities. That’s not good news or bad news. It’s just the power of observation and math.

Let’s level-set. A zero-day is a vulnerability threat actors use in the wild before a patch is publicly available. This means that anyone using the technology could be susceptible to the vulnerability.

The top targets for end-user technology vulnerabilities were unsurprising. Windows vulnerabilities continued to lead the charts. However, Safari vulnerabilities jumped to second place in 2024. This isn’t all that surprising given the popularity of Macs and some iOS vulnerabilities attempt to chain with Safari vulnerabilities to gain full control over the device.

Who’s behind all of these oh-days? No shocker here either. State-sponsored groups led the charge. They’re well-resourced and smart. They can spend time and money on finding vulnerabilities that others can’t.

What’s interesting is that non-state sponsored financially motivated groups had five oh-days last year. It’s about the same percentage of oh-days from 2023, but it shows that these actors are investing in either purchasing the oh-day or doing the work themselves.

So, what’s the takeaway? Not much, to be honest. Software companies continue to ship code with vulnerabilities, but fewer vulnerabilities could suggest the vendors are getting better at locking the software down. State-sponsored hackers continue to lead the charge with zero days, but those financial threat actors are still contributing.

Security & AI News
What Else is Happening?

🐭 Don’t mess with the mouse. A former employee was forced to pay almost $700K in restitution and will now spend three years in jail. His crime? He hacked into a Disney application used to create restaurant menus and modified those menus to include different allergens and added profanity. Although that’s terrifying, hilariously, he was caught because he changed the font on some menus to Wingdings, which crashed the application.

⏹️ An Indian court ordered action to block Proton Mail in India. They allege that mean-spirited people are using the privacy-focused email provider to send “obscene, abusive, and vulgar language, AI-generated deepfake imagery, and other sexually explicit content.” This isn’t the first time an order like this has happened, and it won’t be the last.

🤯 Microsoft CEO Satya Nadella said that 20-30% of Microsoft’s code was written with AI. This is on par with what Google CEO Sundar Pichai said of Google’s code. We’re entering an exciting new time of vibe coding that will supercharge app development and destroy application security teams due to the rapid increase in insecure code.

🇮🇷 Iran claims to have stopped a “widespread and complex” cyber attack that targeted their infrastructure. Details are slim on what happened.

🙃 The latest Verizon DBIR report dropped last week. I haven’t read it yet because it’s just shy of 42,000 words, with an estimated reading time of 210 minutes and a speaking time of 323 minutes. Why do so many security vendors feel they must stuff an encyclopedia of information into a single report? I’m not saying it’s not worthwhile, but it might be too much when I have to put a calendar block to read the report.

The Most AI Company Ever

What happens when you staff a company with only AI agents?

AI SpotlightA company of robots

Security Deep DiveThe oh-days of summer

Security & AI NewsWhat Else is Happening?

Reply

AI Spotlight
A company of robots

Security Deep Dive
The oh-days of summer

Security & AI News
What Else is Happening?