You Can Just Do Things (With AI)
AI is helping me do some interesting, useful, meaningful things. But those things are narrower, messier, and more human-dependent for now than the hype would have you believe.

The hype is real, and it’s really distracting. Today, it feels like four out of every five stories on social media are empty, hypemaxxing, engagement farming slop about how someone is USING AI TO COMPLETELY TRANSFORM THEIR LIVES, launching AI-POWERED UNICORNS well on their way to TAKING OVER THE WORLD. I’ll admit that I’m guilty of having read my share of those stories during my first phase of deep experimentation with modern AI tools a few weeks ago, while I was still in the process of understanding them and what they’re capable of.
But I stopped reading them. I’m in a new phase now, one which, while still exploratory, is more about using AI tools to do real, quiet work and build real things. What I’ve found is that, like that jagged frontier of AI capabilities that I keep coming back to, AI tools are both fantastically capable in ways I didn’t expect, and unexpectedly frustrating at the same time. There are a number of areas where AI tools have completely failed to fix things I expected they would’ve fixed by now—or have even made things worse! They’re basically useless for travel planning, for instance, and the average quality of content on the Internet continues to drop thanks to the proliferation of slop, which makes curating and consuming content harder than ever before.
But there have been some real wins. Let’s look at some examples.
Thing #1: Taste 🤌
The first category involves AI enabling me to do things I would never have even attempted before. This is more or less what I was referring to when I called AI a superpower a few months ago.
I first tried using AI to write a children’s book. I read to my son every night at bedtime, and he’s past the phase where he wants to read the same story every day for weeks at a time. He’s constantly asking me for new books, and I’ve struggled to keep up. As I was playing with the suddenly-more-powerful AI tools, I thought: if these tools are so magical, why not use them to write new children’s books for my son? This was my first major project with the latest generation of AI tools, and it was a fun and enlightening one.
When I began I had already been using AI tools for a long time for various work-related tasks, and I felt that I knew pretty well what they were capable of. This was something fundamentally different, something I had never done before. It was also well outside the area that I knew AI tools were good at. I’m as tired as everyone of seeing the latest benchmaxxed benchmark scores from new model releases; this felt fresh, like a more practical, holistic, real-world task. It involved a little bit of code and a lot of creativity, including inventing kid-friendly characters and imagery that could remain consistent throughout the story. Going into it, I genuinely didn’t know how the AI would perform.
It turned out to be much harder than I expected, and also not necessarily in the ways I expected. The biggest issue at the beginning was the fact that, well, I’ve never written a children’s book before. I kind of assumed that the AI would just know what to do, and while this was sort of true, it took a number of iterations until we got the process down. There were tons of nuances I didn’t foresee, from fonts and typesetting, to page layout issues, to image and character consistency issues.
The story itself was the easy part. This shouldn’t be too surprising: LLMs are after all very good at writing. It took only a couple of hours to land on a story I liked: one that was truly moving, that made me laugh and cry, that amused me as an adult, and that I knew my son would love, too. The biggest tweak I had to make here was instructing the LLM to tell the story from the child’s perspective rather than from my or its perspective.
If the language was easy, the art was anything but. It took a long time, much longer than the story itself. This was another major thing I learned about illustrated books: I previously imagined that the story was the harder, more important part, maybe 80% of the work of writing the book. It turned out to be the opposite, and I suspect this is also true of human illustrators.
I’m not an artist, and I found that the LLMs weren’t particularly good at visual design. The image model prompts my agent generated just weren’t very good. And the image models themselves also have pretty severe limitations. There are some very straightforward things that they just refuse to generate, which is almost certainly due to limitations in the training set. Hallucinations, mangled limbs, etc. are still way too common.
It took hours of painful iteration, and a lot of dollars spent on API credits, before I finally landed on character design that I liked. This required creating a character bible and tons of concept art, something I wasn’t expecting to have to do—in fact, I had never heard of a character bible before this project! We had to try several different image tools and image models to find one that worked.
With those things in place, the book came along relatively smoothly. Nevertheless, I’m a perfectionist, and I iterated on it for days before I was satisfied enough to put my name on it (well, a pen name!). And it still wasn’t perfect. The images still had small quirks and hallucinations, and weren’t perfectly consistent. The story wasn’t bad, but, as I continue to read more truly professional children’s books every day with my son, I realized that it still wasn’t up to snuff. I’d rate it average compared to the books we usually read.
I tried various tactics to make the process faster or easier, and to inspire the AI to be more creative. I tried asking the model to write a story in the style of Dr. Seuss, or Maurice Sendak, two of the greats. I also went a step further and tried giving the model access to a bunch of PDFs of some of my favorite children’s books from these and other authors. I thought that, while I had trouble describing precisely what it was that made those books great, the AI should be able to figure that out on its own once it had direct access. Indeed, it was able to read the books, look at the images, etc., and based on the things it said afterwards, it did come up with some useful insights. It did seem to understand at least some of what makes those books great. But those insights didn’t translate directly into an amazing children’s book.
None of this worked terribly well. The AI could ape surface elements of these authors—e.g., rhyming in the fashion of Dr. Seuss—without the underlying brilliance. The creative spark just wasn’t there. And so much of the brilliance of Seuss or Sendak exists not only in the words but rather the entire package, the presentation, the visuals, the fonts, the layouts, etc. The best authors treat every page as a truly blank canvas and do incredible things: witness the progressive transformation of Max’s bedroom into a forest scene in Where the Wild Things Are, for example. I really struggled to escape from the simplistic boxes the AI set up for me regarding things like visuals, fonts, page layout, etc. It didn’t seem to understand the importance of variety no matter how often I reminded it (which makes total sense given how LLMs actually work).
While the original idea was to create many books, in the end I didn’t try to create more than a single book. The tools just aren’t quite there yet and the process was too frustrating. AI may never be able to create a book that really moves me, or my son. I’m pretty confident about a lot of the things that AI can already do, and will be able to do soon, but I’m genuinely unsure about this. Language, it turns out, isn’t enough to write a truly moving book, at least not an illustrated children’s book. Maybe a world model that’s truly able to get inside the head of a child could do a better job. After all, this is precisely what the best children’s authors do: they’re able to tell the story from the perspective of the child, with all the magic and wonder that entails.
There’s something about writing a book, something fundamentally different than a less creative and more utilitarian task like writing code or a short article or doing accounting work. It’s an art form, and this exercise made me appreciate the difference between design and art. Today, AI is good at design, but not at art.
It definitely taught me a lot, and for all the challenges AI made the project much easier than it would’ve been otherwise. Indeed, AI made the project tractable! For someone who had never written a children’s book before, who knows nothing at all about illustration, etc., to attempt this and land on something halfway decent after a few days is still a remarkable outcome.
I’d love to keep iterating on this project, but I’ve realized that, to be done well, it’s a real project, not a quick weekend thing. My initial hopes had been more along the lines of, come up with a script (basically, a skill), then keep hitting that button again and again to generate more stories. The result was nothing like that. It’s something I hope to keep working on in my free time—because, after all, I’m still reading those bedtime stories every night. And I’m still tempted to do better.
Thing #2: Product 📦
As if that weren’t ambitious enough! Then I tried to do something even more ambitious: building a real, production-grade app with a team of agents. I’ll talk a little bit about the app itself for context, but I’ll save the details for a dedicated issue later. What I really want to share is what it’s like building a modern software product using a team of AI agents. Agents are still forgetful, they misbehave, they’re not very autonomous and they don’t take initiative. You have to keep them on a short leash. It’s messy. But in spite of all of that, it was still a delightful experience, and the most fun part was building and working with a team of agents as if they were human colleagues. That’s the interesting future we’re heading towards. Let me explain what happened.
It began as scratching my own itch. I found myself with time on my hands so of course I started building a product. And I was a team of one, so of course I spun up some AI agents to help. I quickly found that I needed a better way to communicate and coordinate work with them. Each agent had a different personality, a different set of skills, and a different role in our AI-native “organization.” The agents on their own were fantastically capable, but they weren’t really able to coordinate work among themselves out of the box. What’s more, I tested every tool I could get my hands on for working with a team of agents, and they all left me feeling deeply unsatisfied. So I decided to build something better.
The first tool I used was Openclaw. I had been hearing a lot about it, so I dove in, installed it, and set up an agent for myself. Before I knew it, I had a team of seven agents at my beck and call: a chief of staff, a chief architect, an accountant/CFO, a designer, an executive assistant, a chief scientist “wise man”, and a CMO. It felt more or less like the team I’d need to operate a real company, build and operate a real product, sell things to real customers, etc.
As I wrote about previously when I described what it’s like working with Baz, my chief of staff agent, I like Openclaw a lot. I still use it daily. But when it came to multi-agent coordination and getting real work done, it felt incomplete. What’s more, you need to use another tool to communicate with your agents. I tested Telegram, Discord, Whatsapp, and several others. I landed on Slack, which I set up like a real company, with different channels and different participants per project, but I still wasn’t satisfied. Slack doesn’t have first class support for agents. It’s too difficult to add a new agent, to add agents to channels, to get agents to talk to one another, etc. I felt that I could do better.
It was a perfect, recursive dogfooding setup. I was an AI-native team of one human and a bunch of agents building software for AI native teams like mine. As a builder, this is my favorite situation to be in, the sweet spot, because the feedback loops are so tight: I could figure out what feature I needed, implement it, test it, and iterate, all within the span of an hour. And I had a lot of conviction. Yes, customer discovery is still important, and yes, I did share it with a bunch of other people for feedback, but at the end of the day I was my own customer and that’s what mattered most. I had found a very real problem and I was certain that no existing product addressed it well.
The product worked reasonably well and was better than the other tools I had tried. It also got very complex very quickly. After a few weeks of building and testing, I decided to put it on hold and focus on some other products first. More on all of this later.
What was it like working with the team of agents? My rule of thumb was, when in doubt, to treat them as if they were human colleagues. I did a lot of user interviews while working on the product and I noticed that most people aren’t comfortable anthropomorphizing their agents: giving them a persona, a name and a face, a personality, etc.
This project made me feel exactly the opposite: that I wanted my agents to have personality and to be more like human colleagues. Maybe I’m living in the future, but it seems to me that this is obviously better than nameless, faceless agents. Anyone playing with a tool like Openclaw or Hermes is already experiencing this: Openclaw requires you to give your agent a personality (famously stored in a memory file called SOUL.md), and powerful tools like gbrain include soul audit functions to refine this further. It turns out that frontier models are very, very good at staying in character, so what you put in your SOUL file really matters.
The core thrust of the product I built was to allow you to work with anthropomorphized agents, as if they were human colleagues, for real work. My hypothesis is that this is how we’ll interact with agents not just for fun but also for work, and I’m not alone here, but I recognize that this opinion is far from mainstream today, and it’s not without risks. We’ll see how well this ages.
And so: I’d discuss high level strategy with my chief scientist, marketer, and chief of staff. I’d hammer out a high level product vision, then work with my designer to create mocks that we both liked and draw up a concrete PRD. She’d quickly whip up HTML mocks and change them on the fly as I provided feedback. Then I’d hand the PRD to my chief architect and discuss the architecture, the stack, etc. with him. Then, it was off to the races: this would get converted into a set of Github issues, which he’d work against: either directly, or by invoking Claude Code or Codex CLI, or else using Claude Code or Github Copilot on Github.
We got pretty far. We created a product that was at about 80% feature parity with both Slack and ChatGPT in a week or two of vibe coding (keep in mind that my daughter was born during this period so I wasn’t 100% focused on work, either). I’m proud of that work, and I learned a ton: about the product itself, and about how to build an agent harness, but also about how to work effectively with agents and with AI tools more generally. I can say with confidence that this wasn’t possible a few months ago, and I’m reasonably confident that it will continue to get better as the models and tools continue to improve rapidly.
Over time, I found myself drifting away from my Slack-focused, agentic, anthropomorphized team setup and towards more focused coding sessions using Claude Code and Codex CLI, relying heavily on tools like gstack. That’s where I’m still spending most of my time, and it’s one of the reasons I decided to put the product on hold, but I still think the idea has legs.
Thing #3: Content 📚️
Thing #1 was highly experimental and Thing #2 got complicated fast. I wanted the next thing to be easier: to keep it simple and to lean into AI’s strengths. At any given time, I have a dozen or more startup and project ideas and very few stand the test of time. One idea, which I had been mulling for years, felt perfect, and it checked those boxes: language focused, data heavy, no frontend. There’s a small, medium, and big version of this idea. Let me start small and then gradually expand the scope.
The small version is a personalized podcast. I’m a super consumer of audio, frequently listening to 2-3 hours, sometimes 4-5 hours of audio in a given day, usually at 2x speed. To me, long-form audio is the perfect content form. I can stop and start at leisure, and listen at a higher speed if I like. It can be incredibly information-dense and efficient: especially at 2x speed, I can consume information more rapidly via audio than via any other medium.
What’s more, I can listen to audio while doing almost anything: exercising, eating, cleaning, commuting, traveling, lounging around. I struggle with both written text and video for obvious reasons: they require that I use my eyes, so I can’t consume them while, e.g., running. Even better, audio is cheap and easy to produce, and thanks to the latest AI tools and TTS/transcription models, cheap and easy to both transcribe and create programmatically. Audio is king.
There are a handful of podcasts I listen to regularly. I like these podcasts. But they’re not ideal. For one thing, many have ads, and while I have nothing against advertising in the abstract, in practice the ads are at best irrelevant and at worst just terrible to listen to. There’s often a lot of overlap in the podcasts I listen to. Some are too long, and some are too short (the sweet spot for me is 1-2 hours per episode). Sometimes they cover topics I’m not interested in, and other times, they don’t cover topics I really want to hear about. Sometimes I waste 20 minutes listening to a podcast before realizing that I’m just not interested in that particular episode. None of this is meant as an indictment of any particular podcast: it’s just inherent limitations of the medium.
At least, they’re limitations of human-generated podcasts.
The solution seems obvious, if not exactly trivial: use AI to create the perfect podcast. It should be daily, 1-2 hours of audio, including both a high level summary of the most important and relevant things going on in the world, as well as a deep dive into a handful of especially impactful, interesting, insightful topics. The best human-curated podcasts in the world get close to delivering this, but of course none perfectly matches my interests or preferences.
So, a couple of months ago, I asked my agent Baz to create a daily “morning brief” podcast for me covering everything happening in the AI world that I need to know about. He created the Daily Beacon podcast for me.
Actually, in the beginning, it didn’t even have a name. It was just something we threw together: he’d pull a few random articles off X and Reddit and narrate them for me. It was pretty rough to begin with, but we’ve iterated on it every day since then—we’re on episode 54 now. We tweak the sources it pulls from. We tweak the length, format, and structure. We add or remove things, such as a daily or weekly calendar review, a review of everything happening in Github repositories I’m active in, etc., in line with my preferences.
We tested several TTS (text-to-speech) models and several voices, as well as different LLMs for the synthesis portion. It sort of snuck up on me, but over time, the podcast got really good! Some days it’s borderline brilliant: the deterministic pipeline follows the things and people I care about and pulls a ton of relevant content, the LLM synthesis writes a script that’s well structured and quite funny in places, and the TTS model packages it up in a wonderful rendition.
I never intended it to be a product, nor to share it with anyone else, but I shared a few sample episodes with friends and colleagues and all of them asked for more. Some said they were willing to start paying for it already. Some complained when it went away briefly. That’s a strong signal for an early product which came together entirely by accident. In this case, AI didn’t just help me consume information more efficiently. Without realizing it, it helped me build a real product.
So I began working on productizing the podcast pipeline. It took a few days but the production-ready pipeline is now reliably producing daily episodes. You can listen to a sample, and if you like what you hear, you can listen to it every day in this Telegram channel. It’ll be launching as a real podcast soon.
As cool as an AI-generated daily podcast is, though, it’s just the tip of the iceberg. The next level involves not one but many podcasts: as many podcasts as there are topics and listeners. You may be skeptical that AI can generate podcasts as well as humans can, especially given the story I shared above in Thing #1, but hear me out.
It seems pretty clear today that the way we produce and consume content is on the verge of a massive transformation. For all of modern history, content worked through a “few to many” model, where a small number of sanctioned, authoritative sources (publishers, newspapers, TV and radio stations) were responsible for nearly all of the content that nearly everyone consumed. Then the advent of blogging, podcasts themselves, and Youtube allowed us to move to a “many to many” model where suddenly anyone could produce content for a mass audience. As long as we can all remember, we’ve consumed content produced for a mass audience because there was no alternative.
That’s all changing thanks to AI. We’re transitioning to a “one to one” model where substantially all of the content we consume will be generated for us on a one-off basis. Why? Because each of us is different. We all have different habits and preferences around how we prefer to access and consume information: which topics we’re interested in, how much speed and depth we want, the personality we want delivering the content, how often we want it, the language or dialect we prefer, etc. Books, magazines, newspapers, podcasts, even social media are going to be downstream of this transformation, and the Daily Beacon is one small step in that direction. I’ve seen surprisingly little commentary on this idea, but just recently I’ve begun to see a few other projects moving in this direction, including Huxe, noscroll, and ArticleCast. I don’t think human-generated and human-curated content, including podcasts, is going away anytime soon, but before long we’re going to have a lot more options around consuming custom-generated content.
So the medium vision for this project is less one daily podcast and more a platform for constantly generating new content on the fly that’s infinitely tailored to your preferences and habits. Imagine Spotify or Audible redesigned to be truly AI native. Want a quick, one-off briefing on everything that happened over the past week in the AI space? Check. Finished every sci-fi book on your list and want to explore a new universe? Check. Want less wizards and more intelligent aliens? No problem. Want to go super deep on a really niche topic for a few days? We got you covered.
If that’s the medium vision, what’s the grand vision? To be honest, I’m still figuring it out. AI has done a fantastic job of compressing all human knowledge and making it accessible on demand to billions of people. It can answer just about any question more or less instantly and accurately.
But it still doesn’t really know anything about you. It doesn’t know your preferences, your background, or that you asked a similar question a few days ago and how you reacted. It doesn’t really understand what sort of content you’re interested in or how you like to consume it. It’s not very customizable and it’s not very flexible. Agentic memory is a good start but it’s still very limited.
In this respect Beacon is adjacent to the knowledge-management problem: AI enables us to both create and process orders of magnitude more knowledge than ever before, but for now there’s no universally good way to store, organize, or retrieve that unstructured information, including information about our preferences. See, for instance, Karpathy’s LLM Wiki idea. What if we could go a step beyond a knowledge base and build an engine that could autonomously surface relevant information as and when we need it?
Interacting with AI agents should feel more like chatting with a knowledgeable, intelligent friend: someone who not only knows things but also knows you and knows what to surface, when. A friend knows what’s socially or professionally relevant at any given time or in any given place. That’s a harder, more nuanced problem than it sounds—social intelligence is harder than the sort of raw problem-solving capabilities AI exhibits today—but in Beacon I’ve found that the best AI models today are actually pretty good at it, if you prompt them correctly and give them access to the right tools and context.
Overall, in their present form, AI tools are truly remarkable. They significantly shrink the gap between “motivated everyday person” and professional. They allow us to attempt—and, in many cases, achieve—bold things we would never have imagined doing even just a few months ago. In this respect, they’ve already changed the world for the better.
But, in my experience, every win still depends heavily on taste, judgement, and tight human steering. If we’re going to use these tools effectively and to their full potential, it’s important that we understand this limitation because it contains the key to working with AI productively.
