These days, it can feel like AI is everywhere, taking over the world and capable of almost anything. With countless AI startups, projects, and products flooding the market, it’s easy to get overwhelmed. As an open-minded yet inherently skeptical tech enthusiast, I’ve tested many major AI products, including ChatGPT and Copilot. I was reflecting lately on the current state of AI. While these tools have their merits, I've found that AI, for all its power and potential, still falls short in several frustrating ways. Here are three basic yet crucial tasks that AI tools today still seem incapable of handling.
Thing #1: Agent 🎧
AI's most obvious potential is in acting as a semi-autonomous agent, working on our behalf and keeping our interests in mind. Ideally, such an agent would only need occasional feedback or intervention when stuck or when human input would make a significant difference. Imagine an AI that handles mundane tasks, freeing up our time for more meaningful activities. I suspect that, assuming AI tools are mature enough and that concerns about things like privacy can be adequately addressed, most people would be happy to have such an agent working on their behalf and many would probably pay money for such a tool. This is the most obvious, desirable thing that AI tools aren’t yet capable of doing.
I’m intentionally being nonspecific here in using the word agent because there are many potential use cases, but the holy grail is removing drudgery that’s difficult or expensive to outsource to another person. My life is full of things like this and I’ll bet yours is, too. Planning travel is the most obvious. I don’t trust another person to do this for me because I’m very particular about my travel, I have certain routines and preferences that would be very difficult to explain, and I’ve gotten into trouble by outsourcing this task before.
Other things a personal agent should do are managing my calendar and planning meetings, touching base with friends, planning simple dinners and get togethers with family and friends, suggesting or curating content such as must-read articles, podcasts, and audio books, automatically replying to some kinds of emails, etc. If such a tool worked pretty well I could see it giving me back one or two hours a day, which would be insanely valuable, and I’d pay well for such a tool, as I’m sure would millions of others.
It’s theoretically possible to train a person to do many or most of these tasks. There’s no shortage of virtual assistant startups. But I haven’t done this for several reasons. One is that, as with travel, I’m very particular about the way these things are done and I really don’t trust another person with most of them. (The thought of a virtual assistant pretending to be me and drafting emails to friends, family, colleagues and counterparts from my email account is truly terrifying.) The reality is that it would take longer and cost more to train someone to do even a subset of these things well than it would for me to do them myself. And there’s quite a breadth of tasks, so I’m skeptical that one person could handle all of them well.
In theory, there's no reason an AI agent couldn't handle these tasks. While we seem close to this reality, I have yet to encounter an AI tool that comes close to fulfilling these needs. There are a number of startups with “hybrid” products (i.e., AI-assisted with humans in the loop) that have appeared recently that promise to remove this sort of drudgery but the reviews are mixed and I remain deeply skeptical. This sort of application is a great litmus test for how far AI tools have come, how far they still have to go, and for the definition of intelligence.
There are a few reasons tools like this don’t yet exist but a big limitation is access to data. To be effective this tool would need—in addition to knowing me and my preferences—to be able to perform a web search, review the results, filter out useless information, and then summarize and present the results well. It would need to be able to transact, too, booking tickets and sending emails and text messages and then sending me a useful summary and list of action items. Tools like ChatGPT are just beginning to gain the first ability but I don’t think the second is around the corner just yet. Again, I don’t see a fundamental reason this can’t happen, but I think people are afraid to give AI tools the “keys to the kingdom” in the sense of letting them transact on one’s behalf due to issues like hallucination (and, you know, fear of AI takeover). It’s an interesting, revealing question—what would have to happen for you to feel comfortable giving an AI agent access to your bank account, credit card, email or phone contacts?
The other obvious issue here is interoperability and privacy. To be effective this tool would also need access to index, search, and indeed to be trained on basically all of my content across many apps, sites, and services. I have no interest in giving an agent running on someone else’s server access to these data, and I’m not yet able to run something like this locally on my own device. And even if I did want to give it access to these data, they live siloed in many walled gardens controlled by many different companies that don’t have APIs and won’t make integration easy. In many ways these two limitations are the Achilles heel of centralized AI agents and the asymmetry that might let decentralized, local “small” AI agents win over their “large” cloud-based peers.
Thing #2: Content Curation 🎨
I’ve written a number of times about the Cambian explosion in content that we’re living through. Despite how overwhelming it feels now, we're likely just at the beginning of this content boom. It started a generation ago, driven by the rise of Web2 publishing tools like blogs and social media. And it’s about to get much crazier as AI tools are more widely adopted for content generation.
To say that curation tools have failed to keep up with this explosion is an understatement. We used to rely on relatively simplistic tools like newspapers, magazines, movie theaters, record stores, book reviews and book clubs to curate content for us, but those tools are totally outmoded (and have been for years). And it shows: traditional media are in freefall, even the most popular newspapers and magazines of yesteryear are failing left, right, and sideways, and the few remaining players aren’t doing a very good job, either. To be sure there are still a few good human curators around—these days you can mostly find them on places like Substack—but as the breadth and volume of content continues to grow they’ll have no choice but to focus more and to become more and more niche relative to the broad universe of available content.
This seems like an obvious place for mature AI tools to step in and fill the gap! The first attempt to do something like this was probably Google News which launched a remarkable 22 years ago. It still exists but like most Google products it hasn’t evolved or kept up with the times. It merely aggregates headlines with no context or customization beyond one’s geography. In particular, it can’t perform analysis or deliver summaries.
What I’m looking for isn’t Google News. I want an AI-powered news agent that delivers exactly the news I care about, and only the news I care about, in bite-size format. I don’t want a daily newsletter; I get a dozen of those every day and never read them, mostly because I don’t like content being pushed to me (as opposed to my accessing content when I like) and because their content is almost never relevant. I don’t want an app run by a media organization like CNN, Fox News, or NYT constantly sending me biased ads in the form of push notifications. I certainly don’t want a TV station or newspaper full of content I don’t care about.
This problem doesn't seem insurmountable, does it? It feels like the technology is already here. Surprisingly, though, as far as I know, such a tool doesn't exist yet, despite some experimental attempts.
Why not? I don’t know for sure but my best guess is that the companies that own the content in question are loathe to license it to AI companies and projects, and that no workable business model has yet been found (Google News also got in trouble for similar reasons). It reminds me a lot of the early days of MP3s and Napster when the music industry completely failed to understand the rise of digital and clung to their old, broken business model as long as they could. If no workable business model and product emerges then “pirate AI” projects should capitalize on the situation just as Napster did—in other words, the market should force a resolution.
ChatGPT can do many things and it’s gotten more powerful and flexible over time. It may not be the best tool for any specific task but it does a lot of things reasonably well. It does appear capable of performing real time web searches and providing up to the minute information. I imagine that, with a bit of tweaking and training, it could probably perform this task reasonably well. The hard part here is learning someone’s preferences—the topics they do and don’t care about, the format, sources, length, tone of voice, timing, etc. they prefer. (News apps and aggregators including Google News have been trying to learn preferences for many years and for some reason they’ve never managed to do this well.) ChatGPT isn’t capable of this yet; its context window isn’t big enough.
With respect to content curation news is just the most obvious type of content, and might be the first domino to fall. I’d love to see AI curators for long form content, art, music, apps and projects, products, people I should meet, etc. I’m sure all of this is coming. But I’m shocked that none of it is possible yet today. I’d trade three dozen AI profile photo generators for one of these services!
Thing #3: Innovation 💡
It's crucial to recognize the strengths and weaknesses of AI tools. Large Language Models (LLMs), like the one powering ChatGPT, excel at imitation but struggle with true originality. They predict text based on patterns in their training data, making them great at mimicking but not at generating genuinely novel ideas.
A LLM is quite literally an imitation machine: it predicts the next token on the basis of the prompt, some context window, and the previous tokens it’s output, nothing more and nothing less. I don’t mean to downplay the power of this idea and this technology—some useful, surprising, even startling behaviors emerge and we don’t even understand how LLMs are capable of all the things they can do. They’re certainly capable of mixing ideas and concepts from multiple fields and disciplines and presenting them in novel, creative ways, and this can be the basis for some types of innovation. But in my opinion they’re not yet capable of true or radical innovation, i.e., of generating truly novel ideas from scratch.
While it seems that AI technology today is nearly capable of the two applications I wrote about above, and while I wouldn’t be surprised to see them emerge soon, as I wrote previously I don’t think AI tools are capable of generating truly original, thoughtful, creative content yet. A good artist copies; a great artist copies, then synthesizes original work on the basis of that copying. LLMs are great at the first but they aren’t yet capable of such truly original synthesis.
But it’s fun to think about what would be possible if they were capable of this, and what someday may be possible when they get there. A great litmus test would be asking a LLM to generate truly novel business ideas, with the goal of building a successful business, making money, even dominating a market.
Don’t get me wrong: I find these tools to be great sparring partners. Ethan Mollick has an excellent term for this, co-intelligence, and his research suggests that AIs perform best and are the most helpful on creative tasks when humans engage in back-and-forth dialog with them. You can suggest personas, ask them questions, ask them to review and rate your own ideas and suggest improvements, etc.
But asking them to perform de novo innovation won’t yield very good results. By definition and by inherent limitation they’ll spit out whimsical yet predictable ideas that other people and other tools would also come up with. You can ask them to reconsider and be more creative, but in response they’re more likely to hallucinate and suggest things that aren’t possible. And this is one class of task where popular techniques like fine tuning and RAG won’t help at all because by definition there’s no available reference material for something truly novel.
Another obvious example would be to ask a LLM to write a book or even a long form article. It’ll happily comply but you’ll get something that sounds like a mashup of, yes, every book or every long form article you’ve ever read which isn’t exactly the definition of novelty and innovation. It might do a little bit better if you ask it to architect something, whether an app or a building, but that’s because innovation matters less in this class of design.
De novo innovation is really, really hard. I’ve always found this to be the case in my own life and career. It’s not that hard to be creative within tight bounds, e.g., finding clever ways to do one’s ordinary work tasks more cheaply or more efficiently, finding many ways to word a sentence, or thinking of many uses for an object. But “blue ocean,” blank slate innovative challenges, like “create a product that redefines a market” or “write a best selling book or a hit song that changes society” isn’t so straightforward. I think this is one reason I’ve always been a big picture thinker and pursued moonshots—because they’re by definition the hardest thing you can do, and I like challenge.
What will happen when AI tools begin competing with one another along these lines? These tools are obviously getting better by the day and it’s not that hard to imagine that in a few years’ time they may work less like imitation engines and be more capable of true innovation.
But that day isn’t today and I don’t expect it’ll come so soon. Getting there may require nothing less than inventing AGI, and in fact the ability to truly innovate (rather than imitate) might be one of the bases on which we judge something to be AGI. If and when AI achieves this significant breakthrough we might be able to call that AGI and we might be able to turn over the reigns to AI entirely!
Until then I’m not planning to quite my day job and I’m not going to give up on innovation or moonshots.