What’s a bot OS?

First, some examples of bots

WeChat is a chat platform from Chinese tech titan Tencent. It has hundreds of millions of users, and people rely on it for banking, transportation, and more. Because of its popularity, many chat pundits say, “Wechat is years ahead of us in because they have bots.” But if you actually bother to install and use it, those “bots” are actually just micro-applications.

Nope, that chat looks like your old-fashioned app. Sorry.
Invite a bot to your next call and confuse everyone!
Disney’s still trying to sort out the copyright on this.
I suck at chess.
This bot makes as much sense as most tech pundits.

Why the bot hype?

The chatbot hype machine is going full force, and that means a lot of confusion and valuation. But I think there are two big reasons it’s gaining traction: it bypasses the barriers to entry and “app exhaustion” that limit the growth of traditional apps; and it lets developers test and push code constantly.

Bots bypass garden walls

That barrier to entry is important. All of these host environments—Skype chat, Facebook Messenger, WeChat, WhatsApp, Slack, Twitter, and so on—have different capabilities. A Twitter bot can only interact within the constraints of a default tweet (text and URLs), but that also means the barrier to entry is tiny. Anyone can use the Twitter API to make a chatbot.

Bots can be updated constantly

In the software industry, the old-fashioned approach of installing software on a desktop has largely been replaced by hosted Software-as-a-Service, paid for by the month or by the seat. Microsoft Office and the Adobe Suite are now sold as online services.

Another swing of the pendulum

There’s a pendulum in computing: Mainframes centralized it; client-server computing pushed it to the edge; the web centralized it; the app pushed it to the edge; and now bots are centralizing it once again.

A taxonomy of bots

Clearly, there are different types of bots, and a conversation about “bots” that doesn’t recognize this is too generalized to be useful:

  • If you want to display a rich UI like the one in the Wechat example, everyone in the chatroom needs to support it.
  • If one person’s client can’t, it has to fail back to some default mode.
  • I can only play chess with Ben because Messenger supports images.
  • If you’re speaking out loud and driving a car, I can’t hit read and tap a touchscreen.
What senses you use, what the UX is like, and how it instantiates.
Here’s how you launch an app that winds up in chat (like Lucky Money.) Doesn’t look very chatty.
That seems like an awfully limited summary.
These bots are really overrated.

Chat platform becomes operating system

That means the chat host environment starts to look a lot like an operating system, launching bots, managing what they can do, killing them if they get out of hand, and allowing context switching between them. Which brings me to the point of all this: What does a Chat OS look like?

  • If I have a favorite food bot (say a personal life coach — sidenote, life coach bots are going to be a killer app, IMHO) then it will get the request.
  • If I’m using an ad-backed model, bots will bid for the right to suggest some food (“how about Thai?”) Yes, I can hear many of you cringing.
  • In a social model there will be some negotiation between the participants in a group chat (“does everyone have Uber? No? What about Lyft?”) Nothing like peer pressure to encourage mass installs.
  • What data does the bot know, that the OS is allowing it to have? The bot might want to know things like the user’s location; or the permissions of others in the chat room; or everyone’s names for a reservation, or payment information. This is like a transient OAuth, federating permissions between the host and the bot for the purpose and duration of the interaction.
  • What format can the bot use? If it’s a chatroom that permits rich HTML5 micro-apps then use that. But there may be constraints: Perhaps not everyone has the most recent version; or maybe someone is participating while driving, using voice. So the bot may have to fall back to a less-engaging, less-efficient, lowest-common-denominator level of interaction for some or all users.

A concrete example

Here’s a hypothetical use case of this in action.

  1. I add three transportation bots to my host OS: Uber, Lyft, and a fictional one called Rideshare. When I do so, I grant these bots permission to offer to help with transportation. They’re then registered with the host OS (the chat platform) which will notify them of relevant messages to allow them to infer what’s going on and interrupt they think it’s useful to do so. To do their jobs, the various ridesharing bots need to know the pickup location; number of passengers; and who will be paying. It might also be nice to know the destination.
  2. A few days later, I’m at work, chatting with two other people, all at the same location, and someone says, “let’s go to the party at Mike’s.”
  3. The OS recognizes “let’s go to” as a transportation construct.
  4. The OS notifies the registered transportation bots—Rideshare, Lyft, and Uber, in my case—and manages some kind of bidding process for the “best” bot, based on factors like price, past use, distance to be travelled, climate (bike, walk, or car) and so on. This is the equivalent of paid ads in search results for chat, and I would bet good money on it being a competitive part of the bot ecosystem, with affiliate payments for services subsidizing personal “agents.”
  5. The selected bot looks at what information it already has: It knows my location from location services, shared by the host OS; and the number of passengers it can assume from the people in the chat thread.
  6. The selected bot alsolooks at what else it needs to know. This may be disambiguation—it knows that everyone in the chatroom has a shared contact named Mike; but also that there is a bar called Mike’s that several people in the group have been to before.
  7. The bot might use a more advanced visual interface if all users can support it, to confirm the information. We haven’t really explored multi-user social interfaces like this yet—Rideshare might show a map and let everyone touch where they want to go, and hold some kind of voting mini-game like “where should we go next?”
  8. If it must use plaintext, then it will start a conversation to acquire the information it needs or to disambiguate and confirm things. This is where conversational nuance comes in, with the bot chiming in: “Hey, everyone, this is Alistair’s Rideshare bot. When you say Mike’s did you mean Mike Smith, or the bar Mike’s on Main Street?”
  9. Once it has all of the information needed, it will take action, possibly with a confirmation step and a payment step.
  10. After the transaction takes place, it may provide additional information (“The ride is here!”; “The driver wants to know where you are!”; and even “Alistair’s Rideshare rating has now dropped by one star.”)

The emergence of a chat OS

Today, the bot world is the Wild West. The Facebook chess bot looks like an old MS-DOS game, complete with text commands. There isn’t even the equivalent of top-level domains and DNS for bots; instead, we’ve got dozens of directories reminiscent of early-day Yahoo directories.



Writer, speaker, accelerant. Intersection of tech & society. Strata, Startupfest, Bitnorth, FWD50. Lean Analytics, Tilt the Windmill, HBS, Just Evil Enough.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alistair Croll

Alistair Croll

Writer, speaker, accelerant. Intersection of tech & society. Strata, Startupfest, Bitnorth, FWD50. Lean Analytics, Tilt the Windmill, HBS, Just Evil Enough.