{"id":817,"date":"2011-10-05T08:45:42","date_gmt":"2011-10-05T14:45:42","guid":{"rendered":"http:\/\/www.jeffwofford.com\/?p=817"},"modified":"2022-08-19T10:58:30","modified_gmt":"2022-08-19T16:58:30","slug":"how-siri-works","status":"publish","type":"post","link":"https:\/\/www.jeffwofford.com\/wp\/?p=817","title":{"rendered":"How Siri Works"},"content":{"rendered":"<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-828\" title=\"Apple Siri\" src=\"https:\/\/i0.wp.com\/www.jeffwofford.com\/wp\/wp-content\/uploads\/2011\/10\/apple-siri-logo1.png?resize=125%2C149\" alt=\"\" width=\"125\" height=\"149\">Once again someone has offered us incredible artificial intelligence, and once again we are bracing for disappointment. It happened with handwriting recognition on the Newton, which proved to be slow and clumsy. It happened with the not-as-smart-as-they-first-appeared creatures of Lionhead&#8217;s&nbsp;<em><a title=\"Lionhead Studios: Black and White\" href=\"http:\/\/lionhead.com\/Games\/BW\/\">Black and White<\/a><\/em>. And remember the <a title=\"YouTube: Kinect Debut\" href=\"http:\/\/www.youtube.com\/watch?v=Mf44bWQr3jc\">Kinect debut video<\/a> showing a kid interacting with an on-screen villain effortlessly, the AI character perfectly intoning the kid&#8217;s name? Kinect brought some of the innovations promised in that early teaser, but clearly the video implied a level of sophistication and polish that turned to vapor in the end.<\/p>\n<p>But it&#8217;s Apple this time, with Siri on the iPhone 4S. And although Apple has screwed up before\u2014witness the aforementioned Newton\u2014if anyone has the motivation, the resources, and the smarts to get AI right, the iPhone dev team is it.<\/p>\n<p><!--more-->Having programmed&nbsp;and taught artificial intelligence in video games for almost twenty years, I am deeply skeptical\u2014you might almost say cynical\u2014about claims to offer a truly useful and usable intelligent agent. Ordinary people\u2014those who don&#8217;t study AI\u2014have big hopes (and fears) about AI, and marketers prey on these fantasies. In reality AI is, on the whole, a hoax. Virtually everything we call &#8220;AI&#8221; today is either a theatrical display of essentially scripted behavior (that&#8217;s how most&nbsp;game AI works), a massive database (such as Google Suggestions and expert systems) or a vague and decidedly unintelligent jumble of neural networks and genetic algorithms.&nbsp;So-called &#8220;artificially intelligent&#8221; programs are generally either too limited or too clumsy to be useful in helping ordinary people do ordinary tasks. So will Siri be different?<\/p>\n<p>Despite my skepticism, I actually think the answer is &#8220;yes.&#8221; I think Siri will do more or less what Apple <a title=\"Apple October 4, 2011 Special Event Video\" href=\"http:\/\/events.apple.com.edgesuite.net\/11piuhbvdlbkvoih10\/event\/index.html\">promised yesterday<\/a>.<\/p>\n<p>The reason it will work is that it actually has fairly modest ambitions\u2014more modest than they first appear.<\/p>\n<p>Take a close look at the <a title=\"Siri\" href=\"http:\/\/www.apple.com\/iphone\/features\/siri.html\">Siri<\/a> site. What exactly can you ask Siri to do? Apple gives you a list:<\/p>\n<ul>\n<li>Ask for a reminder.<\/li>\n<li>Ask to send a text.<\/li>\n<li>Ask about the weather.<\/li>\n<li>Ask for information (from Yelp, Wolfram|Alpha, or Wikipedia).<\/li>\n<li>Ask to set a meeting.<\/li>\n<li>Ask to send an email.<\/li>\n<li>Ask for a number.<\/li>\n<li>Ask to set an alarm.<\/li>\n<li>Ask for directions.<\/li>\n<li>Ask about stocks.<\/li>\n<li>Ask to set the timer.<\/li>\n<li>Ask Siri about Siri.<\/li>\n<\/ul>\n<p>The last item simply gets Siri to repeat this very list.<\/p>\n<p>Now if you consider the list closely, what you&#8217;ll notice is that it is not as open-ended as it first appears. Siri can&#8217;t understand just&nbsp;<em>anything.<\/em> It can do a certain set of key tasks. In a nutshell:<\/p>\n<ul>\n<li>Interact with the calendar.<\/li>\n<li>Search contacts.<\/li>\n<li>Read and write messages (text and email).<\/li>\n<li>Interact with the Maps app and location services.<\/li>\n<li>Forward search phrases to certain pre-defined data providers (Yahoo! Weather, Yahoo! Finance, Yelp, Wolfram|Alpha, or Wikipedia).<\/li>\n<\/ul>\n<p>This is still an impressive and\u2014most importantly\u2014wildly <em>useful<\/em> set of functions. But it is a limited, focused set. And that&#8217;s what makes me think Siri&#8217;s &#8220;AI&#8221; may actually work.<\/p>\n<p>Looking at it from a programmer&#8217;s perspective, it seems to me that Siri consists of three layers: a speech-to-text analyzer, a grammar analyzer, and a set of service providers. If all three of these work well, then Siri will be fun and helpful. If one of them is as troubled as traditional intelligent agents have tended to be, then Siri will go the same way those other agents went\u2014tumbling into the trash heap of misguided innovations.<\/p>\n<p>A speech-to-text analyzer is a piece of software that takes audio and turns it into text. Simple as that. Except it&#8217;s not so simple\u2014systems like <a title=\"Dragon Speech Recognition\" href=\"http:\/\/nuance.com\/dragon\/index.htm\">Dragon<\/a> have been refining this process for years. It&#8217;s really hard to get right, and I&#8217;ve never seen an analyzer that didn&#8217;t jumble a significant portion of what I say. (If you&#8217;ve got a Mac, you can experience the joy of being constantly misunderstood by a computer by playing with your &#8220;Speech Recognition&#8221; settings. Try a game of chess using nothing but speech. It&#8217;ll miss your move as often as not.)<\/p>\n<p>Siri, however, has a much easier job than Dragon or your Mac&#8217;s Speech Recognition facility. And that, again, is because <em>its job is limited and focused.<\/em> It doesn&#8217;t have to understand just anything you might say. It only has to understand words and sentences that pertain to appointments, contacts, messages, and maps. This makes it easier for Siri to pick out what you&#8217;re saying, because there are only so many things that you&#8217;re allowed to talk about.<\/p>\n<p>Another advantage is physical. A phone has a much better chance of hearing your voice up-close than a computer does. Phone microphone technology already incorporates a degree of noise cancellation. So your phone is more likely to be able to hear you clearly, even in the midst of noise, than your computer is.<\/p>\n<p>Despite these advantages, Siri is likely to misunderstand much more than it seemed to during yesterday&#8217;s Apple presentation. Did you notice how carefully Scott Forstall asked, &#8220;What Is The Weather Like Today?&#8221; Each word clearly articulated. Contractions fastidiously avoided. Reading from a script. Siri understood him well, but note that this was in a quiet room\u2014no TV going in the background, no car humming, no coworkers laughing, no kids arguing. I think it&#8217;s possible that Siri&#8217;s voice recognition could learn to understand my voice pretty darned reliably even under those conditions. But I wouldn&#8217;t be surprised if it often gets me wrong, sometimes with disastrous results. Just think how much fun it will be when I say, &#8220;Send a text to Andrea that says &#8216;I love you,'&#8221; and Siri hears, &#8220;Send a text to Andrew that says &#8216;I love you.'&#8221; I look forward to seeing how reliable it really is.<\/p>\n<p>The job of the speech-to-text analyzer is to turn your voice into written text. Text on its own, however, is just a jumble of letters to a computer. An additional piece of software is needed to turn the text into something useful. Siri needs to recognize that the string &#8220;send a message&#8230;&#8221; maps to the action of creating a new text message. It needs to understand that the phrase &#8220;my son&#8221; refers to the contact &#8220;Liam Wofford.&#8221; It needs to connect the word &#8220;here&#8221; with your current GPS position. This complex mapping of strings to functions is the job of a lexical and grammatical analyzer.<\/p>\n<p>This is a tough job. In the &#8217;80s there was a game company called <a title=\"Infocom - History\" href=\"http:\/\/www.infocom-if.org\/company\/company.html\">Infocom<\/a> that dramatically raised the bar on how computers understand text. Before Infocom, text-based games could only understand two-word phrases. &#8220;Hit ball.&#8221; &#8220;Eat mushroom.&#8221; Infocom gave their games the ability to understand whole sentences, complete with nouns, verbs, objects\u2014even prepositional phrases. You could tell the game, &#8220;Hit the ball with the wooden bat,&#8221; and it would reply, &#8220;You swing with all your might and knock the ball out of the park!&#8221; It was amazing, and it made for some terrific games.<\/p>\n<p>Siri has taken that kind of grammatical analysis to a new level. But despite the gap of almost thirty years, Siri is inches\u2014not lightyears\u2014beyond <em><a title=\"Infocom's Zork - playable online\" href=\"http:\/\/thcnet.net\/zork\/\">Zork<\/a><\/em>. Grammatical analysis still comes down to searching a string for certain key phrases and using those phrases to build up a simple model of what the user wants to do and what he or she wants to do it to. Again, Siri&#8217;s limited focus on appointments, contacts, messages, and maps makes this technically viable.<\/p>\n<p>What makes Siri&#8217;s grammatical analysis impressive is its integration with other aspects of the phone. One of the most exciting parts of the demonstration was when Scott Forstall told Siri (at 79:45 in the <a title=\"Apple - Siri Demo\" href=\"http:\/\/events.apple.com.edgesuite.net\/11piuhbvdlbkvoih10\/event\/index.html\">linked video<\/a>), &#8220;Remind me to call my wife when I leave work.&#8221; Along with understanding that &#8220;leave work&#8221; means move outside of a defined GPS area, Siri had to know that &#8220;my wife&#8221; mapped to Scott&#8217;s wife\u2014an entry in his Contact list.<\/p>\n<p>But how did Siri learn who Scott&#8217;s wife was? The demo didn&#8217;t show us, but I have a suspicion about how it works.<\/p>\n<p>The Mac Address Book has long had an entry for setting up relationships between contacts. I can indicate who my spouse is in Address Book. I suspect that the iPhone Contacts app will gain similar new fields in iOS 5. Siri will use this information to create the mapping between the phrases &#8220;my husband&#8221;, &#8220;my wife&#8221;, &#8220;my spouse&#8221; and the person whom you&#8217;ve identified as your spouse. This mapping will no doubt be mechanical, not &#8220;insightful.&#8221; Siri won&#8217;t understand who your spouse is\u2014it&#8217;ll just record a string-to-Contact mapping. For example, you might be able to say &#8220;my husband&#8221; and have Siri find your wife. As far as I know, Address Book doesn&#8217;t keep information about the sex of each person, so Siri will probably treat all &#8220;spouse&#8221; words as identical. Let&#8217;s try it when it comes out.<\/p>\n<p>Will Siri be able to recognize the phrase, &#8220;my boyfriend&#8221; or &#8220;my girlfriend&#8221;? Perhaps. What about arbitrary terms of endearment, like &#8220;my pookums&#8221; or &#8220;honeybuns&#8221;? Again, it&#8217;s quite possible. Address Book has an option for &#8220;Custom&#8230;&#8221; in the relationship field. You can add a custom label &#8220;pookums&#8221; and indicate your spouse or girlfriend or dog or whatever there. Now if Siri hears you say &#8220;pookums,&#8221; Siri can recognize that contact.<\/p>\n<p>What I hope you&#8217;re seeing is that what Siri does isn&#8217;t science fiction and it certainly isn&#8217;t magic. It is the old and still-developing technology of speech-to-text analysis and the old and fairly mature technology of simple grammatical analysis and string matching.<\/p>\n<p>And then there&#8217;s the third component, which is the set of services that Siri can send your commands to. This is the most modest and familiar part of the system. You already have a calendar app and you can press buttons to view and create appointments. Siri will push those buttons for you, in essence. You already have a maps app and you can search and find directions there. Siri will enter your search text for you, and can toggle traffic on and off by voice rather than by button. You already have Wikipedia and you can type search terms into it. Now Siri can type your search terms for you.<\/p>\n<p>At this level, Siri isn&#8217;t doing anything you can&#8217;t already do. It&#8217;s just doing it hands free, by voice. This, clearly, is the big benefit of Siri, even if it&#8217;s not the most technically interesting part of the system.<\/p>\n<p>Whether Siri is successful will depend fundamentally on the quality of its speech-to-text analyzer. If it can understand me, it will work. The grammatical analysis and service providing parts of the system are relatively modest in terms of technical difficulty and I suspect Apple has these in hand. I don&#8217;t want to trivialize these technologies\u2014judging from the demo, Apple has done their&nbsp;usual, remarkable job of building a slick and natural-feeling user experience, and that takes tremendous skill and effort. But whether Siri becomes the model for how humans interact with computers in the future or whether it gets laughed off the stage of technical innovation like so many AI systems that have come before hinges on whether it can tell the difference between &#8220;Andrew&#8221; and &#8220;Andrea&#8221;\u2014especially when I&#8217;m in a crowded coffee shop, speaking with a Southern drawl, with a stuffed-up nose from a bad cold.<\/p>\n<p>I hope it does work. I&#8217;ve wanted this functionality for years\u2014decades. I want it in my house and my car as well, but if I can get it on my phone the rest will follow.<\/p>\n<p>Apple also deserves credit for doing some delightful things to amplify the &#8220;theatrics&#8221; of Siri&#8217;s AI. By that I mean that they&#8217;ve cooked up little alternative phrases and responses that make Siri seem smarter than it (she?) is. Like she&#8217;ll say, &#8220;Let me check on that,&#8221; or &#8220;Let me think,&#8221; when a traditional computer would spin a spinner or just say &#8220;Loading&#8230;&#8221; &nbsp;In another demo video (see below) a woman asks whether it will be chilly in Napa Valley. (Actually she asks first about San Fransisco, then changes the location to Napa Valley without having to repeat the question. Nice.) Siri replies, &#8220;Doesn&#8217;t seem like it.&#8221; That&#8217;s a very nice alias for &#8220;No.&#8221; It doesn&#8217;t take any more &#8220;smarts&#8221; to say &#8220;Doesn&#8217;t seem like it&#8221; than &#8220;No,&#8221; but it sounds a lot smarter. More natural. That&#8217;s what I mean by &#8220;theatrics&#8221;\u2014making the computer seem smarter by changing the way it expresses output. Again, I don&#8217;t want to trivialize what Apple has done\u2014theatrics are important and getting them right is non-trivial. But it&#8217;s important to keep a realistic view of how intelligent Siri really is.<\/p>\n<p>Here&#8217;s Apple&#8217;s Siri demo trailer.<\/p>\n<p>http:\/\/www.youtube.com\/watch?v=rNsrl86inpo<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Once again someone has offered us incredible artificial intelligence, and once again we are bracing for disappointment. It happened with handwriting recognition on the Newton, which proved to be slow and clumsy. It happened with the not-as-smart-as-they-first-appeared creatures of Lionhead&#8217;s&nbsp;Black and White. And remember the Kinect debut video showing a kid interacting with an on-screen &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.jeffwofford.com\/wp\/?p=817\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How Siri Works&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[15,7],"tags":[],"class_list":["post-817","post","type-post","status-publish","format-standard","hentry","category-programming","category-technology"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/817","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=817"}],"version-history":[{"count":16,"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/817\/revisions"}],"predecessor-version":[{"id":2082,"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/817\/revisions\/2082"}],"wp:attachment":[{"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=817"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=817"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jeffwofford.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=817"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}