Siri 2.0 - Apple and generative AI

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
On the earnings call yesterday Tim Cook revealed what had been widely rumored - that Apple has been putting “tremendous effort” towards generative AI and that they’d release the fruits of those efforts later this year. He gave no detail beyond that but I think it’s safe to assume that those generative AI efforts will feed into a new Siri 2.0.

Just by virtue of it being built into iPhone and their total ubiquity, a GenAI Siri will quickly become one of the most used GenAIs on the planet. There’s also been rumors about Apple backing the money truck up to the New York Times and others to allow its GenAI to access up to the minute news which could be a major differentiator.

So will this be another Apple half-assing Siri episode or will this be a meaningfully useful implementation of GenAI? And will it have the sorts of problems that other GenAIs have had with hallucinations and the ability to be manipulated into saying hateful/biased things?
 
  • Like
Reactions: mklein

cateye

Ars Legatus Legionis
11,760
Moderator
I'm just glad they're hinting at some kind of reinvention or reinvigoration of Siri. I don't really have the knowledge to understand whether an LLM is the "right" solution to that problem, but if the market's obsession with conversational AI is what finally gets Apple to have a CTJ about Siri's failings, great.

However, there's something about Cook's statements during the call that feel uncharacteristically reactionary—Apple rarely responds directly to trends, and yet this feels like exactly that. So that makes me wonder how focused of an effort it will be or will it feel like an LLM bolted on to the same dear old senile Siri we've come to loathe.
 
I don't see a chat-bot (or BS generator) making Siri any less useless to me. Maybe transformer models could help improve speech synthesis or speech-to-text (if they're not already being used), but the problem with Siri remains its unreliability for anything more complicated than setting timers. Generative AI can be useful where you don't really care about the details of the result, such as when generating ad copy, but that won't help me control a computer.
 
I'm quite excited. A locally-processed KI with a focus on data security, interfacing with all that shit stored on my iPhone, unlocked by a simple iOS update right in my pocket? Bring it on Tim!

I don't think Apple was caught back-footed here. They have been buying AI-companies left and right since 2017. But they shure hat to speed-up their roadmaps ...
 
  • Like
Reactions: Tagbert

Tinlad

Seniorius Lurkius
47
Subscriptor++
And will it have the sorts of problems that other GenAIs have had with hallucinations and the ability to be manipulated into saying hateful/biased things?
After reading the Ars article on BMW's "hallucination-free AI" I wonder if they could go down a similar route with Siri; a LLM backed with Apple-vetted knowledge such that the outputs are somewhat predictable (and aligned with Apple's corporate sensibilities). The promoted comment on the article gives a top-line overview of the technology (Retrieval Augmented Generation).

It would strike me as a solution that's inline with the "freedom within the walled garden" approach that Apple so often offers.
 

Honeybog

Ars Scholae Palatinae
2,075
I’m with educated_foo on this. The best outcome is that Apple being Apple, this will be a very conservative implementation, but generative AI is still just statistically based BS.

Siri’s current problem is that it fails to recognize intent too often, and even when it does recognize intent, it then has no way of acting on it and fails out of scope.

Generative AI has the opposite problem, in it never fails on out of scope queries (unless specifically programmed to), and will always try to return something, no matter how stupid or incorrect.

Siri’s current problems are 100% programming issues and Apple has had over a decade to address them. They receive logs, they know when Siri fails on a request, and it would have been trivial to have a team dedicated to continually reviewing logs and building in functionality for the most common requests. Generative AI doesn’t fix this, it just covers it up.
 

Honeybog

Ars Scholae Palatinae
2,075
After reading the Ars article on BMW's "hallucination-free AI" I wonder if they could go down a similar route with Siri; a LLM backed with Apple-vetted knowledge such that the outputs are somewhat predictable (and aligned with Apple's corporate sensibilities).

They could, but then the question would be why they haven’t bothered previously. All Siri ever needed to be really effective was for Apple to devote significant resources to creating in-house training data and building out new domains. I think Apple probably will curate their training sets to a much greater degree than others are, but I don’t see them suddenly changing course after so long.
 
  • Like
Reactions: Ashe

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
Generative models need not be just conversational. Rabbit AI is probably a good example for what to expect from a Siri 2.0. An AI that can do take instructions and do things in apps for you.

The balancing act for Apple is that no app maker wants Siri totally disintermediating them from their customers. When you’re using a piece of their functionality they want you to know it so they get credit/minshare for it. But the optimal user workflow for a GenAI is just to tell it what you want done and have it done. I’ll be interested to see how Apple squares the circle there.
 

ZnU

Ars Legatus Legionis
11,694
They could, but then the question would be why they haven’t bothered previously. All Siri ever needed to be really effective was for Apple to devote significant resources to creating in-house training data and building out new domains.

This is an illusion, I think. None of the pre-LLM assistants are very useful, even the Amazon/Google efforts that seemingly had more resources thrown at them. The whole approach is fundamentally unfit for purpose. Natural language is too open-ended for systems that require manual domain-specific modeling to ever be able to respond to more than a tiny fraction of reasonable requests — especially when you get into multipart requests, due to combinatorial explosion. Because voice interfaces also have extremely poor discoverability, this leaves users in a situation where most things they could ask for won't work, and they have no good way of knowing in advance which things will work. This cashes out as users just remembering two or three things the system can handle (set timers, place calls) and sticking to those, turning these "assistants" into little more than the simple voice command features that preceded them.

LLMs solve the basic problem here by eliminating the requirement to manually model domains and the relationships between them. An LLM with hooks into the right systems can reasonably respond to an interesting fraction of natural language requests. There's no reason to assume Apple's underinvestment in a basically flawed approach will translate into underinvestment in an approach that has substantial promise.
 

OrangeCream

Ars Legatus Legionis
55,362
On the earnings call yesterday Tim Cook revealed what had been widely rumored - that Apple has been putting “tremendous effort” towards generative AI and that they’d release the fruits of those efforts later this year. He gave no detail beyond that but I think it’s safe to assume that those generative AI efforts will feed into a new Siri 2.0.
I don't assume that at all, having taken genAI classes and worked with genAI in a couple of demo projects.
Just by virtue of it being built into iPhone and their total ubiquity, a GenAI Siri will quickly become one of the most used GenAIs on the planet. There’s also been rumors about Apple backing the money truck up to the New York Times and others to allow its GenAI to access up to the minute news which could be a major differentiator.
Where was this reported btw? I want to read the source.

My best guess is genAI will be integrated into the camera system, their synthetic voice, maybe their avatars used in the AVP, and possibly used as filters in FaceTime (like their current comic/ink/watercolor filters)
So will this be another Apple half-assing Siri episode or will this be a meaningfully useful implementation of GenAI?
Yes. I don't think it will be applied to Siri (unless it's her voice) and I think their use will be meaningful since it won't be applied to Siri
And will it have the sorts of problems that other GenAIs have had with hallucinations and the ability to be manipulated into saying hateful/biased things?
That won't be a problem if it's not used to make up things. The reason why it works with images is because Apple's camera stack takes dozens of pictures before synthesizing a final output. Each separate picture can be processed by genAI and there's a consensus mechanism available that means they can't take a fence and turn it into a swastika, for example.
 

OrangeCream

Ars Legatus Legionis
55,362
After reading the Ars article on BMW's "hallucination-free AI" I wonder if they could go down a similar route with Siri; a LLM backed with Apple-vetted knowledge such that the outputs are somewhat predictable (and aligned with Apple's corporate sensibilities). The promoted comment on the article gives a top-line overview of the technology (Retrieval Augmented Generation).

It would strike me as a solution that's inline with the "freedom within the walled garden" approach that Apple so often offers.
It seems more appropriate that Apple uses it for something like Sherlock; it's easy to confirm that the output is sanitized, and it's easy to confirm that the words that appear in the summary also appear in the source:
To create a summary of a text file:
In the Finder, hold down the Control key and click the icon of the text file, then choose "Summarize File to Clipboard" from the contextual menu.
 

OrangeCream

Ars Legatus Legionis
55,362
Here's a transcript I found:

He doesn't actually say much:
That includes artificial intelligence where we continue to spend a tremendous amount of time and effort, and we're excited to share the details of our ongoing work in that space later this year.

And really, it took a whole of company effort to bring it to this far. In terms of generative AI, which, I'd guess, is your focus, we have a lot of work going on internally as I've alluded to before. Our MO, if you will, has always been to do work and then talk about work and not to get out in front of ourselves. And so, we're going to hold that to this as well. But we've got some things that we are incredibly excited about that we'll be talking about later this year.

Let me just say that, I think there's a huge opportunity for Apple with GenAI and AI. And without getting into to more details and getting out in front of myself.


So I will re-iterate my beliefs given my experiences with genAI. This won't look like a chatbot, it will be integrated into existing substructures where synthesis is already a product feature and genAI can be used to great effect. Cameras, videos, avatars, audio output, text summaries, notifications, calendar entries, and reminders all see prime for this kind of technology.
 
Last edited:

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
LLMs solve the basic problem here by eliminating the requirement to manually model domains and the relationships between them. An LLM with hooks into the right systems can reasonably respond to an interesting fraction of natural language requests. There's no reason to assume Apple's underinvestment in a basically flawed approach will translate into underinvestment in an approach that has substantial promise.
Ish… another way of thinking about it might be “Apple didn’t squeeze all the juice out of the last SotA approach so what makes us think they’ll be able to squeeze all the juice out of this one?”

Don’t get me wrong. I’m bullish on GenAI. And I’m also pretty appreciative of Apple’s general approach to AI - building it deeply across their products rather than embedding all their AI work in a few flashy marketing-forward ‘look at how good we are at AI’ features.

But…

Siri was the public face of their AI work and they could never be bothered to do much with it. To me that says something unflattering about the company’s seriousness in the field.

Cautiously optimistic but I don’t think this is anything like a sure thing.
 

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
This won't look like a chatbot, it will be integrated into existing substructures where synthesis is already a product feature and genAI can be used to great effect. Cameras, videos, avatars, audio output, text summaries, notifications, calendar entries, and reminders all see prime for this kind of technology.
Will it be deeply integrated into multiple systems? Yes. That’s Apple’s MO when it comes to ML writ large. But I’d be absolutely shocked if it wasn’t also part of a major update to (or replacement for) Siri.
 
  • Like
Reactions: Tagbert

OrangeCream

Ars Legatus Legionis
55,362
Will it be deeply integrated into multiple systems? Yes. That’s Apple’s MO when it comes to ML writ large. But I’d be absolutely shocked if it wasn’t also part of a major update to (or replacement for) Siri.
I would be absolutely shocked if Siri got updated outside of a more natural sounding voice and more natural sounding speech. It would assume a dramatic leap in technology for 'safe' generative text to be possible. Unless it's something like this:

In which case I can see them integrating all Apple KB articles and such, and any other content Apple has 100% control over.

And that I wouldn't call a 'major update to Siri'.
 
  • Like
Reactions: Honeybog

OrangeCream

Ars Legatus Legionis
55,362
Siri was the public face of their AI work and they could never be bothered to do much with it. To me that says something unflattering about the company’s seriousness in the field.
As a longtime investor I feel like they did exactly what they needed with Siri.

In other words it would be like adding new features to the camera app just so they can advertise it; macro flower mode, fireworks mode, sunset mode, starry night sky mode, etc. They pick something worth doing, like portrait mode, and focus instead of adding half a dozen half-useless features.
Cautiously optimistic but I don’t think this is anything like a sure thing.
If you set your expectations appropriately then it becomes far less speculative.
 
I would be absolutely shocked if Siri got updated outside of a more natural sounding voice and more natural sounding speech. It would assume a dramatic leap in technology for 'safe' generative text to be possible.
Gen AI is absolutely a slam-dunk for intent recognition, where Siri is particularly poor, and hallucinations aren't a blocker.

Ultimately it'll be used to answer questions and take action too, but I can easily see Apple lagging several years behind MS and Google just to ensure they offer a good experience.
 

Honeybog

Ars Scholae Palatinae
2,075
None of the pre-LLM assistants are very useful, even the Amazon/Google efforts that seemingly had more resources thrown at them.

I agree-ish. Amazon poured a ton of resources into Lex—and I’m positive that it underpins the BMW AI in the article above (at least, I’d be willing to bet it shares more DNA with Lex than with the newly announced Alexa AI)—but the time and expense required of customers seems to have made it a non-starter after generative AI kicked off. Still, if my bank had to use AI, I’d prefer the Lex approach to OpenAI or similar.

Natural language is too open-ended for systems that require manual domain-specific modeling to ever be able to respond to more than a tiny fraction of reasonable requests

I guess that depends on whether you see that as a bug or a feature. I want my windows to open and my pen to write, I don’t need them to offer the totality of possible actions. I’d rather my phone AI assistant be consciously designed around problems to solve, even at the expense of writing a paragraph in the voice of a 15th century doctor.

especially when you get into multipart requests, due to combinatorial explosion.

Earlier LLMs were working on that, although I’d grant that you’d still need to model each individual request.

Because voice interfaces also have extremely poor discoverability, this leaves users in a situation where most things they could ask for won't work, and they have no good way of knowing in advance which things will work. This cashes out as users just remembering two or three things the system can handle (set timers, place calls) and sticking to those, turning these "assistants" into little more than the simple voice command features that preceded them.

We’re in total agreement here. I’d note that Apple deserves a to of blame for this, though. When Siri first rolled out, there was a button specifically for showing potential queries (gone now), and, up until about 5 years ago, they’d make a point of highlighting new functions.

That said, I think it’s also the case that there is only so much that people actually want their assistants to do. Gen AI can try (and fail) to do anything, but how much value does that actually add?
 

OrangeCream

Ars Legatus Legionis
55,362
Gen AI is absolutely a slam-dunk for intent recognition, where Siri is particularly poor, and hallucinations aren't a blocker.

Ultimately it'll be used to answer questions and take action too, but I can easily see Apple lagging several years behind MS and Google just to ensure they offer a good experience.
Your definition of lagging is 100% opposite mine.

Alexa could do more cool stuff than Siri, but since it was unsustainable it ultimately got degraded where Siri kept getting annual updates and enhancements.
 
Siri is a fairly rare example of an Apple service that launched competitively but didn't markedly improve over time. Alexa and Google Assistant received huge investment but like you say, ultimately didn't generate revenue so were degraded. The problem with those early attempts at voice assistants was their functionality was far too limited. LLMs have the opposite problem, zero limits but also poor fences on accuracy.

My feeling, and I actually work in cognitive AI (rapidly pivoting to gen AI), is adding guardrails is vastly easier than expanding functionality. Gen AI will get there. Apple probably won't be the first ones out of the gate, but those that are will endure some severe embarrassment.
 
Gen AI is progressing faster than you might think. You can already run a heavily quantized model on a M1 macbook air, no discrete GPU. I wouldn't be surprised if phones run local models in the near future. Speed and accuracy will of course be poor at the start, but with the open-source models released by FB evolving at light speed, they won't take a lifetime to get good.

At any rate, RabbitOS runs in the cloud.
 

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
I don’t know why?

Unless you’re talking about 20 year timeframes. In that case sure, I can see Apple being involved in something like that for the Siri of 2030, when on device AI HW is the equivalent of a 4090 GPU.
Why would you assume that Apple would restrict themselves to on device with a putative GenAI Siri when they don’t have that restriction with current Siri???
 
  • Like
Reactions: Tagbert

OrangeCream

Ars Legatus Legionis
55,362
Why would you assume that Apple would restrict themselves to on device with a putative GenAI Siri when they don’t have that restriction with current Siri???

Why do you assume Apple would be willing to pay for the constant operation of GenAI Siri when they don't pay for (much) Cloud Siri? Are you saying GenAI Siri is going to be so useful that it warrants end users to pay for iCloud+ subscriptions and constant operating costs for GenAI Siri?
 
  • Like
Reactions: Rocketpilot

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
Why do you assume Apple would be willing to pay for the constant operation of GenAI Siri when they don't pay for (much) Cloud Siri? Are you saying GenAI Siri is going to be so useful that it warrants end users to pay for iCloud+ subscriptions and constant operating costs for GenAI Siri?
My assumption is that if GenAI Siri is a meaningful differentiator for iPhone, Apple will have zero problem paying for the cloud services necessary to make it run. That said, they will also do their damnedest to make sure that as much as possible runs on device, that the cloud computing component is as computationally efficient as possible, and they’ll improve it over time.

I’d argue that the utility from a fully conversational Siri that works well would be a major differentiator and would command enough margin to pay for itself in device sales.
 

OrangeCream

Ars Legatus Legionis
55,362
Google will do it first and (once successful and generally useful, assuming that it is) that will be a substantial competitive advantage for Android.
Why do you assume they’ll make it a competitive advantage? You can argue they could have made Google Wallet, NFC Payments, and P2P cash transactions a competitive advantage for Android, but they obviously didn’t:



The same could be said of Google Glass:



Or Watch:



I don’t see why you think Google is capable of giving Android a competitive advantage? That would mean Google would be able to grow the Pixel ASP and market share simultaneously.
 

OrangeCream

Ars Legatus Legionis
55,362
My assumption is that if GenAI Siri is a meaningful differentiator for iPhone, Apple will have zero problem paying for the cloud services necessary to make it run.
If it’s that meaningful then I agree. It will sell more iPhones and boost the ASP of iPhones and probably work with AirPods at that point.

That said, they will also do their damnedest to make sure that as much as possible runs on device, that the cloud computing component is as computationally efficient as possible, and they’ll improve it over time.

I’d argue that the utility from a fully conversational Siri that works well would be a major differentiator and would command enough margin to pay for itself in device sales.
The difference is I don’t see that being feasible in a cost effective manner in the near future. I just don’t see any real work showing a truly conversational AI being made.

I will agree that if Apple manages to create Jarvis from Iron Man that it will change everything. I just don’t think I’ll see it in my lifetime.
 

OrangeCream

Ars Legatus Legionis
55,362
Hell, you could go on for another three pages listing failed Google initiatives. The real question here is whether you think gen AI is bullshit like cryptocurrency. I don't. It's the real thing-- it isn't even half-baked yet, but it's already incredibly useful and transformative, in both positive and negative ways. But it's real.
I agree it’s real. I disagree Google can succeed, and I disagree that Apple will create Jarvis. I very clearly explained how I envision Apple using it in the next year.
 

OrangeCream

Ars Legatus Legionis
55,362
I try my damnedest not to prognosticate the future. Remember when gen AI was going to revolutionize search, and Google was suddenly scared of Bing? It's transformative for sure, but what it'll transform and precisely how, that's tough to say!
It's like predicting the weather, I think. I think I know what the current state is, and I have a pretty good idea of how fast HW is growing, and given an existing SW I can make a reasonably good guess where the AI slots into existing SW without creating something that doesn't yet exist.

So that's why I'm fairly certain what the limits of the technology is.

Even the AVP, as an example, is a logical extension of every existing Apple technology today. The only wildly unique and new capabilities are the iris scanning, hand tracking, and eye tracking. No other product has them. So if you think they'll do something sci-fi, 90% of it has to be real today, first.
 
The HTC Vive Pro 2 had eye tracking, I believe it launched last year. The Quest 1 had hand tracking, although it wasn't very good. I believe Apple was the first with iris scanning in a consumer device. Biggest technical advancement in the Vision Pro are the displays, using cutting-edge tech not even officially released yet, at 3500ppi they offer much greater clarity (and are incredibly expensive).
 

OrangeCream

Ars Legatus Legionis
55,362
The HTC Vive Pro 2 had eye tracking, I believe it launched last year.
I meant no Apple product had eye tracking, iris scanning, or hand tracking.
The Quest 1 had hand tracking, although it wasn't very good. I believe Apple was the first with iris scanning in a consumer device. Biggest technical advancement in the Vision Pro are the displays, using cutting-edge tech not even officially released yet, at 3500ppi they offer much greater clarity (and are incredibly expensive).
Yeah, that wasn’t my point. I was trying to say that Apple had proofed out the dot illuminators, lidar, depth mapping, positional sound, gyros, accelerometers, microphones, and cameras across multiples of their products for the decade plus leading up to the AVP.

And that’s how I envision Apple’s future genAI work. Almost everything they plug genAI into will be a feature or product where genAI will be an enhancement and improvement, and not the product itself. You’re not paying for genAI; you’re paying for the improved photos that the genAI synthesizes from a half dozen pictures. You’re not paying for genAI; you’re paying for the improved Siri speech to text, text to speech, and sentiment analysis. You’re paying for the ability to summarize an email into two sentences, and then attach it to the calendar entry and notification created from the email.

Generation doesn’t mean it has to ‘make stuff up’, it can use existing input to transform it into a different version of the same information.
 

wrylachlan

Ars Legatus Legionis
12,769
Subscriptor
The difference is I don’t see that being feasible in a cost effective manner in the near future.
I think your priors may be off by an order of magnitude or more. Most people talk at something on the order of 100-150 words per minute. Anything more than say 30 seconds I would consider a very long prompt for a phone AI where most things are like “Call Joe”. But let’s say you do have a bit of longer back and forth with the AI - call it 2 minutes. And you want to do that kind of long interaction 10 times a day with a daily memory. You’ve still used an order of magnitude fewer tokens that the kind of GenAI where compute cost is meaningful.

This isn’t the kind of GenAI where you paste in your college essay and ask it to make suggestions. This is a GenAI tuned to extremely short, direct prompts to use functionality on the phone. And Apple has been structuring their intents system for years, so the universe of what the GenAI can do is well understood and finite.

I’d argue that the sort of ‘drop an extensive spec in here and have it produce code’ features of ChatGPT and the like are actually more intensive than an IronMan Jarvis-like implementation that takes instructions at the rate of 100-150 tokens per minute.