Hopefully Apple can think outside the box here and pay to license access to many existing AI models, so they can focus their approach to aggregate this intelligent network to provide an AI+ experience.
They’ve been using AI and I think you just don’t know it:Apple has been investing a lot of real estate on die to AI and have not really taking advantage of it. Have they been working on something that we have not thought off?
A lot of people associate AI with a ChatGPT-style answerbot but that is not necessarily the most useful way to deploy AI for Apple and its customers. I’d rather see it used to enhance how we use the apps and abilities of the phone and give a richer way to interact with those features.They’ve been using AI and I think you just don’t know it:
On-device Panoptic Segmentation for Camera Using Transformers
Camera (in iOS and iPadOS) relies on a wide range of scene-understanding technologies to develop images. In particular, pixel-level…machinelearning.apple.com
Those applications enable great user experiences, like searching for a picture in the Photos app, measuring the size of a room with RoomPlan, or ARKIT semantic features, as referenced in our research highlight 3D Parametric Room Representation with RoomPlan.Deploying Attention-Based Vision Transformers to Apple Neural Engine
Motivated by the effective implementation of transformer architectures in natural language processing, machine learning researchers…machinelearning.apple.com
The framework uses a device’s sensors, trained ML models, and RealityKit’s rendering capabilities to capture the physical surroundings of an interior room.RoomPlan | Apple Developer Documentation
Create a 3D model of a room by interactively guiding people to scan their physical environment using a device’s camera.developer.apple.com
The iPhone's keyboard on iOS 17 leverages a transformer model, similar to what OpenAI(the company behind ChatGPT) uses in its own language models, to learn from what you type on your keyboard to better predict what you might say next, whether it's a name, phrase or curse word.What You Need to Know About the Improved Autocorrect on iOS 17
For starters, cursing is now integrated into your iPhone keyboard.www.cnet.com
As previously mentioned, every time you use Face ID to unlock your iPhone or iPad, your device uses the Neural Engine. When you send an animated Memoji message, the Neural Engine is interpreting your facial expressions.What Is the Apple Neural Engine and What Does It Do?
You likely hear about the Neural Engine without really knowing what Apple uses it for. Let's dig deep into this crucial technology.www.macobserver.com
Apple has included Neural Engine chips in all iPhones since the iPhone X, and these provide the computing power behind Memoji, Face ID, and the newly unveiled Live Voicemail.
With the power of the Neural Engine, Live Voicemail transcription is handled on-device and remains entirely private.iOS 17 makes iPhone more personal and intuitive
Apple today announced iOS 17, a major release that upgrades the communications experience across Phone, FaceTime, and Messages.www.apple.com
Autocorrect receives a comprehensive update with a transformer language model, a state-of-the-art on-device machine learning language model for word prediction — improving the experience and accuracy for users every time they type.
In Photos, the People album uses on-device machine learning to recognize more photos of a user’s favourite people, as well as cats and dogs.
The first of the iOS 15 features that screamed “Machine Learning!” to me was Live Text. Live Text is a feature in iOS 15 that enables your iPhone to read text in your Photos app.iOS 15 Features Finally Take Advantage Of The iPhone's Neural Engine
While watching the WWDC21 keynote a few weeks ago, I started to notice a recurring theme. And that theme is that several of the upcoming iOS 15 featuresappletoolbox.com
Another of the new iOS 15 features that uses the Neural Engine is object recognition in Photos. This feature works similarly to Live Text, except that it recognizes objects rather than text. The example Apple used is that you can point your iPhone camera at a dog, and your iPhone will not only recognize that it’s a dog but also which breed of dog it is.
Notifications will now be grouped in a Notification Summary, so you don’t see less important notifications crowding up your Lock Screen all day. You can customize the Notification Summary feature, or let the Neural Engine handle it for you.
In Maps in iOS 15, you’ll be able to point your camera around while walking. That will allow you to see AR directions projected on your environment. Say you’re trying to get to the movies and aren’t sure which road to take. You’ll be able to point your iPhone around and see directions highlighted on the streets and buildings around you.
Long story short: any time your phone or Mac is doing object recognition, text recognition, speech recognition, text summaries, image analysis, voice recognition, text prediction, maps prediction, calendar predictions, and speech to text, you’ll be using ANE. Those are pretty classic AI/ML tasks.
I mean processing audio data is fast. It wouldn’t take even one month to process that data, and it can be processed in parallel once trained.iOS 17.4 has a (great) new feature that provides automatic transcripts for Podcasts. Some Podcasts only have transcripts for episodes from the past few weeks, but others, like the BBC’s In Our Time podcast, have everything transcribed going back to 2002.
It seems like a massive undertaking to transcribe twenty years of even just one weekly podcast, and I kind of wonder if Apple didn’t do this in part to generate some really massive training sets.
It’s not thinking so it’s not making a judgement. It’s not even synthesizing citations technically. It’s just a really long autocorrect string where words are pasted together in statistically likely manners.I have seen, personally, the widely reported issue of ChatGPT making up legal citations. It appears to think it's OK to synthesize citations that look a lot like real ones. I don't trust anything it tells me about law that I don't already know. It is a very dangerous tool if used to orient oneself in an area of law that is directly adjacent to a well-known one, which is something lawyers do ALL THE TIME.
That seems very likely to me with the caveat that ‘advanced’ is doing some heavy lifting there and ‘series of’ is exactly right but subtle. My guess is that if you put the average ChatGPT skeptic in a Time Machine and sent them forward 100 years to when we know substantially more about the brain, they would be shocked at how ChatGPT-like our cognition is.And what is thinking but an emergent behavior?
It may just be that the language center of our brain is an advanced series of autocorrects.
My guess is that if you put the average ChatGPT skeptic in a Time Machine and sent them forward 100 years to when we know substantially more about the brain, they would be shocked at how ChatGPT-like our cognition is.
- Photo retouching.
- Voice memo transcription.
- Suggested replies to emails and messages.
- Auto-generated emojis based on the content of a user's messages, providing all-new emoji for any occasion beyond the existing catalog.
- Improved Safari web search.
- Faster and more reliable searches in Spotlight.
- More natural interactions with Siri.
- More advanced version of Siri designed for the Apple Watch, optimized for "on-the-go tasks."
- Smart recaps of missed notifications and individual messages, web-pages, news articles, documents, notes, and more.
- Developer tools for Xcode.
I'm not sure that they should have been added, because way back when the technology wasn't as mature as it is now.Mark Gurman (via MacRumors) had a report a few days ago about the anticipated "AI" features planned for iOS 18 and it's kind of interesting. There's surprisingly few "generative" features, and a lot of it seems like stuff that shoulda woulda been added into iOS already if Apple didn't get distracted from its earlier ML push.
The devil is always in the details, because 'retouching' has a plethora of meanings.Photo retouching is a no-brainer. It's kind of amazing that they've taken this long.
This is going to be interesting if it 'talks like you'.Suggested replies: Kind of already exists between keyboard suggestions and Apple Watch's smart replies, no? Presumably, it would be similar, but longer. Assuming Gurman's list is inclusive, this seems like it's the closest Apple will get to generative content. It'll be interesting to see if this draws from a users' previous messages. I almost never use the Apple Watch smart replies, because they're so completely different from my actual voice. I'm also not looking forward to the looming feature where everyone is just trading generated missives back and forth.
Yes, this is a problem. Try searching for 'HSA receipts for tax year 2021'; first it has to identify receipts, summarize the contents, determine if they are HSA approved, and if the purchase date happened in 2021Spotlight: Is this a problem anyone has? If anything, I wouldn't want Spotlight to be less strict with search terms.
Currently Apple Watch can only do things like send a message, set a timer, open apps, start an activity, and connect to the internet for more advanced queries. I wasn't able to create a Calendar event using my Watch.I have no idea how to read the Siri on Apple Watch thing.
That's been Apple's MO for the past decade. I'm looking forward to further AI enhancements in the Camera AppI can't say I'm hugely excited about most of these, but it would be kind of nice to see Apple try and steer things away from generative LLM and towards useful tools.
I'm not sure that they should have been added, because way back when the technology wasn't as mature as it is now.
Yes, this is a problem. Try searching for 'HSA receipts for tax year 2021'; first it has to identify receipts, summarize the contents, determine if they are HSA approved, and if the purchase date happened in 2021
I wasn't able to create a Calendar event using my Watch.
I’m not sure what you mean here as most of what’s on that list likely has some element of LLM under the hood except the visual elements but then that’s still generative AI.I can't say I'm hugely excited about most of these, but it would be kind of nice to see Apple try and steer things away from generative LLM and towards useful tools.
I’m not sure what you mean here as most of what’s on that list likely has some element of LLM under the hood except the visual elements but then that’s still generative AI.
Gotcha. So moving away from ‘Chat’ as the modality for leveraging LLMs. Agreed, LLMs and generative AI writ large are so much more than just a chat engine.I was actually saying that it’s closer to what you have been advocating for in this topic, as opposed to what most people assumed it would be (ChatGPT-esque, like literally everyone else is doing).
Not using ML. It requires licensing the work of tens of thousands of professional artists with before/after pairs as training data to develop the feature using ML.I mean, retouching, blemish removal, etc. has been a bog standard image editing feature for a long time now.
The issue isn't the inference, it's creating the datasets and models and then training them. The fact that research papers trying to solve this problem were being published even in 2023 means it's still an issue:As for the maturity of the technology, the actual details will matter of course, but on the surface, not a lot here seems super cutting edge. Apple has been touting ML features for, what, a decade? They certainly could have implemented a lot of these earlier if they had the impetus to.
I would have just put those in “~/Documents/Financial/Taxes/2021/Receipts/HSA/“.
Did you make sure to turn off the radio on your iPhone and step away so that the iPhone mic can't hear you?Really? I just added one with Siri on my ancient S5.
I would love to have a conversational Siri. I'm pretty sure I'm not the only one ¯\(ツ)/¯
I don't know if they will have that level of interaction ready by June, but I hope that they are working toward that over the next year or two.To the extent that Apple's AI ambitions could allow Siri to have far better awareness of my data in order to be conversational would be nice. There are hints of that already—the way Siri can spot references to appointments in emails and will queue them as potential Calendar entries. That, just with far more density and capability. "Siri, can you summarize in a few sentences the last 24 hours worth of emails from my client Foo?" — I would use that constantly, and it being verbal/conversational would, in a way, be more useful to me as a momentary way to engage information than a text- or action-based UI.
I doubt Apple would be that brazen.I kind of wonder if Apple didn’t do this in part to generate some really massive training sets.
Why would transcribing podcasts be brazen?I doubt Apple would be that brazen.
I never claimed it was . I did say say that transcribing podcasts to train AI would be incredibly brazen.Why would transcribing podcasts be brazen?
No. The LLM itself is far more important than the hardware it’s running on. And 2nm efficiency isn’t going to unlock some new LLM that isn’t possible without it. It’s incrementally more efficient, not categorically.Just wondering here without much knowledge — can’t Apple use forthcoming TSMC 2nm chips and create some sort of enterprise LLM type of GPU hardware rack that could compete with Nvidia on a cost-per-watt metric, to greatly reduce compute power and win the day?
No. The LLM itself is far more important than the hardware it’s running on. And 2nm efficiency isn’t going to unlock some new LLM that isn’t possible without it. It’s incrementally more efficient, not categorically.
Honestly I think Apple’s general approach to ML/AI - the path they’ve been on for years - is the best strategy for winning at ML/AI. They just need to resource it a little better and push harder. Building core models as a shared service in the OS and then having every corner of the OS/app ecosystem use those models to implement features is the right approach. Just keep going.
I think an open question is whether there’s a limit to the ROI for the naive ‘increase the number of nodes in the model’ approach to AI. NVIDIA and others will benefit greatly is throwing more transistors at the problem gives you a benefit in perpetuity. But my suspicion is that sooner rather than later the how the model is constructed (its design) rather than how many nodes does it have will be the most important factor.
There are far more organizations thinking that they will be able to monetize LLMs than actually will be successful. It will be interesting to see what happens to the chip market when all this comes crashing down in 3-5 years.LLM growth and what fabs are producing are diverging like a rocket taking off
Using freely-available audio content, from a public catalog, with the side benefits of furthering accessibility of said content? I don't understand the harm.I never claimed it was . I did say that transcribing podcasts to train AI would be incredibly brazen.
Oh lots of people have rightly observed that LLM growth and what fabs are producing are diverging like a rocket taking off. It's implausible that the approach can be sustained. A back of an envelope calculation will show how ridiculous it is.
There are far more organizations thinking that they will be able to monetize LLMs than actually will be successful. It will be interesting to see what happens to the chip market when all this comes crashing down in 3-5 years.
There’s no evidence that Apple is hellbent on putting AI in their OS in the same way Microsoft is. There is seven years of evidence that Apple is thoughtfully integrating AI into the product where it makes a difference in capability and usability.Microsoft and Apple seem hellbent on putting AI in their OS' and I get it (new shiny, more buzzwords), but I'm also not at a point where I trust it currently. As an end user, I'm not clamoring for this, just vague feature adds, now featuring AI.
Second that 100%. Data Detectors and the (sometimes) closely related 'Siri Intelligence' both feel like partially-executed features whose full vision has not been realized. And we have waited for sooo long.I 100% expect Apple to integrate AI into data detectors:
Data detection methods in other frameworks detect common types of data represented in text, and return DataDetection framework classes that provide semantic meaning for matches.DataDetection | Apple Developer Documentation
Access and utilize common types of data that the data detection system matches.developer.apple.com
Yeah. A ML DataDectector would analyze an email and extract calendar entries, map entries, reminders, alarms, and summaries.Second that 100%. Data Detectors and the (sometimes) closely related 'Siri Intelligence' both feel like partially-executed features whose full vision has not been realized. And we have waited for sooo long.
To wit: please explain, in this day and age, why the act transferring concerts, air travel, and other events onto my calendar is such a clumsy and inconsistent experience. WHY?? <shakes fist at sky>
If the AI push resolves this I will be forever grateful... it's low-hanging fruit at this point. IMO
Good question. iCalendar has been around for more than two decades, so what gives?please explain, in this day and age, why the act transferring concerts, air travel, and other events onto my calendar is such a clumsy and inconsistent experience. WHY?? <shakes fist at sky>
But it won't, as neural networks inherently have an error rate.If the AI push resolves this I will be forever grateful... it's low-hanging fruit at this point. IMO