Apple’s new on-device generation AI for the iPhone should not come as a surprise, but its method might, says The Register

Opinion Apple’s endeavors to incorporate inventive AI into its iDevices shouldn’t shock anyone, but Cupertino’s existing uses of the technology, and the limitations of mobile hardware, indicate it will not be a prominent feature of iOS in the immediate future.

Apple has refrained from joining the recent surge of inventive AI support, even generally shunning the terms “AI” or “Artificial Intelligence” in its recent keynote presentations compared to many businesses. Nevertheless, machine learning has been, and continues to be, a fundamental capability for Apple – primarily in the background serving discreet enhancements to the user experience.

Apple’s implementation of AI to manage images is one instance of the technology in the background. When iThings capture photos, machine learning algorithms go to work to recognize and tag subjects, conducting optical character recognition, and inserting connections.

In 2024 that type of covert AI isn’t satisfactory. Apple’s competitors are promoting inventive AI as an essential capability for every device and application. According to a recent Financial Times report, Apple has been secretly acquiring AI companies and developing its own extensive language models to guarantee it can deliver.

Apple’s hardware benefit

Neural processing units (NPUs) in Apple’s proprietary silicon manage its current AI implementations. Apple has utilized the accelerators, known as “Neural Engines” since the introduction of 2017’s A11 system-on-chip and employs them to handle smaller machine learning workloads to liberate a device’s CPU and GPU for other tasks.

Apple’s NPUs are remarkably potent. The A17 Pro found in the iPhone 15 Pro is capable of delivering 35 TOPS, twice that of its forerunner, and about twice that of some NPUs Intel and AMD offer for use in PCs.

Qualcomm’s latest Snapdragon chips are comparable to Apple’s in terms of NPU performance. Like Apple, Qualcomm also has years of NPU experience in mobile devices. AMD and Intel are relatively new to the field.

Apple hasn’t disclosed floating point or integer performance for the chip’s GPU, although it has boasted its proficiency running games, such as the Resident Evil 4 Remake and Assassin’s Creed Mirage. This suggests that computational power isn’t the constraining factor for running larger AI models on the platform.

Further supporting this is the fact that Apple’s M-series silicon, used in its Mac and iPad lines, has proven to be especially potent for running AI inference workloads. In our testing, given sufficient memory — we encountered difficulties with less than 16GB — a now three-year-old M1 Macbook Air was more than capable of running Llama 2 7B at 8-bit precision and was even more responsive with a 4-bit quantized version of the model. By the way, if you want to try this on your M1 Mac, makes running Llama 2 very easy.

Where Apple may be compelled to make hardware compromises is with memory.

Usually, AI models require about a gigabyte of memory for every billion parameters, when operating at 8-bit precision. This can be reduced either by moving to lower precision, something like Int-4, or by developing smaller, quantized models.

Llama 2 7B has become a common reference point for AI PCs and smartphones due to its relatively minor footprint and computation requirements when running small batch sizes. Using 4-bit quantization, the model’s requirements can be reduced to 3.5GB.

But even with 8 GB of RAM on the iPhone 15 Pro, we suspect Apple’s next generation of phones may require more memory, or the models will need to be smaller and more focused. This is likely one of the reasons that Apple is choosing to develop its own models rather than adopting models like Stable Diffusion or Llama 2 to run at Int-4, as we’ve seen from Qualcomm.

There’s also some evidence to indicate that Apple may have found a solution to the memory problem. Discovered by the Financial Times, back in December, Apple researchers published [PDF] a paper demonstrating the ability to run LLMs on-device using flash memory.

Anticipate a more cautious approach to AI

When Apple introduces AI functionality on its desktop and mobile platforms, we anticipate it to adopt a relatively cautious approach.

Transforming Siri into something individuals don’t believe they need to address like a pre-school child appears to be an obvious starting point. Accomplishing that could entail assigning an LLM the task of interpreting input into a format that Siri can comprehend more easily, enabling the bot to provide superior responses.

Siri could become less easily disoriented if you phrase a query in a circuitous manner, resulting in more efficient responses.

In theory, this should yield a couple of advantages. The first being Apple should be able to suffice with employing a much smaller model than something like Llama 2. The second is that it should predominantly avoid the issue of the LLM generating erroneous responses.

We could be mistaken, but Apple has a history of being tardy in implementing the latest technologies, but then achieving success where others have faltered by dedicating time to refine and enhance features until they are genuinely beneficial.

And for what it’s worth, inventive AI has yet to demonstrate it’s a triumph: Microsoft’s extensive chatbot endeavor to revitalize no one’s favorite search engine Bing has not translated into a notable market share increase.

Apple, on the other hand, claimed the title as the leading smartphone vendor of 2024 while deploying only covert AI. ®

Leave a Reply

Your email address will not be published. Required fields are marked *