Apple wants AI to run directly on its hardware instead of in the cloud::iPhone maker wants to catch up to its rivals when it comes to AI.

  • Apollo2323@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    8
    ·
    9 months ago

    I am really jealous being an android user that Apple usually have this things running on their own devices with more privacy in mind. But I just can’t accept the close garden Apple push.

    • KubeRoot@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 months ago

      Funny that you mention it, a few months ago when updating stuff I got a new feature on my Android phone… Offline subtitle generation based on audio, just realtime generated from anything outputing sound on my phone.

      A Google search suggests this might be an older feature - not sure if my phone didn’t support it, or if I maybe just missed it, or if they added a more obvious button.

      Google has a separate app for that stuff, called Private Compute Services. Right now it’s nothing like an offline Google assistant replacement, but I thought it’s really nice to have that stuff available without relying on internet access.

  • Diabolo96@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    edit-2
    9 months ago

    It’s already possible. A 4bit quant of phi 1.5 1.5B (as smart as a 7b model ) takes 1Gb of ram . Phi 2 2.6B (as smart as a 13b model ) was recently released and it would likely take 2GB of RAM with 4bit Quant (not tried yet) The research only license on these make people not want to port them to android and instead focus on weak 3B models or bigger models ( 7b+) which heavily limit any potential usability.

    1. Apple could mimic and improve the phi models training to make their own powerful but small model and then leverage the fact that they have full knowledge and control over the hardware architecture to maximize every drop of performance. Kinda like how the some people used their deep knowledge of the console architecture to make it do things that seems impossible.

    Or

    1. The Apple engineers will choose, either due to time constraints or laziness to simply use llama.cpp which will certainly implement this flash attention and then use an already available model that allow its use for commercial purposes like mistral, add some secret sauce optimizations based on the hardware and voilà.

    I bet on 2.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    8
    ·
    9 months ago

    AKA “we completely missed the boat on this thing and are going to pretend it was intentional by focusing on an inevitable inflection point a few years out from today instead.”