Apple releases eight OpenELM AI small language models for on-device use - Computing

Grouped under OpenELM – efficient language model – the models provides a training and inference framework trained on publicly available datasets for researchers and developers to work with.

Small language models are intended to extend AI to simpler but more targeted tasks capable of running on less processing power than mainstream large-language model (LLM) AI. That makes them ideal for businesses that may not require the full power of more complex AI models, or for discrete consumer applications. They may be practical for tasks like data analysis, document summarisation and basic chatbot interactions.

OpenELM currently comes in eight models – four pre-trained and four instruction tuned.

The team of researchers explained: “OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. We pre-trained OpenELM models using the CoreNet library. We release both pre-trained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.

“Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens.”

However, the release notes also come with the usual disclaimers: “Trained on publicly available datasets, these models are made available without any safety guarantees…

“Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts. Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.”

Apple’s OpenELM release came just days after Microsoft revealed its own small language open AI model, Phi-3. Microsoft claims that the latest iteration of its small language model technology is significantly faster than Phi-2 and capable of providing responses similar to language models up to ten times in size.

With Phi-3, according to Eric Boyd, corporate vice president of Microsoft Azure AI Platform, Microsoft used a LLM AI to create children’s books from just over 3,000 words on which to train Phi-3.

It goes without saying that a number of providers and development teams are working on small language models in addition to Apple and Microsoft. These include Google, with Gemma; Facebook owner Meta with Llama 3; and, the open source Mistral AI project.

READ SOURCE