Google has recently launched its brand-new multimodal open-source AI model, Gemma 3n. The standout feature of this model is its ability to run locally on smartphones with just 2GB of memory, offering developers a seamless AI application experience. Gemma 3n supports not only text input but also voice, image, and video data processing, enabling users to unlock full multimodal capabilities on mobile devices.
As part of the Google Gemma model family, the development philosophy of Gemma 3n consistently emphasizes openness and customization. The officially released E4B main model and E2B sub-model are both available for download on the Hugging Face and Kaggle platforms. These models are structured using the “MatFormer” technology, resembling a Russian doll in their architecture, with E2B serving as a streamlined version of E4B that can operate independently.
The E2B model, designed specifically for mobile platforms, boasts 5 billion parameters and successfully compresses memory usage to just 2GB. This achievement is thanks to its innovative “Per-Layer Embeddings (PLE)” design, which significantly reduces the data access burden during model execution, allowing memory-intensive models to run smoothly on smartphones.
It’s worth mentioning that the Gemma 3n E4B model boasts 8 billion parameters, and in practice, its memory consumption during operation is comparable to traditional 4 billion models, requiring only 3GB. In terms of performance, the Gemma 3n E4B even surpasses the GPT 4.1-nano and outperforms larger models like the Llama 4 Maverick 17B-128E and Phi-4.
Currently, Gemma 3n supports multiple execution environments, including Hugging Face Transformers, llama.cpp, Google AI Edge, Ollama, and MLX. Users can also run the Google AI Edge Gallery application locally on mobile devices like the Pixel 8 Pro. Additionally, users can test its chat functionality in Google AI Studio.
Google has showcased the potential of AI models freed from hardware limitations through Gemma 3n. This design, which allows for open downloads and low memory requirements, could drive the rapid adoption of multimodal AI applications in smartphones and IoT devices in the near future.



