
What began as a technology yet fully understood, Large Language Models (LLMs) emerged and quickly become the center of AI's trend.
At the heart of the technology, is neural networks trained on massive amounts of text to understand, generate, and respond to human language with eerie finesse. These models, made popular following the release of ChatGPT from OpenAI don’t just spit out canned responses—they learn patterns, nuances, and context to mimic human-like conversation and reasoning.
But here's the secret: their brilliance comes with an insatiable hunger for resources.
Microsoft has its own take of LLMs, calling it Copilot.
This time, researchers from the company managed to create an AI LLM model small enough to run on some CPUs.
Calling it the 'BitNet b1.58 2B4T,' it's a 1-bit LLM with two billion parameters, trained on four trillion tokens.
What sets this AI model apart is its ability to run efficiently on standard, off-the-shelf CPUs—no specialized hardware required. Despite its capabilities, it's lightweight enough to operate seamlessly on traditional, commercially available systems, making advanced AI more accessible than ever.
To achieve this feat, BitNet b1.58 2B4T relies on ultra-efficient 1-bit weights, limited to just three values: -1, 0, and +1.
This minimalist approach drastically reduces memory usage compared to mainstream AI models that typically rely on 16- or 32-bit floating-point precision. The result is a sleek, power-conscious model that requires far less computational overhead—perfect for environments where efficiency is everything.
In fact, BitNet b1.58 2B4T only consumes 400MB in non-embedded memory.
According to the team behind the AI on their technical report, this lightweight model can be compared with leading mainstream models, including Meta’s LLaMa 3.2 1B, Google’s Gemma 3 1B, and Alibaba’s Qwen 2.5 1.5B, and scored relatively well against these models in most tests.
In fact, it even reached the top in some of the benchmarks.
The model, which has been open-sourced, is readily available on Hugging Face, allowing anyone to experiment with it.

This simplicity however, comes at a cost.
First of, the LLM must use the bitnet.cpp inference framework for it to run this efficiently. The team specifically said that this model cannot perform efficiently "when using it with the standard transformers library, even with the required fork."
Those who wish to experiment with it, need to acquire the framework available on GitHub to take advantage of its benefits on lightweight hardware. The repository describes bitnet.cpp as offering "a suite of optimized kernels that support fast and lossless inference of 1.58-bit models on CPU "
Then, there is the fact that BitNets tend to trade off some accuracy in exchange for their lean architecture.
But BitNet b1.58 2B4T manages to close that gap in a rather clever way—by training on an enormous dataset, reportedly equivalent to over 33 million books. That sheer volume of knowledge helps compensate for the lightweight math under the hood.
AI models have often faced criticism for their heavy energy demands, both during training and day-to-day use. But lightweight language models like BitNet b1.58 2B4T offer a promising alternative. Designed to run efficiently on less powerful, everyday hardware, they open the door to local AI processing—without the need for sprawling, energy-hungry data centers.
This shift could be transformative.
By reducing reliance on specialized chips, NPUs, and high-end GPUs, lightweight models make artificial intelligence more accessible to those without the latest tech. It’s a subtle, yet powerful move toward a more inclusive and sustainable future for AI.