As large language models (LLMs) grow increasingly power-hungry, researchers are shifting toward smaller, more efficient models – Small Language Models (SLMs) – with just a few billion parameters, designed for specialized tasks and lower energy use.
Large language models: Power from hundreds of billions of parameters
The latest large language models (LLMs) from OpenAI, Meta, and DeepSeek are built with hundreds of billions of parameters—the components that define relationships between data and are fine-tuned during training. The more parameters a model has, the better it can recognize patterns and make connections, resulting in greater power and accuracy.

The massive energy and resource costs of LLMs
But that power comes at a cost. Training a model with such a vast number of parameters requires immense computational resources. According to Wired, Google reportedly spent $191 million training its Gemini 1.0 Ultra model. LLMs also consume huge amounts of energy every time they respond to a prompt, making them notoriously power-hungry. The Electric Power Research Institute (EPRI) estimates that a single ChatGPT request uses roughly 10 times the energy of a standard Google search.
SLMs: A more efficient, cost-effective approach
To address these concerns, researchers are turning to smaller-scale models. Tech giants like IBM, Google, Microsoft, and OpenAI have recently released small language models (SLMs) that use only a few billion parameters—a fraction compared to LLMs.
Unlike general-purpose LLMs, SLMs are designed for narrow, specific tasks such as summarizing conversations, answering health-related queries as medical chatbots, or collecting data from smart devices.
“For many tasks, an 8-billion-parameter model works surprisingly well,” says Zico Kolter, a computer scientist at Carnegie Mellon University. These smaller models can even run on laptops or smartphones, rather than relying on massive data centers. While there’s no universally accepted definition of an SLM, most are capped at around 10 billion parameters.
Knowledge distillation: Learning from the big models
To make training more efficient, researchers use clever methods like knowledge distillation. While LLMs are trained on massive and often messy datasets scraped from the internet, they can be used to generate high-quality training data tailored for smaller models.
This teacher-student approach enables the smaller model to “learn” from the larger one in a more efficient way. “The reason SLMs perform so well with limited data and size is that they’re trained on cleaner, higher-quality datasets,” Kolter explains.

Researchers are also developing SLMs by starting with large models and then scaling them down. One common technique is “pruning,” which involves removing unnecessary or inefficient components of the neural network—a complex web of interconnected data points that underpins a large model.
Inspired by the human brain and the theory of “Optimal Brain Damage”
The idea of pruning draws inspiration from the human brain—a biological neural network that becomes more efficient by trimming synaptic connections as we age. Modern pruning methods trace back to a 1989 study by computer scientist Yann LeCun, now at Meta. In his research, LeCun argued that up to 90% of the parameters in a trained neural network could be removed without sacrificing performance. He coined this approach “optimal brain damage.”
Pruning allows researchers to fine-tune smaller language models for specific tasks or environments, making them more efficient and adaptable without compromising their effectiveness.
SLMs offer lower cost, greater flexibility
Smaller models also make it easier and cheaper for researchers to experiment. With fewer parameters, their reasoning can be more transparent and easier to interpret. “If you want to build something new, you have to experiment. Small models let researchers take those risks with lower stakes,” says Leshem Choshen, a scientist at the MIT–IBM Watson AI Lab.
LLMs still matter, but SLMs are the practical path forward
While large, expensive LLMs remain vital for broad-use applications such as general-purpose chatbots, image generation, and drug discovery, smaller, focused models are proving more practical for many users. They are easier to train, more energy-efficient, and significantly cheaper to run.
“These efficient models can save money, time, and computing resources,” Choshen notes.
Source: Thu Thảo (VnExpress)

Tin cùng chuyên mục:
DeepSeek recruits talent for product development, hints at new AI model upgrade
Intel unveils new AI chip strategy, officially challenges Nvidia
Machine learning & AI help Vietnamese developers attract users effectively
The “Trimming” trend: Optimizing large language models for efficiency