An expensive supercomputer built in the Arizona desert by a multibillion-dollar corporation in order to create superintelligence.
Sounds like a movie (or we wish it was) but it seems it’s something that is taking place as we speak, as Microsoft is allegedly investing in OpenAI to build the most advanced data center in the world.
The name of the mysterious project? Stargate.
Honestly, this number seems completely outrageous to me, considering the several recent advancements in the field toward a more efficient AI future.
Unless… I am being short-minded and the theory of long-inference AI models, models that barely exist on paper or that have gone no further from the most advanced AI labs in the world, is becoming a reality we — society — aren’t aware of or prepared for yet.
The Most Significant AI Law to Date
Consistently, the expansion laws of AI have refuted skeptics. Now, these two words are guiding Microsoft, the most valuable company globally, toward investing an amount equivalent to the GDP of the 66th largest economy into a single project: Stargate.
But what’s the rationale behind such a substantial investment?
Scaling remains both the primary option and the secondary one. Despite Large Language Models (LLMs) progressively nearing the trillion-parameter milestone, with pioneering models like GPT-4, Claude 3, or Gemini surpassing it, signs of saturation as models grow larger are nowhere in sight.
In simple terms, enlarging models consistently yields better results.
Technically speaking, the perplexity metric, used in training models, continues to decrease as LLMs expand.
But what is perplexity?
When a model lacks confidence in predicting the next token, it’s described as ‘perplexed’, indicating uncertainty in its predictions. Essentially, the higher the probability assigned to the correct word, the less perplexed (or more confident) the model is.
Considering all factors, the major tech companies backing these initiatives (Google, Microsoft, Meta, etc.) have every incentive to construct increasingly larger data centers.
For context, Microsoft, Meta, Amazon, and Google, in that order, collectively contribute around 40% of NVIDIA’s total revenue.
However, Microsoft might have an additional motive. Google’s computing prowess makes competitors seem inadequate. And as illustrated earlier, the gap is anticipated to widen further.
But is a $100 billion investment truly necessary?
Recent advancements suggest otherwise.
The Shift Towards Efficiency
Recent research trends highlight four key developments, suggesting that AI could become significantly more affordable in the long run.
- Everything is a MoE (Mixture-of-Experts): Nearly every new model incorporates a Mixture-of-Experts. This architecture efficiently divides the model into specialized groups of neurons, reducing the computational burden.
- The 1-bit Era: Microsoft is pioneering this movement. By reducing parameter precision, performance remains unaffected while significantly cutting costs. Additionally, converting parameters to binary eliminates the need for matrix multiplications.
- Hybrid Architectures: Models that blend attention mechanisms with subquadratic operators, such as Jamba, offer comparable performance to standalone transformers at a lower cost.
- Ring Attention: This distributed computing architecture, exemplified by recent models like Gemini 1.5 and Claude 3, minimizes memory requirements by partitioning sequences across multiple GPUs.
These advancements collectively paint a picture of increased efficiency and affordability in AI model deployment.
From Long Training to Long Inference
There’s mounting evidence, supported by research from institutions like OpenAI, MIT, and Google DeepMind, suggesting that extending inference time leads to significant performance improvements. However, such models incur substantial costs.
Before making any decisive moves away from or reaffirming commitment to Node.js, it’s imperative to ponder your project’s unique demands, evaluate the merits and drawbacks of each option, and assess the learning curve you or your team are prepared to navigate. The optimal solution awaits, and oftentimes, it hinges on aligning the technology with the task at hand.
Consider AlphaCode 2 and AlphaGeometry by Google DeepMind, which demonstrate impressive problem-solving capabilities but are hindered by high deployment costs due to their computational demands.
These models reflect our efforts to replicate human “System 2 thinking,” characterized by deliberate, conscious problem-solving. Currently, AI models predominantly utilize “System 1 thinking,” rushing to provide answers.
In summary, Project Stargate signals a significant advancement. Microsoft’s substantial investment suggests a focus on training the next generation of AI models, specifically long-inference search + generation models, indicating a paradigm shift in AI development.