Microsoft's BitNet b1.58 uses weights of only -1, 0, and 1, slashing memory by 94%. This sounds like GPUs should get cheaper, right? Wrong. Companies won't lower prices when models get efficient - they'll just run more models and make more money. The efficiency gains happen after
Blog