People waiting for better coding models don't realize that the quadratic time and space complexity of self-attention hasn't gone anywhere. If you want an effective 1M token context, you need 1,000,000,000,000 dot products to be computed for you for each of your requests for new code.
Right now, you get this unprecedented display of generosity because some have billions to kill Google while Google spends billions not to be killed.
Once the dust settles down, you will start receiving a bill for each of those 1,000,000,000,000 dot products. And you will not like it.