This is amateur hour thermal analysis. If this proposal landed on my desk I would throw it in the trash can.
It took me 5 minutes to find a smoking gun with this design. With 110 m^2 and assuming only their average load (not the peak), and generously assuming perfect black body radiation, the outer surface of the radiator facing space would be at about 100 C. Uh oh. Dead right there.
Chip reliability goes down significantly when the junction temperature (the hottest temperature in the chip) exceeds 80 C. Every 10 degrees you go above this exponentially reduces chip life. This mostly occurs due to thermomechanical fatigue, where cycling stresses erode connections, cause further hot spots, and failure accelerates catastrophically.
Data centers on land are already having issues keeping
$NVDA chips cooler than 80 C, and keeping the HBM next to the GPU under 90 C.
$META published a notorious report about failure rates training Llama 3, where half of the chips failures were due to physical operational failure, and the failure rate was roughly 9% of chips PER YEAR.
This high failure rate is almost certainly due to thermal cycling.
Notice that the
$SPCX engineer mentions that most of the solar tech is already used on the Starlink satellite, implying this design isn't even really that radical. But they are about to learn the hard way the insidious scaling of surface area to volume ratio.
Having worked on a number of high power cooling applications, scaling a system up almost inevitably makes it harder to cool because the volume grows so much faster than the surface area. Motor designers I know joke that if you make a motor big enough, eventually you just start making really good heaters. This happens because heat is generated in larger volumes with no commensurate increase in surface area to pull that heat out, so everything tends to get hotter internally.
As the power consumption in the volume of the satellite grows, the surface area required to support it grows comically large.
So lets come up with a reasonable estimate for the temperature of the GPU junctions based on their design. Keep in mind, even if the fluid convection had no resistance, and even if there was no conduction resistance through manifolding from the chip to the fluid, and even if there were no contact resistances anywhere in the system, and even if the chips were perfect conductors, the junctions would still be too hot at 100 C.
In reality they will be higher. A great back of the envelope estimate for thermal resistance between a chip package and the fluid in a cold plate heat exchanger is about 0.01 K/W. This is considered pretty good state of the art. Even if I give them the benefit of the doubt and make aggressive assumptions, I would be remiss to say they are getting better than 0.005 K/W with a cold plate. To get better they are going to have to rework the packaging at the chip level.
In this very rosy picture, they are looking at a temperature rise just to the inside of the packaging of around 7 C. Then you have the temperature variation inside the packaging which would make this worse (but we will ignore it).
So even in the rosiest picture I can paint for them, they are getting chip temperatures of 107 C. Again, dead on arrival.
In reality, the chips will get much hotter. Without doing the analysis, it's not unreasonable to think that these chips won't be able to operate at average power under about 150 C.
To get the radiator to emit at an average temperature of 100 C, the fluid actually has to get much hotter. As the fluid moves through the radiator, it will cool down, reducing the total heat dissipated by the radiator. You can get around this somewhat by massively overpowering your pumps so they are pumping an enormous amount of fluid, but the weight required will not be kind on the payload. For example, if you want a 1 degree temperature drop across the radiator, you will need around a 10-20 GPM pump, which generally is around 10-20 HP (7.5-15 kW) and weighs 200-300 lbs each! If they want redundancy, just the pumps will be 15% of the weight of the payload at 70 kW/metric ton!
If we reduce the pumps by 10x, expect a drop in fluid temperature around 10 C. So now the chip is nearly 120 C. Add in imperfect emissivity and contact resistances, and your junction temperatures will easily exceed 150 C in the chip and the chips will fail.
But again, these are all details. Even in a completely perfect system, the chips run at 100 C and will fail.
Now to be clear, do I think that with enough time and money you could get a GB300 rack to run in space? Sure. But this looks a a quagmire of a decade project that will either drastically under-deliver or just get canned because it's extremely impractical and is not competitive with land based systems.
Literally the only positive for putting these in space is the lack of regulations to put them there.