Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pretty interesting that they only used about $15k worth of resources (retail price) to achieve this. It's not a technique that would have been out of reach for other organizations based only on not being able to afford the compute.


That’s only for the final model. To find it, they’d need to run 1,000 experiments, trying many high-level approaches, many architectures for each component, hyperparameter search, and multiple seeds. Large machine learning projects need $10M in capital.


I bet it's still a lot less than they spent training AlphaStar.


The tech might not be out of reach but the talent pool is.

Whether it's good PR or not is to be debated, but it seems that the talent at DeepMind simply can accomplish things other's can't.


Based on the going rate of a 32-core TPUv3 slice ($32/hr USD) running "for a few weeks", isn't this closer to $65k USD?


One could buy 200 GPUs for cheaper, I think that's where the other comment's price estimate came from.


It says $1,752/mo for v3-8, so I just multiplied it 8x.


Fair enough, that calculation is still a bit off if they used 128 cores (16x instead of 8x). Not that it really matters...


I'm pretty sure that this took more than 1 junior engineer-month.


How much would the labor cost, though?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: