Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
cma
on Jan 28, 2025
|
parent
|
context
|
favorite
| on:
How has DeepSeek improved the Transformer architec...
Flash attention was also a set of common techniques in other areas of optimized software, yet the big guys weren't doing the optimizations when it came out and it significantly improved everything.
whimsicalism
on Jan 28, 2025
[–]
yes, i agree that low-level & infra work is where a lot of deepseek's improvement came from
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: