spps11's comments

spps11 · on Feb 21, 2025

pyspur is apache 2. it is free to self-host.

spps11 · on Feb 20, 2025

i feel this question correlates more with the generation of the SWE more than anything.

expecting current gen SWEs to talk about network layer protocols while answering this is kinda the same as expecting 1990s SWEs to include wire physics and dispersion statistics in their answer to this question.

Depth alone isn't always a good indicator. We have to move on from some of the low level stuff at some point and it is okay for engineers to know in detail about things that have been solved long back.

Swizec · on Feb 20, 2025

Surely that depends what your hiring them for?

The nice thing about the website load question is that it touches every part of the stack. You could talk for an hour about rendering on screen at the OS level, or network protocols, or server stuff, or web client stuff, or data center stuff, or …

Really to answer the question in its full entirety would be the equivalent of that “Everything that goes into a pencil” essay. You could build an entire college curriculum just out of that question.

LoganDark · on Feb 21, 2025

> that “Everything that goes into a pencil” essay

which one! I tried googling this but without quotes it gives me clickbait and with quotes it only points to this comment. :(

baggy_trough · on Feb 21, 2025

I, Pencil

LoganDark · on Feb 21, 2025

thank you!!

panza · on Feb 21, 2025

I think it's fine to ask a web developer how web pages are served.

If they can talk about network layer protocols, then that tells you something. If the next candidate's understanding stops at knitting libraries together, then that's also notable.

Even if it's not a discriminating factor in hiring, it still helps you flesh out the candidate.

notarealllama · on Feb 20, 2025

That's the beauty. I could care less if my backend JS candidate knows anything about Ethernet frames, but I damn well expect h3 and async discussion.

But my devops guy? He better be talking about CDN, cavhing, WAF.

rendaw · on Feb 21, 2025

Do you prompt them? Expecting them to talk about CDN and WAF seems unfair, those go beyond the definition of the web.

hadlock · on Feb 21, 2025

For a senior role? no. They should be asking you how deep you want to go. And you should be the one stopping them for time. If your SRE/DevOps can't wax poetic about the deep deep details, how are they going to get you out of the ditch when production goes down at 11am on a Tuesday, or 11pm on a Saturday

miyuru · on Feb 21, 2025

> CDN and WAF seems unfair, those go beyond the definition of the web

Current web is full of WAFs and CDNs, devops should know that they exist at the very least.

dartos · on Feb 21, 2025

I’d expect an SRE to know about all that.

fud101 · on Feb 21, 2025

what is h3?

theandrewbailey · on Feb 21, 2025

HTTP/3

spankalee · on Feb 21, 2025

Why should "current generation" SWEs know about network protocols? We live in a more networked world, not less.

spankalee · on Feb 21, 2025

Oops! Bad, meaning-reversing omission: I mean "not know".

tabony · on Feb 20, 2025

100% a generation thing.

Been around long enough so I've written my own CSS framework, server-side framework, web browser, web server, sent bits over Ethernet, written assembler, programmed a FPGA, built circuits, AND have a electrical engineering degree... and YET, _ABSOLUTELY NONE_ of this is useful _99%_ of the time.

So meh. If I want you to do frontend, I will ask you frontend questions. Hopefully you can go deep on a11y and that's what I care about.

(But the 1% of the time when I can precisely step through a whole stack is also fun.)

notarealllama · on Feb 20, 2025

Exactly, someone who can answer where media queries fall in terms of loading priorities for FE (when hiring for that.)

Electrical engineering is a whole nother thing!

dylan604 · on Feb 20, 2025

Hence the differences in the level of schooling. Graduating a coding boot camp for using React vs years of engineering school. Why are the interviewers are confused on this is the unsane thing.

layer8 · on Feb 21, 2025

Yes, I would drop the “in excruciating detail” and “everything”. The level of detail the interviewee starts at or goes down to by default is also informative.

spps11 · on Feb 20, 2025

very cool! always wondered what that tower was haha, now i know!

spps11 · on Feb 20, 2025

Thanks for sharing, enjoyed reading it!

I have a slightly tangential question: Do you have any insights into what exactly DeepSeek did by bypassing CUDA that made their run more efficient?

I always found it surprising that a core library like Cuda, developed over such a long time, still had room for improvement—especially to the extent that a seemingly new team of developers could bridge the gap on their own.

saagarjha · on Feb 21, 2025

They didn’t. They used PTX, which is what CUDA C++ compiles down to, but which is part of the CUDA toolchain. All major players have needed to do this because the intrinsics for the latest accelerators are not actually exposed in the C++ API, which means using them requires inline PTX at the very minimum.

t55 · on Feb 20, 2025

They basically ditched CUDA and went straight to writing in PTX, which is like GPU assembly, letting them repurposing some cores for communication to squeeze out extra performance. I believe that with better AI models and tools like Cursor, we will move to a world where you can mold code ever more specific to your use case to make it more performant.

suresk · on Feb 21, 2025

Are you sure they ditched CUDA? I keep hearing this, but it seems odd because that would be a ton of extra work to entirely ditch it vs selectively employing some ptx in CUDA kernels which is fairly straightforward.

Their paper [1] only mentions using PTX in a few areas to optimize data transfer operations so they don't blow up the L2 cache. This makes intuitive sense to me, since the main limitation of the H800 vs H100 is reduced nvlink bandwidth, which would necessitate doing stuff like this that may not be a common thing for others who have access to H100s.

1. https://arxiv.org/abs/2412.19437

t55 · on Feb 21, 2025

I should have been more precise, sorry. Didn't want to imply they entirely ditched CUDA but basically circumvented it in a few areas like you said.

pjmlp · on Feb 21, 2025

Targeting directly PTX is perfectly regular CUDA, and used by many toolchains that target the ecosystem.

CUDA is not only C++, as many mistake it for.

spps11 · on Feb 20, 2025

got it, thanks for explaining.

> with better AI models and tools like Cursor, we will move to a world where you can mold code ever more specific to your use case to make it more performant

what do you think the value of having the right abstraction will be in such a world?

t55 · on Feb 20, 2025

I think that for at least for us dumb humans with limited memory, having good abstractions makes things much easier to understand

spps11 · on Feb 21, 2025

Yes, but I wonder how much of this trait is carried over to the LLMs from us.

t55 · on Feb 21, 2025

what do you mean, the LLM abstracting things for us while we speak to it?

spps11 · on Feb 21, 2025

No I meant something else. As you said: us humans love clean abstractions. We love building on top of them. Now LLMs are trained on data produced by us. So I wonder if they would also inherit this trait from us and end up loving good abstractions, and would find it easier to build on top of them. Other possibility is that they end up move-37ing the whole abstraction shebang. And find that always building something up bespoke, from low-level is better than constraining oneself to some general purpose abstraction.

tomnipotent · on Feb 21, 2025

It's an interesting idea.

If code is ever updated by an LLM, does it benefit from using abstractions? After all they're really a tool for us lowly sapients to aid in breaking down complex problems. Maybe LLM's will create their own class of abstractions, diverse from our own but useful for their task.

t55 · on Feb 21, 2025

ah gotcha. I think that with the new trend of RLing models, the move 37 may come up sooner than we think -- just provide the pretrained models some outcome-goal and the way it gets there may use low-level code without clean abstractions

spps11 · on Jan 28, 2025

When you say sequence length, does it only count the output tokens or are input tokens also included in that?

Thanks for the post, it was an excellent read!

t55 · on Jan 28, 2025

Thanks for reading! In most contexts (including this one), seq length encompasses both the initial input (prompt) tokens and the output tokens the model generates. It’s the total length of all tokens processed by the model so far.

maxmolecule · on Jan 29, 2025

Please do! Seeing that you used multiple research papers to back up this writing inspired me to use this in my current research project for the literature review and eventual write up.

The template will be hugely helpful for a non-programmer like me.

spps11 · on Jan 3, 2025

Is it unethical to short your own company's fake-coin?

t55 · on Jan 4, 2025

Not if nobody knows!