- the web is numerous specifications: HTTP, HTML, CSS, JS, SVG. At worst, a regular compiler needs to worry about macros and the language syntax
- each of those specifications has numerous versions which, in some cases, can be significantly different from other versions of the same language or protocol. A language compiler generally only focuses on one version of that language
- A browser needs to support broken websites. A compiler only needs to fail gracefully
- A browsers output is graphical, which is much harder to unit test
In short, you’re dealing with a harder problem across a broader number of specifications. I would liken writing a browser more closely to writing a new graphical OS than writing a compiler.
(“Browser” here means “browser + engine et al” and not just a reskin of Chromium).
A browser rendering engine's output isn't purely graphical, and most things can be tested through other means such as by reading console.log output, looking at the DOM, or looking at computed CSS styles and/or bounding-box information.
In fact there's a good reason to keep graphical tests to a minimum: web specs do not dictate things down to the pixel level, so pixels can shift around from version to version, requiring the occasional golden data rebase.
Fun aside: Chrome's test suite contains a font named ahem.ttf where (almost) every character is an identical black rectangle. This allows tests to include text without relying too much on the details of a particular font.
The main reason it's difficult is because the output criteria seem properly defined but they actually aren't at all.
Yeah it's "just" building some parsers and figuring out live updates, but you have to keep in mind that this is ~the internet~. People have been uploading broken, against spec, webpages since forever. Coding a web browser as a serious project (so not as a flight of fancy) borders on the impossible mostly because of that.
The main sites people test against/use aren't the "simple" CSS/JS/HTML sites from the past. Few people will care for a browser whose main job is to be able to render a neocities website. People want their popular sites working - Discord, Facebook, reddit, twitter. All of those are big JS apps.
The real bugbear here is JS though, HTML and CSS are complex but workable. JS is an ever-moving target as spec implementers (mostly Chrome) dump more and more of the jobs a browser was meant to do as the user agent into JS[0]. (And that's without delving into how widevine became part of the spec, which means it's legally impossible to make a fully spec compliant browser.)
Polyfills can offer a lot of fallback/lenience, but polyfills are a moving target too - older browsers get deprecated, polyfills get removed for performance/optimization reasons, so your baseline spec for functional JS becomes ever-increasing unless you somehow get the people making popular JS libraries to accept that your browser project is important enough to keep the necessary polyfills around for.
[0]: Presumably so that Google can take away the User part from the browsers job as the User Agent, but typically covered up as a poorly defined "privacy problem".
This comment gets some pretty important fundamentals wrong.
> The real bugbear here is JS though, HTML and CSS are complex but workable. JS is an ever-moving target
What you characterize as "JS" is, in reality, more HTML and CSS than JS. JS is a language. The fact that all the behavioral details of the HTML and CSS objects and related host objects have bindings available to JS programs does not make those things "JS"...
Doing a new JS engine from scratch is an order of magnitude easier than doing a browser engine. It is directly analogous to the eminently tractable "building a compiler" problem that the other commenter mentioned.
Fair. I meant JS here as in "fully DOM compatible, as-used-in-your-browser JS". JS engines themselves aren't that hard to make (I think there's about 9 or 10 actively maintained ones?), but to make one that's usable in situations that aren't things like node or as a sub-language in a different project... that's far more difficult.
> "fully DOM compatible, as-used-in-your-browser JS"
That's still wrong.
s/DOM//
s/JS/DOM/
Continuing to say JS when you're really talking about what is, again, still in the land of HTML, CSS, etc just confuses things. Viz:
> to make [a JS engine] that's usable in situations that aren't things like node or as a sub-language in a different project... that's far more difficult
It's really, really not about JS. You don't make a browser that's compatible with the Wild Wild Web by adding stuff to the JS engine. You do it by implementing moar browser.
I think the distinction comes from the fact that a compile that's unfinished is unfinished but as a devloper you knew that and either you contribute or you suck it up.
A browser that's unfinished really cant be used by users at all. Either it lacks security, so nobody should use it, or it lacks vital features (of the spec, not end-user features), so nobody can really use it because every time a website relies on that API something doesn't work.
I don't think anyone has argued that building a browser from scratch is impossible, it's clearly not, just that building a competitive engine from scratch is impossible. SerenityOS is a sort of very cool art project, it's not attempting to justify itself in any specific way. If they make an engine that's 1% as good as Blink, works OK for the sites the authors personally care about and eventually they lose interest, OK, so what, no big deal.
It depends on how much of a browser you want to implement I guess. Comparing it with a compiler is a skewed comparison I think; compilers are built as part of many people's college/university education, but are only a small part of turning programming language into working software. Likewise, I'm sure most developers on here could feasibly write a web browser that can fetch websites and render the HTML.
But that's just one aspect, next you need to add support for CSS [1] and Javascript [2], each of which has had lifetimes of work invested in the standards and implementations.
So yeah, while it's doable to build a new browser, if you want to build a big one that has feature parity or is on-par with the existing browser landscape, you need a large team and many years of work. And that's just the practical aspect, the other one is, would a new browser actually be better? Could it compete with the existing market? So many players have just given up over time.
Big problem is the constant feature churn in the web space. Getting from zero to browser is probably doable. Staying at the mark with the ever shifting CSS standards and the constant deluge of web extensions is hard and expensive.
I don't think that's true at all. We get only a handful new CSS features every year, and features introduced today are much more carefully defined than the ad-hoc features of yesteryear. Implementing them is pretty straightforward. Certainly not more difficult than any of the million other things you have to do when building an OS from scratch.
The difficulty of building a state-of-the-art browser is almost entirely about performance. Everything else is straightforward by comparison.
> features introduced today are much more carefully defined than the ad-hoc features of yesteryear.
Many of them are just as ad-hoc, even if they are better defined, and meant to cover some holes in previous ad-hoc specifications. For example, the entire `subgrid` spec is patching one specific hole which actually has a proper general definition: "These <children> however are independent of the parent and of each other, meaning that they do not take their track sizing from the parent. " [1]
So instead of solving that general problem, we have a hyper-specific patch for a single feature. Which will definitely clash with something else in the future.
I mean, the entire web components saga is browser developers patching one hole after another that exist only because the original implementation was just so appalling.
> The difficulty of building a state-of-the-art browser is almost entirely about performance.
But that performance is directly affected by the numbe rof specs and features.
What is harder, starting with nothing and building the equivalent of 2020 Chromium from scratch, or taking Chromium from 2020 and extending it with the new features to get it up to date with today's Chromium?
The former is 100x harder than the latter, and the prior statement that new CSS/JS features are a burden to keep up with is patently absurd. Because it's a tiny amount of work relative to the total work required to make a browser. (But still hard in the absolute sense, because browser engines are among the most complicated software projects.)
Posing the problem as making just one change hides the problem. The problem is long term. A constant deluge of externally driven changes inevitably creates a quagmire of technical debt since you have no choice but to implement them regardless whether your chosen architecture is suitable for the change.
> the prior statement that new CSS/JS features are a burden to keep up with is patently absurd.
Chrome ships up to 400 new APIs a year (that is JS, CSS etc.)
Safari and Firefox ship 150 to 200 new APIs a year. [1]
Even Microsoft gave up on trying to keep up with browser development and switched to Chromium.
> Because it's a tiny amount of work relative to the total work required to make a browser.
That is, like, the primary work required. And many of those things often don't even have a solution until someone finally figures them out in a performant manner (like CSS's :has)
> I'm finding it weird that unlike other non-trivial projects like OSes or compilers, people often discourage building web browser engine because it is "hard" or something like that like... how is it different from building a compiler?
A conformant C++ compiler is in the same ballpark as a browser, but a naive C compiler is orders of magnitudes simpler.
Recreating the Windows OS is in the same ballpark as a browser, but a simple OS that boots and runs command-line apps is orders of magnitudes simpler.
People don't casually start new C++ compilers or projects such as WINE, but they do start toy compilers and OS all the time.
Compilers aren't routinely built anew either, except for new languages. Browsers are rarely written around a new language. (if you did invent a new language substituting html+css+js you'd likely want to implement it on the existing stack first, kind of like an equivalent to "complies to C")
people often discourage building web browser engine because it is "hard" or something like that
like... how is it different from building a compiler?
You gotta build HTML parser, CSS parser, figure out a fancy structure to represent those concepts and modify at fly.
Also there's difference between making it work and making state of the art.
That persons says that there's shitton of RFCs - yea sure, but you don't aim to support everything from the beginning.
Let's start with HTML + CSS, then build basic js interpreter