That's in general a problem with dynamic languages with weak type systems. How "...

BiteCode_dev · on Feb 8, 2022

I don't think a type system can help you with decoding a file with the wrong charset.

morelisp · on Feb 9, 2022

A type system can refuse to turn a `bytes` into a `utf8str` until it's been appropriately parsed.

(It doesn't even need to be a very good or strongly-enforced type system - Go makes it dangerously easy to convert between `[]byte` and `string` by other-type-system standards, and yet everything works pretty well. It's enough to hitch your thinking and make you realize you need another step.)

BiteCode_dev · on Feb 9, 2022

How does the type system knows the bytes are utf8 and not CP850?

morelisp · on Feb 9, 2022

The same way you can parse a string into an integer and it knows you got an either[integer, err].

BiteCode_dev · on Feb 9, 2022

So it doesn't, and it's not about the type system. Any decode operation in python 3 raise an exception or return string.

It's about the runtime error handling making it mandatory to deal with the error and won't let you panic at run time for this specific error.

morelisp · on Feb 9, 2022

I'm sorry you don't see exceptions as part of a type system yet.

pvorb · on Feb 8, 2022

The compiler can raise a warning about bad characters. It can't detect all problems, but it certainly can help with some.

It's not the type system, but many dynamic languages are interpreted instead of compiled.

dotancohen · on Feb 9, 2022

Those bad characters are fed into the program long long after the compiler has done its job.

pvorb · on Feb 9, 2022

Not necessarily? I'm talking about constant string literals in source code which can be validated at compile time.

cafard · on Feb 9, 2022

But this is not a matter of UTF-8 in the code, rather a matter of UTF-8 in the input or output. How does compiling a program ensure that it is robust on a range of inputs?

morelisp · on Feb 9, 2022

> How does compiling a program ensure that it is robust on a range of inputs?

This is quite literally the job of a type system: to impose a semantic interpretation on sequences of "raw bits" and let you specify legal (and only legal) operations in terms of the semantic interpretation rather than the bits.

digisign · on Feb 8, 2022

There are a number of mitigations, so those kind of bugs are quite rare. In our large code base, about 98% of bugs we find are of the "we need to handle another case" variety. Pyflakes quickly finds typos which eliminates most of the rest.

wk_end · on Feb 8, 2022

This is the difference between people who embrace static typing and everyone else. A static type lover hears that 98% of your bugs are of the "we need to handle another case" variety and says, "well, that means you could have gotten rid of 98% of your bugs with better typing".

digisign · on Feb 9, 2022

No, what I mean is that an additional key comes in (with the json or similar hash) and we now need to do some thing with it, or something different than we thought we were supposed to with it. Typing is not going to fix it because the full cases were unknown at development time.

mekster · on Feb 9, 2022

> "your code compiles" is much closer to "your code is correct".

That is so far from the truth.

ben-schaaf · on Feb 9, 2022

How is it anything but the truth? The express purpose of static analysis, like a type system, is to catch bugs before running your code. That pretty clearly means that code that successfully compiles is closer to being correct than code that doesn't.

mekster · on Feb 9, 2022

I mean, correct as in logically correct and that's just a different matter than being grammatically correct.

morelisp · on Feb 9, 2022

The parser assures your code is grammatically correct; the type system assures your code is semantically consistent, which is usually a much stronger guarantee, and by most practical measures will be closer - often much closer, and for total functions on total types, sometimes all the way - to "logically correct".