I've spent two decades writing C and C++, but the last 8-9 years in really high-level languages (Ruby, Javascript, Python). From either end of the spectrum, I've never felt the need for such emphasis on fixed-sized numeric types.
I've commonly needed access to fixed size numerics, like when sending texture formats to the GPU, defining struct layout in file formats and network protocols, but I have never once thought: "You know what, I'd like to make a decision as to the width of an integer every time I declare a function."
"Just has to work" low level code was the norm during the 16-bit to 32-bit transition, so it was a fucking pain. Notice how smooth the 32-bit to 64-bit transition went? (and yes, it was smooth.) I credit that to high-level languages that don't care about this stuff, and people using better practices like generic word-sized ints and size_t's in lower level code. Keep that stuff on the borders of the application.
I've noticed a decent-sized emphasis on type size in both Crystal and Swift, two not-entirely braindead newer languages. I don't get it, it's a big step backward.
C's normal integer types work the closer to the way you want. When you say you're returning an int, you're not really specifying the exact size at all.
For those who don't know, C's char, short, int and long data types aren't really "fixed size". The standard defines their order: sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long), but the actual size of these integer types depends on the platform and compiler. If you want to use fixed size ints, you'd need to use the uint8_t, uint16_t, uint32_t types from stdint.h. Although interestingly enough C99 doesn't require the platform to provide these types, it only mandates the uint_least_n_t and uint_fast_n_t types (n=8,16,32..) which are guaranteed to be at least, but not necessarily exactly the requested width.
Now you may be thinking "sure the standard says they could be different, but they aren't really, a char is 8bit, a short is 16-bit an int is 32-bit a long is 64-bit". So here's a couple examples for you: many DSP platforms have a C compiler where char is 16-bit. Also Microsoft Visual c++ on windows on x86_64 compiles longs as 32-bit while GCC on Linux on x86_64 compiles longs as 64-bits.
>Notice how smooth the 32-bit to 64-bit transition went? (and yes, it was smooth.)
I know you said it was smooth, and maybe it was for some applications. But for many others, the 32-bit to 64-bit transition actually caused a lot of problems! Andrey Karpov has already done a great job of categorizing many of them, so I won't waste my time repeating him, but you can read his list here: http://www.viva64.com/en/a/0065/
Yeah, I'm arguing that the situation you describe (accurately) is better than baked-in sizes all over the source code.
Use the platform-native types (whatever size they may be) unless you have a reason not to, then use <stdint.h> or an analogue. If CHAR_BIT is 13, let char be 13 bits: the platform probably chose that for a reason. When you have to pack it into a TCP header, do your strict fixed-width stuff there.
I completely agree with you when it comes to function return types.
For data structure fields, especially structs used many, many times like in very large arrays, I'd say it's sometimes worth using fixed size types to get better control over memory use. Using a 64-bit int for a field a 16-bit integer can handle will use up 4x as much memory. And if you've got a ten or a hundred million structs of that type, then it really adds up.
For example size_t is 64-bit on my x86_64 system. But modern x86_64 systems can only use 48-bit address spaces, so a 64-bit sized object can't even be addressed! Even worse is my cpu and motherboard have a 32gb maximum of RAM (for an effective 35-bit physical addressing limitation). And size_t is supposed to be able to store the size of any object in memory, but on this platform it stores things that won't fit in memory. So most of these 64-bits are wasted on modern systems. For files you can use off_t if you're on POSIX, but the C standard doesn't say anything about requiring size_t to be able to store any filesize in a filesystem.
Just using size_t is not enough to make your code work correctly on 64-bit size quantities either. For example, you make your strlen implementation return size_t, and you use size_t everywhere you do anything with a string. But can your application really handle strings that are bigger than the system RAM , or the hardware address space? Are your algorithms even efficient enough to handle the 4,294,967,295 byte maximum string size for a 32-bit system?
The effort to get your program to work efficiently on >32-bit quanities is often much harder than just using size_t instead of int.
So to me, when I see 64-bit size_ts being used everywhere in code that won't actually be able to handle working with >32-bit quantities, it just feels a little useless. Of course this is really more a complaint about how big size_t is on the x86_64 platform than it is a complaint about the idea of size_t in general. If only we had 48-bit size_ts (24-bit would be handy too!)
If you wrote an application that's as efficient as possible without any wasted bits in the size_t type, it then only works on your machine.
If I wanted to run such an application on my supercomputer with 2TB of RAM (such machines exist), I would then have to recompile for a 41-bit size_t.
We use machine-neutral (but architecture-specific) size_t for these kinds of things explicitly to avoid recompiling on different machines that are instances of the same platform.
Said another way, binary distributions could not exist if everything was made efficient for the underlying hardware. It would stink to have to recompile the world after upgrading RAM.
I'd rather have a few bits of wasted space (which are typically lost anyway due to struct packing) than lose intra-platform comparability.
A 48-bit size_t like I suggested could address up to 256TB of RAM. All modern x86_64 cpus are limited to 48-bits of address space, you're not losing any portability here.
Also consider my strlen example. Say you compute strlen by iterating through the whole string until you find a 0, then you return a size_t for the number of bytes you iterated through. That operation is O(n) in the length of the string. If you were to use my strlen function on a string whose length is greater than would fit into a 32-bit integer, say a 1TB string or something, then the function would take so long to compute it that it would be useless. To be efficient, you'd probably have to redesign your program to do some special things to handle 1TB strings, maybe some special algorithms, or some kind of indexing. Returning a 64-bit integer type does not mean that the function can actually handle working with 64-bit sized quantities. So if you have a lot of datastructures where you keep string length, why store them as a 64-bit size_t when your application would be completely unable to handle strings of that size without keeling over?
Of course you wouldn't really use a 48-bit size_t, because x86_64 cpus don't work well with 48-bit quantities.
Even a hundred million integers still adds up to only 300 MB extra for 64- vs. 16-bit. On any reasonable modern server, laptop, or desktop, that kind of memory usage probably will be the least of your worries, all the more if you really have an application that needs to hold hundreds of millions of ints in memmory at the same time.
And if you are programming for a very specific embedded or otherwise constrained system, then you anyways want full control over the exact sizes of your types, as discussed elsewhere here.
Is this "wasting resources", as you say? Probably yes. Is it worth the extra development effort to fine-tune that on modern machines? Probably not - and it might even be premature optimization. (Yes - I agree there are corner case where it indeed will make sense, but those are the exception, not the norm.)
I'm not sure where you are getting the 300MB figure from. ((64-16)/8)x10^8/2^20 gives me 572.2MB of wasted space. What's even more interesting is looking at the percentage of wasted space. (((64-16)/8)x10^8/2^20)/(((64)/8)x10^8/2^20) means a whopping 75% of the memory use of our program is completely useless wasted space.
Using more space than you need to will also impact performance. First there's the cache issues. A 64KB L1 cache can fit 32768 16-bit integers, but only 8192 64-bit integers. Other cache layers will also fit less 64-bit than 16-bit integers in them, causing 4x more hits to the slow RAM backing store. Hitting RAM is very slow in comparison with cpu operations, so this will make your program a lot slower.
There's also the computational speed issues. Lets say your problem can be implemented using the AVX/AVX2 instructions. These registers can compute multiple results at once, in parallel. The AVX registers are 256 bits, which means they can operate on 16 16-bit integers at once. In comparison, they can only work on 4 64-bit integers at once. So there's another potential for 4x improvement, although the cache problems are probably going to be a bigger issue in practice.
Sorry for the 400-100=300MB mix-up, you are completely right, it should have been 800-200=600MB.
If you need to crunch all those numbers at once, then yes, your cache will become your bottleneck. But if you were, say, keeping the count on a 100 million things, you most of the time will not be worried about that or your RAM usage, as most counts typically tend to follow a power-law distribution. Therefore, contention to make sure you are tracking every count will probably become a far worse problem for your app than your ability to shuffle data to/from the CPUs. Only in the corner case where you crunch a matrix or vectors or numbers of that size at once, you will start to get worried. But as said, I think the tensor-crunching use-case is the exception, not the norm.
One example is that if you don't need a specific size, using the loosely defined types may be faster. For example, if you specify int32_t, but your program compiles for a 16-bit platform, then instructions operating on this variable may need to run twice, since the registers can only fit 16 bits of data.
And the other way around: If you specify int8_t, but your program runs on a 64-bit platform which only can address memory in 64 bit chunks, then you might waste 32-bits of memory for this variable, and addressing it can be slower since the program has to find the 8 interesting bits in a 64-bit register, and discard the rest of the bits so they don't affect calculations.
The only reason to use specific sizes IMO is if you work with loads of data and need to pack it efficiently, or work with bit-specified protocols like TCP packets or files. If you need _at least_ 16 bit of data for example, then you can use the int_least_16_t, which may be 64-bit or 16-bit depending on the architecture.
For a university project I programmed for an Arduino Uno (ATmega328P). Debugging performance problems, we found out that a switch (implemented with a bunch of if/else ifs) with about 12 branches used ca. 200 clock cycles just checking the 12 conditions.
Turns out the ATmega328P has only 8bit registers and the ALU operates on 8bit values. We were using 32-bit datatypes in the conditions, which the ATmega328P loaded and compared byte per byte, each comparison (incl. jump etc) cost us something like 16 cycles.
So yeah, the size of the datatypes definitely mattered for us :)
edit: as an addition, we needed quite a while to figure out (and were quite surprised when we did) that ints are 16bit and longs 32bit on this platform/compiler. i guess this comes down to the general question of explicit vs implicit, and i usually do prefer the former.
Looks like you haven't worked on embedded systems. We need to be very careful with sizes of ints here. Not just because we run the risk of overflows, but also because when we create code that may have to be ported from one architecture to another, we want to minimize re-work.
> Notice how smooth the 32-bit to 64-bit transition went?
There are many reasons for that. For most PC work, a 32-bit int is more than large enough. Going to 64-bit should not have affected that at all. When you're working with embedded systems, you're often working at sizes that are the bare minimum that you can live with. You might also get to work on architectures where char, short, and int are all 32-bit wide. Assume something, and the communications protocol stops working. Moreover, the PC architecture itself supported coexistence of 32-bit and 64-bit executables.
> Looks like you haven't worked on embedded systems.
I've deleted a defensive technical response to point out that these dismissive assumptions (usually unjustified) are pretty prevalent on HN, and I don't think it promotes level-headed discussion. It reads like "Let me discredit a stranger's background that I don't know, and then argue my opposing view." It creates a defensive mindset off the bat.
Edit: I mean, your points are valid. We won't agree on them, but I don't like the assumption that we won't agree because I'm ignorant to them.
You shouldn't have worried: his points are not valid at all. Using fixed-width types across various embedded (as in "small and exotic") platforms is a nightmare for portability. There is no reason not to use standard integer types, providing you understood their definition. Use fixed-width types only when absolutely really needed; don't use standard types when you need fixed-width width, assuming they will be an exact size, and everything will be fine.
I work with embedded systems daily, writing code that is expected to work on multiple different architectures (e.g. systems where char is 8-bit, 16-bit, 24-bit or 32-bit, systems without floating point units, systems with SIMD instructions, systems without, etc). In doing this, I have found that fixed width types make the job harder, and any code that uses fixed width types is generally more difficult to work with.
You have an int16_t? How do you port it to a machine without a 16-bit type? What would happen if you replaced it with a 32-bit type? If you are relying on wraparound, then your code is already undefined, so some compiler is likely going to mess with you anyway.
The stuff about communicating with the outside world is quite different. At this point you have protocols, and it depends on how a protocol is written, and how you are able to actually interact with the world to fulfil that protocol. If the protocol is an ABI, then generally you are fine, because the compiler too will match the ABI. If it is a file format, then you need to know how your file io functions work (do they just write the bottom 8-bits of a char, or do they blat the whole thing across?).
> How do you port it to a machine without a 16-bit type?
The problem is made more severe when you use "int" and let the compiler decide the size of the variable. I guess I don't know what you're advocating.
> The stuff about communicating with the outside world is quite different.
I meant communicating as a throwaway example. It's a valid example, but really, everything gets affected. For example, a CAN identifier is 29 bits. What generic "int" would you use when you want to fill in this identifier? And no, I work at a level that's lower than an ABI. So, no OS, no file IO, etc.
If you use "int", you know you have at least 16-bits to play with. Take some correct code that works with int16_t, and replace the "int16_t" with plain "int". What breaks?
1. If you are dumping it out to file or similar (i.e. assuming the layout in memory) then your code is no longer portable between different endian machines.
2. If you are relying on overflow behaviour, then your code is broken already (this doesn't apply to unsigned types, but using minimum width types and explicit mods/masks is likely going to be clearer code anyway).
3. The one place where you can't replace int16_t with int is in the case where you make use of the assumption that an int16_t can represent -0x8000 and you then port to a machine that doesn't use two's complement and int is 16-bits... I'm not aware of any such machine, but that is about the extent of it.
What type to use for a 29-bit CAN identifier? A "long" will do just fine. It is guaranteed to be at least 32-bits wide, and also to exist. Would you suggest a uint29_t?
> Take some correct code that works with int16_t, and replace the "int16_t" with plain "int". What breaks?
Memory. Defaulting to 16-bits when an 8-bit variable will do can be incredibly wasteful on an 8-bit µC. Keep in mind that we are not just talking about one variable in isolation. We are talking about all the integers we pass between functions. We are talking about code space, data space, and stack space. There are compilers that can optimize their arithmetic operations to 8-bit registers when they can be sure that that's all their operands need.
> Would you suggest a uint29_t?
If you are aware of a machine that provides that, yes! Otherwise, I'd suggest uint32_t. Yes, "long" is guranteed to be 32-bits wide, but it can also be 64-bits wide. I would not recommend defaulting to "long", as that could be wasteful. Here's an interesting discussion about the meaning of "long" and "long long" for their compiler: https://www.dsprelated.com/showthread/comp.dsp/42108-1.php. I see this discussion as a failure of the C standard.
I would much rather the compiler provided int64_t or int40_t or whatever else they can, that is not inefficient.
> Memory. Defaulting to 16-bits when an 8-bit variable will do can be incredibly wasteful on an 8-bit µC.
I hope it doesn't seem like I'm just moving the goalposts, but the obvious answer here is to use a type which is at least 8-bits wide if you only need 8-bits. Such a type exists, and is called "char" (with the appropriate signedness modifiers).
This whole discussion has been about writing portable code. Using a "char" here is going to work perfectly on your 8-bit uC. It is also going to work perfectly on your SHARC chip with 32-bit chars.
> If you are aware of a machine that provides that, yes! Otherwise, I'd suggest uint32_t.
I think we are talking about different things. I am talking about writing portable code. You are talking about writing code that only works on a particular machine (or the class of machines that have a 29-bit int).
I know there is a place for that code, and once you are writing code where you actually need to make use of knowledge about the machine, then I'm all for using types that make this clear. However, typically this code only lives at the edges of the system, and the actual "computation" can be written in portable, efficient, readable code without a great deal of trouble.
I'm completely aware that long can be 64-bits on some machines. If you care so much about the wasted memory, then uint_least32_t should make you happy - if you are happy with the C99 dependency (which can limit portability, though things are getting better), then I don't see how you can see this as being worse than the fixed width uint32_t.
Personally, I have found that while it sounds nice in theory, the domains I've been working in have meant that the memory doesn't make much difference (if it is just sitting on the stack or being passed between functions, then on 64-bit machines, there is typically no differences, as calling conventions tend to pad things out). It is only when you have an array of these in memory that it might start to matter, and here it tends not to matter a great deal - you are typically now optimizing an algorithm for an amd64 machine (read - it has plenty of memory) and the algorithm typically doesn't actually use a lot of it (since it needs to run on tiny micros too).
Anyway, I think we probably agree for code which isn't supposed to be portable. Potentially just that I tend to work more on the code that is supposed to be portable, and you work on the code at the edges?
I did not expect you'd say "char", because it's a fixed size numeric type. Isn't that what you have been arguing against so far? Anyway, I'd get yelled at if I checked in code where I do math on char. It has to be on uint8 or int8 (or other widths). Not that I'd ever check in such code, after all, I agree with the company policy there. But "char" is just a part of the problem. We have the same issue with all kinds of bid-widths. When I read PC code with such names, I don't worry about their sizes, and just assume i386 conventions. When I read embedded C code with datatypes of undeclared widths, I get quite annoyed. Thankfully, I can't recall the last time I read such code.
> then uint_least32_t should make you happy
The thing is, I assume all uintX types to really mean at least X bits wide. I am aware that the compiler is free to use its discretion for allocating memory. I worry about finer details only when I need to.
char isn't fixed width. It is guaranteed to be at least 8-bits wide, but on a SHARC chip it is 32-bits wide.
> The thing is, I assume all uintX types to really mean at least X bits wide. I am aware that the compiler is free to use its discretion for allocating memory. I worry about finer details only when I need to.
That is explicitly not what they are. The standard is very clear that they must be exactly X bits wide, and if there is no type that is exactly X bits wide (and 2s compliment) then the types must not exist.
But, if this is how you are treating these types, then aren't you basically just saying that in your mind you aren't using fixed width types? If everywhere you write uintX_t you are thinking "a type that is at least X bits wide", then you have to do all the reasoning as if the type is of an unknown width. It sounds like you have the worst of both worlds - the lack of portability of fixed width types combined with the slipperiness of minimum width types.
This miscommunication was my fault. We don't use uint8_t or other such things, we use uint8, uint16, etc. These are typedefed to compiler-specific types that guarantee a required minimum size. This is also the reason we are not allowed to do arithmetic on chars. The types we use may eventually get typedefed to a char, but at least when I see someone else's function that requires a char as opposed to uint8, I know he's dealing with actual characters.
> then you have to do all the reasoning as if the type is of an unknown width
Well, no. If we are writing portable math routines for such types, it gives us nice ranges for which the code should behave with correct answers or should throw red flags during unit testing.
If that is the case, then it sounds like this whole thread has been miscommunication :-)
The very start of this was talking about how fixed width types are not particularly helpful. I was under the impression that you were making the case for fixed width types, but it sounds like we've all been on the same side the whole time.
> Well, no. If we are writing portable math routines for such types, it gives us nice ranges for which the code should behave with correct answers or should throw red flags during unit testing.
Exactly - if you are writing portable maths routines for your int16 type (i.e. a type which is at least 16-bits wide), then you do the analysis as if the type is exactly 16-bits wide. However, it might be that the type is actually 32-bits wide, and that doesn't change things.
> If that is the case, then it sounds like this whole thread has been miscommunication :-)
I am not sure about that. The original poster wrote 'I have never once thought: "You know what, I'd like to make a decision as to the width of an integer every time I declare a function."' I disagreed with that, and that's where the whole discussion started off. And I still advocate using the smallest (explicitly sized) type you could get away with. Weren't you advocating using bare C types (unsigned char, short, long, etc.) for your variables all along?
> I am not sure about that. The original poster wrote 'I have never once thought: "You know what, I'd like to make a decision as to the width of an integer every time I declare a function."' I disagreed with that, and that's where the whole discussion started off.
But your solution also doesn't involve deciding on the width of an integer - your solution is to use integers that have a minimum width. By doing that, you are saying that you don't care about the width of an integer, as long as it is at least N bits, which is exactly the right thing to do for portable code.
> And I still advocate using the smallest (explicitly sized) type you could get away with.
This is where I'm a little confused, if we go back:
>> Would you suggest a uint29_t?
> If you are aware of a machine that provides that, yes! Otherwise, I'd suggest uint32_t. Yes,
If I understand what you are saying, then you are saying that at your work you use "intN" to mean "an integer of at least N bits". Every machine that has a C compiler for it has such a type - one possible type that could satisfy this is long. If you have a C99 compiler, then another type is int_least32_t.
So from this, it sounded like you were talking about fixed width types.
> Weren't you advocating using bare C types (unsigned char, short, long, etc.) for your variables all along?
I've been advocating for non-fixed width types all along. How their names are spelt is a secondary thing and one where lots of people have different opinions which makes it hard to be black and white. However, the meanings of the words is what I care about, and so whenever I gave examples I would tend towards the C90 types because they have the right semantics (they are minimum width types of sensible sizes) and they are portable everywhere (how long has it taken for MS to have stdint.h in their C compiler?) and have definitions that can be expected to be known by any C programmer (since their definitions are part of the language definition of C).
Personally, I think that the int_leastN_t types are ugly, so I avoid them (and it sounds like your company does too, since it renames them to less ugly (but I think more confusing) names). Although they are ugly, I do think their semantics are good enough to use for portable code. I wouldn't say the same about the fixed width types (since they may not exist) or the "fast" types (they aren't good for portability because their size is ambiguous across toolchains targeting the same ABI - compare the GNU definitions to the MSVS definitions). This paragraph is just my opinions, and I expect others to have differing opinions, and I don't really care. What I will hold to though is that fixed width types are bad for portability, and I think you agree with that.
> But your solution also doesn't involve deciding on the width of an integer - your solution is to use integers that have a minimum width.
We declare the minimum width for all variables. We don't use "int" in for loops and hope for the best. I wouldn't go as far as to say "we don't care". We don't worry too much for code that's mean to be portable. We obviously care in non-portable code. But the important point here is that we use the same conventions whether we are writing portable code or not.
> So from this, it sounded like you were talking about fixed width types.
C99 conventions are too radically new to be used across our entire codebase. Uglyness has nothing to do with it. The only way to ensure fixed widths without int_leastX conventions is to be intimately familiar with the platform while we are declaring our typedefs, or creating makefiles for our projects. And to that extent, we use only fixed with types.
> The only way to ensure fixed widths without int_leastX conventions is to be intimately familiar with the platform while we are declaring our typedefs, or creating makefiles for our projects..
int_leastX_t doesn't give you fixed width types - that's the entire point of them. They are not fixed width.
> And to that extent, we use only fixed with types.
I don't get it. You said this:
> We don't use uint8_t or other such things, we use uint8, uint16, etc. These are typedefed to compiler-specific types that guarantee a required minimum size.
You are using uint_least8_t, uint_least16_t, just with different names. As I've said - I don't really care what things are called.
Semantically, there is very little that is different between "uint_least16_t" and "unsigned short".
> As I've said - I don't really care what things are called.
I think this is where our disagreement lies. To me, this counts as "make a decision as to the width of an integer every time I declare a function". We are deciding the lower bound, but still, it's a decision on that matter.
> Semantically, there is very little that is different between "uint_least16_t" and "unsigned short".
But there could be a difference between "uint_least16_t" and "unsigned int" or "unsigned short". And there could be a difference between "uint_least32_t" and "unsigned int" or "unsigned long".
The comment that started it all was very clear that "make a decision as to the width of an integer" was referring to deciding on the exact width. In the same sentence that quote is taken from, it makes it clear that it is talking about fixed sized numerics. Deciding the width of an integer is not the same as deciding the minimum width of an integer (otherwise, the original comment makes no sense).
In the next paragraph it contrasts this with the minimum width approach ("word-sized ints and size_t's").
Deciding on the minimum width of a data type is a decision that needs to be made, and is made regardless of how you spell your type name. If you use an "int", you have specified that it must be at least 16-bits wide. Just because the name of the type doesn't have the number 16 in it doesn't mean that the same decision isn't being made.
> If you use an "int", you have specified that it must be at least 16-bits wide.
Few people write code for PCs with the assumption that an int could be 16-bits wide. They all assume that it's going to be 32-bits at least. A generic "int" encourages bad assumptions. Then come beasts like "long" or "long long", and people must pull out compiler manuals, talk on support forums, and so on. A developer is obliged to do that, but system integrators and cross-team reviewers are in for a world of pain if they have to keep reading across code bases across platforms like that. It's not an unrealistic scenario. A modern airplane or electric train is bound to have tens of processor families littered throughout the vehicle.
The original commenter also wrote about Swift going backwards, with its UInt8, Int32, etc. I'm not a Swift programer, but the documents gave me the impression that these are, again, syntactic widths, not physically allocated, i.e. semantic widths. All this shows that the original commenter had a problem with specifing any indication of the size of integers. His emphasis on "word-sized" clearly indicates that he wants to use "int" everywhere, and hopes the compiler will take care of the rest. This is what Swift does too with its "Int".
Hah, me too. I wonder if you will ever see this :-)
> Few people write code for PCs with the assumption that an int could be 16-bits wide. They all assume that it's going to be 32-bits at least. A generic "int" encourages bad assumptions.
If people are just writing code for PCs, then it is no longer portable code that is being talked about.
But when writing portable code, which you do and I do, being overly specific with type widths is bad.
The C data types all have very specific widths, and it is just a fact that int is at least 16-bits wide.
I'm not a swift person either, but based on reading this:
It sounds like the types with numbers in their name are most certainly fixed width types. The original commenter didn't like the emphasis on such fixed widths, and I think the swift language designers agree - their section on the "Int" type (which is at least 32-bits wide, but may be 64-bits wide) says that this is the type that people should be using generally.
"Int" in swift is quite similar to "int" in C, just the C version is less constrained (may be any width >= 16 bits). But the principle is the same - for better portable code, enforcing exact widths of types is not a good thing.
These are good, but some of our codebases predate these conventions. We don't even use // for commenting, because the code could find its way to a compiler that can't handle C99. Secondly, the overriding principle here is that the massive company, despite dealing with a huge variety of architectures, has to speak one language.
If I were to start a company today in a similar domain, and not have a legacy to deal with, I would mandate uint_{least,fast}{8,16,32,64}_t and such.
How does it make portability worse if you write your code in a way that doesn't assume anything about the size of your ints? If you don't know that your ints are 16 or 32 or 27 bits long you have to be more careful with checking for overflows, but if the code does that I don't see what problems you'll have when you port it to a platform with a different int size.
You can only check for overflows if you err in favour of larger architectures and declare all integers as "long int", and then check for overflows. Even the 8-bit compilers I have seen will allocate 32-bits for long int. And doing that will let the code remain compatible on smaller architectures, but it will hammer your CPU and memory usage. It's not a pleasant situation when you're struggling to save bytes.
You can't ever check for (signed) overflows in C - you can only prevent them. I'm not sure if I'm drawing a distinction you weren't intending to make (if so, then my apologies).
When you are adding signed integers in C, you need to prove that doing so can't overflow - knowing the exact width doesn't really help you here - just prove it for the narrowest possible width and you have proved it for all of them. In practice this isn't hard because either it is trivial to prove (e.g. incrementing a variable which is known to be small, or just plain modifying the code to put in explicit saturation/other logic for the cases where things get hairy)
Of course an 8-bit compiler is going to allocate 32-bits for a long int - the language requires that a long is at least 32-bits wide. If you had a compiler that didn't do this, then it isn't a C compiler.
Ok, now lets suppose you have done the job of thinking about it, and you are now up to actually acting on your thinking. What do you write?
You want:
- A single code base
- The code be compiled on the "I don't have a 16-bit type" machine, as well as on your amd64 machine and x86 machine. Having it work on a 24-bit machine might be nice too if you want bonus points.
- You don't want to waste memory on the crappy embedded chips where memory is scarce
As far as I'm aware, you can either:
1) Use fixed 32-bit types everywhere (you lose both 24-bit compatibility and waste memory... possibly not the end of the world)
2) Use non-fixed width types and tick all the boxes.
You don't really gain anything by using the 32-bit types (who cares if you know exactly how big it is - you only needed 16-bits anyway), but you do lose things.
Honestly, there's no way I'd use an MCU that doesn't support 16-bit integer types. I've never even heard of such a thing. It's really a property of the compiler, not the hardware. I use 32-bit ints and floats all the time on 8-bit 8051s and AVRs, both of which are among the most primitive microcontrollers in use today. They are typedefed to S32 and F32 precisely because I want a unified code base that compiles and works the same under Windows.
If I know I'll never need more than 8 or 16 bits, I'll use an S8 or S16, but I certainly won't hesitate to use an S32 or even an S64 on these chips if needed. Using "int" is asking for some serious grief in the embedded world, which is why it's almost never done anymore.
I've also never heard of a 24-bit processor -- unless you're thinking of some antediluvian bit-slice processor, or something like that? -- but I'm sure the same code will compile and run on it just fine, if and when I ever encounter one. If not, that's not my problem. It can be blamed jointly on the administration and faculty at the school that handed a diploma to the programmer who wrote the C compiler for it.
At the end of the day, if I'm that worried about RAM and resource usage, or if I'm building 100,000,000 of something that has to use the smallest, cheapest controller possible, I'm probably writing in assembly anyway.
> Honestly, there's no way I'd use an MCU that doesn't support 16-bit integer types. I've never even heard of such a thing.
SHARC chips don't have any integer types smaller than 32-bits. Why would they, when they can't address any smaller than 32-bits?
> I've also never heard of a 24-bit processor -- unless you're thinking of some antediluvian bit-slice processor, or something like that?
56k chips have 24-bit chars. Not so common any more, but they are still kind of kicking around.
> but I'm sure the same code will compile and run on it just fine, if and when I ever encounter one.
My point is that it won't. If you use types which must be exactly 16-bits wide, then you can't compile that code for machines that don't have a 16-bit wide register.
SHARC chips don't have any integer types smaller than 32-bits. 56k chips have 24-bit chars. Not so common any more, but they are still kind of kicking around.
Well, those chips are DSPs, which is kind of a different horse.
Why would they, when they can't address any smaller than 32-bits?
How, for instance, would you write a TCP/IP stack, if you can't do something like this?
There are a couple of right answers and a wrong answer. The wrong answer is, "You can't."
The right answer is, "Go ahead and declare the 8-bit or 16-bit integer. The compiler will do the necessary shifting and masking to simulate unaligned reads and writes, including those that straddle two word boundaries."
Another right answer might be, "You wouldn't," in the case of a dedicated DSP chip.
What I'm saying is that I've never seen a general-purpose platform where that wasn't supported. It might not be fast, of course, but embedded work is often more about interoperability than speed. (DSP being an obvious exception; I'm lucky enough not to have had to work with any old-school DSP coprocessors.)
> Well, those chips are DSPs, which is kind of a different horse.
This whole discussion has been about portable code. The point is that no matter what the horse is, you can still write code that works (and works efficiently).
If you are writing a portable TCP/IP stack, then having a struct like you described doesn't help you one bit. It looks like you are setting things up so you can directly blat the bits into it (or avoid the copy and interpret some bits using this struct, whatever). If you do this, then you don't have portable code because of endianness.
To make this code portable, you separate the in memory representation from the "on-wire" representation. The memory representation doesn't need compiler specific things like PACK_STRUCT_FIELD would expand to, it only needs types that are at least 16 or 32 bits wide. The code that fills in the struct needs to operate byte at a time and then fill in the struct members appropriately.
Note that because byte in C isn't necessarily 8-bits wide, you have to be a little bit careful with the "byte-at-a-time" code, but it is very possible to write it so it works on any whacky machine that has a C compiler - including DSPs... they really aren't that special.
I think we have to decide whether we're talking about specialized and/or archaic technological artifacts or modern general-purpose processors. My interest is limited to the latter, all of which have C compilers that support 1, 2, 4, and 8-element wide integers composed of 8-bit bytes. Confining the notion of "portability" to these platforms is fine with me. I'll definitely have to yield the point with respect to older DSP chips; I've never written code for a 56K or a SHARC, and never will.
Endianness is generally hidden with wrapper macros or functions, and I suppose you could do the same to pretend you're accessing integers of unsupported sizes. But you can rest assured that losing track of the sizes of the underlying types is going to bite you in the ass. If I can't use typedefs, I'll want to use some semantically-equivalent notational cues.
You seem to think that it is difficult to write programs with this level of portability. It really isn't, you just program to the language spec, and everything else falls out. You don't need to come up with arbitrary rules about which processors are too specialized or archaic. The alternative you provide isn't obviously better than just writing portable code, but from what I can tell, you have never actually just tried the alternative.
I can assure you that knowing that exact width of types is rarely useful when writing super portable code for platforms that are different enough that you are happy to decide that you wouldn't even bother coding for them. So I'm puzzled by why it suddenly becomes helpful when you decide you are only going to look at a subset of those platforms.
Give an example of how it is going to bite me in the ass and then the discussion can be less emotional and actually based on facts.
When browsing code posted or advertised on various forums, I noticed a dramatic increase in the use of uint8_t and such. In places where they really were not useful, and a basic standard type would suffice, and be more portable, and be as fast or faster. I have seen intN_t used as booleans! I have seen uintN_t types used and signed values put in them! (which meant the author didn't understand yet the basis of types but was somehow taught it was a good practice to use exact-width types).
I also suspect the influence of languages with fanboys, as Rust. I had a 'fight' with these, they really didn't grasp the concept of having a type defined as the natural type of the platform and as having at least (and possibly at most) N bits. I gave up. As long as they have fun with their thing and do not try to spread "the good word" and influence other languages, that would be fine.
Let's hope we won't see the influence of languages whose designers had a gripe against unsigned integers...
Overflow is often something kept in mind in careful C programming. It's something you don't have to worry about nearly as much in Python (promotes to bignum) or Javascript (double precision float).
That 32 to 64 bit transition? Windows couldn't change the size of "long" because it would break too much windows software that assumed long was 4 bytes. Thus "long long".
The Linux ecosystem fared a bit better because of the greater variety of Unix systems and cross-platform C that ran on them. It was a time of fragmentation and incompatibility, but also a time of posix and attempting to define minimal standards in a portable and flexible way. Windows of course didn't have to care about any of that.
I actually fixed some old-ish classic open source networking software for 64-bit mips, around 2010. You see, it worked OK on x86_64, which is little endian, because in that case the low-order 4 bytes (an ipv4 address) end up in the same place relative to the start address, whether it's a 4-byte or 8-byte type. But for big-endian 64-bit mips, that doesn't work out, the low-order 4 bytes are on the unlucky end of the 8-byte type.
There's also the awkwardly named "ntohl()" and "htonl()" standard library functions that were named in the distant past when it was assumed a long would always be 4 bytes. As soon as that wasn't true they changed to work with uint32_t, as they always should have.
Sure, I'll use a plain int whenever I'm returning a status or error code, and whenever iterating over some trivial range. I'll use a "char[]" or similar for a string of ascii. But for many things, like "count of bytes transferred", or "seconds until timeout", or any meaningful quantity value or anything stored in a struct (meaningful and non-transient), I'll pick the appropriate bit-size, to make the code clear. It's good style because you have to know the limits of the type whenever you do anything non-trivial with it.
These days, there's always a value here or there where 4 billion will sometimes not be enough: bytes, microseconds. I need 64 bits. I could write "long long", but why not write what I really mean: "uint64_t" aka give me 64 bits.
I've heard Microsoft had a great story there, but I was drawing from experiences with the open source Unix-likes. Seems like it was across the board until you get too far into the weeds.
I was working in console games during the 32->64 transition, and watched our game engine code suddenly go crazy with data size specificity.
It was a little weird. I don't know whether it's good or bad overall, but there was definitely a reason for it, the need was clear. Suddenly you didn't know how big an int was, and that matters a lot when you're sending ints over the network and stuffing ints into save games, and it matters a lot when you use an int expecting 64 bits, but you're still compiling code for a Wii and only get 32. It matters for multiply overflows and negative numbers and bit flags and for a bunch of things besides deciding how high your for loop will go or what your largest return value is.
As you say, very few people want to decide what size to use when declaring functions. But everyone wants to know exactly what to expect. If you do declare something and don't actually know what size it is, you have a real problem that can and will lead to crashes.
> Notice how smooth the 32-bit to 64-bit transition went?
I'm not certain, but is it possible it went smooth because everyone working in C/C++ started paying attention to their data sizes?
As a user of Ruby/JS/Python, it would be worth checking what happened to the source code for the interpreters of those languages. When you program in them, yes, you're buffered from a lot of data size issues, but the internals of the languages themselves may be just as fixed-size centric as anything.
> I'm not certain, but is it possible it went smooth because everyone working in C/C++ started paying attention to their data sizes?
We don't have to guess. A huge amount of the code that made this transition was strewn across the internet in CVS and SVN repos, and now in preserved git history. From compilers and libcs to kernels to linkers, loaders and interpreters. They make concessions where they had to (check out glibc), but stay with traditional K&R-style types when possible. The "i32 calculate_age()" thing is way newer than that.
So you really have two arguments in there. For pointers, I'm sure no one will disagree that size_t is a good thing. Which is why any modern low-level language has these built-in.
As for fixed size types, what are the alternatives? If you wanted a dynamically sized int, you'd either need to reserve all the possible space you'd need (in which case why not just use a int64_t), or you'd need some kind of heap-allocated integer that can resize itself. We can't hide the fact that the machine itself uses fixed-size ints, unless we are willing to live with a leaky abstraction.
I'm not very experienced with low-level programming, but doesn't the single-responsibility principle suggest that a purpose-built module should handle packing/unpacking of memory objects even if you are writing this module by hand? I'm sure there are cases where even the overhead of calling and running these procedures is too much, but in many cases no further optimization would be needed.
There's one more gotcha: security concerns. Writing code that expects long to be 64 bit might be dangerous as it could overflow and create a security bug on a 32 bit machine.
I find that I do want to know the sizes of things fairly often.
You can write C in any language. You can also write any language in C.
The fact that it's possible does not imply that it's a good idea. Some of the macros here are in common use (e.g., countof, although often with other names); some will make experienced C developers say "what's this? Oh, you mean <insert expansion here>; why did you use a weird macro?"; and some, like the redefinitions of case and default are actively hostile and are guaranteed to result in bugs when exposed to experienced C developers.
Double pointers are extremely common and triple pointers show up now and then.
Also '__attribute__ cleanup' as far as I know only works with automatic variables that live on the stack. Mucking about with code execution and automatic variables as the stack unwinds is not a style I would like to see in general purpose C coding. Its the one type of automatic memory management that you get for free and can't really screw up.
The rest of the macro loops and so forth seem fairly standard. Its good to experiment and explore what you can do with the compiler and the preprocessor.
To get the address of an object you use the unary 'address of' operator '&'.
His macro was #define address *
The * is a pointer. You can declare a variable of type pointer to T with as many indirections as you want. int * * * * * * foo is fine. But you also use it to deference your object. bob = * foo;
You can't practically have a different name for each level of indirection to what ever the IS0 standard/Compiler limit is for * .
Also you end up with nasty indirections like this.
**(*((struct foo*)(*bob.x))).y
Its nasty enough as it is without trying to name each pointer indirection.
Very cool! I wasn't aware of the cleanup attribute, can't wait to try it out:
The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; it may not be applied to parameters or variables with static storage duration. The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored.
Yeah, that is probably what bothers me the most about this. I write C code almost everyday. I can't remember the last time I forgot a "break", but I do use fallthrough when it simplifies the logic. It also breaks even the simple case of having multiple labels for the same code.
Also, the for-loop overrides are ugly and pointless, IMO.
I defined the same macros but with different names, back when I still coded in C. I still think it's a good idea which tends to get reactions like "Ew! Wash your hands."
I'm only guessing, but it looks like a C header file with a set of macros that produces neat-looking C code. Not sure what's the deal with the 22nd century thing.
It's a clever joke and rather cynical deconstruction of our industry. The author suggests that in 100 years the C language will not only still exist and be widely used, it will have advanced almost nothing at all due to the industry's inability to solve real problems and instead focus on virtue signalling languages such as Clojure and Ruby. The author hammers home his point by demonstrating how a simple header file can at once render the code unreadable, while adding absolutely no value at all. I think he or she is specifically talking about Silicon Valley here, though of course since the joke is expressed in code, one can only guess.
Seriously, I'd hope in 2100 we'd have pushed our systems languages beyond facile keyword redefinition. Maybe even towards concurrency and an actor model. Like I explored in my unfinished (sorry, shit happened) book "Scalable C".
The included header file uses the macro system, among other things, to bring C99 up-to-date with more modern programming techniques, using what we've learnt in the past decades since C was conceived.
In fairness however the short POD type names are purely cosmetic.
Maybe the author wanted this to be C99 compliant but a cool thing is that by using clang we even have lambdas. It's called blocks, the same as in Objective-C.
http://clang.llvm.org/docs/BlockLanguageSpec.html
I've been using them in all my new C code, since I'm stuck to clang anyway, and it's awesome.
...and no, they're not a good idea, its just a syntactic change.
If you want a 'better-C', language-wise, Rust is pretty much best current hope, tho that wont mature for another decade or so.
Really? I haven't had a chance to dig into it and a quick google-based look-see didn't pop up immediately obvious results, but I would have thought that two areas where one would want to use C would include specific bit manipulation and explicit memory management. Both of those violate the high-level principles that I thought Rust was supposed to be about. So, either I was wrong (which happens more and more these days), or Rust has a work-around, or it's not a good fit for that last bastion of C use-cases.
Sorry, you should do more research... :) Rust has great support for very low level operations. Maybe search for 'bitwise operators' or 'inline assembly'. Memory management is even more explicit and strict than C, since you also have to deal with lifetimes and borrowing.
The best thing is, though, since Rust is such a young language, you can change it! Imagine trying to add a new language feature to C, something like the ? operator currently being evaluated for addition in Rust. It would take years, if not decades, to get those things in C. Even now MSVC is just rolling out support for C99.
I can definitely see Rust taking over C's current place in the market over the next decade or two. Better memory safety, higher-level constructs to improve productivity, more cross-platform compatibility, and modern libraries to deal with things like Unicode are a huge draw.
Rust absolutely lets you work at the bit level, you can shift and mask just like in C.
> explicit memory management
It depends on what you mean by "explicit" here. Most Rust code has scope-based memory management, but you can also call malloc/free within an unsafe block.
No problem. To elaborate on _this_ slightly, today in stable Rust you can call libc's malloc/free directly. There's also the Rust alloc::heap module, but it's not yet stable, and so is only available on nightly Rust https://doc.rust-lang.org/alloc/heap/index.html
Cool. I didn't even find either of those on the roadmap a bit over year ago when I looked at it -- outside some speculative chat on forums/lists/tickets. It seems as if I need to give it a couple more months and give it a look-see again.
I use there defer for both gcc/clang, countof(arr) -> ARRAYSIZE(array)
In any case, yeah, going through gcc/clang internals, and through C11 standards gives people some ways to speed-up and clena-up their implementations a bit.
There will probably be C code running in 2100. That's only 84 years away. FORTRAN is now 60 years old and still going strong in scientific computing. (Partly because, in most other languages, multidimensional array support is worse.)
I suspect, though, that new successful languages will have automatic memory management, either GC/reference counting or Rust-type borrow checking. There's no reason for a new language with the lack of safety of C.
Dlang is actually quite nice and mature, although nowhere nearly as hyped as Rust, Go or Swift. You get a GC with the option of deactivating it and doing the memory management yourself, pointers and casts if you need them (you usually don't), an auto type, fast compiling times, among many other cool things.
On UNIX yes, it will never change, C is married to UNIX and no alternative will ever change that.
On other OSes, it really depends on how Apple, Google and Microsoft do their work of "our way or the highway" regarding pushing their safe alternatives.
When you consider how truly distant into the future -- in technological time frame -- 2100 is, you can wonder if Unix will even be around at that time. 2100 is double the lifetime of Unix from now.
Nassim Nicholas Taleb has an interesting point that for human creations the expected lifespan is the same as their previous life[1]. If a book is read 10 years after it's published, you can expect that it will be read for another 10 years.
So for C this would be 44 more years from now, but it gets another extra year for every year it survives, so it isn't far fetched to think C will be around in 2100.
I've commonly needed access to fixed size numerics, like when sending texture formats to the GPU, defining struct layout in file formats and network protocols, but I have never once thought: "You know what, I'd like to make a decision as to the width of an integer every time I declare a function."
"Just has to work" low level code was the norm during the 16-bit to 32-bit transition, so it was a fucking pain. Notice how smooth the 32-bit to 64-bit transition went? (and yes, it was smooth.) I credit that to high-level languages that don't care about this stuff, and people using better practices like generic word-sized ints and size_t's in lower level code. Keep that stuff on the borders of the application.
I've noticed a decent-sized emphasis on type size in both Crystal and Swift, two not-entirely braindead newer languages. I don't get it, it's a big step backward.