Unless you are adding TCP to your CDC6600 or PDP-8, bytes are 8 bits. Any suggestion to the contrary in standards is fantasy.
That said, since the next field is a uint16_t, you may as well stay in Rome and call them uint8_t. short really is variable on some live architectures.
(Ok, or some DSPs only address words larger than 8 bits, but there's no reason for your compiler not to pick up the slack.)
Also, TIL that __attribute__((packed)) assumes the struct can be malevolently aligned and for some architectures generates very large and slow code to handle that. If you must pack, also add the ",aligned(2)" or whatever you can get away with to mitigate this.
I tend to use int for loops (where I know there will always be less than 2^16 iterations!), return codes etc. purely because it allows the compiler to pick a fast word size everywhere (e.g. if somebody compiles your code for AVR, and your loops are all int32_t, your API returns int32_t, they're gonna have a worse time). Otherwise, I fully agree.
Maybe you could use int_fast16_t? (That means, pick the fastest signed integer type which is at least 16 bits.)
Of course, "int" is less typing than "int_fast16_t". But, int_fast16_t possibly makes it more obvious why you've chosen the type you did (i.e. you only need 16-bits, but aren't relying on any overflow so a bigger type can freely be substituted if that gives better performance.)
That's certainly a valid approach, but I've never actually seen those used in practice. I guess, like me, people are lazy and prefer typing just int, which is guaranteed to be at least 16 bits, and is generally the fastest int type (well, excepting some 8-bit archs.. :P).
That said, since the next field is a uint16_t, you may as well stay in Rome and call them uint8_t. short really is variable on some live architectures.
(Ok, or some DSPs only address words larger than 8 bits, but there's no reason for your compiler not to pick up the slack.)
Also, TIL that __attribute__((packed)) assumes the struct can be malevolently aligned and for some architectures generates very large and slow code to handle that. If you must pack, also add the ",aligned(2)" or whatever you can get away with to mitigate this.