Unit tests are not a goal, they are a tool. Striving for 100% test coverage is nonsense, not testing your software at all levels is bad. Middle ground and moderation are where it's at, not a black vs white choice. Just like every other tool you should understand it's strengths and weaknesses and you should apply it properly, not dogmatically or it will bite you.
I read this complementary one the other day, and one thing that is readily apparent to me is that a lot of people have a lot of different opinions about testing (move to system tests! more unit tests! regression tests!), but many are not asking the zillion dollar question:
What are you testing for?
This is critical because it basically gives you immediately what you should and should not test, and how. While mindless, dogmatic, metric oriented testing is a waste, testing with higher intent and purpose is extremely useful.
An example: test that something working on current vX also works on vA to vW, and when vZ is out, have the answer readily. Or that a biz feature fulfills the requirements. Or that someone not as well versed on intricate details of your piece of ownership will be confident in that piece still working after a simple fix when you’re on vacation. It can be one, some, but probably not all.
With that in mind, what to test, what doesn’t make sense to test, and what to test against becomes more clear: should I mock this? or should I run it against some staging environment? Should I perform (yikes? not!) manual testing?
The answers are highly dependent on the piece of code being tested.
Tests are here to help you answer a question, if you aren’t sure what the question is then your tests will miss the point.
I feel like a lot of unit testing is just another form of bikeshedding. It's easily understood, everyone can talk about it, and you can spend a lot of time on it with no clear goal but feel like you're getting something done.
100% coverage of what exactly? Tests that go through all your lines of code without testing any of the logic, is useless. If you want to be thorough, you need to do mutation testing, which is a system that tests the quality your unit tests by mutating your logic (changing a > for >=, a + for -, etc) and then expects at least one test to fail. If no test fails, that piece of logic wasn't tested.
Without that, it's entirely possible your high code coverage doesn't actually test anything meaningful. Also, this sort of logic is exactly the kind of stuff you want to unit test. All the standard plumbing boilerplate code is not something that needs to be unit tested. The logic does.
and it's not just that, for example branch coverage only says that you branched but not how, a complex branch might need more than one test to be fully, or decently, tested.
Right, essentially you have to be aware that you are testing the state space the code might end up in, which is very different from just hitting every line of code or every branch.
On that note it is a great tool during development to get a piece working without connecting it to the broader application. Test driven development gives a nice debugging context that is easier to work with. The code coverage and regression part comes as a nice bonus feature.
I’d also like to add that if you contribute code to an open source project it is extremely beneficial to have iron-clad unit tests. Since there is so many devs it would be easy for someone to accidentally break something you fixed already.
The benefit of 100% test coverage is that there are no more discussions on what to test and what not. When in doubt, test it. In larger groups of developers there are otherwise ongoing discussions if A needs to be tested or not. I have seen culture wars around this, from people who don't want to test and are in the eyes of others always testing not enough and vice versa. Especially with a diverse development force with different ages, seniority and cultural background.
It's often easier to just aim for 100% test coverage instead (with excluding some categories of files).
EDIT: I would not and did not start with 100% unit testing. But if there are ongoing culture wars and discussions didn't lead to a workable compromise, 100% test coverage worked for me and after some days test coverage was a non issue.
> just aim for 100% test coverage instead (with excluding some categories of files).
That's where the 'gaming' comes in.
The tests start just going through lines without hitting a single expect statement.
The ignore files start becoming battlegrounds in the PRs because people just exclude half the damn project.
We just have a simple rule... if you wrote code, you have to write coverage for it. If it breaks and your test doesn't catch the breakage, the bug fix goes back to you. Some people will ask "but what about what I'm working on now", you'll have to communicate that you feel your previous work was far more important.
> the bug fix goes back to you. Some people will ask "but what about what I'm working on now", you'll have to communicate that you feel your previous work was far more important.
this feels punitive, especially in the eyes of management. unless you're in a safety critical area where fully testing every code path is a hard requirement, people will eventually write bugs.
i'd rather work somewhere that recognizes defects occur and has a fast iterative process to push out new changes rather than one based on shame for having written a bug.
Even with 100% coverage, that doesn't mean you've found every bug that can possibly be found with unit tests. Your tests could always cover more inputs, more situations, etc.
Rather, you won't find bugs that you choose not to think of because you've let the "100%" number lull you into complacency, even though you know it's 100% of lines/branches, not 100% of inputs.
My problem in 40y of programming is still making bugs and those I make come from not thinking about edge cases or from wrong assumptions and not from being lulled into writing tests to meet a 100% number.
But personalities differ and if being lulled into security by writing towards a 100% number is a problem for you I would be careful, I totally agree here.
I agree that fuzz testing+linting can help you. However from my persective lower level tests help you to build trust about the software you're releasing.
If you are someone who games metrics for his benefit or hire people who are gaming metrics for their benefit, I assume yes, this metric is very easy to game as are most metrics. Metric systems are not cheater proof.
I like thinking about the trade off between a simpler rule that is mostly right vs. decisions require judgement and consensus. I think the simpler rule is usually the better side of the trade-off.
But in this case I think the cure might be worse than the disease. Tests for plumbing code often end up being brittle tests of methods getting called on mocks in the right order. People will notice that these require a lot of toil to keep them running as code changes while providing very little benefit in avoiding mistakes. People will rankle at being told that they must write these tests, which they can see are a waste of time.
I've done it both ways. I'm much happier with my work when I'm not trying to write tests that are tedious and don't seem to provide any value, in order to hit an arbitrary coverage metric. I suspect my teammates feel the same way, so on teams where I have input into the decision on this, I do not advocate for 100% coverage. It does make it harder to have the discussion of which tests should and shouldn't be written, but I think it's worth that cost.
Writing good testing code is harder than writing business code. Especially junior developers struggle with this, most often because many companies write not enough tests to learn writing good tests.
And if you're in an environment, where this is a non-issue I think thats great. Don't fix something that doesn't need to be fixed.
Yup. The debate over unit testing, being political, is a far bigger impediment to progress than the actual tests, which are a technical hurdle. It's essentially the same reason for linters and auto-formatters.
It has a side benefit that it forces devs to write testable code, which inclines them to reasonbly factored code.
"larger group of developers" is the phrase that caught my attention. Humans don't scale well. This is where microservices do become attractive. This service is owned by a small team, and that team makes these types of judgements. It may be very different from how other service is owned and maintained, and that's ok.
From my limited experience you get cross team discussions about unit testing, especially if one microservice has too many bugs in the eyes of other teams giving development a bad reputation or making working with a microservice hard. Especially if it breaks with releases and other teams get paged.
Ongoing culture wars are a recipe for demotivated staff.
Perhaps I am wrong, and I would not start with a 'diktat' for obvious reasons.
As a manager, you didn't have discussions about the level of necessary code coverage? Would be interested on how you managed unit testing without 'dictat'. How it would fit into integration testing and explorative testing. What level did developers in your deparment usually find "adequat" ? If you considered it too low, how did you raise test coverage as a manager without defining a coverage level?
Exactly, the 80-20 rule also applies to unit tests. I don't have 100% coverage on big projects, but everything I write that's meant to go in production is in TDD anyway, so there's always enough tests to prevent a junior from breaking my stuff, and I save a lot of time because thanks to TDD I don't need to manually test much, and most of the times not at all, I just wait for user feedback: it's always much easier to have code working in real conditions, if it already works in test conditions, not the other way around.
there is real world research that actually shows that the 20/80 works in unit tests, the last 20% hardly catches any issues or contribute very little to quality
And like other tools their importance is part of the entire tool-set. In many shops tight schedules, management by Product managers or people who are too removed from code cause you compromise every other principle of responsible sane coding. When this happens, unit tests are your only shield from doom. If everybody knows and allowed to write sane, good code with reasonable time to build it, the unit tests are nice to have but not a must
Actually it is pretty common to have 100% coverage with some extra redundancy too (where some things get accidentally covered multiple times). Striving for 100% is indeed nonsense, but having 100% coverage is usually accidental in clean code that you want to work, and merely a by-product of TDD.
I strive for working code. Sometimes I miss something in the TDD cycle and don’t have 100% and it is that which usually comes back to bite you.
I have never found 100% test coverage has bitten me, dogmatic or otherwise.
That heavily depends on the size of your codebase and perhaps also the language you are writing in. Writing in C++, for example, I often have switch statements in the form of:
switch(type) {
case X:
...
case Y:
...
default:
throw InternalException("Unsupported type!");
}
Now if all goes well the default case will never be covered. At some point I thought "why have this code if it's not supposed to run; let's rewrite this so we can get 100% code coverage!", and I ended up with the following code:
Now we can get 100% code coverage... except the code is much worse. Instead of an easy-to-track down exception we now trigger either an assertion (debug) or weird undefined behaviour (release) when the "not supposed to happen" inevitably does happen because of e.g. new types being added that were not handled before.
Is worse code worth getting 100% code coverage? In my eyes, absolutely not. I think good code + testing should be able to reach at least 90% typically, likely 95%, but 100% is often not possible without artificially forcing it and messing up your code and/or making it much harder to change your code later on.
This behavior occurs in internal functions and is not triggerable by the user. The only way to trigger this behavior would be to create unit tests that test small internal functions by feeding them specifically invalid input. This is possible, but I would argue this falls under "dogmatically trying to reach 100% code coverage". Testing small internal functions adds very little value and is detrimental to a changing codebase. After adding these tests every single change you make to internals will result in you needing to hunt down all these tiny tests, which adds a big barrier to changes for basically no pay-off (besides a shiny "100%" badge on Github, of course).
As always, I think the answer here is more along the lines of "it depends." It's not that uncommon of a task to make an existing function more performant, and a well thought out test suite makes that leaps and bounds easier even for small, internal functions.
It's arguable that this is a programming bug an not really recoverable, so throwing doesn't make much sense.
You can be defensive to various degrees about assertions:
1. You can just use assert() to fail in Debug and do nothing in Release.
2. You can be more defensive and define your always_assert() to fail in Release as well.
3. You can double down on the UB with hints to the compiler and provide assume(), which explicitly compiles to UB when it's triggered in Release (using __builtin_unreachable() for example).
About the organization of the if statement: I agree that the former is better, I would use assert(false) though.
Indeed it is a programming bug - but programming bugs happen. In my experience writing programs as if bugs will not happen is typically a bad idea :)
Throwing an exception here is basically free (just another switch case) and gives the user a semi-descriptive error message. When they then report that error message I can immediately find out what went wrong. Contrasting with a report about a segfault (with maybe a stacktrace), the former is significantly easier to debug and reason about.
assert_always would provide a similar report, of course. However, as we are writing a library, crashing is much worse than throwing an internal error. At worst an internal error means our library is no longer usable, whereas a crash means the host program goes down with it.
Better yet, omit that default case, so that in the future when you do add a new value to the enum, the compiler will warn you and force you to add a new case.
But I agree with your general thesis that it's just not worth getting to 100% coverage.
If it didn't catch any bugs, either during initial development or through later changes, then it bit you via wasting your time. I don't think it's fair to say that having tests necessarily makes the code under test any cleaner.
And that's exactly why you need less tests in your project that uses a DB: there's no need to test the DB because it's covered by its own tests already.
Interactions with your DB are often the most fragile piece of your code, because there's an impedance mismatch between the language you're writing and SQL. Some languages/frameworks abstract this more safely than others, though.
Interaction, in general, is where many, if not most, errors lie. Unit testing verifies that things work in isolation. But if you code up your unit tests for two components that interact, but with different assumptions, then the unit tests alone won't do you any good: X generates a map like `%{name => string()}`, Y assumes X generates a map like `%{username => string()}`. Now, hopefully that won't happen if you're writing both parts at the same time, but things like this can happen. Now your unit tests pass because they're based on the assumptions of their respective unit of code, but put it together and boom!
Exactly, though I believe there's still a thin line between testing the interaction, and testing the db itself. Just like, the difference between testing some code, and testing the language itself.
Mocking networked services is something very much worth doing because you can then check you’ve set timeouts correctly, handling incomplete or junk responses gracefully etc. Those are the kind of hidden problems that can bite you on production deployments.
"mocking networked services" is exactly what you would do when testing clients.
And it doesn't have to be a static mock. It's not too hard to inject a fuzzer in your mock service response, although that's probably left to a separate testing routine, and not part of your unit test setup. But if you have no mock for your network service, you can't fuzz it either.