Unit tests are not a goal, they are a tool. Striving for 100% test coverage is n...

lloeki · on July 9, 2020

I read this complementary one the other day, and one thing that is readily apparent to me is that a lot of people have a lot of different opinions about testing (move to system tests! more unit tests! regression tests!), but many are not asking the zillion dollar question:

What are you testing for?

This is critical because it basically gives you immediately what you should and should not test, and how. While mindless, dogmatic, metric oriented testing is a waste, testing with higher intent and purpose is extremely useful.

An example: test that something working on current vX also works on vA to vW, and when vZ is out, have the answer readily. Or that a biz feature fulfills the requirements. Or that someone not as well versed on intricate details of your piece of ownership will be confident in that piece still working after a simple fix when you’re on vacation. It can be one, some, but probably not all.

With that in mind, what to test, what doesn’t make sense to test, and what to test against becomes more clear: should I mock this? or should I run it against some staging environment? Should I perform (yikes? not!) manual testing?

The answers are highly dependent on the piece of code being tested.

Tests are here to help you answer a question, if you aren’t sure what the question is then your tests will miss the point.

https://flak.tedunangst.com/post/against-testing

hnick · on July 10, 2020

I feel like a lot of unit testing is just another form of bikeshedding. It's easily understood, everyone can talk about it, and you can spend a lot of time on it with no clear goal but feel like you're getting something done.

mcv · on July 9, 2020

> "Striving for 100% test coverage is nonsense"

100% coverage of what exactly? Tests that go through all your lines of code without testing any of the logic, is useless. If you want to be thorough, you need to do mutation testing, which is a system that tests the quality your unit tests by mutating your logic (changing a > for >=, a + for -, etc) and then expects at least one test to fail. If no test fails, that piece of logic wasn't tested.

Without that, it's entirely possible your high code coverage doesn't actually test anything meaningful. Also, this sort of logic is exactly the kind of stuff you want to unit test. All the standard plumbing boilerplate code is not something that needs to be unit tested. The logic does.

2rsf · on July 9, 2020

and it's not just that, for example branch coverage only says that you branched but not how, a complex branch might need more than one test to be fully, or decently, tested.

zzzcpan · on July 9, 2020

Right, essentially you have to be aware that you are testing the state space the code might end up in, which is very different from just hitting every line of code or every branch.

yaktubi · on July 9, 2020

On that note it is a great tool during development to get a piece working without connecting it to the broader application. Test driven development gives a nice debugging context that is easier to work with. The code coverage and regression part comes as a nice bonus feature.

I’d also like to add that if you contribute code to an open source project it is extremely beneficial to have iron-clad unit tests. Since there is so many devs it would be easy for someone to accidentally break something you fixed already.

KingOfCoders · on July 9, 2020

The benefit of 100% test coverage is that there are no more discussions on what to test and what not. When in doubt, test it. In larger groups of developers there are otherwise ongoing discussions if A needs to be tested or not. I have seen culture wars around this, from people who don't want to test and are in the eyes of others always testing not enough and vice versa. Especially with a diverse development force with different ages, seniority and cultural background.

It's often easier to just aim for 100% test coverage instead (with excluding some categories of files).

EDIT: I would not and did not start with 100% unit testing. But if there are ongoing culture wars and discussions didn't lead to a workable compromise, 100% test coverage worked for me and after some days test coverage was a non issue.

WrtCdEvrydy · on July 9, 2020

> just aim for 100% test coverage instead (with excluding some categories of files).

That's where the 'gaming' comes in.

The tests start just going through lines without hitting a single expect statement.

The ignore files start becoming battlegrounds in the PRs because people just exclude half the damn project.

We just have a simple rule... if you wrote code, you have to write coverage for it. If it breaks and your test doesn't catch the breakage, the bug fix goes back to you. Some people will ask "but what about what I'm working on now", you'll have to communicate that you feel your previous work was far more important.

deleuze · on July 9, 2020

> the bug fix goes back to you. Some people will ask "but what about what I'm working on now", you'll have to communicate that you feel your previous work was far more important.

this feels punitive, especially in the eyes of management. unless you're in a safety critical area where fully testing every code path is a hard requirement, people will eventually write bugs.

i'd rather work somewhere that recognizes defects occur and has a fast iterative process to push out new changes rather than one based on shame for having written a bug.

WrtCdEvrydy · on July 9, 2020

> has a fast iterative process to push out new changes rather than one based on shame for having written a bug.

That is the fastest most iterative process we have found so far... as the expert on the original code, you are able to deliver the best outcome.

You're not being shamed for writing a bug, you're being shamed for not testing your code.

shawnz · on July 9, 2020

Even with 100% coverage, that doesn't mean you've found every bug that can possibly be found with unit tests. Your tests could always cover more inputs, more situations, etc.

KingOfCoders · on July 9, 2020

No, you can't find bugs that you can't think of.

Property based testing and random values in unit tests find lots of bugs you didn't think of though.

brlewis · on July 9, 2020

> you can't find bugs that you can't think of

Rather, you won't find bugs that you choose not to think of because you've let the "100%" number lull you into complacency, even though you know it's 100% of lines/branches, not 100% of inputs.

KingOfCoders · on July 9, 2020

Yes.

My problem in 40y of programming is still making bugs and those I make come from not thinking about edge cases or from wrong assumptions and not from being lulled into writing tests to meet a 100% number.

But personalities differ and if being lulled into security by writing towards a 100% number is a problem for you I would be careful, I totally agree here.

monksy · on July 9, 2020

I agree that fuzz testing+linting can help you. However from my persective lower level tests help you to build trust about the software you're releasing.

tams · on July 9, 2020

100% test coverage is pretty easy to game as long as the metric is known. Just exercise all paths and write as few assertions as you can.

I'd prefer <100% coverage plus discussions about what to test (and how) much more than working with a test suite built on the wrong incentives.

KingOfCoders · on July 9, 2020

If you are someone who games metrics for his benefit or hire people who are gaming metrics for their benefit, I assume yes, this metric is very easy to game as are most metrics. Metric systems are not cheater proof.

misja111 · on July 9, 2020

This argument, that it avoids discussions, does make sense but only if you're in a team where discussions tend to be a waste of time.

When my colleagues are knowledgable and open minded I would embrace every opportunity to have a good discussion.

sanderjd · on July 9, 2020

I like thinking about the trade off between a simpler rule that is mostly right vs. decisions require judgement and consensus. I think the simpler rule is usually the better side of the trade-off.

But in this case I think the cure might be worse than the disease. Tests for plumbing code often end up being brittle tests of methods getting called on mocks in the right order. People will notice that these require a lot of toil to keep them running as code changes while providing very little benefit in avoiding mistakes. People will rankle at being told that they must write these tests, which they can see are a waste of time.

I've done it both ways. I'm much happier with my work when I'm not trying to write tests that are tedious and don't seem to provide any value, in order to hit an arbitrary coverage metric. I suspect my teammates feel the same way, so on teams where I have input into the decision on this, I do not advocate for 100% coverage. It does make it harder to have the discussion of which tests should and shouldn't be written, but I think it's worth that cost.

KingOfCoders · on July 9, 2020

Yes I have seen bad unit tests very often.

Writing good testing code is harder than writing business code. Especially junior developers struggle with this, most often because many companies write not enough tests to learn writing good tests.

And if you're in an environment, where this is a non-issue I think thats great. Don't fix something that doesn't need to be fixed.

ben509 · on July 9, 2020

Yup. The debate over unit testing, being political, is a far bigger impediment to progress than the actual tests, which are a technical hurdle. It's essentially the same reason for linters and auto-formatters.

It has a side benefit that it forces devs to write testable code, which inclines them to reasonbly factored code.

chucky · on July 9, 2020

> It's often easier to just aim for 100% test coverage instead (with excluding some categories of files).

Congratulations, now you have a war over which categories of files are excluded from the "100% test coverage" rule. ;)

KingOfCoders · on July 9, 2020

Thank you!

redleader9345 · on July 9, 2020

"larger group of developers" is the phrase that caught my attention. Humans don't scale well. This is where microservices do become attractive. This service is owned by a small team, and that team makes these types of judgements. It may be very different from how other service is owned and maintained, and that's ok.

KingOfCoders · on July 9, 2020

From my limited experience you get cross team discussions about unit testing, especially if one microservice has too many bugs in the eyes of other teams giving development a bad reputation or making working with a microservice hard. Especially if it breaks with releases and other teams get paged.

DodgyEggplant · on July 9, 2020

Yes. you move the people don't scale well issues to multiple teams don't scale well

tonyedgecombe · on July 9, 2020

Rule by diktat is a recipe for demotivated staff.

KingOfCoders · on July 9, 2020

Ongoing culture wars are a recipe for demotivated staff.

Perhaps I am wrong, and I would not start with a 'diktat' for obvious reasons.

As a manager, you didn't have discussions about the level of necessary code coverage? Would be interested on how you managed unit testing without 'dictat'. How it would fit into integration testing and explorative testing. What level did developers in your deparment usually find "adequat" ? If you considered it too low, how did you raise test coverage as a manager without defining a coverage level?

rimliu · on July 9, 2020

This is dangerously close to not understaning what you are doing. _Thinking_ what to test has a benefit of making you think, not just cargo-culting.

KingOfCoders · on July 9, 2020

Obviously you need to think about how to test. Usually this leads to finding omissions in your code adding edge cases.

1337shadow · on July 9, 2020

Exactly, the 80-20 rule also applies to unit tests. I don't have 100% coverage on big projects, but everything I write that's meant to go in production is in TDD anyway, so there's always enough tests to prevent a junior from breaking my stuff, and I save a lot of time because thanks to TDD I don't need to manually test much, and most of the times not at all, I just wait for user feedback: it's always much easier to have code working in real conditions, if it already works in test conditions, not the other way around.

2rsf · on July 9, 2020

there is real world research that actually shows that the 20/80 works in unit tests, the last 20% hardly catches any issues or contribute very little to quality

DodgyEggplant · on July 9, 2020

And like other tools their importance is part of the entire tool-set. In many shops tight schedules, management by Product managers or people who are too removed from code cause you compromise every other principle of responsible sane coding. When this happens, unit tests are your only shield from doom. If everybody knows and allowed to write sane, good code with reasonable time to build it, the unit tests are nice to have but not a must

pjmlp · on July 9, 2020

Yeah, but those QA dashboards monitored by management....

majikandy · on July 9, 2020

Actually it is pretty common to have 100% coverage with some extra redundancy too (where some things get accidentally covered multiple times). Striving for 100% is indeed nonsense, but having 100% coverage is usually accidental in clean code that you want to work, and merely a by-product of TDD.

I strive for working code. Sometimes I miss something in the TDD cycle and don’t have 100% and it is that which usually comes back to bite you.

I have never found 100% test coverage has bitten me, dogmatic or otherwise.

mytherin · on July 9, 2020

That heavily depends on the size of your codebase and perhaps also the language you are writing in. Writing in C++, for example, I often have switch statements in the form of:

  switch(type) {
  case X:
    ...
  case Y:
    ...
  default:
    throw InternalException("Unsupported type!");
  }

Now if all goes well the default case will never be covered. At some point I thought "why have this code if it's not supposed to run; let's rewrite this so we can get 100% code coverage!", and I ended up with the following code:

  switch(type) {
  case X:
    ...
  default:
    assert(type == Y); 
    ...
  }

Now we can get 100% code coverage... except the code is much worse. Instead of an easy-to-track down exception we now trigger either an assertion (debug) or weird undefined behaviour (release) when the "not supposed to happen" inevitably does happen because of e.g. new types being added that were not handled before.

Is worse code worth getting 100% code coverage? In my eyes, absolutely not. I think good code + testing should be able to reach at least 90% typically, likely 95%, but 100% is often not possible without artificially forcing it and messing up your code and/or making it much harder to change your code later on.

tremon · on July 9, 2020

Why can't you have a test that checks if the correct exception type is thrown on invalid input? The exception is part of your API too.

mytherin · on July 9, 2020

This behavior occurs in internal functions and is not triggerable by the user. The only way to trigger this behavior would be to create unit tests that test small internal functions by feeding them specifically invalid input. This is possible, but I would argue this falls under "dogmatically trying to reach 100% code coverage". Testing small internal functions adds very little value and is detrimental to a changing codebase. After adding these tests every single change you make to internals will result in you needing to hunt down all these tiny tests, which adds a big barrier to changes for basically no pay-off (besides a shiny "100%" badge on Github, of course).

hansvm · on July 9, 2020

As always, I think the answer here is more along the lines of "it depends." It's not that uncommon of a task to make an existing function more performant, and a well thought out test suite makes that leaps and bounds easier even for small, internal functions.

steerablesafe · on July 9, 2020

It's arguable that this is a programming bug an not really recoverable, so throwing doesn't make much sense.

You can be defensive to various degrees about assertions:

1. You can just use assert() to fail in Debug and do nothing in Release. 2. You can be more defensive and define your always_assert() to fail in Release as well. 3. You can double down on the UB with hints to the compiler and provide assume(), which explicitly compiles to UB when it's triggered in Release (using __builtin_unreachable() for example).

About the organization of the if statement: I agree that the former is better, I would use assert(false) though.

mytherin · on July 9, 2020

Indeed it is a programming bug - but programming bugs happen. In my experience writing programs as if bugs will not happen is typically a bad idea :)

Throwing an exception here is basically free (just another switch case) and gives the user a semi-descriptive error message. When they then report that error message I can immediately find out what went wrong. Contrasting with a report about a segfault (with maybe a stacktrace), the former is significantly easier to debug and reason about.

assert_always would provide a similar report, of course. However, as we are writing a library, crashing is much worse than throwing an internal error. At worst an internal error means our library is no longer usable, whereas a crash means the host program goes down with it.

kccqzy · on July 9, 2020

__builtin_unreachable() is your friend.

Better yet, omit that default case, so that in the future when you do add a new value to the enum, the compiler will warn you and force you to add a new case.

But I agree with your general thesis that it's just not worth getting to 100% coverage.

shawnz · on July 9, 2020

If it didn't catch any bugs, either during initial development or through later changes, then it bit you via wasting your time. I don't think it's fair to say that having tests necessarily makes the code under test any cleaner.

alecco · on July 9, 2020

> Striving for 100% test coverage is nonsense

Tell that to SQLite guy.

ben-schaaf · on July 9, 2020

Striving for very high test coverage seems like exactly the kind of thing you want for your database infrastructure.

1337shadow · on July 9, 2020

And that's exactly why you need less tests in your project that uses a DB: there's no need to test the DB because it's covered by its own tests already.

bpicolo · on July 9, 2020

Interactions with your DB are often the most fragile piece of your code, because there's an impedance mismatch between the language you're writing and SQL. Some languages/frameworks abstract this more safely than others, though.

Jtsummers · on July 9, 2020

Interaction, in general, is where many, if not most, errors lie. Unit testing verifies that things work in isolation. But if you code up your unit tests for two components that interact, but with different assumptions, then the unit tests alone won't do you any good: X generates a map like `%{name => string()}`, Y assumes X generates a map like `%{username => string()}`. Now, hopefully that won't happen if you're writing both parts at the same time, but things like this can happen. Now your unit tests pass because they're based on the assumptions of their respective unit of code, but put it together and boom!

1337shadow · on July 9, 2020

Exactly, though I believe there's still a thin line between testing the interaction, and testing the db itself. Just like, the difference between testing some code, and testing the language itself.

kelvin0 · on July 9, 2020

You should definitely watch this presentation on SQLite exploits.

SELECT code execution from using SQlite - DEF CON 27 Conference

https://www.youtube.com/watch?v=HRbwkpnV1Rw

Don't get me wrong, it's a great product and I use it often, but 100% test coverage does NOT equal 100% safe.

nottorp · on July 9, 2020

Something like sqlite can actually be unit tested to 95% or so. If you stretch the definition of 'unit test' a bit (don't mock i/o).

Try that with a networked application that takes user input though...

laumars · on July 9, 2020

Mocking networked services is something very much worth doing because you can then check you’ve set timeouts correctly, handling incomplete or junk responses gracefully etc. Those are the kind of hidden problems that can bite you on production deployments.

nottorp · on July 9, 2020

You can mock networks all you want and you'll still cover only what you can think of.

It will always break on the user's internet, because it's too diverse to predict.

Doesn't mean you can't have some networking unit tests, just that you shouldn't believe in them too much.

Edit: you said services. You thinking of server? I'm thinking of clients.

tremon · on July 9, 2020

"mocking networked services" is exactly what you would do when testing clients.

And it doesn't have to be a static mock. It's not too hard to inject a fuzzer in your mock service response, although that's probably left to a separate testing routine, and not part of your unit test setup. But if you have no mock for your network service, you can't fuzz it either.

jfkebwjsbx · on July 9, 2020

And yet they had bugs and could lose data.

Which shows 100% unit test coverage is not better than spending that time in other kinds of tests.

WrtCdEvrydy · on July 9, 2020

unit tests don't account for timing-based effects and you generally can't test for ACID properties.

tanilama · on July 9, 2020

TBH

SQLite is suitable for 100% coverage.

A lot of application code or workflow style code is hard to reach 100% coverage as they are rarely triggered.

ben509 · on July 9, 2020

The rarely triggered code is exactly where you want unit tests.

That's an opportunity to describe, in code, what that path is supposed to do, and then make sure it does it.