I really wish that in elixir we drew a distinction between dependencies that do and don't spawn their own supervision trees. It would be nice if this were reflected in, say, hex.pm, possibly using different tags for them.
Agreed. I also wish fewer libraries started their own supervision tree, and instead gave you a child spec to drop into your supervision tree. There's definitely use-cases where shipping libraries as an application makes sense, but oftentimes that sort of design causes problems for me, because it means not being able to start multiple copies of the dependency with different configurations.
I think Phoenix PubSub is a perfect example of how libraries should be structured, in that you just need to drop the module + options into your supervision tree, and you have the freedom of starting multiple independent copies of the tree, in different contexts, and with their own configurations: https://hexdocs.pm/phoenix_pubsub/Phoenix.PubSub.html#module...
Alternately, the dependency can start its own supervision tree with any global processes/tables hanging off it from the beginning; and then export a Mod:start_link/1 function which clients will call, which will 1. start a child tree owned/managed by the dep's supervision tree; but which then 2. links that child subtree's root into the caller as well.
Such deps are integrated, by adding a stub GenServer that calls Mod:start_link/1 in its init/2 callback; and then adding a child-spec for that stub GenServer in your client app's supervision hierarchy.
The ssh daemon module in the stdlib works this way. Most connection-pooler abstractions (e.g. pg2, gproc) do as well.
Yes! This is a great approach, and I'd be happy to see more examples like this in the wild. This is similar to the same way Phoenix PubSub works, with the PubSub application starting a pg scope as part of its supervision tree, that client PubSub servers can join if configured to use the pg adapter.
I was a little bit flippant in my initial comment, but my main criticism was of libraries that don't support any sort of hooks like this into their supervision strategy, and instead rely entirely on a global and static supervision tree, usually configured using app config.
I'm 50-50 on that one (used to agree with you more but have since retraced a bit). This may be an overly nitpicky detail, but I you sort of want your own sup tree to not necessarily have a different-ly scoped "microservice" tied to it in terms of failure domains, and also just plain visual organization in your observer/livedashboard. For the 90% use case (e.g. http process pools) an indepentent sup tree is correct, but to your points,
1. it would be nice to have a choice. The library-writer should think about their users and choose which case is more correct. And make it opt-out and easy (let's say 2-3 loc) to implement the "other case", and spelled out explicitly in the readme/docs landing page.
2. PubSub indeed made (IMO) the correct choice when it migrated over from being its own sup tree to moving into the app's sup tree.
IIYC you're suggesting that what I am depending upon here is convenient but problematic?
My understand is not yet sophisticated enough to follow your point about "not being able to start mutiple copies of the dependency with different configurations".
Do you have any explanatory examples that could help me (and presumably others like me)? Thanks. m@t
Problematic is probably too strong of a term, and I think I'd use the word inflexible instead.
I want to be clear though: my issue isn't with applications -- the functionality you're talking about is powerful and useful -- it's purely with the tendency of starting a static and global supervision tree as part of a dependency: see some of the other comments in this thread for some neat examples of how applications like ssh and pg2 handle supervision.
When libraries are written like this, they usually start everything up automatically, and pull from their application environment in order to configure everything. This means that this configuration is global and shared amongst all consumers of the library.
Imagine an HTTP client, for example, that provides a config key for setting the default timeout. This key would be shared among all callers, and so if multiple libraries depended on this client, their configurations would override each other.
Fortunately, Elixir now recommends against libraries setting app config, so this problem is partially mitigated, but it's still a concern within your app: if I'm calling two different services, I want to use different timeouts for each, based on their SLA, so having a global timeout isn't helpful.
Instead, in this situation, I'd prefer something like what Finch provides, where I'm able to start different HTTP pools within my supervision tree, for different use-cases, and each can be configured independently: https://github.com/keathley/finch#usage
Another approach would be to do something like what ssh does, and have the Finch application start a pool supervisor automatically, but then provide functions for creating new pools against that supervisor, and linking or monitoring them from the caller.
There's a few other techniques you can use too, with different tradeoffs and benefits: like Ecto's approach of requiring that you define your own repo and add that to your tree. Chris Keathley describes some of those ideas here: https://keathley.io/blog/reusable-libraries.html
Global trees like this are also harder to test, especially if they rely on hardcoded unique names, and usually restrict you to synchronous tests, since you can't duplicate the tree for every test and run them independently of each other.
Again though, I want to stress that running processes in the library's application is not my problem: it's just not having any control over when or how those processes are started.
I'm just responding on my phone, and I need to run for a few hours, but feel free to ask for more info or reach out. I'm always happy to talk about this stuff! I enjoyed your article, and I apologize if my initial comment came across as an attack on your core points.
No indeed, I did not perceive it as an attack, rather as hinting at concerns that I am not aware of and I'm grateful for your comment and the links (and thank you for your compliment).
Reading what you've written I wonder if this is about configuration rather than the nature of a library starting a process per se.
In my case there is no configuration, the agent state is a pure-counter, I think firing it off is harmless as other users of the library would just bump the counter value. Your point about testing is a subtle one, I'm not 100% sure I have the right mental picture yet (something I struggle with most of the time anyway).
What I think you are getting at is a library starting a process that does have configuration around how it works, should be less automatic giving the user a chance to make choices about how it works.
A lot of what I'm talking about has to do with configuration, but reuse is another big element. Your example has no configuration, and so is good in that regard, however your example is not reusable, in the sense that it's only possible for a single counter to exist.
I realize this is a contrived example, because you were trying to keep things simple, but if I needed two distinct atomic counters in my app, then I wouldn't be able to use Ergo, as it's currently implemented, because the application only starts a single counter, and doesn't provide any capabilities for starting additional counters.
You could change Ergo to get around this, possibly by instead running a dynamic supervisor that can start named counters under it, using something like `Ergo.create_counter/1`, but this would only address this specific use case.
To go back to my last comment, if you instead exposed, for example, a `__using__` macro that modules could use to define new counters, then callers would be able to integrate as many counters as they needed, whenever or however into their supervision tree as they required.
This ties back to the testing point too: if the process is a singleton, managed by the application, then you can only run one test against that process at a time in order to isolate the state for this tests, and you need to ensure you properly clean up that state between tests. Instead though, if the library allows you to start the processes yourself, then each test can use `start_supervised!` to start it's own isolated copy of the process, which will be linked to the test's process, and automatically cleaned up once the test finishes.
If a library is not written like that, it's poorly designed. It could provide one global started version as a commodity, but not being able to start it multiple times would be a big no.
This has historically been fairly common among a lot of the early Elixir libraries, and I'd imagine that's a byproduct of many of the early adopters coming from the Ruby ecosystem, and not having prior experience with the patterns used in Erlang. I think some of the early confusion surrounding how application config should be used also led to some misguided decisions early on.
Fortunately it's something that I've seen improve over time, but it's a pain-point I've run into with a lot of dependencies, so I try to call it out when I see it.
It's not obvious from use? It's been a long time since I was in this world (well, Erlang), but it was application:ensure_started(App) that let me know "this dependency has its own supervision tree".
Httpoison, for example, starts its own supervision tree to manage client process pools (not obvious that an http client should do that, definitely usecases where it shouldn't) and there is no indication either in mix.exs, or `MyApplication.Application.start/2` that Httpoison needs the tree. For most deployments, it's probably not a big deal. Some process will try to read http content, fail if httpoison isn't quite ready, and be restarted by its supervisor.
However, if you try to use it early in compilation or test, say, in your test_helper.exs file, speaking from experience, you could wind up with a very difficult to understand race condition where the httpoison process tree hasn't fully booted and you're trying to fetch something off the internet, and you don't have the same level of supervision protection -- if test_helper fails the whole test suite gives up and doesn't restart -- for obvious reason.
For the http case thankfully the elixir ecosystem is getting Mint as a base http library, which doesn't require a process tree out the gate, and several interesting explicit process-pool-libaries (finch, mojito) which are tuned for their own use cases that derive from mint.