It’s big for caches. As your application grows in complexity you get dependent lookups. I need the user id to get the company id. I need the company id to figure out what features the user has access to. And then I need to run a bunch of queries to pull data for those feature.
And while using the cache may cut an order of magnitude off of the overall time, you’ve gone from hundreds of milliseconds to tens, but with a bloom filter you can figure out you have a cache miss faster and start the process of fetching the data sooner. The user may not notice the small improvement in response time, but by Little’s Law your cluster size can be smaller for the same traffic.
Web browsers use bloom filters to determine which CSS rules apply to which elements. IIRC Chrome removed a perf screen for CSS rules because most people were getting results below the noise floor for the timing function. The time to load the CSS was still relevant (maybe moreso due to the higher setup cost of the filters).
A few non-exhaustive real world use-cases that come to mind:
- Databases: To quickly check if a record might exist before doing a costly disk lookup.
- Spell Checkers: To check if a word might be in the dictionary.
- Spam Filters: To check if an email sender is on a list of known spammers.
- Browser Security: Chrome uses Bloom filters to check if a site might be malicious.
- Password Checker: To check if a password is known to be leaked.
- Web Caches: To check if a URL or resource is definitely not in the cache.
- Distributed Systems: To avoid sending data that another system definitely doesn’t need.