DDG shares "signals for fraud detection" with the Azure Fraud Detection Service and receives back "View ID" + "Fraud Vector".
The "View ID" + "Fraud Vector" + "anonymized UA" + "anonymized IP" + "search query" are then shared with Microsoft Bing Private Search API.
"Signals" could include raw IP address, just not directly associated with the search query. Microsoft owns both the Azure service and Bing API so it can just match up the two separate requests on the backend to de-anonymize the user.
There is a lot of unnecessary FUD in the article, but some elements of truth too. Personally, I have always been sceptical and never trusted and used DuckDuckGo because they are an American company. Their relation with Apple, Microsoft and Yahoo also added to the mistrust. (DDG gets its search results from Microsoft Bing and Apple harvests a lot of user data from it by offering it as one of the few default search engines on ios / macOS platform. Thanks to them, DDG is the second most used search engine platform in the US).
As Snowden's PRISM revelation highlighted, the US has outsourced a huge part of their intelligence gathering to BigTech. Thus US companies have a huge profit incentive to collect as much personal data as they can on their users. (It's not just the tech companies - I remember reading about how Ford downloads your contact list, and recent calls, from your car when you take it for servicing from their service center). So it's a no brainer for US companies to harvest as much user data as they can. Who wouldn't love to have the US government as their client, with their near unlimited wealth?
So companies like Apple and DDG have just resorted to using "privacy" just for marketing, to increase their user base, while they slowly abuse the misplaced trust of the user and slowly keep increasing the data they collect from their user.
> US has outsourced a huge part of their intelligence gathering to BigTech.
And tech people and venture capitals have realised that there's money to be made out of data. And not just for data's sake, eg. selling it by the gigabyte, but by holding to that data and offering features with that data. Take for example Facebook. They don't "sell" the data by the gigabyte, but offer advertisements which are placed to people according to the data (with some algos which are also valuable, might even argue more valuable). Same with Google. However when it comes to DDG, I've yet to see DDG ask permission to my contacts for advertisement purposes. They've yet to create an account system to link all my devices' queries together for a more targeted advertising profile.
So to me this article seems fear mongering. Should I switch back to Google instead, now that some favicons are loaded through their services? I still think DDG comes out ahead.
I have tried using Searx but its user experience is nowhere near even DDG which lacks behind Google, too.
A large part of the list is ‘they cooperate with X that I don’t like’ ‘X that I don’t like works there’ and similar conjecture.
And to top it off ‘A judge told them not to link to some sites and they don’t’. Hardly a ‘privacy abuse’. More like a typical attempt at cancel culture.
> A large part of the list is ‘they cooperate with X that I don’t like’ ‘X that I don’t like works there’ and similar conjecture.
Exactly. By this standard I should not buy my food from any grocery store on this planet because they sell cereal sold by Nestle which want's private ownership of water that I'm against. This type of logic is not scalable in the least.
I basically agree with you on all points other than the judge one - the judge didn’t tell them to do anything because they are not an English company. It looks like the censorship is coming from their integration with Yahoo.
Would recommend using searx.be instead. Works basically flawlessly from my usage and I haven’t seen any privacy abuses. They don’t seem to want to be big which is ironically a reason to trust them more in my opinion.
It sucks to have experiences like the author of TFA because it’s very difficult to be taken seriously.
Agree, but please don't everybody on HN switch to this particular Searx instance. Spread the load and use other instances. I spun up my instance but hesitate to publish it here.
Searx is not a search engine with its own index, but rather, it is a meta-search engine, and operates by forwarding your query to a customizable list of other search engines (like Google, DDG, Wikipedia, etc) and gives you aggregated results. It is more privacy respecting, because the search providers cant run any tracking JS, see your IP, give you tracking links, etc. People may have their own instance for any number of reasons, including: hosting in different places to get different geo-specific results, to be faster by being closer to the end user, to change default themes/settings, to reduce the load on other instances. Nowadays, the fork SearXNG is more popular than the original.
doesn't that mean we have to trust the owners of the searx hosts though, if we don't spin up our own? I think i'd rather trust a faceless corp sometimes than someone with a potential axe to grind.
Big data is valuable, but little data is usually worthless. My instance is running on a Raspberry PI on my desk running Yunohost. If I figured out how to log IP addresses and the queries, I nor anyone else would have any use for the information. Searx doesn't set any cookies or do any browser fingerprinting. So all I would know is someone from an IP address (somebody in your house or on your VPN) queried something.
it's a meta-search engine (a search engine that searches through other engines). I self-host one myself (although I host SearxNG, a fork). You could argue you can be identified because of the same IP, so you shouldn't self-host for yourself only - but there's very real legal and practical limits to tracking without cookies, so I'm reasonably fine with this, especially since I'm not the only one using my instance (although the user count is under 10).
You can self-host and use a mobile 4G/5G modem for the outgoing connections from SearXNG: the IP address is going to be shared between different users (at least with IPv4).
Thanks. After reading this and also realizing it'd be better to use a meta-search engine rather than a normal one because of the aggregation of results from different search engines, I am shifting to MetaGer. I find it's results better than searx.
Also, would like to say if you have TS;DR extension installed, it has a search engine: TS;DR Search which is an instance of searx and can be set to default in firefox (which I personally use) coz firefox doesn't show option of searx to be set as default search engine.
Update: Sorry I've reverted back to DDG. MetaGer searches are too random and change (right in front of my eyes) even after the initial results are shown.
For example, if I search for "esd windows" in https://metager.org, the first result "Should I delete ESD Windows..." changes to "ESD West Windows". In reality, I actually wanted the former but the result changed and the latter became the first result even after not displaying it after first time querying. For me, this is unacceptable and indicates manipulation (for whatever motive).
So, back to DDG. Search results are comparable to meta search engines (I've now realized). I'll put up with whatever problems it has for now. And man I miss the keybindings, made my life less painful.
I've moved to SearXNG since this submission, and it is just great (for my mostly IT queries), so if you say MetaGer is even better I am pretty curious: I see they have an English version https://metager.org (great), but I don't get what you mean with your second paragraph: the "TS;DR extension"
By the way: I've just discovered that you can set a custom search engine in Firefox without the extension: just right click, see https://metager.org/plugin and it works for searXNG too :)
Checked the first one (kagi.com) and on the first page they make requests to both cloudflare.net and duckduckgo.com. I just don't get it - how difficult is it not to leak your users' data all over the place?
EDIT: you.com does better on the first page, but then they leak data to CloudFlare and Google via 3rd party JS on FAQ page. Come on.
EDIT2: qwant.com looks promising, thank you for the tip!
Hello! Checking in from you.com on this. For the FAQ page, is this the page you were looking at: https://about.you.com/faq/ ? I want to make sure I'm looking at the exact same page.
Looking at that page, it appears we load a font from Google for about.you.com. Thanks for flagging! We'll get rid of that.
We do use CloudFlare for our CDN, DNS name server, JavaScript caching, and DDoS protection. We're not planning to move off of CloudFlare any time soon, but we've worked with them to ensure that they do not store user queries in their logs (completely masked out), they redact the last part of user IP addresses so no individual IP address is stored while keeping the benefits of bot detection and DDoS protection, and their javascript analytics tracker is turned off so there shouldn't be anything CloudFlare client-side.
Thanks again for the time you took looking into all this! It's great to have people out there that care enough to look and report what they find. Much appreciated.
I have advanced blocking in uBlock Origin and all 3rd party scripts are blocked by default. Not sure how DDG came there (maybe because I came to Kagi through it?), but I have checked agsin and for CF it says:
d33q65j1hc8iiu.cloudfront.net
assets.kagi.com
I assume you are using CloudFlare as CDN for your assets (can't check right now)? If so, it is a weird decision for a privacy focused search engine. Of course there are degrees to privacy, but CF is too big to be trusted imho.
CloudFront is an AWS service (we use AWS for our infrastructure) and it has nothing to do with CloudFlare except sharing 6 letters in the name.
Would be nice if you double-chcecked both claims before making allegations publicly (or at least contacted us for clarification) as we do not have the resources of a big company to right all the wrongs said about us.
Everything about Kagi and privacy is available at kagi.com/privacy
Thanks. And yes it does matter, which is why I invested disproportionately more time and energy than it took to write the original comment, but it is a game I can not win at any scale. Unfortunately some people that seen the original comment and never seen my rebuttal will walk away misinformed and potentially spread misinformation further, further damaging our brand, all completely unnecessary.
The NamesDB "surveillance capitalist service designed to coerce naive users to submit sensitive information about their friends" claim is kind of weird since fundamentally it was less so than Facebook & most other social networks. I'm pretty sure what they're talking about is how you could get paid status by inviting enough people? I've read the code (admittedly from 2-3 years after Weinberg's time) and it was a really generic site, the only thing that made it stand out was that it had class lists for a lot of countries that were being ignored by the big players. The invite function was basically manual or the standard "give us your addressbook" feature I've always hated, there was no tracking off-site or even integration with their parent company (the forever sleazy Classmates.com). At the time I thought it was kind of sketchy, but everything I disliked it did has been done worse & more aggressively since then by the large social networks.
One thing they did properly is turn off mail after a single bounce or junk mail marking - the mail loops with the big mail providers and the big mailers tell you exactly who marked your mail as junk and you're supposed to immediately stop mailing them, but that doesn't stop LinkedIn or Facebook.
CIA does invest in companies - Google got funds from them when it was a startup. But the US intelligence services no longer needs to "own" or "operate" any computer tech company in the US. US laws allows them to ask for any information they want under "National Security", and US citizens / companies operating in the US cannot refuse. I also remember reading that they can also force access to any data centre to place and connect their own devices / servers and do whatever they want with the data they collect.
The "View ID" + "Fraud Vector" + "anonymized UA" + "anonymized IP" + "search query" are then shared with Microsoft Bing Private Search API.
"Signals" could include raw IP address, just not directly associated with the search query. Microsoft owns both the Azure service and Bing API so it can just match up the two separate requests on the backend to de-anonymize the user.
https://www.searchenginejournal.com/microsoft-announces-priv...