I think for stealth TCP proxies are more common since you can use your own TLS fingerprints and all of that, with something like an HTTP proxy you'd need to set up your requests to match with the TLS fingerprint that the proxy is using, although I guess the proxy could make the TLS look the same? There are other ways of detecting HTTP proxies like for example comparing with the RTT of websockets or something like that, the idea is that there will always be at least one thing with RTT from the proxy and at least the RTT for one thing from the client that must go trough the proxy, you measure the difference between the two and there you have it.
The issue is that if HTTP is an extra 50ms than TCP for example, if you increase TCP by 50ms now HTTP is 100ms more. Basically it is always more no matter how much you increase it.
If the proxy can "see" the requests, then this isnt an issue because the headers can be trivially be modified.
The problem is that the proxies which are targets of identification - think proxies for large scale web scraping which use CONNECT tunnels - dont get to "see" the request.
I imagine any big CDN implementing something like this could keep a database of all of this, combined with the old kind of IP intelligence and collecting not only RTT on other protocols like TLS, HTTP, IP (aka ping, and traceroutes too), TCP fingerprint, TLS fingerprint, HTTP fingerprint...
And with algorithms that combine and compare all these data points, I think very accurate models of the proxy could be made. And for things like credit card fraud this could be quite useful.
It's set at a low threshold since I want to avoid blocking regular users at all costs, I think the detection can be improved a lot by using more data and not a single division to calculate the score, in this case it's a somewhat simple PoC.
Thanks for taking the time to test it, I really appreciate it!
It's a super cool tool, I've been wondering about an open source tool doing this since reading about the technique in one of Nikolai Tschacher's blog posts years ago (https://incolumitas.com/pages/about/).
There's a few ways to work around this, but I think it's one of the best signals available to detect low-effort/common proxy providers.
Would you be open to offering MASQUE proxying? I started to as support to GOST, been testing with Bright Data (only for UDP sadly, not TCP), but would love to see others add support so I could test with more than just 1 vendor.
Oh I haven't seen that before, it's really cool, thank you for showing me that!
I want to clarify that the approaches are a bit different, they use IP intelligence too and this approach doesn't use any kind of websockets, which is a really good idea, and I have to admit I didn't think of that, but sadly it's not really possible to do it with Fastly.
Another big difference is that this could work with any TCP application, not only HTTP, and if you do it with HTTP/S you can know if it's a proxy or not on a request basis and totally passively, without adding any delay or changing the code of the app.
About the straightline path I did think of that but apparently I forgot to address it when writing the README :p
The point I was trying to make is that if the RTT is low enough you can know the connection is being made from close, it's an upper bound, and making some assumptions you can get it lower, so it's not a way of knowing the exact distance but rather the max distance the connection can be made from. If someone is in Spain but they can't be more than 400km from Australia, something went terribly wrong somewhere hehe
In hindsight I think the issue with my explanation is that I was trying to explain the differences when fingerprinting two different protocols, but ended up going for a TCP-only approach since Fastly wouldn't expose to me the data I needed for the TLS and HTTP RTT. But in theory fingerprinting with protocol RTT difference where one protocol is proxied and the other is impossible to bypass, but this is only the theory.
I think I will edit the README in the future since I don't like how it turned out too much. Thanks for the feedback!
> But in theory fingerprinting with protocol RTT difference where one protocol is proxied and the other is impossible to bypass, but this is only the theory.
Alice wants you to think she's in New York when she's really in Taipei, so she gets a VM in New York and runs a browser in it via RDP. How are you detecting this?
I guess for this to work best you'd build your own CDN and have as many servers as possible. I have always dreamed of an Open Source CDN managed by a nonprofit and dedicated to offering CDN services for free or for a reasonable cost.
If you did the timings by comparing to other protocols, like TLS or HTTP you could do this with a single server, but that's a bit more complex than doing it on the same protocol since you have to account for more stuff, but it could be done, at the end of the day, my idea with Aroma was mostly to prove that it's possible, thanks for the feedback btw!
I think you could also compare with TLS handshake timings, delay for client hello among other things. And you could also compare it with HTTP RTT, not to mention that you can do TCP fingerprinting and compare it with the TLS and HTTP fingerprint of the browser, you can also measure the IP TTL and ping, among many other things... What I mean is that there are a ton of things that can be done on both sides, but any company with enough people working at this and enough servers will surely make something miles away from my proof of concept, and they also have a lot of traffic to know what's baseline data and what isn't.