At a former job, we reverse engineered the trading APIs of most American retail stock brokerages (Fidelity, E-Trade, Robinhood, TD Ameritrade, etc). We did it by rooting an iPhone and using Charles Proxy to grab the unencrypted traffic.
I learned a lot from that experience, and it's also just plain fun to do. We did get some strongly worded letters from Robinhood though, lol. They tried blocking our servers but we just set up this automated system in Digital Ocean that would spin up a new droplet each time we detected a blockage, and they were never able to stop us after that.
I did almost this exact same thing back in 2015~ ish when I was in high school over Christmas break. I reverse engineered the anime streaming site Crunchyroll's API via their Android and PS3 app using some HTTP proxy application and trial + error. I ended having a proper HLS-based streaming player and Android TV app back when their Android app was still Flash based. It was lots of fun!
Maybe! I considered pursuing a job there at the time, but I opted to get a degree instead. Being located in the Midwest would've made it rather challenging anyways.
This reminds of an internship I did long time ago at a "insight generation company" that use to reverse engineer all do the flights/booking website for price forecasts. Getting paid for automating scraping/reverse engineering APIs was always fun.
I wonder why didn't hey block all Digital Ocean IP ranges.They only need residential customers to access Robin Hood, so they can block everything else.
On the defender side, it's much funnier to poison the data of identified scrapers than to immediately ban them. Let them work out that their data has been altered for a while, clean up their datasets, and work to understand what identifies them as scrapers.
The product was for end users so the traffic was coming from any account. They never discovered the accounts we used for testing. At least, the American ones didn't. When we finished with US companies, we went for Singapore.
Our CEO was friendly with an investor who had an account at some big Singaporean trading firm called Lim Tan. He gave us his account credentials and I began working. A few days later my boss comes over to my desk and says "stop whatever you're doing right now." Apparently my traffic had set off so many alarm bells that the CTO of Lim Tan was woken up at 2am. They permanently banned the investor, which I felt really bad about. What's crazy is that I wasn't even doing anything weird. I was just poking a bit at their authentication methods. That was when I learned that Singapore tech doesn't fuck around.
I wonder if using something like puppeteer or playwright to actually make the server think everything is being done client-side would still raise flags.
Scraping a well-built API at human speed often isn't terribly useful, and once you start ramping up the scraping, it's account creation/patterns/use frequency that will set an alarm.
Faking real user clients won't prevent these alarms.
The purpose of our work wasn’t scraping - we were building a unified UI where our users could trade with any vendor of their choosing. Kinda like Plaid but specifically for retail stock trading. So the goal was to implement their trading API.
Yep. Though it’s really hard. To capture the US market (E*Trade, Fidelity, TD Ameritrade, Scottrade, Schwab, Interactive Brokers, Robinhood) it took me and another engineer almost 2 years. It’s non-trivial.
Reverse engineering the dudes holding your funds isn't a good idea to begin with. Too much risk. Better to work with them directly or switch to a better service which does feature APIs.
It reminds me of the expression "locks are to keep honest people out," in that code which runs on a device you control is code that you control: https://github.com/shroudedcode/apk-mitm#readme
Can't one just list all of digital ocean's ip blocks?
Like sure then you can add in hertzer or w/e and keep adjusting but idk if somebody keeps ban dodging by using the same provider it seems like you'd just try banning that provider early on?
The trick is to use the same provider. I did it with the Expedia api for a while. At the time they were using AWS so running it on Lambda made it very tricky for them to do much about it. They were hardly about to block all AWS IP's and risk their own services or any of the "real" partners having issues.
Same sort of timeframe, a project I worked on used netwoking via mobile hotspots on a bunch of Android phones with SIMs from a provider that used CGNAT. If the target websites wanted to block that, they'd be blocking well over 10% of all mobile phones in Australia.
(Hmmm, all the devices we used then would have just stopped working with the shutdown on the 3G network here. I wonder if it's all broken, or if they've upgraded all those devices to 4/5G ones?)
Quite a few people do that. My grocery store's app and the Tesla app stop working if I bring up my vpn through DO. (I first set that up years ago because the legoland hotel's wifi was blocking reddit.)
I remember a friend telling me years ago some scraping that happened where they worked - they scraped results from a bunch of different websites to create SEO websites and they had some setup using tor to avoid getting blocked. One of the websites that the company actually depended on apparently rendered results using a whole assortment of visually identical but structurally (HTML-wise) different methods which were returned randomly to hamper scrapers. They eventually gave that up because it turned out TV closed captions can be downloaded as XML and they had what the company needed.
I've done the same for use in foreign currency exchanges. The adventure of reverse engineering protocols and finding security checks, etc was more fun than the actual accomplishment lol.
Our product was built for end users, so the traffic coming from our servers could technically be from any account. But as to why we weren't blocked during testing, that I'm not sure about. It's been about 8 years since I did that work - I assume we had someone's account who wasn't obviously connected to the company.
I learned a lot from that experience, and it's also just plain fun to do. We did get some strongly worded letters from Robinhood though, lol. They tried blocking our servers but we just set up this automated system in Digital Ocean that would spin up a new droplet each time we detected a blockage, and they were never able to stop us after that.
Fun times.