Hacker Newsnew | past | comments | ask | show | jobs | submit | preinheimer's commentslogin

What about a “right to create act” giving people the right to create things and not have their creation be ingested to train ai for billion dollar companies?

Some sort of pre-emptive auto-opt-AI't.

It's ridiculous that AIco's arguments are dwindling down to "it's not copyright infringement to ingest others' work and make 'derivatives' [which often are identical to original authors' works]."

----

We desperately need younger politicians, who can not only keep up with information more sharply (i.e. aren't legally decades-retireable), but also are of the age where their own children are being affected by government re-funding flows away from youth/education/future.

At this point I'm willing to concede that our future probably has companies' individual LLM/genAI products competing against one-another, as digital politicians ["the digital pimp, hard at work... we have needs"--Matrix' Mouse]. Nobody knows how either flesh nor silicon congressmen work, inside; but I think the latter could act more human[e]ly...


Do you believe that for younger people this question (about derivativeness) is clearly settled? If so, how?

I would hypothesize that younger people are less-inclined to feel guilty about using pirate TV services. I think they're also more invested in the future, and aware of dangerous technology's pre-eminence.

OK, but what about "LLM-generated work is clearly derivative of the training material" ?

> AI moderation is currently a "black box" that prioritizes safety over accuracy to an extreme degree.

I think there's a wide spread in how that's implemented. I would certainly not describe Grok as a tool that's prioritized safety at all.


You say that - and yet it has successfully guarded Elon from any of those pesky truths that might harm his fervently held beliefs. You just forgot to consider that Grok is a tool that prioritizes Elon's emotional safety over all other safeties.


It's bizarre how casually some people hate on Musk. Are people still not over him buying Twitter and firing all the dead weight?

_Especially_ because emotional safety is what Twitter used to be about before they unfucked the moderation.


> Are people still not over him buying Twitter and firing all the dead weight?

You think that's really the issue? Or are you not making a good faith comment yourself?

I cannot remember the last time I saw someone hating on Elon for his Twitter personnel decisions. The vast majority of the time it is the nazi salutes he did on live TV and then secondary to that his inflammatory behavior online (e.g. calling the submarine guy a pedo).


I still pick on it, but I was never a big Twitter user, I just enjoy calling it Xitter. Picking on Elon Musk is for the shitty things he's been doing to our government and the world, and for being a bad person in general.


doesn't he keep having to lobotomize it for lurching to the left every time it gets updated with new facts?


I think it’s important to test these systems. Let some % of candidates who get this wrong through to the next stage and see what happens. Does failing this test actually correlate with being a bad fit later?

If you want to ineffectivly filter out most candidates just auto-reject everything that doesn’t arrive on a timestamp ending in 1.


> Let some % of candidates who get this wrong through

Really, the better test would be to not discriminate on it before you know it's useful, but store their answer to compare later.


You're right. I agree.


_How_ can you be a good hire for a _software engineering_ position, if you can’t get that one correct though?


It depends why they didn't get it "correct" (asked ChatGPT bad, used Python REPL not so bad, used screen reader very not bad) and what "correct" even means for this problem.

There's a bizarro version of this guy who rejects people who do it in their head because they weren't told to not use an interpreter and he values them using the tools available to solve a problem. In his mind, the = is definitely part of the code, you should have double checked.


Oh. I was reading this on a phone, and didn’t realise there’s hidden equal sign (though it’s mentioned).

That does change it. In that I can see how false negatives may arise. Though, when hiring you generally care a lot more about false positives than negatives.


> Let some % of candidates who get this wrong through to the next stage and see what happens.

This isn't a good methodology. To do your validation correctly, you'd want to hire some percentage of candidates who get it wrong and see what happens.

Your way, you're validating whether the test is informative as to passing rate in the next stage of your hiring process, not whether it's informative as to performance on the job.

(Related: the 'stage' model of hiring is a bad idea.)


We've got detailed global ping data here: https://wondernetwork.com/pings

One of our competitors was claiming a server in a middle eastern country we could not find any hosting in. So I figured out what that server's hostname was to do a little digging. It was >1ms away from my server in Germany.


I'm a co-founder at WonderProxy, we didn't make their list (we target people doing application testing, not consumer VPNs).

We're in 100+ countries, and I'll stand by that claim. It's a huge pain in the neck. In our early years we had a lot of problems with suppliers claiming to be in Mexico or South America who were actually just in Texas. I almost flew to Peru with a rackmount server in my luggage after weeks of problems, that plan died when we realized I'd need to figure out how to pay Peruvian income tax on the money I made in country before I could leave.

We've also had customers complaining that a given competitor had a country we'd had trouble sourcing in the Middle East. A little digging on our part and it's less than a ms away from our server in Germany.


I work for IPinfo. I have raised a ticket internally, but I think we focused on consumer VPNs for this test.

For our ProbeNet, we are attempting to reach 150 countries (by ISO 3166's definition). We are at around 530 cities. Server management is not an easy task. We do not ship hardware, but operate using dedicated servers, so this reduces one layer of complexity.

To maintain the authenticity of our server locations, we utilize cross-pings and network traffic behavior detection. If any abnormality is detected, the server will be immediately disabled to prevent polluting our data. There will be a ticket to investigate what went wrong.

We pay for each (excluding 3 to 4 servers where the owner and the team really likes us and insists on sponsoring) server. Expansion is an active effort for us, as there are 70k ASNs and about 100 more countries where we do not have a server.

We hope to partner with more ASNs, particularly residential ISPs and IXPs. So, a lot of effort is put into active outreach through WhatsApp, emails, social media and phone calls. We use a number of different data-based techniques to identify "leads".


Google, Apple, and Meta (maybe others?) have the data to build a complete GeoIP dataset. None of them will share because there are only downsides to doing so.

When FB was rolling out ipv6 in 2012, well meaning engineers proposed releasing a v6 only GeoIP db (at the time, the public dbs were shit). Not surprisingly, it was shot down.


We are always happy to work with large technology enterprises and streaming platforms, not necessarily to sell, but to share insights, data, and practical advice. We observe the entire internet through active measurements, and we are open to co-publishing research when it benefits the broader ecosystem.

Google/GCP is top of mind for me due to a recent engineering ticket. Some of our own infrastructure is hosted on GCP, and Google’s device-based IP geolocation model causes issues for internet users, particularly for IPv6 services.

From what we understand, when a large number of users from a censored country use a specific VPN provider, Google's device-based signals can bias the geolocation of entire IP ranges toward that country. This has direct consequences for accessibility to GCP-hosted services. We have seen cases where providers with German-based data centers were suddenly geolocated to a random country with strict internet censorship policies, purely due to device-based inference rather than network reality. Our focus is firmly on the geolocation of exit-node IPs, backed by network evidence.

https://community.ipinfo.io/t/getting-403-forbidden-when-acc...

We are actively looking to connect with someone at Google/GCP, Azure/Microsoft and others who would be willing to speak with us, or directly with our founder.

Our community consistently asks us to partner more deeply with enterprises because we are in constant contact with end users and network operators. To be honest, we do not even get many questions or issues. We are partners with a large CDN company, and I get one message about a month, which usually involves sharing evidence data and not fixing something.

From a large-scale organization's perspective, IP geolocation should not be treated as an internal project. It is a service. Delivering it properly requires the full range of engineering, sales, support, and personnel available around the clock to engage with users, evaluate evidence, and continuously incorporate feedback.


> From what we understand, when a large number of users from a censored country use a specific VPN provider, Google's device-based signals can bias the geolocation of entire IP ranges toward that country.

Yep, this is a known effect.

How it seems to work is: Google uses Android phones as data harvesting probes. And when it sees that a lot of devices in a given IP range pick up on GPS data, Wi-Fi APs or cell tower IDs that are known to be located in Iran, and possibly other cues like ping to client devices or client device languages, timezones, search request contents, then the system infers "there's a network wormhole there with Iran on the other end", and the entire IP range grows legs and drifts towards Iran.

The owner of those IP addresses can mitigate the issue, mostly by shaping traffic or doing things to Google's system, but I know of no way for anyone else to do it.


They have a correction form but I am not sure if it is super robust: https://support.google.com/websearch/workflow/9308722?hl=en

I talked to someone who bought a /24 from South America to be used in the United States for office use. I asked him to tell everyone to get on WiFi and keep Google Maps running. Apparently, that solved the issue.


Do Cloudflare's floating egress IPs probe in a way where you can easily geolocate them?

https://blog.cloudflare.com/cloudflare-servers-dont-own-ips-...


If it is an anycast IP address, we have hints of all locations. However, because we have to produce a standard IP geolocation product, we can only select one address. So, we choose the address we find from a reliable geofeed and designate the IP address as "anycast" in the API response.

Internally, we have an anycast database. I believe we can also provide all the location hints we see for each anycast IP. It is generally niche data though.


At my previous company we had a subscription to Spur Intelligence. It is like Palantir for IP address info, and probably the closest to what you are talking about.

They recently added GeoIP to their data and in the bit of testing I was able to do before I left it was scary good. I also had an amusing chat with one of their engineers at a conference about how you can spoof IPInfo's location probes...


> how you can spoof IPInfo's location probes...

Interesting. I would love to know how this is possible. Like with Geofeed or something else?


If you're doing latency-based probing, location spoofing is presumably possible to an extent by adding artificial delays and possibly spoofing ICMP "TTL expired" packets like https://github.com/blechschmidt/fakeroute


I am not sure whether this kind of IP spoofing will impact our accuracy because we will likely identify the noise and behavioral anomaly and discard the location hint derived from traceroute.

We have tons of historical traceroute data patterns, and generic traceroute behaviors are likely modeled out internally. So, if you can spoof the traceroute to your IP address, our traceroute-based location hint scoring weight for that IP address will decrease, and we will rely on the other location hints.

You have to be extremely deliberate to misguide us. But I would love to see this in action, though.


Yeah, I doubt there are more than a couple of hosts on the entire internet serving fake traceroutes anyway. Even finding hosts that don't enforce BCP38 requires quite some effort these days.


I don't think it is fair to IPInfo to give the specifics publicly, because once you have the "ah ha" moment you realize it is an entire class of difficult to address problems with how they use their sensor network. That knowledge only helps the bad guys.


We are actively trying to improve our system and build it as figuratively 'antifragile'. We can not afford to get comfortable and we need to constantly find faults in it. If you know anything, you can contact our founder or me directly.

The problem is that everyone knows we are the most accurate data provider and our growth is exponential. To my knowledge, most cybersecurity teams use our data to some degree. We cannot risk having any secrets out there that could disrupt the accuracy of the system. We are aware of several cases where accuracy may be affected, with the most notable being adversarial geofeed submissions.

If the issue is an adversarial geofeed submission, it is a well-known problem. When active measurement fails, we have to fallback to some location hint. There are layers of location hints we have to fall through to ultimately landing on echoing geofeed location hint.

But aside from that... I'm not sure what could possibly impact us. A substantial systemic malicious change in data accuracy seems highly unlikely and quite impossible.


Why do we assume that only "bad guys" would want to bypass internet censorship?


Google's GeoIP is creepy good. I noticed a while ago that for fixed or technically dynamic but rarely actually changing IPs, their IP geolocation eventually converges on the exact street address, presumably due to Google crowdsourcing geolocation from devices with GPS or Wi-Fi geolocation access, which is in turn crowdsourced from devices with both GPS and Wi-Fi.


It's pretty slow to converge though, as it needs enough data points so they cross some certainty threshold. Especially in the context of VPN exit points as the traffic comes from all over the world.


Google's GeoIP is rubbish for me. Often it's hundreds of kilometres off, and varies a lot even for a fixed IP.


As always with big corporations, if the experience is OK for 90% of people but absolutely sucks for 10% of people, then that's totally fine!


I can tell you how we approach enterprise partnerships: absolute accountability. If something is wrong with the data, it is not our customers' fault for trusting us, it is our fault. End users talk to us directly. And because the data is so good these days, we just have to present evidence, that's it.

We with multi-billion-dollar corporations, and for every product integration we maintain an active, visible presence in their user communities.

For example: https://community.cloudflare.com/search?q=ipinfo%20order%3Al...

Customer support teams are encouraged to build support pipelines that either route data-related questions directly to us or send users directly. We remove friction rather than hiding behind layers of enterprise support.

We make a deliberate "account manager for everyone" effort when introducing ourselves to a partner's user community. We engage with influential community members and MVP users and encourage them to contact us directly when issues arise. We also connect with the engineers who work hands-on with our data and make it clear that they have a direct line to our engineering team.

We actively and aggressively monitor social media for reports of issues related to our data within partner platforms and engage with users directly when something comes up.

To be honest, this is not difficult. Once or twice a month, we may need to present evidence to a user to explain our data decision.

This is not a paid add-on or a special clause in an enterprise contract. Our customers do not pay extra for this level of engagement.

Developers hold us in high regard. Maintaining that trust requires ongoing investment of time and resources. We fundamentally believe developers trust us because of the quality of the product and the lengths we go to provide clear, honest explanations when questions arise.


90% of end users, not 90% of your customers. If your product blocks 10% of end users because it provides wrong geolocation data to your customer, sucks to be them!


That is a great point! For us, it is 100% of end users not limited to our customers. If you are impacted by our data in any way, it is on us. We are accountable for that.

https://community.ipinfo.io/t/wrong-geolocation-based-on-ip-...

Our free database is licensed under "CC-BA-SA" (freely distributable but requires attribution) because of accountability. If you use our data as an enterprise or a free open-source project, if there is any issue, you can come to us and talk with us.

It is not even end-users. We maintain open communication policies in general. Even if a streaming service does not use our data, if they come to us, we try our best to help them based on our industry knowledge.


How can somebody who is blocked from (looking at your homepage) Docker Hub or Microsoft know that the reason they are blocked is that you have wrong data on them? How would they know to ask you? If they ask Docker Hub or Microsoft, they'll get funnelled into the "well it works for 90% of people" funnel.

Also the reason most IP information companies don't do this is the obvious risk of false information. I am currently in Somalia via a remote connection via Germany. Actually I'm not, but if I emailed you and said I was, how would you know?


We really don't want to operate our own hardware. The situation in Peru at the time was that there wasn't anyone offering the bandwidth we needed who could actually back up their bandwidth claims. Forget 95th percentile, bandwidth there was straight "you pay for a pipe, we give you that size pipe (but somewhat oversold)". But no one could do more than like 5mbit that was actually more like 3.


Could you use RIPE Atlas and its network of probes, at least to fill in areas where it's difficult to get your own probes?

That way everyone benefits.


We are actually a sponsor of RIPE Atlas and have a bunch of credits.

But I am not sure if we use them extensively. I think, as we own and operate the ProbeNet, much of the data collection efforts can be done through that in a scalable manner.


Ive installed a browser extension to remove them on the desktop.

There should absolutely be a better answer here.


maybe there will be another tier of youtube premium in a few years that removes shorts, and people can try to guilt you for blocking them using browser extensions like they do for ad blocking.


"Your data belongs to you" but we can take any of your data we can find and use it for free for ever, without crediting you, notifying you, or giving you any way of having it removed.


It's owned by you but OpenAi has a "perpetual, irrevocable, royalty-free license" to use the data as they see fit.


We can even download it illegally to train our models on it!


Wow it's almost like privately-managed security is a joke that just turns into de-facto surveillance at-scale.


Love it, I'm still often surprised by how long a hop can be. e.g. I'm looking at one from France to Singapore.

If you're looking to trace to something far away when doing a demo we've got servers in ~280 cities around the world so <random large city>.wonderproxy.com works. e.g. taipei.wonderproxy.com or santiago.wonderproxy.com, berlin, newyork, etc.


Happy you like it!


Speaking as a shareholder: It would be kinda swell if they went public though.


Reminds me of how Yahoo! worked back in the day. All their display logic in PHP, with the hard business logic in c extensions.


if you know more about it, i would glad to hear which extension they did developed for business logic. i used a lot of Yahoo! in back days.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: