Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rust Malware Staged on Crates.io (phylum.io)
93 points by inferiorhuman on Aug 25, 2023 | hide | past | favorite | 58 comments


Happy to see this on HN! I'm one of the co-founders @ Phylum.

We actively monitor and report on malware and software supply chain attacks across multiple ecosystems. Most notably, we were the first to identify and report on attacks carried out by North Korean state actors in NPM [1]. With our fairly recent addition of Crates.io support, we've begun monitoring and reporting on campaigns in the Rust ecosystem. In doing so, we identified what appeared to be staging of a malware campaign and were able to report it to Crates.io before it got too far along.

We're also in the process of releasing a beta `cargo` extension that will transparently query our API for information about a package, before it is permitted to install. This is available in our open-source CLI [2]. Prefixing `cargo` with `phylum` will perform this check before the build occurs:

    phylum cargo build
In addition to this, we've also developed and released an open-source sandbox [3] that provides facilities for limiting access to disk, network, and environment variables. This is baked into our `npm`, `yarn`, and `pip` CLI extensions; we're working on adding it to more.

Would greatly appreciate any feedback on our Cargo extension and suggestions for improving our sandbox!

Happy to answer any questions about software supply chain attacks or security in general!

Stay tuned, we're tracking another fairly complicated supply chain attack.

1. https://blog.phylum.io/junes-sophisticated-npm-attack-attrib...

2. https://github.com/phylum-dev/cli

3. https://github.com/phylum-dev/birdcage


How does Phylum detect these attacks?


Sorry, I just saw this! We actively monitor each open source repository and as packages are published, we pull them down and analyze each line of code and any associated metadata. We also pull as much information as we can get from VCS platforms like Github and Gitlab. We then run some heuristics, analytics, and ML models over this data to make a determination of whether or not something is malicious. I should stress that this process is fully automated, it's just not tenable to do this work at this scale manually. Today we process about 2-3M files each day, across 30-50k packages. It's pretty crazy how many attacks are going on every single day.


Is any of this stuff open source?


I'm one of the crates.io team members, and we're very grateful to Phylum for doing this analysis and alerting us!

As a volunteer member, I'm also very thankful to the Rust Foundation for funding and hiring Walter Pearce, Adam Harvey, and Tobias Bieniek to work on security and crates.io (in varying proportions). They've helped lower our response time to incidents like this and made proactive improvements.

Regardless of any improvements they have or will make, there's always the possibility of malware getting through defenses. Reports are important to us, taken seriously, and handled as promptly as possible. More details here: https://www.rust-lang.org/policies/security


Response time was one of the best we've experienced at Phylum. It's obvious you guys are putting in a ton of work over there. Please let me know if there's anything we can help out with!


I'll note that crates.io accounts are backed by GitHub accounts. Have the authors of these crates been reported to GitHub (at least to notify them in the case of hijacking, banning in the case of nefarious deeds)?


Yes, we (Phylum) work closely with Github and reported this account to them.


What's with the `?ref=blog.phylum.io` parameters on all the links?

That they are using this tracking mechanism on all links is surprising enough (I doubt most of the open-source projects linked in the blog are tracking those), they are not even using standard parameters e.g. `utm_source`.


It's some stupid blog setting. I just disabled it. Thanks for the heads up!


If we had namespaces that were SSL-signed and DNS piggy-backing this wouldn't be a problem, right?

I recall that when I published to Maven Central, they needed to verify ownership of the package namespace via reverse DNS, i.e. com.ad13f9c.blah package comes from someone controlling ad13f9c.com effectively. It was not complete, of course, because:

1. I don't think the package signature is checked on download, only on upload

2. Ownership of the domain originally and continued ownership of the GPG key will keep you in the running forever

But if we had namespaces with a signature signed by an SSL key and then we could lookup against DNS and then query for the SSL cert, then we're good, right? The Rust ecosystem is hard to use on a plane already so surely that cannot be the constraint.


The article is about typo-squatting, there is nothing preventing someone from squatting postgress.org. Also domain registration expires, and allowing someone to take over leftpad.net because the previous volunteer maintainer didn't pay a yearly fee is an anti-feature.

I'm only talking about domain ownership here because I don't understand how this scheme could benefit from TLS certificates. Anybody can get a certificate if they control the domain anyway.


1) Typosquatting DNS adds cost (it's harder to spam the namespace), and it forces the attacker to put down a credit card.

2) Go-style import URLs prevent you from typosquatting a large part of the path. On `example.com/hrd2spll` you can't typosquat the latter part (which would be easier to attack), only the hostname.


> 1) Typosquatting DNS adds cost (it's harder to spam the namespace), and it forces the attacker to put down a credit card.

It also raises the bar of who will publish things. I have a few random projects on github that some people seem to use. I wouldn't pay for a domain name to publish those.


github.com/vladvasiliu/foo works fine, if that's what you want.


How would signing malware prevent that malware from being distributed? I feel like there’s something implicit you haven’t stated that I’m missing.


Ah you're right. I was imagining that folks would get their certs from an out-of-band flow. Welp, there's no reason to expect that. This is why I don't do security, I suppose. Whoops.


This would be very operationally complex (publishing a package means you now have to maintain a website and a TLS cert) for relatively little immediate gain (as an end user, I'm still blindly trusting `coolcrates.biz`).


[flagged]


This is a pretty low-value comment: attackers have been hosting malware on public indices since day 1; the more interesting observation here is that crates.io is now "important" enough to get their attention for typosquatting.


Rust proc macros scare me.

They can literally do anything. They can access anything the user has access to.


People like to key on proc macros specifically as scary, but this is basically true of any dependency you take. In fact, run time dependencies will often have _higher_ levels of privilege (e.g. access to production data sets). If you can't trust the dependency at build time, you sure as hell shouldn't trust it at run time.

I think the right thing happened here - the community audited for malicious crates, and action was taken to remove them. I do wish crates.io would be more aggressive about proactively removing instances of typo-squatting, though.


> In fact, run time dependencies will often have _higher_ levels of privilege (e.g. access to production data sets). If you can't trust the dependency at build time, you sure as hell shouldn't trust it at run time.

The threat model with build-time exploitation and run-time exploitation is different, but in the general case, I'd rate build-time higher than run-time, because at build time you have access to the entire developer machine, including the ability to have run-time impacts on every projets the dev has access to (including ones that are more critical than the first one to be corrupted) or allows to impersonate the developer themselves in social engineering attacks against the company's management for instance.


`cargo test` and `cargo run` are just as scary as proc-macros and build.rs files. The only difference is that these two last ones can be executed automatically by rust-analyzer when opening a project, but that should be something to be tackled by the UI (VSCode should ask on first load "these are the set of proc-macros/build files that will be executed, do you want to proceed?").


Makefiles too. And any npm or pypi package. And also maven. Those things run arbitrary code that can access your ssh keys and upload malware to Github under your name (among other things). This has happened before.

Securing just proc macros doesn't really make builds secure because there is build.rs too, and other build tools besides Cargo.

Truth is, if you want to protect yourself, you need to build in a VM, AND run the binary in a VM. Regardless of the language, that's not specific to Rust.


> Truth is, if you want to protect yourself, you need to build in a VM, AND run the binary in a VM. Regardless of the language, that's not specific to Rust.

I have been in the practice of running build scripts/package installers in a container for projects that I don't explicitly trust for a while now. I wish more people saw the value in doing so.

I used to build everything in my homedir as well until about 6-7 years ago when there was a big outbreak of malware scripts on NPM. Quickly realized the folly of running so much untrusted code with the same privileges that I use to do everything short of administrating my machine.


This protects your machine but not necessarily the resulting binary code. Nothing prevents the malware from patching the source before building and removing the patch before the end, resulting in a binary and a package that contain code that doesn't match your trusted sources.

As a general rule, it must always be possible to build without network access, otherwise you're having a big trust problem. And once you know you're having all your dependencies, only then you can think about sandboxing all this stuff after having carefully studied it.


In many (but not all) Linux package distributions, malware is filtered out by maintainers. In other package distributions, maintainers are omitted, because «it's too hard». Instead, everybody protects it's own system alone.

Even if your ssh-key will not be stolen when project is built, it still can happen when malware will be shipped to your client. How you protect your client?


Please remember that by default running in a container (docker) is not secure. Unfortunately you need to invest some time and brainpower to improve security.


For what reasons specifically? Namespaces not being good enough (if the malware is more complex and tries to exploit the kernel)?


> Regardless of the language

Haskell has Safe Haskell, where there is no way (at compile-time or at runtime) to "sneakily" violate user expectations. In particular, any compile-time IO is disabled and any runtime IO must be clearly marked.


This is not true for Maven and overall the general Java ecosystem. Your own build may have arbitrary logic, true (this holds for Maven, Gradle, SBT - virtually any build tool for JVM), but the artifacts which are distributed to consumers of libraries never have anything which gets automatically executed by the build process of the consumer. build.rs and equivalents, however, are a part of the package itself and are executed when the package is “depended on”.


> but the artifacts which are distributed to consumers of libraries never have anything which gets automatically executed by the build process of the consumer

What if I build everything from source, including all libraries? (That is, do not consume any .jar files but build them myself). I would surely need to run the build step of every library right?


And you must never again attach that VM/container to the network/internet if it had access to sensitive information at the time of running the untrusted code.


What about Guix ?


It looks like you can sandbox builds with guix, and this appears to be the default. But I can only find a single configuration line here https://guix.gnu.org/manual/en/html_node/Miscellaneous-Servi... (ctrl+f sandbox)

It's based off Nix sandboxing https://nixos.wiki/wiki/Nix_package_manager#Sandboxing

> When sandbox builds are enabled, Nix will setup an isolated environment for each build process. It is used to remove further hidden dependencies set by the build environment to improve reproducibility. This includes access to the network during the build outside of fetch* functions and files outside the Nix store. Depending on the operating system access to other resources are blocked as well (ex. inter process communication is isolated on Linux); see nix.conf section in the Nix manual for details.

> Sandboxing is enabled by default on Linux, and disabled by default on macOS. In pull requests for Nixpkgs people are asked to test builds with sandboxing enabled (see Tested using sandboxing in the pull request template) because in official Hydra builds sandboxing is also used.

> To configure Nix for sandboxing, set sandbox = true in /etc/nix/nix.conf; to configure NixOS for sandboxing set nix.useSandbox = true; in configuration.nix. The nix.useSandbox option is true by default since NixOS 17.09.

This appears to use namespaces etc (basically containers) rather than a VM but I think it may be secure. Their goal is to aid reproducibility but if the network isolation actually works, then at least the build will be secure.

Note that an infected source may either run malware during build, or embed malware in the compiled binary (or both). When running the binary you're not protected at all by this sandboxing, unless you use something like Qubes (which is quite heavyweight)


Guix builds are sandboxed per package (I'm pretty sure it cannot be turned off at all). The Guix build containers don't have network access.

Guix package definitions include a cryptographic hash of the source, don't autoupdate and have people review when there is an update.

The Guix package definition includes what dependent packages this package needs. These dependencies will be built first and the result made available for the Guix container of the final package build. Nothing else is available in there.


Proc macros can do anything, but this is not meaningfully different from mechanisms in most packaging ecosystems: Python runs arbitrary code (via `setup.py` or a potential in-tree build system) when you install a source distribution, Ruby evaluates the Gemspec's source code, etc.

Even Rust has plenty of avenues for arbitrary install-time code execution: `build.rs` runs as arbitrary Rust code without a sandbox, regardless of any proc macros.

(This is not to say that any of this is good! But it's worth having a full picture of the status quo.)


I think Matt Godbolt said he found it impossible to fix all the ACE vulnerabilities on godbolt.com (before code execution was a feature), and that was only GCC command line flags, not even makefiles.


It would be nice to see some sort of sandboxing for these. I would love it if cargo had a mechanism to run all build actions for a crate inside a firecracker vm.

That said, most build systems can access everything the user has access to, so it's not really a regression from the industry standard.

And Rust's typesystem on its own isn't enough to protect against malicious crates, even if you ban all unsafe code, procedural macros, build.rs, and filesystem/network APIs. Several soundness issues were found in previous versions of the standard library that allow a malicious crate to take advantage of undefined behavior to run arbitrary machine code. It seems likely to me that there are similar unknown issues lurking in today's standard library or compiler.

If you care about security, you have to audit your dependencies very carefully, no matter what language you are using.


> Several soundness issues were found in previous versions of the standard library that allow a malicious crate to take advantage of undefined behavior to run arbitrary machine code. It seems likely to me that there are similar unknown issues lurking in today's standard library or compiler.

Even without any soundness issues or bugs or security flaws, people can just plain write malicious code. Rust prevents a lot of common _mistakes_, but people can also use it to write completely "safe" worms that are guaranteed not to have use-after-free errors, data races and unexpected panics.

I don't think there's any way to get around the social aspects of software security for most developers ("trusting _people_ not code")


And yet hardly anyone ever makes this argument against sandboxing the code we download on demand every single day while browsing the internet.

I have started to believe we should think of open-source software as not so different, and sometimes even as less trustworthy in certain ways since we typically don’t sandbox it.

How many of your tools have telemetry? How many of your drivers do? How many download and execute stuff on the fly?


That's completely different, though. JavaScript is sandboxed (and a lot of effort goes into ensuring that), while build-time macros (and runtime code) runs on the machine in an unrestricted manner. Not sure what your point is.


> Even without any soundness issues or bugs or security flaws, people can just plain write malicious code. Rust prevents a lot of common _mistakes_, but people can also use it to write completely "safe" worms that are guaranteed not to have use-after-free errors, data races and unexpected panics.

The idea here is that if the "evil crate" is prevented by the compiler to call out into any os functionality or use unsafe, the worst it can do is return bogus values, mutate anything it has a &mut reference to, and leak heap memory. And if the toolchain were perfect, that might be achievable.

This was the promise that Java Applets made, which wasn't achievable due to the JVM being too bug-prone.


We're actively working on this with our sandbox (https://github.com/phylum-dev/birdcage). We've wrapped the likes of pip, yarn, and npm already and are making moves to similarly provide support for cargo.

Currently comes as part of the Phylum CLI (https://github.com/phylum-dev/cli), so that doing something like:

    phylum npm install <somePkg>
Will reach out to the Phylum API to ask what we know about it (e.g., does the source have characteristics congruent with malware?), if that passes it'll then install the package from within the confines of the sandbox with limited disk, network and env access (as defined by allowed resources in the TOML file).


> If you care about security, you have to audit your dependencies very carefully

The problem is Cargo is yet another iteration of the npm-ization (maven-ization?) of code. It requires a BDFL or a team of such BDFLs to proactively police the repos in order to prevent trivial supply chain attacks. Your statement misses the point that not only do people NOT do this it's now trivial to avoid it with Cargo.

This is not the same security risk that is present if you use git submodules and build with a makefile. The key difference is ease of use. Cargo, like NPM and to some extent Maven, are so easy to use you've accidentally created a massive attack surface.

The result will likely be yet another incarnation of JFrog or the like. Code will be audited, built into dylibs, and then pinned via some enterprise supply chain manager. This is a big lift for companies who might otherwise switch to Rust for new projects quicker.

Cargo is my main gripe with Rust almost all the time. So many people say "you can just build manually with rustc!" but this is not true. It's not as simple. Again, it would also be one thing if Cargo was just the package manager but it's also the build management system, test framework, etc. It's the ecosystem. It not only introduces supply chain attacks. It also introduces other attacks directly from the owners such as vengeful removal of packages, changes in CoC that ban certain libraries, etc. I don't like it. Of all package managers, it makes me feel the worst.


The issue is that if your language doesn’t NPM-ize your packages, developer write you off as not modern enough.

The simple fact is that this seems to be what most developers want, AND this is not limited to JavaScript developers.


The ease of use outweighs the security concerns for most reasonable people.


> I would love it if cargo had a mechanism to run all build actions for a crate inside a firecracker vm

Why not go in the other direction? A tool that can set up a VM with source code and build tools based on a declarative file? That could work for any language - not just Rust. Sort of like a completely local CI system.


You're describing Vagrant.


Ah! You're right. I wonder if vagrant is enough to mitigate these concerns.


Who will verify VM, your local CI, and produced binary?


The idea of the microVM is to isolate the build system from the host. Your question is beyond the scope of the original problem.


Why would macros need sandboxing if the crate build doesn't have any? What is your threat model?

Without proc macros, running `cargo build` or `cargo check` can already do anything via `build.rs` scripts. Without that, you still run arbitrary code from crates.io if you do `cargo run` or `cargo test`.


If you're not sandboxing, you're asking to get pwned.


Java's annotation processor can do anything. Gradle plugins can do anything. Maven plugins can do anything. Webpack scripts can do anything. C/C++ macros can do anything.

Turing complete things can do turing complete things.


Build scripts are a lot scarier than proc macros to me. Those get run if you simply add the package as a dependency and build the project, even if you haven't used them in your library yet at all. And nearly every ecosystem has build scripts that can run entirely arbitrary code.


You should consider doing your development in an isolated VM using something like Qubes. It has a lot of interesting tooling that makes provisioning and communicating between VMs easier without sacrificing isolation.


So can make files and python scripts in your build.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: