Thanks. I'll link it in the first line in the README. I think the interlocking-free part can pack cups like you suggest. They propose a flood fill algorithm which computes all the reachable places for the voxelized shape. It doesn't put assumptions on convexity. I think it would be a great example to try it out on though.
A while back, I implemented a paper that had showed up on HN for a course project (Dense, Interlocking-Free and Scalable Spectral Packing of Generic 3D Objects).
Over the holidays, I cleaned up the implementation (with the help of Claude Code, although this is not an advertisement for it) and released it on GitHub.
If anyone needs fast 3D packing in python, do give this a shot. Hopefully I have attributed all the code/ideas I have used from elsewhere properly (if not, please feel free to let me know).
The problem sounds very interesting and a complex one to solve. Could give examples of use cases where dense 3d packing is needed? (Say, besides literal packing of physical objects in a box? )
> Could give examples of use cases where dense 3d packing is needed? (Say, besides literal packing of physical objects in a box? )
Not an answer, but something interesting on this topic:
In a warehouse/distribution center, a dense packing result can be too time consuming for most consumer products. As density increases, it takes the human longer to find their own solution rapidly that works. You can provide instructions but that is even slower than the human just doing their best via trial and error.
We had to dial back our settings from about a 95% volume consumption percent (initial naive setting) down to about 80% before they could rapidly fill the cartons. Basically it's balancing cost of labor vs capacity of system during peak (conveyor would start backing up) vs shipping costs.
Bin packing can be seen as an optimization problem. In 2D, consider a scenario where you need to cut shapes from a sheet of plywood or sheet metal while minimizing waste; finding the optimal orientation of these shapes reduces material loss. In 3D, you might imagine packing objects into a container or cargo space, or sculpting a collection of 3D shapes out of a known volume of material, where you'd optimize the arrangement and orientation to minimize waste.
This is also true in the US. People pick up enough written English to get by most of the time, but it's often quite broken and clearly a second language. I know hearing impaired people native to the US with substantially worse English than the average European.
Just so you know, "hearing-impaired" implies that a person has a flaw whether a person is born with it (it is natural to them) or impacted in later life (hearing-challenged).
Most non-offensive way to refer to a group of people without perfect hearing is "hard-of-hearing or deaf".
That is correct. We want to translate between English and ISL. English, because it is by and large the language of the Web and I think we should try to connect ISL to it rather than Indian Languages.
From my understanding, they are quite dissimilar. A person who knows ISL will not understand ASL, for example.
Thanks for the feedback. You raise great points and this was the reason why we wrote this post, so that we can hear from people where the actual problem lies.
On a related note, this sort of explains why our model is struggling to fit on 500 hours of our current dataset (even on the training set). Even so, the current state of automatic translation for Indian Sign Language is that, in-the-wild, even individual words cannot be detected very well. We hope that what we are building might at least improve the state-of-the-art there.
> It's more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.
Can you elaborate a bit more on this. Do you think if we make a system for bad/broken transliteration and funnel it through ChatGPT, it might give meaningful results? That is ChatGPT might be able to correct for errors as it is a strong language model.
> Do you think if we make a system for bad/broken transliteration and funnel it through ChatGPT, it might give meaningful results?
No, because ChatGPT's training data has practically no way of knowing what a real sign language looks like, since there's no real written form of any sign language and ChatGPT learned its languages from writing.
Sincerely: I think it's awesome that you're taking something like this on, and even better that you're open to learning about it and correcting flawed assumptions. Others have already noted some holes in your understanding of sign, so I'll also just note that I think a solid brush up on the fundamentals of what language models are and aren't is called for—they're not linguistic fairy dust you can sprinkle on a language problem to make it fly. They're statistical machines that can predict likely results based on their training corpus, which corpus is more or less all the text on the internet.
I'm afraid I'm not in a good position to recommend beginner resources (I learned this stuff in university back before it really took off), but I've heard good things about Andrej Karpathy's YouTube channel.
I think you think it's a magic box. There's not actually such thing as a "strong language model", not in the way you're using the concept.
> We hope that what we are building might at least improve the state-of-the-art there.
Do you have any theoretical arguments for how and why it would improve it? If not, my concern is that you're just sucking the air out of the room. (Research into "throw a large language model at the problem" doesn't tend to produce any insight that could be used by other approaches, and doesn't tend to work, but it does funnel a lot of grant funding into cloud providers' pockets.)
You're mixing up cause and effect. The transformer architecture was invented for machine translation – and it's pretty good at it! (Very far from human-level, but still mostly comprehensible, and a significant improvement over the state-of-the-art at time of first publication.) But we shouldn't treat this as anything more than "special-purpose ML architecture achieves decent results".
The GPT architecture, using transformers to do iterated predictive text, is a modern version of the Markov bot. It's truly awful at translation, when "prompted" to do so. (Perhaps surprisingly so, until you step back, look at the training data, and look at the information flow: the conditional probability of the next token isn't mostly coming from the source text.)
I haven't read that paper yet, but it looks interesting. From the abstract, it looks like one of those perfectly-valid papers that laypeople think is making a stronger claim than it is. This paragraph supports that:
> Note that these models are not intended to accurately capture natural language. Rather, they illustrate how our theory can be used to study the effect of language similarity and complexity on data requirements for UMT.
It’s true that the Transformer architecture was developed for seq2seq MT, but you can get similar performance with Mamba or RWKV or other new non-Transformer architectures. It seems that what is important is having a strong general sequence-learning architecture plus tons of data.
> The GPT architecture, using transformers to do iterated predictive text, is a modern version of the Markov bot.
The Markov nature only matters if the text falls outside the context window.
> Perhaps surprisingly so, until you step back, look at the training data, and look at the information flow: the conditional probability of the next token isn't mostly coming from the source text.
I’m not sure what you’re getting at here. If it’s that you can predict the next token in many cases without looking at the source language, then that’s also true for traditional encoder-decoder architectures, so it’s not a problem unique to prompting. Or are you getting at problems arising from teacher-forcing?
Basically the question was how an LM could possibly help translation, and the answer is that it gives you a strong prior for the decoder. That’s also the basic idea in the theoretical UMT paper: you are trying to find a function from source to target language that produces a sensible distribution as defined by an LM.
Hello everyone, we are trying to make a large dataset for Sign Language translation, inspired by BSL-1K [1]. As part of cleaning our collected videos, we use a nice technique for aggregating heuristic labels [2]. We thought it was interesting enough to share with people on here.
I've been checking about twice a week for the last 6 months, and they are very rare, but it does happen. I caught one on video 2 weeks ago! https://youtu.be/NkNx6tx3nu0?t=744
reply