That's awesome. If I may ask, the data you operate on are still image-like grids...

catillac · on Feb 15, 2022

This is image based, not text based. It’s very useful for a number of applications!

I think your usecase is extremely promising assuming it results in better quality output than just running a modern object detector. Another usecase I don’t have bandwidth for, but would likely be very marketable, is similar to what you’re saying but to allow the use of traditional algos like sift or surf across modalities.