Ah yes. This has to do with the thresholding 'bug' I discovered sometime ago. I will update the algorithm soon. Happens with color images in the preprocessing stage during the conversion of query image to a binary image, especially in images with flat/palette colours.
I eventually came up with a contrived set of heuristics to tackle this problem as you can see in the example below and managed to get more get accurate thresholding more than 90% of the times for pathological cases like these with the right set of weights. --- https://imgur.com/a/XMhdnjH
I spent some time confused by the animated running horse and how it was supposed to be related with the other icons. Until I found out that it was the loading gif.
I believe if it would be both adorable and more easily understandable if it was a smaller image, giving a visual hint that is fulfills the role of an icon, not a featured image
It was the other way for me, I linked it to the duckduckgo icon svg and the horse started. I thought it was a loading animation (and took more than a few seconds) so I threw it to another monitor and continued reading HN.
...30 minutes later the horse is still running and I'm like 'wtf? what does a horse have to do with the DDG logo?' close tab. read comments...
It turns out the app doesn't handle svg (it is actually in the to do list) and returned a 500, but the failure was never presented to the user.
Ouch. Sorry to have put you through that :/ I will fix it asap. And I agree with the parent too. Maybe some text below the horse that says fetching results... would make things clearer.
I scraped it from their website and then asked for their permission by sharing this link with them. They appreciated that I linked all the icons to their website and gave their consent to make this public.
The MPEG-7 dataset[0] is what most researchers use to benchmark shape similarity algorithms. There are couple of other datasets that I used that I can't recollect now. These datasets are relatively simple with a single shape as opposed to logos, icons that comprise multiple elements in different configurations.
I would test on the MPEG-7 dataset to begin with and once the precision and recall values are good enough go ahead with testing on logos and icons. I must've manually tested the algorithm more than a 100,000 times probably because that was the only way to do with untagged datasets. Quite tedious indeed. This version gives out pretty decent results about 7-8 out of 10 times I'd say.
Traditional algorithms and image vectors. I used a conconction of existing region and contour based techniques and threw in some original ideas as well.
I suspect it has to do with the lower resolution. I'm using nearest neighbors interpolation for resizing images and have noticed similar behaviour before. Would be great if you can try with higher resolution versions(preferably > 200px) of the same images and let me know the results.
A closer inspection of the results actually shows some of the results aren't that bad a match. Results ordered 1, 4, 5, 7 and 7 in particular vaguely have the same outline as that of the query image. If I have to score this result, I wouldn't give it more than a 3 out of 10 for sure.
I just realised the "download" icons aren't meant to be "similar icons"... they allow you to download the one above. Doh!
I've re-tried the "remove user" one but uploaded an SVG instead of a PNG (so technically the resolution is unlimited). Uploaded it both circled and not circled.
:) Please feel free to share the SVGs. I will convert them to PNGs and test them out. I will add SVG support real soon. Right now I've put an exception handler that passes an empty array as query if an image format that can't be decoded is thrown at it :|
Sure - is your email yantrams@linkdot.link? If you don't want to post your email publicly, can you email amy@amyboyd.co.uk and I'll reply with both SVGs + PNGs.
I actually uploaded SVGs so I think you might already (unintentionally) support SVGs?
Thanks a ton for testing it quite exhaustively! Would really appreciate it if you can share the second image.
I spent a lot of time working on a hack for 'normalizing' white on black and black on white backgrounds and also between choosing adaptive vs gaussian thresholds dynamically during preprocessing.
Here is an example with white on black and black on white variations of Nike logo that works as intended.
If you are referring to entering those words in the searchbox, yes I should've put in some warnings/checks there to enter a valid image URL. Will fix it soon. And yes I should make the site secure too. Thanks for letting me know.
PS: You can explore company logos here http://compute.vision/brands/index.html . It's implemented using an older iteration of the algorithm and performance isn't that great compared to the one used with the icons database.
Glad you liked it and wow are you sure you are not confusing me with someone else ? :) I have a suspicion you are mistaking with Anil Battula from http://sovietbooksintelugu.blogspot.com/ maybe.
Speaking of Telugu, I recently got hold of a treasure trove(about 700GB) of scanned copies of Telugu magazines and newspapers some of them as old as 1880! Gonna upload them on archive.org very soon.