The author's approach is really good, and he hits on pretty much all the problems that arise from more naive approaches. In particular, using a perceptual colorspace, and how the most representative colour may not be the one that appears the most.
However, image processing makes my neck tingle because there are a lot of footguns. PNG bombs, anyone? I feel like any library needs to either be defensively programmed or explicit in its documentation.
The README says "Finding main colors of a reasonably sized image takes about 100ms" -- that's way too slow. I bet the operation takes a few hundred MB of RAM too.
For anyone that uses this, scale down your images substantially first, or only sample every N pixels. Avoid loading the whole thing into memory if possible, unless this handled serially by a job queue of some sort.
You can operate this kind of algorithm much faster and with less RAM usage on a small thumbnail than you would on a large input image. This makes performance concerns less of an issue. And prevents a whole class of OOM DoS vulnerabilities!
As a defensive step, I'd add something like this https://github.com/iamcalledrob/saferimg/blob/master/asset/p... to your test suite and see what happens.
> Okmain downsamples the image by a power of two until the total number of pixels is below 250,000.
That being said, this is sampling the fixed-size input buffer for the purposes of determining the right colour. You still have to load the bitmap into memory, with all the associated footguns that arise there. The library just isn't making it worse :) I suppose you could memmap it.
Makes me wonder if the sub-sampling is actually a bit of a red herring, as ideally you'd want to be operating on a small input buffer anyway. Or some sort of interface on top of the raw pixel data, so you can load what's needed on-demand.
I think if you were going to "downsample" for the purpose of creating a color set you could just scan through the picture and randomly select 10% (or whatever) of the pixels and apply k-means to that and not do any averaging which costs resources and makes your colors muddy.
I should probably add this nuance to the post itself.
Edit: added a footnote
[0]: https://dgroshev.com/blog/okmain/img/distance_mask.png?hash=...
EDIT: then (when url refreshed) triggered a redir loop culminating in a different error ("problem occurred repeatedly")...
ah, ofc, your intent was to demonstrate a problematic asset.
Got any to share? A self-contained command-line tool to get a good palette from an image is something I’d have a use for.
https://www.fmwconcepts.com/imagemagick/dominantcolor/index....
As for loading into memory at once: I suppose I could integrate with something like libvips and stream strips out of the decoded image without holding the entire bitmap, but that'd require substantially more glue and complexity. The current approach works fine for extracting dominant colours once to save in a database.
You're right that pre-resizing the images makes everything faster, but keep in mind that k-means still requires a pretty nontrivial amount of computation.
This is a render from Second Life, in which all the texture images were shrunk down to one pixel, the lowest possible level of detail, producing a monocolor image. For distant objects, or for objects where the texture is still coming in from the net, there needs to be some default color. The existing system used grey for everything. I tried using an average of all the pixels, and, as the original poster points out, the result looks murky.[1] This new approach has real promise for big-world rendering.
[1] https://media.invisioncic.com/Mseclife/monthly_2023_05/monoc...
The SpotifyPlus HA integration [2] was near at hand and does a reasonably good job clustering with a version of ColorThief [3] under the hood. It has the same two problems you started with though: muddying when there's lots of gradation, even within a cluster; and no semantic understanding when the cover has something resembling a frame. A bit swapped from okmain's goal, but I can invert with the best of them and will give it a shot next time I fiddle. Thanks for posting!
[1] https://gist.github.com/kristjan/b305b83b0eb4455ee8455be108a... [2] https://github.com/thlucas1/homeassistantcomponent_spotifypl... [3] https://github.com/thlucas1/SpotifyWebApiPython/blob/master/...
[1] https://engineering.fb.com/2015/08/06/android/the-technology...
So, making a library that provides an alternative is a great service to the world, haha.
An additional feature that might be nice: the most prominent colors seem like they might be a bad pick in some cases, if you want the important part of the image to stand out. Maybe a color that is the close (in the color space) to the edges of your image, but far away (in the color space) from the center of your image could be interesting?
convert $IMG -colors 5 -depth 8 -format "%c" histogram:info: | sort -nr
If needed you can easily remove colored borders first (trim subcommand with fuzz option) or sample only xy% from the image's center, or where the main subject might be.A rust CLI would make a lot of sense here. Single binary.
> uvx --with pillow --with okmain python -c "from PIL import Image; import okmain; print(okmain.colors(Image.open('bluemarble.jpg')))"
[RGB(r=79, g=87, b=120), RGB(r=27, g=33, b=66), RGB(r=152, g=155, b=175), RGB(r=0, g=0, b=0)]
It would make sense to add an entrypoint in the pyproject.toml so you can use uvx okmain directly.That’s like having someone looking at a display of ice cream in a supermarket saying “I’d be interested in trying a few samples before committing” and then getting a reply like “here are the recipes for all the ice creams, you can try to make them at home and taste them for yourself”.
I know I could theoretically spend my weekend working on a CLI tool for this or making ice cream. Every developer knows that, there’s no reason to point that out except snark. But you know who might do it even faster and better and perhaps even enjoy it? The author.
Look, the maintainer owes me nothing. I owe them nothing. This project has been shared to HN by the author and I’m making a simple, sensible, and sensical suggestion for something which I would like to see and believe would be an improvement overall, and I explained why. The author is free to agree or disagree, reply or ignore. Every one of those options is fine.
How is it "simple"? There are like a ton of different downscaling algorithms and each of them might produce a different result.
Cool article otherwise.
The blurred mirror is inoffensive to almost everyone, and yet it always strikes me as gauche. Easy to ignore and yet I feel that it adds a lot of useless visual noise.
https://github.com/si14/okmain/blob/main/test_images/IMG_134...
https://github.com/si14/okmain/blob/main/test_images/pendant...
https://github.com/si14/okmain/blob/main/test_images/pendant...
https://github.com/si14/okmain/blob/main/test_images/red_moo...
https://github.com/si14/okmain/blob/main/test_images/supremu...
For every heuristic, I can think of an image that breaks it. On the other hand, I just wanted to do better than the 1x1 trick, and I think the library clears that bar.