Awesome to see this! I actually wrote PySceneDetect, was great to see it getting some use here. Would you be willing to share what parameters you were using? I'm curious why the accuracy was so low.
PySceneDetect only uses basic heuristic methods right now so it does require some degree of tuning to get things working for certain data sets. Your post inspired me to look into maybe integrating TransNetV2 as a detector in the future!
Nice to see you on here! I used the ContentDetector with a threshold of 27.0 and otherwise default parameters. Realize I could have done a grid sweep to really hone in on a good param range, but because I had only one input video labeled I wanted something that would work well enough out of the box. I imagine this dataset is rather... heterogenous.
If you happen to know a better apriori threshold I would be happy to re-run the analysis and update the chart.
If you're willing, could you try using AdaptiveDetector? It should have better defaults for handling fast camera movements a bit better.
The threshold values themselves can be tuned if you generate a statsfile and plot the result, but that can sometimes be tedious if you have a lot of files (thus the huge interest in methods like TransNetV2). Glad to see the real world applications of those in action. You can always just increase/decrease the threshold by 5-10% depending on if you find it's too sensitive or not sensitive enough as well.
Are there keyboard controls? Someone I was just playing with seemed to always lock on each "note" or "half note" exactly, wasn't sure how they were doing that.
Author of PySceneDetect here: Thank you all for the thought-provoking discussions, and the attention you've given my side project. There are some specific cases where PySceneDetect achieves great accuracy (like fast cuts or fades), and some where it's currently not so good at (like sudden flashes or large obstructions). That being said I do want to track these things and come up with solutions to improve the robustness of the content detection algorithm over time.
I'm most open to any feedback or feature requests/ideas/suggestions; feel free to checkout the issue tracker on Github, or create a new entry:
So this is likely beyond the scope of your project, but I've always thought a really good project would be a website to host scene indexes for movies and TV.
Eg. Let's say that you wanted to watch a prerecorded football game or baseball game without all the commercials, timeouts, commentators talking about the fans, etc.
Or... Let's say that you wanted to re-cut a movie in a certain way, by re-ordering the scenes, you could just generate a new scene data file and let the encoder/player use that.
This is still relevant I think :) What you mentioned is very similar to an edit decision list (EDL [1]), of which I only learned recently. I had a feature request [2] to support EDL as an ouptut format, and upon further investigation, it seems like the format is very similar to what you're talking about. The Wikipedia page also indicates that VLC supports xspf files, but I haven't done much research into that yet ("XML Shareable Playlist Format").
Author of the program here. You are correct, the program detects shots rather than scenes, but I didn't want to give the impression that this project was related to the existing ShotDetect program. I felt that the documentation explained this well enough, but I'm open to considering an alternative project name if anyone has a suggestion.
Would you be able to share a small sub-set of the episode, in particular the area where you're unable to detect the starting segment? (If not, no worries!)
There's a few issues with PySceneDetect currently that may lead to false or missed detections, but these are things that I would like to solve in the long run:
- threshold is heuristic/fixed right now, but I would like to change it to an adaptive/statistical method which can dynamically change
- single-frame events can trigger false scene changes
Thanks for your feedback, and feel free to share any other suggestions you might have.
Author of PySceneDetect here. The current implementation does exactly what you hint at, except instead of YUV, it considers deltas in the HSV domain (specifically differences in hue and color).
Other techniques being considered for future work include use of optical flow, background subtraction, and analyzing histograms.
From what I remember the Y (luma) component in a YUV video has more information than the other two components and it could also be extracted without the need to fully decompress the video (in mpeg compressed videos). Of course this info is more than 10 years old (I don't really do any video research any more) so I guess there should have been progress in that area.
This is indeed correct, I'm just using HSV instead of YUV, but the primary source of information is the luma/brightness component (although currently all 3 of the HSV components are averaged, so perhaps a better weighting may improve precision).
This is definitely a good idea, and something that I'm most open to considering for a future release of PySceneDetect. Admittedly the current version does not handle single-frame "upsets" like this, but this does seem like a logical and reasonable approach to a first attempt at filtering them out.
I would do an exponential smoothing of pixel values over some timescale, say, 0.2 seconds, before further detection of scene changes. That should do the trick.
I've seen a proof of concept which combined the output of PySceneDetect with subtitle information and computer vision to allow you to do something like "go to the scene with the big castle" or something similar. I can't remember what it's called off the top of my head, but it seemed like a pretty cool concept.
PySceneDetect only uses basic heuristic methods right now so it does require some degree of tuning to get things working for certain data sets. Your post inspired me to look into maybe integrating TransNetV2 as a detector in the future!