Bloom filters allow you to prune the number of files you even have to look at, which matters a lot when there is a coat associated with scanning the files.
Partitioning the data can be advantageous both for pruning (on its sort keys) or for parallel query fan-out (you independently scan and apply the predicate to the high cardinality column in each partition concurrently).
In the use case that underpins the article, they want to minimize unnecessary access to parquet file data because it lives on a high latency storage system and the compute infrastructure is not meant to be scaled up to match the number of partitions. So they just want an index to help find the data in the high cardinality column
Partitioning also prunes the number of files to be looked at. Only the directory structure of the files (or prefixes of the objects in S3) need to be checked.
partitioning only prunes the files you need to be looked at *if* the predicate includes the column you're partitioning on.
For example, let's imagine you're partitioning on "time" and "region" and the high cardinality column is "container_id". Now imagine you want to query that filters on a particular container_id but is run across all time and all regions. You'd have to scan through the "container_id" chunks of all your parquet files. Indices on your high-cardinality data allows to know which column chunks have data that matches your predicate (and bloom filters will tell you that probabilistically). In the example, without such indices you'd have to scan through all data unless you also have predicates on "time" and "region".
In general, if you can partition your datasets on your predicate column, sorting is likely the best option
For example when you have a predicate like, `where id = 'fdhah-4311-ddsdd-222aa'` sorting on the `id` column will help
However, if you have predicates on multiple different sets of columns, such as another query on `state = 'MA'`, you can't pick an ideal sort order for all of them.
People often partition (sort) on the low cardinality columns first as that tends to improve compression signficantly
Yes. The bloom filters tend to be useful on queries that are doing lookups on specific values for high-cardinality columns. For example, "SELECT * FROM ... WHERE ids IN ('guid1', 'guid2', 'guid3', ...)". You could hash partition on the guid here but its likely that your filter criteria will cover a lot/all of the potential hashes, as opposed to the bloom filter.
I saw this post hours after I finished a task where this would have come in handy. I had a bunch of HQ video clips from a professional Sony video camera that I needed to string together, with short simple fade transitions. The last time I did this(few years ago), I was using a Mac, and iMovie did the job well. This time, I was on a Windows NUC, and figured Windows MovieMaker should do the trick. Nope.
- Opening Movie Maker redirects to the Photos app, with a note that Microsoft Clipchamp has this functionality now and Movie Maker is deprecated.
- Install Clipchamp and see that its hilariously bad at batch-adding clips to the timeline. Adding 300 clips, one at a time, is a dealbreaker.
- Look up reviews for free 3rd party apps to do this on Windows. Find everyone recommending DaVinci Resolve. Fine. Install Resolve. Looks great. Import my clips, and get only audio. A quick Google search tells me that Resolve free version doesn't support importing 10-bit video. Welp.
- Let's try FOSS then. Shotcut is supposedly better than Openshot. Install Shotcut. Import all clips, add to timeline and export. Takes a few hours to export, displays a Success message and gives me the first few seconds of video, followed by a couple of hours of just audio.
- F** it, let's try Openshot. Hesitant because I've heard a lot of crashing happens, but what do I have to lose. Install. Import clips. Add to timeline. Let's me add transitions. Export takes a few hours. Gives me flawless output file.
Moral of the story: For occasional amateur video editing, Openshot is great.
- From what I have heard the Blender video editor for many people is a go to tool as well. In this case it likely would have been overkill, but figured it is worth mentioning.
I used kdenlive recently and it worked very well! Clipchamp charges for 4K export which seriously annoys me given we all know that’s just arbitrary, but kdenlive handled it just fine and was actually faster to use anyways
I completely missed looking up Kdenlive because I (rather stupidly, in hindsight) assumed KDE implied for Linux only.
I'll keep it in mind for the next time.
And yes, Blender would have been overkill, but I might've gone that route if Openshot didn't work out.
KDE actually has a lot of software on Windows (and is about to even put them on the MS/Windows store, if it hasn't already), in particular when it comes to content creation and document stuff - Krita, Okular, Kate, to name a few. And Kdenlive, of course.
> - From what I have heard the Blender video editor for many people is a go to tool as well. In this case it likely would have been overkill, but figured it is worth mentioning.
Blender is great as well if you happen to be a programmer, as everything is also callable as Python functions. The "built-in docs" in form of hovering over buttons and seeing what the equivalent Python code would be, makes it super easy to script together one-off scripts for doing things like "Add 300 video clips with a 100ms fade-in-out between all of them".
I recently went through a large video editing project on Ubuntu. Really wanted to use DaVinci resolve but it had multiple hard crashes and just wouldn’t open for some reason, doesn’t seem like they support Linux as well as Mac and windows.
I ended up using Blender, and while it’s powerful and super useful to be able to link scenes from other files, one huge missing feature is support for videos of varying frame rates. If your videos don’t match, the audio will be either much longer or much shorter than the video clips.
Had a few hard crashes too and lots of bugs, but definitely usable.
Oh gods. The Blender VSE is.... Technically professional-grade, but also isn't even multi-threaded. Switching to it from Kdenlive certainly felt like graduating, especially back when Kdenlive crashed every half hour or so, but honestly just use Kdenlive.
Middle-school-me's still waiting for Lightworks's free Linux release. Are Avidemux, Kino, Cinelerra, PiTiVi still around?
As a daily Blender user...
Never recommend the Blender Video Sequence Editor. Ever. I do everything in-camera just so I don't have to deal with that thing. It is powerful, but it's powerful in the same way plate tectonics is.
I would say for amateur video editing, that DaVinci Resolve free edition is ideal candidate, you can go quite in-depth, or just drag & drop and add some transitions; previously I used imovie on mac too (even build a hackintosh first and bought imovie license $12) but it does not add that small extra that was needed.
once I started working with DaVinci.. game changer, from start to finish, with some advanced motion tracking, title overlays, in less than a few hours. Upside is also that there are plenty of tutorials available for DaVinci, from beginner to advanced
I was looking for this comment. Hard agree. Every time I need some casual compositing tasks, or some transitions, Blender's never done me dirty. Every time I pull my head out of the ground and go hunting for another FOSS compositor, I always end up shrugging, "whelp, nothing really all that better than Blender, and I know Blender, soooooo . . "
EDIT also, since last I used it, the Blender Compositor has gotten a lot better. Dang.
Ehh. The VSE's always been the weakest part of Blender IMO. Not even multithreaded, and IIRC there's some ways it can't integrate with the compositor (requiring intermediate renders), plus time bugs during preview with some types of OGG and MP4 videos. …Still good, but a noticeably frustrating experience compared to a lot of other Blender features.
> EDIT also, since last I used it, the Blender Compositor has gotten a lot better. Dang.
The compositor's always been great. What's changed? It looks pretty much the same as it always did even around the 2.4X days?
Speaking respectfully, does anything look the same as the 2.4 days? That was before the giant UI flush, when they decided, "screw it, make it look like Maya" (I was pretty hostile at first, especially since half my Python scripts stopped working, but I eventually got used to it. I keep an old EXE for the old scripts I really need, but with geo nodes I'm hitting those less and less).
The aesthetic changes are mostly just theming, really. For a long time they even kept shipping a 2.4X-style theme IIRC— It looked old enough to prevent my old high school teacher from getting scared, in maybe the late 2.6X days. Looks like there's still a default "XSI" theme that gets pretty close to that. And up until pretty late you could still make the properties panel horizontal instead of vertical with a RMB menu option (which I can't seem to find in the latest versions), though the layout didn't really work too well in that mode.
It seems to handle whatever junk I throw at it in whatever format it comes in. I guess because it is used to remix all kinds of junk video files that people find around the Net.
Not great for vertical cellphone video (1080x1920). The video will import horizontally and any efforts to set it vertically will either fail, or the exported (vertical 1080x1920) version is horrid.
For simple horizontal edits, it can be quite convenient. But they really need to fix a bug or two, especially considering the amount of footage by phones out there.
Agreed. The first time I used Openshot it was a buggy mess that ignored my output settings when rendering... Within a year or two of that, I gave it another shot, and the glaring issues were fixed. It just did what I needed with no problems.
I think partway through that frustration I'd learn what arguments I need to put into ffmpeg to do that. I may be biased because I've used it for limited editing already though.
Even in tempered music theory, calling a note a sharp or a flat generally depends on the scale.
For eg: Bb major uses:
Bb C D Eb F G A Bb
while A# major uses:
A# B# Cx D# E# Fx Gx A#
(x indicates double-sharp)
Despite all the notes being exactly the same on a tempered instrument.
I have a suggestion: You could optimize the website to be easily readable and navigable on the Kindle's web browser, and recommend it as an option. I've often found it to be the easiest way to get non-store books on my Kindle. I've also noticed that cover images are handled correctly when the ebook is downloaded straight onto the device, with no need for a separate image file.
A hurdle for this though, is that building a good website for the Kindle browser is a pain, as the browser's support for various html/css/js features and standards is all over the place, with no debugging tools available.
I believe our website does have some basic Kindle browser support. The problem, as you noted, is that Kindle's browser is terrible.
I say the same thing in every ebook thread: On a purely technical level Kindle is a terrible ereader designed by people who seem to hate books. Buy almost anything else.
A jailbroken kindle is okay - they make an adequate PDF reader and they can be found easily for less than the alternatives, at least in Britain. I do agree they're somewhat poor when used as intended.
They're also quite a nice embedded ARM Linux machine for a lot less than I could make one or buy one from elsewhere, but I suspect that isn't the core market for a kindle...
Kobo, with either stock OS or KOReader (I use this, in part because the font size can be easily increased for my daughter who so far needs text larger than stock) or Plato.
I no longer have a Kindle to compare, but I'm very happy with build and lighting on my Kobo Libra 2. I've used Kindles since the Kindle 1, and there are some Kobo things that I don't like as well as Kindle, but it's a better-than-decent e-reader and I'm glad to be out from under Amazon's thumb.
I've been happy with the half dozen Kobos in my house.
Weirdly, about half of them have developed a problem after about 5-7 years, whereby they intermittently stop charging. Replacing the battery doesn't seem to fix it. Might be a problem with the soldering of the USB connector to the PCB?
As a bonus they are Linux based, and you can do fun things like replace internal SD cards with bigger ones, login using telnet and install new applications.
Oh that's not the reason. Animation in general is seen as childish by the vast majority here in India, and a lot of people wouldn't even imagine that animation could be NSFW. Most folks wouldn't care less if the child was watching, say, Bojack Horseman (except any blatantly explicit scenes/words of course). Tom and Jerry is an easy reach, since the humor is more slapstick than verbal, and it's been around long enough that parents today have grown up watching it on TV. Most of the NSFW stuff in Spongebob is too subtle for the average uninterested Indian adult.
Basically, children's entertainment = animation = Tom and Jerry, more often than not. Urban 1%er parents tend to default to Peppa Pig.