Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From skimming the article, it seems that this is a munging of the terms in directions that just aren't meaningful.

I've had the following view from the beginning:

- Batches are groups of data with a finite size, delivered at whatever interval you desire (this can be seconds, minutes, hours, days, or years between batches).

- Streaming is when you deliver the data "live", meaning immediately upon generation of that data. There is no defined start or end. There is no buffering or grouping at the transmitter of that data. It's constant. What you do with that data after you receive it (buffering, batching it up, ...) is irrelevant.

JMHO.



The lines blur though when you start keeping state between batches, and a lot of batch processing ends up requiring that (joins, deduplication, etc).


No, it really doesn't. The definition of "streaming", to me, can be boiled down to "you send individual data as soon as it's available, without collecting into groups."

Batching is, by definition, the gathering of data records into a collection before you send it. Streaming does not do that, which is the entire point. What happens after transmission occurs, on reception, is entirely irrelevant to whether the data transfer mode is "streaming."


Most streaming does some batching. If you stream audio from a live source, you batch at least into "frames", and you batch into network packets. On top of that you might batch further depending on your requirements, yet I would still count most of it as "streaming".


Only if you ignore that streaming streams data in records. The creation of a record (or struct, or whatever term you want to use) is not "batching". Otherwise any 32-bit word is a nothing more than a batch of four bytes, and the entire distinction instantly becomes meaningless.

An audio stream can easily be defined as a series of records, where each record is a sample spanning N seconds, probably as provided by the hardware. Similarly, a video frame can also be considered a record. As soon as a record becomes available, it is sent. Thus, streaming.

Optimizing to fully utilize network frames can generally be considered a low level transport optimization, and thus not relevant to the discussion.


Isn’t that pretty much exactly what the OP is saying? He just calls it ”push” and ”pull” instead. Different words, same concepts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: