Scala Vector operations aren't “Effectively Constant” time

kbenson · on Aug 9, 2017

I'm a little disappointed that I bothered to read what turned out to be a massive act of pedantry.

A few sections are devoted to showing that O(log32(n)) is the same as O(log(n)). This is entirely correct, as shown, because the difference between them is reducible to a constant multiplier, which big-O notation does not care about. It's also, from the problematic sources given, and entirely made up argument, given that I didn't find O(log32(n)) mentioned in any of them that I was able to view. Plenty of notes in them about it being logarithmic though...

I understand making sure people know that it's not constant time, but when all the examples of problematic descriptions you give make sure to say that it is logarithmic time and then go on to explain why it's "effectively constant", you're railing against a non-issue with an absurdly complicated argument.

dom0 · on Aug 9, 2017

> you're railing against a non-issue with an absurdly complicated argument.

It's not really the correct kind of argument. Landau notation is for analyzing asymptotic behaviour of functions; this makes them fit as sets of functions for many purposes in theoretical compsi, but it's simply the wrong approach for practical, "in the field" comparison of implemented algorithms. It only has some merit there to state... well, asymptotic behaviour, like the complexity of a hash-table lookup or deleting an arbitrary element of a vector.

KirinDave · on Aug 9, 2017

Words may be relative, but that's why we set up systametized math notation to appeal to when we need to start characterizing the precise nature of algorithms and systems.

To argue that this is act of pedantry is to argue that any form of canoncial notation to describe the essential nature of algorithms is itself pedantry. It's to suggest bigO notation itself is pedantry and by extension so is it's defense.

In the face of this, I'd say, "If you don't like it, don't use it. But many people do find it useful and it's corrosive in the extreme to demand the right to redefine it for what amounts to marketing purposes." Scala's got a bunch of actors in it's community that do this; and it's frustrating when you're trying to evaluate the ecosystem.

The asymptotic boundaries of algorithms ARE theoretical tools that can be misleading. For example, how many people incorrectly suggest that quicksort is asymptotically faster than merge sort? It's not.

But the response to this should not be to vaguely redefine the terms. The response to this should be to quantify the implementation's docs with real metrics.

kbenson · on Aug 9, 2017

> In the face of this, I'd say, "If you don't like it, don't use it. But many people do find it useful and it's corrosive in the extreme to demand the right to redefine it for what amounts to marketing purposes."

But as I pointed out, in all the examples provided in the article as evidence of this behavior, not only did they not mention O(log32(n)), they often (possibly always in the links presented) did specifically call out running time as logarithmic prior to mentioning what that meant in practice.

> But the response to this should not be to vaguely redefine the terms. The response to this should be to quantify the implementation's docs with real metrics.

This is not a case of vague wording being used, it's a case of exact wording being used and additional context being provided.

Edit: I noticed some of the examples weren't as upfront about the time complexity, I was mistakenly seeing what was presented for Lists as applying to Vectors. In lieu of that, I agree that they could, and should, be more up-front about the theoretical time and the practical time if they plan to continue using "effectively constant" in documentation.

KirinDave · on Aug 9, 2017

I've seen numerous examples of "essentially constant time" on slide decks, presentations and talks. But I'd argue even if you did do this after a prior disclaimer, it's still somewhat misleading. It's adopting language that sounds very similar and creating greater confusion.

> This is not a case of vague wording being used, it's a case of exact wording being used and additional context being provided.

And yet the confusion persists. And I can point to enthusiastic jr. engineers who don't fully grasp the difference, and that were somewhat surprised when I corrected them (the equivalence of log32 and log2 should be enough but...)

Similar stories abound from multiple actors in the world of Scala. Another great example is the (now infamous) "non-blocking backpressure." Same principle, taking well-understood industry terms (some of which even have precise definitions lifted from more rigorous disciplines) and wrangling them into something almost unrecognizable.

kevinwang · on Aug 9, 2017

Effectively constant seems like a reasonable thing to say if the real world number of operations is upper bounded by six. I don't interpret that claim as making any claim about the theoretical asymptotic complexity of an algorithm.

Of course, as the author notes near the end, this also means that any real world algorithm running on a real world data type which is bounded in size can also be considered "effectively constant", which does throw doubt onto the effectiveness of the term.

I guess in the real world we should use graphs in units of time instead of classes of functions to discuss performance based on input size, since that's what we care about.

Also interesting to note is that"effectively constant", although extremely hand-wavy and not rigorous, is used even by computer scientists to denote extremely slow growing functions. A professor once used similar words to describe the complexity of the inverse Ackerman function: https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse

jerf · on Aug 9, 2017

"Of course, as the author notes near the end, this also means that any real world algorithm running on a real world data type which is bounded in size can also be considered "effectively constant", which does throw doubt onto the effectiveness of the term."

They're not all effectively constant though. I can say that effectively all web pages render within 100 seconds and thus web page rendering is O(1), but it's not true because "between visually instantaneous and 100 seconds" is not effectively constant.

In another example, 1 nanosecond and 1 millisecond, separated by six orders of magnitude, are both effectively instant to a human... at least, as long as it's only happening once. Start looping on that and the differences rapidly become, ah, effective.

The root problem here is that pragmatic measures don't fit in math and math doesn't fit in pragmatic measures, and you need to at least be careful not to equivocate back and forth. But there's nothing wrong with either one; you just need to know where you stand. You can't help it anyhow; we are not capable of dealing with the universe with mathematical precision everywhere, we lack the computational power to do so, assuming it is even theoretically possible.

kbenson · on Aug 9, 2017

> In another example, 1 nanosecond and 1 millisecond, separated by six orders of magnitude, are both effectively instant to a human... at least, as long as it's only happening once. Start looping on that and the differences rapidly become, ah, effective.

Well, yeah. Looping on some operation changes the complexity to n*(O(interion operation)). That changes the complexity of O(log(n)) to O(n·log(n)).

Also, anyone that's actually learned anything about time complexity analysis and order of magnitude notation will know that the constants which aren't shown can and often do swamp the other factors in practice, depending on the algorithm.

Finally, I think it's totally acceptable to take these analysis out of the theoretical realm of math and apply them to real world limits. If there's a hard limit for n such that it's not infinite, why not use that as an upper bound constant where applicable?

victor___ · on Aug 9, 2017

I don't think pulling out the number 6 separately from the constant factor results in an apples-to-apples comparison with other operations with logarithmic complexity.

The 32-way tree operations have asymptomic complexity 6 * C_1 * log(n) = C_2 * log(n)

Operations on a balanced binary tree have asymptotic complexity C_3 * log(n) = 6 * C_4 * log(n).

The only difference between the two data structures is the actual values of the constants.

I think the Scala people have a valid point that logarithmic complexity may be as good (or nearly as good) as constant-time in practice. The precise way the claim is formulated is wrong and abuses big-O analysis.

A more correct argument is to choose a practical upper bound on log(n). E.g. maybe 64. Then multiply that by the constant factor (not playing any tricks with splitting out 6 *). If that number is always good enough if practice, then you don't need to worry about your operation being logarithmic.

rrobukef · on Aug 10, 2017

I don't think they should use effectively constant. Because the difference between O(1) and O(log(n)) may not be very big but the difference between O(n) and O(n log(n)) is very measurable. Even for small n.

nwellnhof · on Aug 10, 2017

> Also interesting to note is that"effectively constant", although extremely hand-wavy and not rigorous, is used even by computer scientists to denote extremely slow growing functions.

Another example is Donald Knuth who claims in TAOCP, section 4.3.3 "How fast can we multiply", that multiplication algorithms are effectively linear. Quote: "For all practical purposes we can assume that [...] the Schönhage-Strassen multiplication technique will have a running time linearly proportional to n. [...] The practical problem of fast multiplication is therefore solved, except for improvements in the constant factor."

This really stuck out to me.

smitherfield · on Aug 9, 2017

Even a statically-allocated C array doesn't "really" have constant-time lookup with respect to size; the more elements it has the greater the rate of cache misses and page faults on lookups becomes (I'm pretty sure this can be proven true of all data structures). Moreover the possibility of cache misses and page faults means that while lookups take a fixed number of CPU cycles on average, any single lookup might take between <1 and thousands of cycles. If you truly need your data accesses to take a fixed number of CPU cycles, you have to handwrite assembly and put everything in registers. And even that doesn't account for instruction reordering and the possibility of interrupts or context switches.

I would assume the reason the Scala website calls Vector log(n) element lookup "effectively constant" is because in real-world use (as opposed to hypothetical ideal Von Neumann machines), the slowdown of the lookup algorithm with respect to n is nil, in relation to the decrease in cache and allocation efficiency with respect to n.

If you're creating some sort of ultra-fine-grained real-time system where everything needs to take a fixed number of clock cycles, Scala would be a pretty terrible tool for the job. For that sort of thing you'd want MISRA C or assembly running on bare metal, or a FPGA or ASIC.

arnioxux · on Aug 9, 2017

There's an interesting argument based on physics on why memory access is O(sqrt(N)):

http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-p...

Relevant quote:

> The amount of information that can fit within a sphere with radius r can be calculated using the Bekenstein bound, which says that the amount of information that can be contained by a sphere is directly proportional to the radius and mass: N ∝ r·m. So how massive can a sphere get? Well, what is the most dense thing in existence? A black hole! It turns out that the mass of a black hole is directly proportional to its radius: m ∝ r. This means that the amount of information that can fit inside a sphere of radius r is N ∝ r². And so we come to the conclusion that the amount of information contained in a sphere is bounded by the area of that sphere - not the volume!

> In short: if you try to squeeze too much L1 cache onto your CPU it will eventually collapse into a black hole, and that would make it awkward to get the results of the computation back to the user.

But honestly I find all of these arguments equally pedantic.

People often forget Big Oh is a tool just like any other tool. It's useful for estimating number of X, whether X is cpu instructions, cache miss, disk read or whatever. When it stops being a good estimate (whether it's because your real world N is too small for growth rate to dominate, or stuff you're counting are not unit-cost comparable, or you're not even counting the right things in the first place), just stop using it and count cycles on a benchmark instead.

deepsun · on Aug 9, 2017

You're comparing radius (in Bekenstein bound) and event horizon (in black hole).

Event horizon is a pretty different beast than radius. If we define "density" of a black holes using event horizon instead of radius (black holes don't have real radius), then supermassive black holes are actually pretty "light". A black hole of the mass of our universe would have "density" of the universe, which is mostly empty.

arnioxux · on Aug 9, 2017

Not my article and I don't know any theoretical physics.

In case that point happens to be false, I think his other argument still works:

- "data centers ... are spread out on the two-dimensional surface of the earth"

- if we tightly pack these data centers on the surface it will form a circle with area O(N) so distance to furthest data center is O(sqrt(N)). (sky scraper height is negligible)

Anyway, it's more of an interesting thought experiment than of any practical use. Nobody sane would estimate cost of generic "information retrieval" that mixes cost of ram/disk/network.

edejong · on Aug 10, 2017

Thanks, enjoyed reading that. I thought in similar lines, but never wrote it down.

saurik · on Aug 9, 2017

The concept of "constant time" doesn't mean "always the same number of instructions, a value we can exactly predict": it means essentially "there exists some X such that for all accesses of items in a structure of size N at any index it will always take time less than X, independent of N". For "linear time" we would say "there exists some X such that for all accesses of items in a structure of size N at any index it will always take time less than X multiplied by the value N".

So "O(Q)" means "there exists some X such that for all accesses of items in a structure of size N at any index it will always take time less than X multiplied by the expression Q, which is expressed in terms of N". O(1) is "constant time", and does not mean things take one instruction: it means the expression does not depend on N. There is no O(1000): that is the same as O(1); and if an operation takes randomly either 10 or 1000 cycles, it still is O(1) as X is 1000: both 10 and 1000 are bounded by 1000 multiplied by 1.

Time complexity is thereby a measure of how scalable something is, not how long it will take. An operation that is linear might beat one that is logarithmic at small values of N, but eventually you will have an input large enough that the logarithmic one is not just faster but will always be faster as the constant multipliers can no longer compensate the bounds for the one underlying expression being swamped by the other one.

The constant time constraint is true for simple memory access on a computer running a single process (which I only add as a constraint as the entire theoretically framework breaks otherwise), even if some accesses take a thousand times longer than other ones: there is a slowest possible time past which it will not be slower. No matter how large the array gets, it isn't going to get slower, even though it is super fast for extremely small arrays that fit entirely in some small amount of cache.

Of course, it might not be true if memory is backed by a file and that file is on network storage, and the remote file is stored in a linked list ;P, but that isn't the situation with cache: the controller doesn't even keep a resizable data structure for cache lookups. If you plotted a graph of the pefromance of lookups vs. the size of the array, once you got noticeably past the size of the cache the line would stop growing, and the entire thing would fit under a flat line at some constant X.

There is a fiction here, and that is that we can think about these issues at all in the world of a computer, which has a limited amount of storage (which means there exists some I past which it is no longer possible to even access elements as we cannot allocate an array that large); but it isn't due to cache misses and has nothing to do with the kind of predictable real-time constraints you describe: those are constant multipliers on the complexity. That is when you enter the world of physics, which is arguably more interesting.

pierrebai · on Aug 9, 2017

This mainly shows that the author mistakes the prupose of Big-O notations. It's a measure of asymptotic performance, so there need to be an asymptote to reach. If the range of values you're measuring is small (and a non-constant factor that varies between 1 and 6 is small) then it's teh fact that one uses Big-O to characterize an algorithm that is wrong.

It's a classic mistake. There are many pitfall to performace measurements, including: using Big-O when the range is small or ignoring cinstant factor when the constant is big. Both of these has lead people down the wrong path, for example usng a theoretically faster algorithm that is in reality slower because in the typical use case the constant overweigh the complexity factor.

The counter-example to the author argument would be to say that a binary decision is linear over the input range.

kbenson · on Aug 9, 2017

I think the problem is confusion as to whether it's being applied to the algorithm, or the implementation. The algorithm is not effectively constant time, as n is infinite. The implementation, which has an upper bound for n, is effectively constant because of that limitation.

flgr · on Aug 9, 2017

> and a non-constant factor that varies between 1 and 6 is small

Wouldn't the 6 in O(((n^1000)^n)^6) be a non-constant factor that makes a big difference in asymptotic performance though?

mamon · on Aug 9, 2017

This is not a constant factor, this is an exponent. Constant factors are what's standing in front of whole term, like this: O(7*((n^1000)^n)^6), 7 being constant factor.

lorenvs · on Aug 9, 2017

I think the author missed an opportunity to point out what I feel is the the most convincing argument against labeling these operations as "Effectively Constant". The only reason we drop constant factors out of Big-O expressions is because we measure as N approaches infinity. Once you start reasoning about the size of N being bounded, the rationale for dropping constant factors vanishes.

The cost of maintaining a 32-element tree node might not be measurable as N approaches infinity, but for N < 2 ^ 32 it is, especially in relation to the cost of maintaining 2-element tree nodes. Once you consider constants, the runtimes are likely comparable between a binary tree with depth 32 and a 32-ary tree with depth 6 (with a skew towards the 32-ary tree for locality of access).

deepsun · on Aug 9, 2017

TL;DR:

Scala doc claims that their algorithm is O(log32(N)), where N is not greater than 2^31, so it's O(6)=O(1) -- "effectively constant".

Author claims that then all real-world algorithms are constant, for example bubble sort is O(N^2), where N is not greater than 2^31, so it's O(2^62)=O(1) -- also "effectively constant".

_5ysi · on Aug 9, 2017

Well put! Taking author's basic proposition to its logical conclusion shows that faulting Scala for this is wrong.

Scala has illuminated the author to a pedantic dissonance in academia: in this case, that true asymptotic analyses should be using O(logn) instead of O(1) for random memory access.

Hence why algorithms like Hoare's quicksort are so fast, they actually are Theta(nlogn) in practice because linear memory access looks like O(1) on modern machines.

(but again pedantically it is probably still O(logn) from gate depth of multiplexing/demultiplexing the architecture's pointer size.)

Edit: Remembering that O((logn)^2) = O(logn), I suppose the author is probably correct.

Asdfbla · on Aug 9, 2017

Seems like a pedantic article, but I liked his remark that the 'effectively constant' argument can of course be extended to any algorithm where the input size is bounded and therefore, shouldn't really be used to describe your algorithmic complexities, even in real life implementations (except when you maybe use the inverse Ackermann function or something).

cameldrv · on Aug 9, 2017

This is a pet peeve of mine in interviews when asked what the big O of hash table lookup is and I say "you're probably looking for O(1), but it's really O(log n). The assumption is that you can hash a word sized quantity and access the corresponding memory in constant time, but the whole point of Big O is asymptotic complexity, and just taking the hash of N bits is going to take O(n) time. Now I get called a pedant for this, but really log n time is for most purposes close enough to constant time to be negligible for practical purposes. Log n factors are unlikely to determine whether an algorithm is feasible, as the constant factor is usually way more significant.

boethian · on Aug 10, 2017

Be careful here, because you hash the element, granted that may take O(n) where n is bits in the element, but n usually denotes the number of elements in a list, which can obviously be much larger and is thus far more important. EDIT: in part because the number of bits in each element doesn't "grow"; it's usually fixed.

stcredzero · on Aug 9, 2017

By now certain elements of a modern programming environment have become pretty apparent. (Vectors and maps, for example) Could there be something gained by supporting these directly in the architecture?

DSrcl · on Aug 9, 2017

Labelling an operation -- number of steps of which upper-bounded by 6 -- "effectively constant" is different from considering any real world algorithm with bounded input size.

Calling Scala's immutable vector ops "effectively constant" is not a stretch/hack by any means, considering we also say the same for integer addition and multiplication.

MaulingMonkey · on Aug 9, 2017

> Calling Scala's immutable vector ops "effectively constant" is not a stretch/hack by any means, considering we also say the same for integer addition and multiplication.

This rationale ad absurdum means we can call an in-memory linear search "effectively constant" because we have an upper-bound of 2^32 or 2^64, depending on processor architecture.

Yes, log32 is very nice. Even log2 is pretty nice though, and we don't call that "effectively constant", because we already have a much more precise and well defined term for it - "logarithmic" - which the docs also use: http://docs.scala-lang.org/overviews/collections/performance...

You might be able to tell me with a straight face that labeling log32 "effectively constant" and log2 "logarithmic" is "correct enough". But can you tell me with that same straight face, that labeling it "Log32" (or even just "Log") isn't just as correct, more informative, and less confusing?

flgr · on Aug 9, 2017

This is why I've never liked stating that something has "run-time of O(log(n))" since that's rarely true — the assumption for that is that all machine instructions take pretty much the same time, which is not the case. CPU instructions involving cache misses are multiple orders of magnitude more expensive than others.

I think it makes much more sense to talk about concrete operations (or cache misses) instead. Sounds like their implementation has O(log(n)) cache misses.

wohlergehen · on Aug 9, 2017

Has anyone actually benchmarked the vector lookup for various vector sizes in scala?

That would straightforwardly determine if the runtime is "effectively constant" or "effectively logarithmic", e.g. cause it is really O(a+b*log(n)) with a >> b.

flgr · on Aug 9, 2017

Sounds like the author did — http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...

Notably Vector index access takes 4,260,000 ns for a vector with 1,048,576 elements instead of measured 0 ns for native arrays. is about +4 ns extra per element and hints at the whole data set still fitting into L2 cache.

If the process ends up accessing more working memory than fits into L2 (32 or 64 MB or so) and lookups aren't nicely bundled together, the overhead will approach about +80 ns per element access. Or +0.08 seconds per million element accesses.

It does seem unlikely that every element access would end up causing a cache miss since I'd expect lookups to happen close together, but this can be significant for intense workloads such as OLAP.

andrewla · on Aug 9, 2017

Based on the numbers for lookup, it looks worse than that -- it seems to be linear (r2 = .9988).

    df <- data.frame(
      s=c(0,1,4,16,64,256,1024,4096, 16192, 65536,
          262144, 1048576),
      t=c(0, 1, 5, 17, 104, 440, 1780, 8940, 38000,
          198000, 930000, 4260000))
    > summary(lm(t ~ s + 0, data=df))
    ...
    Coefficients:
      Estimate Std. Error t value Pr(>|t|)
    s  4.02825    0.04161   96.82   <2e-16 ***

    Residual standard error: 45060 on 11 degrees of freedom
    Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9987
    F-statistic:  9374 on 1 and 11 DF,  p-value: < 2.2e-16

cscheid · on Aug 9, 2017

That's because the program is looking up every item. It's _right there in the text_:

> Note that this time is measuring the time taken to look up every element in the collection, rather than just looking up a single element.

http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...

andrewla · on Aug 9, 2017

Shame on me for not looking more closely!

So when you look at the per-operation cost (t/s in my model) then what emerges is, indeed, logarithmic (r2 ~ .96 for lm(t/s ~ s + log(s)). Bounded, for sure, but if you walk in with a vector-indexing-bound but array-size-independent algorithm expecting that the performance for 1e3 elements will be representative of the behavior for 1e6, you're in for a surprise.

clhodapp · on Aug 9, 2017

log_32(x) is only a constant factor of 5 off of log_32(x). If we are going to call log_32(collection_size) "effectively constant", we may as well call any log-complexity operation "effectively constant". If you call the op "logarithmic-complexity", the "effective-constant" property should be just as clear to someone who knows what's up with time complexities and logarithms plus you gain the benefit of being technically accurate. Actually calling log_2 ops "logarithmic" and log_32 ops "effectively constant" crosses the boundary into being actively misleading.

wtetzner · on Aug 9, 2017

I think the point is more that the implementation is effectively constant time, even if the algorithm isn't. In the Scala implementation, vectors have a maximum size of 2^32. So we know that a lookup won't need to walk more than 6 levels deep, no matter what.

Hash tables are usually considered to have constant time lookup, but of course in practice they have to deal with collisions. I don't think it's unreasonable to say that Scala's vector lookups are "effectively" constant time, because they are.

clhodapp · on Aug 10, 2017

That's definitely a fair way to describe things (even if it is strictly less precise than just saying "log_32") but if you're going to do that, you should call all of the logarithmic-in-size operations "effectively constant". Having a bounded max of e.g. 30 lookups isn't markedly different from having a bounded max of 6. Basically... my claim is that it's misleading for the Scala docs to draw the distinction where they've drawn it.

Edit: Ah! I messed up the first sentence of my original comment but it seems to be too late to edit it! I meant to say "log_2(x) is only a constant factor of 5 off of log_32(x)".

wtetzner · on Aug 10, 2017

> Basically... my claim is that it's misleading for the Scala docs to draw the distinction where they've drawn it.

Yeah, I see where you're coming from.

I guess it would be constant time if they made sure lookups always took as long as the worst case :)

marvinalone · on Aug 9, 2017

Wow, you really showed that straw man what's what.