More

fulmicoton · 2025-07-17T02:04:51 1752717891

One trouble I could see with your approach is that you treat the information "Doc at pos i" beats "Doc at pos j" independently from i and j. Intuitively, it is not as critical when a bad doc is at rank 9 instead of rank 10; compared to bad doc landing at rank 1 instead of rank 10.

LambdaMART's approach seems better in that respect.

https://medium.com/@nikhilbd/pointwise-vs-pairwise-vs-listwi...

fulmicoton · on Feb 20, 2025

This bug hit us, and yes, I hadn't thought of just switching to opendal. That's indeed a great reminder.

fulmicoton · on Jan 12, 2025

No. Quickwit was founded well before Warpstream and it did not inspire us.

The Husky blog post was released after we released a few versions of quickwit if I recall correctly. It was not an inspiration either.

As far as I know, the similarities are fortuitous.

fulmicoton · on Jan 12, 2025

Our seed round was 100% made of SAFE, so VCs did not have the power to force us to do anything.

The sentence in the blog post is a tad misleading. I suspect François is not really talking about VCs that had already invested in quickwit, but about the usual flow of other VCs who contacted us, to know about the company and be part of our eventual series A.

It just generally felt like we were "at a crossing".

No one twisted our arm.

propter_hoc · on Jan 12, 2025

Thanks for the clarification, and sorry for jumping to an incorrect conclusion based on vague wording. (I would edit my comment accordingly but I can't anymore.)

fulmicoton · on Dec 3, 2024

Developer of tantivy chiming in! (I hope that's ok) Database performance is a space where there are a lot of lies and bullshit, so you are 100% right to be suspicious.

I don't know SeekStorm's team and I did not dig much into the details, but my impression so far is that their benchmark's results are fair. At least I see no reason not to trust them.

PSeitz · on Dec 3, 2024

Also we are working on some performance improvements based on the benchmark comparison, as they highlighted some areas we can improve in tantivy.

fulmicoton · on Oct 11, 2024

Yes. We should shut down this demo. We reduced the hardware to cut down our costs. Right now it runs a ludicrously small amount of hardware.

fulmicoton · on July 12, 2024

Quickwit is targetting logs:

    - it does not do vector search. It can rank docs using BM25, but usually people just want to sort by timestamp.
    - its does not use an SSD cache. Quickwit reads directly into the object storage.
    - it is append-only (you can't modify documents)
    - it scales really well and typically shines on the 1TB .. 100PB range
    - it has a Elastic search compatible API.

fulmicoton · on July 12, 2024

This is NOT about transaction log. This is application logs. The thing you generate via Log4j for instance.

Also 100PB is measured as the input format (JSON). Internally Quickwit will have more efficient representations.

jcgrillo · on July 12, 2024

yeah I think I showed that pretty clearly

fulmicoton · on July 12, 2024

Security and customer support are the two main reasons why people want a super long retention.

Medium retention (1 or 2 months) is still very appreciable if some issue in your bugtracker stay stale for this amount of time.

fulmicoton · on July 12, 2024

It is pretty much the same as Lucene. The compression ratio is very specific logs and depends on the logs themselves. (Often it is not that good)

jcgrillo · on July 12, 2024

Exactly! Which is again one of the reasons it's confusing that people apply full text search technology to logs. Machine logs are quite a lot less entropic than human prose, and therefore can be compressed a whole lot better. A corrollary is that because of the redundancy in the data "grepping" the compressed form can be very fast, so long as the compression scheme allows it.

If the query infrastructure operating on these compressed data is itself able to store intermediate results, then we've killed two birds with one stone because we've also gotten rid of the restrictive query language. That's how cascading mapreduce jobs (or Spark) does it, allowing users to perform complex analyses that are entirely off the table if they're restricted to the lucene query language. Imagine a world where your SQL database was one giant table and only allowed you to query it with SELECT. That's pretty limiting, right?

So as a technology demonstration of Quickwit this seems really cool--it can clearly scale!--but it's kind of also an indictment of Binance (and all the other companies doing ELKish things out there).