For scientific search experiments, you may like to consider using PyTerrier (which facilitates comparing multiple search model types - (sparse) vector space model; Boolean model; Binary Probabilistic Model; Support Vector Learning-to-Rank model; Divergence from Randomness model; (dense)embedding ranked retrieval models etc.).
Pipeline so far has gone like this:
* Use the search engine's API to query a bunch of depravity
* Use qwen3.5 to label the search results and generate training data
* Try to use fasttext to create a fast model
* Get good results in theory but awful results in practice because it picks up weird features
* Yolo implement a small neural net using hand selected input features instead
* Train using fasttext training data
* Do a pretty good job
* for (;;) Apply the model to real a world link database and relabel positive findings with qwen to provide more training data
Currently this is where I'm at
There's a lot of vague middle ground and many of the false positives are arguably just mislabeled.