Hacker Newsnew | past | comments | ask | show | jobs | submit | trickleup's commentslogin

CodeGate is designed to make AI Agent Applications and Coding Assistants, safer and easier to use and manage. It provides a centralized, abstracted environment for managing prompts, model provider configurations, keys, model muxing, and more. Additionally, CodeGate offers protections against the leakage of personally identifiable information and tokens, keys and secrets.


Hey Folks!

Us folks over at Stacklok, needed a means to generate large synthetic datasets using a local LLM, over say OpenAI or a cloud service. So we built Promptwright, a Python library that lets you generate synthetic datasets using local models via Ollama

Why we built it:

* We were using OpenAI's API for dataset generation, but the costs were getting expensive for large-scale experiments. * We looked at existing solutions like pluto, but they were only capable of running on OpenAI. This project started as a fork of [pluto](https://github.com/redotvideo/pluto), but we soon started to extend and change it so much, it was practically new - still kudos to the redotvideo folks for the idea. * We wanted something that could run entirely locally and would means no concerns about leaking private information. * We wanted the flexibility of using any model we needed to.

What it does:

* Runs entirely on your local machine using Ollama (works great with llama2, mistral, etc.) * Super simple Python interface for dataset generation * Configurable instructions and system prompts * Outputs clean JSONL format that's ready for training * Direct integration with Hugging Face Hub for sharing datasets

We've been using it internally for a few projects, and it's been working well. You can process thousands of samples without the worry of API costs or rate limits. Plus, since everything runs locally, you don't have to worry about sensitive data leaving your environment.

Checkout the examples/* folder , for examples for generating code, scientific or creative writing

We'd love to get feedback from the community, if you're doing any kind of synthetic data generation for ML, give it a try and let us know what you think!

GitHub: https://github.com/StacklokLabs/promptwright


Interesting little observation (take from it what you will)

If you search youtube for "2024 financial crash" you will hit quite a few results

If you search youtube for "2023 financial crash" you won't find anything, they delete them all NYD 2024


Have you considered using the YouTube API to track these videos/channels while they are up (download the json, thumbs, transcript, maybe even whole video), check yt IDs daily/weekly to track video/channel deletions? The data dump might make an interesting info graphic, or be used to expose a grifter, or a ring of shill accounts.


Quora did a plenty good job of ruining itself, way before AI did.


mihai is an ml engineer who works at google.


The post could have been made clearer. And also I’m surprised but they are good at scale.


Why, exactly. Do you not have the ability to take information and draw conclusions on your own -- like a functionally mindless automaton lacking a frontal lobe?


It wasn't made evidently clear.

> take information and draw conclusions on your own

That's exactly what I did do. I don't approve with your kindest choice of words, nonetheless you, being a lucky sperm, also have the right of free speech


I wonder if that will one day help sell the house to some trekkie sci-fi enthusiast.


There's no fi in this, they can just enjoy science.


Lots of teams get thrashed trying to fix or bring down the count of CVEs so they can ship, so chainguard provide images with a guarantee of 0 CVEs. It saves folks a lot of time patching to bring down the count. However its a novel situation as most of the time these vulnerabilities are not even reachable in the first place, they are just noise. So its a solution more appeaseing to security theatre, than a real world threat. Once in a while a nasty thing comes along like log4shell, heartbleed, but most of its just noise. They do cut down image size significantly though, which is something I personally like and has value for saving ingress costs. Think Alpine.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: