More

vincelt · 2025-11-18T22:10:51 1763503851

It's 2025 and a code editor is now 600MB.

vincelt · 2025-09-08T13:23:13 1757337793

Agreed. For immersing at home, reading mangas/webtoons with an OCR translated layer and watching Japanese vlogs with dual subtitles has been effective for me.

vincelt · 2025-09-02T11:04:46 1756811086

Found this repo the other day in case that's useful: https://github.com/Xuanwo/acp-claude-code

vincelt · 2025-06-26T06:43:03 1750920183

Agreed. If there are provider-specific differences then we should be able to add CLAUDE/GEMINI.md to take precedence, but most things in there are basic or general information and should apply to all providers (same for .cursor/, .windsurfrules/, ..)

vincelt · on April 29, 2024

Still fails to solve this one:

Prompt:

> Complete the following Python program:

```

len, print = print, len

def print_len(x):

     "Print the length of x"

```

gpt2-chatbot:

```

len, print = print, len # Swap the functionalities of print and len

def print_len(x): "Print the length of x"

    len(x)  # This 'len' is actually 'print' because of the swap above

print_len("Hello, world!") # This will output: 13

```

rany_ · on April 29, 2024

That's the same output as gpt-4-turbo for me, so no real improvement for that particular prompt.

hgomersall · on April 29, 2024

That's not bad though!

vincelt · on Sept 9, 2023

Have you tried replicating via the API with a temp of 0?

belter · on Sept 9, 2023

No I did not.

moffkalast · on Sept 9, 2023

Using a temp of zero usually returns garbage results from most models, so it would likely do so in case of GPT 4 as well. Any other great ideas?

taberiand · on Sept 9, 2023

The point isn't that temp 0 should be used, the point is that anyone surprised that they get different results should realise that there is an element of randomness involved by default.

Even repeating the same question in a single chat can have GPT-4 vary on its output, though it will often settle on a particular output due to context informing the output (which is why adding context is so important for these models)

afro88 · on Sept 9, 2023

Temp of 0 gives the least random and most predictable results

moffkalast · on Sept 9, 2023

That's true, but those results are rarely the correct ones, at least for v1 llama models. In my experience each model has an optimal temperature at which it performs vastly better. I'm sure OpenAI have the best config they know set up for ChatGPT but let people generate trash through the API if they want to waste their credits on it.

dontreact · on Sept 9, 2023

Why would the accuracy decrease with lower temperature? Setting temperature to 0 just means at each step the model will emit the token with the highest likelihood.

moffkalast · on Sept 10, 2023

Yes that's what I'm saying, to reiterate: The likeliest token does not lead to the highest performing result. Otherwise temperature wouldn't even be an option. I would imagine things like language word frequency affect the token rating a lot while having nothing to do with the task at hand except providing a correctly formatted answer, but it's probably not the whole story.

OpenAI (and others that know what they're doing) always do their benchmarks in a multi-sampled way, by running 5 or 20 times at optimal temp. Using a wrapper that runs these samples and then another pass that judges self-consistency for a final answer can give you a correct answer 100% of the time for a question that would be wrong 100% of the time with temp at zero.

lostmsu · on Sept 11, 2023

I had a conversation with a friend regarding this exact question and my understanding is that model trains to optimize the distribution of all texts, therefore when you restrict it to deterministic sampling that is not representative of inputs you select the slice of the distribution that model learned that conveys much less information than the full distribution, and hence has poorer results.

Grimblewald · on Sept 10, 2023

Not in my experience, in fact I find that when I need precise, realistic, and reliable results temp 0 is needed. For example, here is a bunch of names, gather the names of specific plastics under headings matching their common acronym - if I don't use temp 0 I might get nonsense out. Temp 0? reliably correct.

moffkalast · on Sept 10, 2023

Interesting, that's the exact opposite of my experience.

circuit10 · on Sept 9, 2023

What do you mean? It works fine for me when I’ve tried it

vincelt · on Sept 9, 2023

Oh neat, thanks for sharing, wanted to add an interpreter to that test

vincelt · on Sept 9, 2023

I used the API from Together[0].

Thanks for sharing your results, they're indeed pretty different. I looked at the source again and did append a "# " before every prompt made by those 10 `code` models (during testing thought that formatting it as a Python comment might help them).

Will re-run the script without that to see if it matches your results.

[0] https://docs.together.ai/docs/models-inference#code-models

vincelt · on Sept 9, 2023

Thanks. I haven’t put it online yet, but will try to clean it (removing API keys & all) tonight/tomorrow and publish it

jmorgan · on Sept 9, 2023

:-) that's awesome. Thanks! Nice work on this.

vincelt · on Aug 25, 2023

Interesting, what about it makes you sleep better?