Hacker Newsnew | past | comments | ask | show | jobs | submit | tmcdonald's commentslogin

Ollama has a num_ctx parameter that controls the context window length - it defaults to 2048. At a guess you will need to set that.


This is a harsh foot-gun that seems to harm many ollama users.

That 2k default is extremely low, and ollama *silently* discards the leading context. So users have no idea that most of their data hasn’t been provided to the model.

I’ve had to add docs [0] to aider about this, and aider overrides the default to at least 8k tokens. I’d like to do more, but unilaterally raising the context window size has performance implications for users.

Edit: Ok, aider now gives ollama users a clear warning when their chat context exceeds their ollama context window [1].

[0] https://aider.chat/docs/llms/ollama.html#setting-the-context...

[1] https://github.com/Aider-AI/aider/blob/main/aider/coders/bas...


There are several issues in the Ollama GitHub issue tracker related to this, like this[1] or this[2].

Fortunately it's easy to create a variant of the model with increased context size using the CLI[3] and then use that variant instead.

Just be mindful that longer context means more memory required[4].

[1]: https://github.com/ollama/ollama/issues/4967

[2]: https://github.com/ollama/ollama/issues/7043

[3]: https://github.com/ollama/ollama/issues/8099#issuecomment-25...

[4]: https://www.reddit.com/r/LocalLLaMA/comments/1848puo/comment...


Thank you! I was looking for how to do this. The example in the issue above shows how to increase the context size in ollama:

    $ ollama run llama3.2
    >>> /set parameter num_ctx 32768
    Set parameter 'num_ctx' to '32768'
    >>> /save llama3.2-32k
    Created new model 'llama3.2-32k'
    >>> /bye
    $ ollama run llama3.2-32k "Summarize this file: $(cat README.md)"
    ...
The table in the reddit post above also shows context size vs memory requirements for Model: 01-ai/Yi-34B-200K Params: 34.395B Mode: infer

    Sequence Length vs Bit Precision Memory Requirements
       SL / BP |     4      |     6      |     8      |     16
    --------------------------------------------------------------
           256 |     16.0GB |     24.0GB |     32.1GB |     64.1GB
           512 |     16.0GB |     24.1GB |     32.1GB |     64.2GB
          1024 |     16.1GB |     24.1GB |     32.2GB |     64.3GB
          2048 |     16.1GB |     24.2GB |     32.3GB |     64.5GB
          4096 |     16.3GB |     24.4GB |     32.5GB |     65.0GB
          8192 |     16.5GB |     24.7GB |     33.0GB |     65.9GB
         16384 |     17.0GB |     25.4GB |     33.9GB |     67.8GB
         32768 |     17.9GB |     26.8GB |     35.8GB |     71.6GB
         65536 |     19.8GB |     29.6GB |     39.5GB |     79.1GB
        131072 |     23.5GB |     35.3GB |     47.0GB |     94.1GB
    *   200000 |     27.5GB |     41.2GB |     54.9GB |    109.8GB

    * Model Max Context Size
Code: https://gist.github.com/lapp0/d28931ebc9f59838800faa7c73e3a0...


Can context be split on multiple GPUs?


Not my field, but from this[1] blog post which references this[2] paper, it would seem so. Note the optimal approach are a bit different between training and inference. Also note that several of the approaches rely on batching multiple requests (prompts) in order to exploit the parallelism, so won't see the same gains if fed only a single prompt at a time.

[1]: https://medium.com/@plienhar/llm-inference-series-4-kv-cachi...

[2]: https://arxiv.org/abs/2104.04473


Huh! I had incorrectly assumed that was for output, not input. Thanks!

YES that was it:

  files-to-prompt \
    ~/Dropbox/Development/llm \
    -e py -c | \
  llm -m q1m 'describe this codebase in detail' \
   -o num_ctx 80000
I was watching my memory usage and it quickly maxed out my 64GB so I hit Ctrl+C before my Mac crashed.


Sorry this isn't more obvious. Ideally VRAM usage for the context window (the KV cache) becomes dynamic, starting small and growing with token usage, whereas right now Ollama defaults to a size of 2K which can be overridden at runtime. A great example of this is vLLM's PagedAttention implementation [1] or Microsoft's vAttention [2] which is CUDA-specific (and there are quite a few others).

1M tokens will definitely require a lot of KV cache memory. One way to reduce the memory footprint is to use KV cache quantization, which has recently been added behind a flag [3] and will 1/4 the memory footprint if 4-bit KV cache quantization is used (OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve)

[1] https://arxiv.org/pdf/2309.06180

[2] https://github.com/microsoft/vattention

[3] https://smcleod.net/2024/12/bringing-k/v-context-quantisatio...


I think Apple stumbled into a problem here, and I hope they solve it: reasonably priced Macs are -- by the new standards set by modern LLMs -- severely memory-constrained. MacBook Airs max out at 24GB. MacBook Pros go to 32GB for $2200, 48GB for something like $2800, and to get to 128GB requires shelling out over $4000. A Mini can get you to 64GB for $2000. A Mac Studio can get you to 96GB for $3000, or 192GB for $5600.

In this LLM era, those are rookie numbers. It should be possible to get a Mac with a lesser processor but at least 256GB of memory for $2000. I realize part of the issue is the lead time for chip design -- since Mac memory is an integral part of the chip, and the current crop were designed before the idea of running something like an LLM locally was a real probability.

But I hope the next year or two show significant increases in the default (and possible) memory for Macs.


> It should be possible to get a Mac with a lesser processor but at least 256GB of memory for $2000.

Apple is not known for leaving money on the table like that.

Also, projects like NVidia DIGITS ($2k for 128G) might make Apple unwilling to enter the market. As you said, Studio with 192G is $5600k. For purely AI purposes, two DIGITS' are a better choice, and non-AI usages don't need such ludicros amount of RAM (maybe for video, but those customers are willing to pay more).


> Apple is not known for leaving money on the table like that.

True -- although I will say the M series chips were a step change in performance and efficiency from the Intel processors they replaced, and Apple didn't charge a premium for them.

I'm not suggesting that they'll stop charging more for RAM than the industry at large -- I'm hoping they'll unbundle RAM from CPU-type. A base Mac Mini goes for $600, and adding RAM costs $200 per 8GB. That's a ridiculous premium, clearly, and at that rate my proposed Mac Mini with 256GB of RAM would go for $6600 -- which would roll my eyes until they fell out of my head.

But Apple is also leaving money on the table if they're not offering a more expensive model people would buy. A 128GB Mini, let's say, for $2000, might be that machine.

All that said, it's also a heck of a future-proof machine, so maybe the designed-obsolescence crowd have an argument to make here.


This has been the problem with a lot of long context use cases. It's not just the model's support but also sufficient compute and inference time. This is exactly why I was excited for Mamba and now possibly Lightning attention.

Even though the new DCA based on which these models provide long context could be an interesting area to watch;


Ollama is a "easymode" LLM runtime and as such has all the problems that every easymode thing has. It will assume things and the moment you want to do anything interesting those assumptions will shoot you in the foot, though I've found ollama plays so fast and loose even first party things that "should just work" do not. For example if you run R1 (at least as of 2 days ago when i tried this) using the default `ollama run deepseek-r1:7b` you will get different context size, top_p and temperature vs what Deepseek recommends in their release post.


Ollama definitely is a strange beast. The sparseness of the documentation seems to imply that things will 'just work' and yet, they often don't.


Yup, and this parameter is supported by the plugin he's using:

https://github.com/taketwo/llm-ollama/blob/4ccd5181c099af963...


Hey Justin - this is probably one of my favourite talks!

https://testdouble.com/insights/the-selfish-programmer

Was lucky to be at the conference when you gave it and it's stuck in my head since. When I'm struggling for motivation or my side projects seem to be just too big I give it another watch.


v4 will compile with libsass by default. You can set the TWBS_SASS environment variable to use the Ruby compiler if desired.


Dota 2 has been the most popular game on Steam for a while now, and with the gradual rollout of beta keys the number playing has been steadily climbing - you can see a graph of the number of players on steamgraph[1].

[1]: http://steamgraph.net/index.php?action=graph&jstime=1&appid=...


I've had 4 emails in the past month providing information about the phishing emails from my department, JCR and IT services, and despite that a number of accounts still got compromised.

Couldn't agree more about education never actually fixing the problem.


Both of them have applied for .search.

> The .search gTLD provides Google with the opportunity to differentiate its Google Search products and services by linking them to a unique gTLD. Google will be able to quickly distinguish new products and services it develops and⁄or acquires by offering them in the proposed gTLD.

> The mission of the .SEARCH registry is to provide a unique and dedicated platform for Amazon while simultaneously protecting the integrity of its brand and reputation.

> A .SEARCH registry will:

> • Provide Amazon with additional controls over its technical architecture, offering a stable and secure foundation for online communication and interaction.

> • Provide Amazon a further platform for innovation.

> • Enable Amazon to protect its intellectual property rights.

Basically, only for their own commercial gains. The other two applicants want .search to be a place for consolidation of search related domains, which seems like an absolute pipe dream.


It was specifically mentioned a couple of times in the post-launch press conference.


It's using my bootstrap-sass gem. [1]

[1]: https://github.com/thomas-mcdonald/bootstrap-sass


Thanks Thomas! Used your gem in a project recently when we migrated an older app from a Theme Forest layout to Twitter Bootstrap. It wasn't immediately obvious which gem was the best, since it looks like a few people set out to do the same thing, but yours was the only one that was up-to-date and easy to install. We've also been getting great mileage from rails_admin, which uses your gem as well. Thanks for doing this and sticking with it!


I've been wanting to use Bootstrap with some other CSS syntaxes for awhile, and I was wondering how you manage the conversion from less to scss and stay up to date with the latest Bootstrap changes?


In general I wait until the new version is merged into master, and I use GitHub's compare view to see all the changes between the latest version and the previous one, which I keep open on one screen and go through each file updating the bits that have changed. For the Javascripts I just copy the whole folder over, since I don't fiddle with those.

The main exception to this was for 2.0, which had a separate branch for a while so that people could still use bootstrap-sass while updating their application to the new syntax. When 2.0 was merged into master, since it had been under such heavy development I reconverted the entire codebase.

There are a few quirks with the conversion from Less to SCSS, but these tend to be few and far between, and are usually down to me missing a variable somewhere. The main one is the use of namespaced mixins[1]. SCSS doesn't support this, so I have to prefix/suffix the namespace to each mixin within the namespace. Aside from that, and method names/variable notation, there (appears to be, can't speak authoritatively since I haven't really used Less) little difference between the two.

[1]: http://lesscss.org/#-namespaces


I'm not sure all the things you list as being possible are true.

  - Every GitHub Repository could be access by anyone as if they had full administrator privileges.
  - This means that anyone could commit to master. 
  - This means that anyone could reopen and close issues in issue tracker. 
  - Even the *entire* history of a project could be wiped out. Gone forever.
As I understand it from his explanation[1] he added his public key to the Rails user, which has permissions to push/pull to the repository. This doesn't mean he had web administrative access, just Git access, since you cannot log in to the web service using your private key. I hope that's the case, at least.

[1]: http://homakov.blogspot.com/2012/03/how-to.html


The way he was able to add his key was via a web-based exploit, which effectively gave him administrative web access. So yes, the list is correct.


I thought that he added his public key to the Rails user through his own account settings, which wouldn't give him access to the Rails web admin.


This is correct. People who don't understand what a mass-assignment bug is are running with this story. It's like when we witness a DDoS and have to tollerate people who think it means that the targeted party was infiltrated.

This bug allowed one to add their public key to another user's account, and make changes to comments and issues.


What are the odds that there's a similar bug which allows changes to user accounts? If that's the case, then altering the password or email address is trivial.


FWIW, that last bullet, aside from being the most egregious example of hyperbole in TFA, shows a complete lack of understanding of how git works.


Nothing in the article would lead one to believe he didn't understand the distributed nature of git repositories, but a lot in the article would lead one to believe he was specifically referring to the data loss issues on GitHub if someone wiped out a project.


If the project was deleted from Github, you'd just have to create a new one push a local clone to it. That's hardly what that point is saying.


I think this is related to an issue he opened on Rails[1] which would suggest that GitHub isn't protecting against malicious mass assignment.

By default, if you have an new, create or update_attributes (and more, I imagine) call which changes various attributes based on a hash from parameters (eg params[:post], where you have params[:post][:title], params[:post][:body] etc) Rails allows mass assignment of every attribute on that model, since attr_accessible is not called.

There is a method you can call in the model called attr_accessible that restricts the columns that can be updated through mass assignment, while still allowing for manual assignment of other columns.

An example of this might be a post's user_id, which you would usually want to set to the current user while not allowing mass assignment. Without specifying attr_accessible it would mean that if a malicious user added params[:post][:user_id] to their POST/PUT, the Rails application would update the user_id as per the params value. If attr_accessible had been called, defining the columns that the developer wanted to be mass assigned (say post and title), it would mean that the user_id would not be mass assigned and Rails would log that this was the case.

attr_accessible therefore acts as a whitelist for columns that can be mass assigned. It just so happens that the Rails default is to have no whitelist and allow all columns to be mass assigned, despite the fact that the sensible option is to always have a call to attr_accessible in your models.

[1]: https://github.com/rails/rails/issues/5228


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: