Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> GPT-4 took months to train on a supercomputer and it generated a neural network of hundreds of gigabytes. What exactly was that supercomputer doing for several months and what exactly would the neural network represent if not a world model?

I believe GPT-4 was trained on a server farm, not a single computer. In any case what it was doing all that time was going over and over the text in its gigantic training corpus, which was of petabyte size, and optimising the objective P(tₖ|tₖ₋ₙ , ..., tₖ₋₁) i.e. the probability of token k given a "sliding window" of the n preceding (or surrounding) tokens.

There is nothing in this objective that needs a world model, and it is really not obvious why optimising this objective should lead to development of a world model, rather than, or in addition to, a model of the training corpus.

It is easy to see how this stuff works. You can train your own language model easily, although of course it would have to be a smaller language model. For example, you can train a Hidden Markov Model on the text of freely available literary works on project Guttenberg, or on wikipedia pages, and without too much compute (an ordinary laptop will do).

In fact, I recommend that as an exercise and as an experiment to gain a better understanding in how language modelling works, for those who are curious about questions regarding their ability to model something beyond text.

A good textbook to begin with statistical language modelling is "Foundations of Statistical Natural Language Processing" by Manning and Schűtze:

https://nlp.stanford.edu/fsnlp/

Or, just as good, "Speech and Language Processing" by Jurafsky and Martin:

https://web.stanford.edu/~jurafsky/slp3/

Or, if you don't have the time for an entire textbook, "Statistical Language Learning" by Eugene Charniak is an excellent, concise introduction to the subject:

https://archive.org/details/statisticallangu0000char



> I believe GPT-4 was trained on a server farm, not a single computer.

Yes a server farm of Nvidia A100 with supercomputing performance.

> It is easy to see how this stuff works. .. you can train a Hidden Markov Model..

No. GPT is not a Markov model.


A HMM is one way to train a language model that optimises the conditional objective I note above. A transformer is another. Working with a HMM will give you an insight into how language modelling works, and it's something you can do easily and cheaply, unlike training a giant transformer architecture.


Those primitive obsolete models are not the subject of the article. I already have a degree in applied mathematics and computer science and know the basics of machine learning.

My questions to you were rhetorical. I wasn't asking you for academic guidance.


I've already posted in another comment that I don't think there's a reason for me to insist, but I'd just like to point out that it's OK to not be an expert in everything but one should not have strong opinions on matters not of one's expertise. It's not a shame to not know everything, but it's unseemly to act like one does.


The exact same would apply to you. Projection. You are extremely arrogant with your opinions that no one else really agrees with and you talk down to people who disagree. Essentially you are insisting that your own personal definition of "world model" is true and the normal meaning of the term is not. Your argument is totally subjective and philosophical.


> development of a world model, rather than, or in addition to, a model of the training corpus

But the training corpus describes the world, so a model of the training corpus is a world model


A model of a training corpus is a model of a training corpus. Unless a system can read the corpus and understand what it says, it won't get any world model by modelling the corpus.


You have a subjective philosophical disagreement and are entitled to your opinion but it's not a technical argument.

It provides "a" world model and it's a useful model for its intended purpose. Not the "one true model" that is your your own personal life experience since birth.


Then humans also have a model our training corpus, which is an interpretation of the world via our senses. If we don’t ever directly experience the true underlying reality, do we truly understand it? In a way no, but it’s close enough that we call it understanding




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: