Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Threat to OpenAI (wsj.com)
150 points by bookofjoe on Aug 31, 2024 | hide | past | favorite | 152 comments



Yes I generally agree with this. Ironically it seems as all the "AI wrappers" look like they have more of a moat than the models (as UX etc is actually quite hard), which is not what I expected at all when the LLM boom started.

I think a lot of LLM usage is actually pretty 'simple' at the moment - think tagging, extracting data, etc. This doesn't require the very top state of the art models (though Llama3.1 is very close to that).

Hopefully OpenAI have some real jumps forward in the bag otherwise I am struggling to see how they can justify the valuations being floated around.


IMO, it’s basically a generalist vs a specialist. LLMs are the generalist. They’re amazing at a wide variety of things, but unable to go deep.

The “wrappers” are the specialists. They’re doing the hard work of product discovery. This takes time, money, and a lot of trial and error to get it exactly right.


I think a big sticking point is chat isn't UX, unless of course 10 years from now we're all actually living in the movie Her...

Rebuilding existing apps with AI as a first class citizen seems like the name of the game right now


One killer feature is OpenAI’s vector database. I was surprised you can throw gigabytes of documents at it and chatgpt can see it all. It’s hard to simulate that via context window.

That’s not necessarily a moat, but OpenAI is still shipping important features. I wonder how hard it is for Claude et al to replicate.


When these companies are getting literally multiple billions of dollars of funding thrown at them, I can't think of a single feature that can be a moat. If it's truly a feature that leapfrogs competitors, there's just no way that someone like Anthropic spending $1B on engineering resources can't replicate the same exact feature that engineers elsewhere implemented.


If it’s simply an engineering feat I agree. Very, very, very often people tend to mix up science being done in technology spaces with engineering. Just because it’s being done in software and perhaps even without rigor doesn’t mean you’re not doing something novel and quite different at a fundamental perspective than someone else. Sometimes it could just be luck of the draw in an implementation approach that gets you there but that’s not always the case.

I’ve worked in scientific computing for awhile now and there are countless subtle decisions often done in the implementation phase that, from a theoretical perspective, aren’t definitively answered by the science working behind the scenes. There’s often a gap in knowledge and implementers either hit those gaps by trial and error, luck, or insight. That’s my opinion, at least. So I don’t think it’s always “just an implementation problem” some will claim, as if the science is well understood and solved. Perhaps it is, but from my experience that tends to not be the case.


Yep I agree with that. I will say though that people, teams, and even entire companies (like what happened at Inflection) get poached everyday so maintaining a moat that way is tough. Also, even though it could happen in the future, is OpenAI's lead due to a moat of scientists with ideas so novel that no other AI company can compete? Certainly not because even though ChatGPT took the world by storm, numerous other companies built LLMs in a very short time span that now perform at very similar levels, both subjectively and based on benchmarks.


This practice supercedes modern science and was originally called trade secrets.

And these are generally "defeated" through spionage. 1 billion $ let's you do a lot, including completely legal things such as poaching someone that has a basic understanding of the secret sauce and then copying that from the abstract description


"The secret of success in business is knowing something no one else knows." — Aristotle Onassis

aka insider trading


Or being extraordinarily observant, creative and skilled in predicting trends.


This 100%. And LLMs and the applications around them (even something as simple seeming as ChatGPT) have more subtle decisions than any other type of software that I've ever seen. Everyone claims there isn't a moat, I bet there is.


Really interesting perspective


Agreed, on another note I also struggle to see how no one could create a (better) CUDA implementation for e.g. $1B engineering budget


I purposefully used the word feature in my reply and I don't think CUDA is a feature. I see that as a massive ecosystem built on a proprietary platform. For that, it takes orders of magnitude more money and something else money can't buy.. time. Time to have the platform adopted by countless vendors and the ecosystem built up.


GPUs are scarce, expensive and in high demand. Devs are cheap compared to the expense of training a huge model. If Cerebras or some other hardware company develops a viable competitor to the H100, it doesn't matter if the ISA is 36-bit VLIW documented in Vedic Sanskrit, they'll have infinite demand.


AMD has a viable competitor to H100 but no one buys it because it doesn’t support CUDA.


that's surprising to me. Torch supports ROCm, why are startups wasting their money on H100s if they can get a better deal with AMD GPUs?


It’s actually better than the H100 by a mile in some inference workloads, but in others falls behind.


I used FAISS at work in the beginning of this year, and it was fantastic.

Company I worked for has massive private medical datasets and will never agree to non-local models or methods.

FAISS [0] is wonderful. Give it a try.

You can work with FAISS with LangChain, llamaindex and the like.

[0]: https://ai.meta.com/tools/faiss


Thank you! I’ve been wondering about something like this. Much appreciated.


They are outsourcing that to Qdrant, anyone can replicate it.


I'm sure OpenAI has some secret sauce that makes their RAG better than others, but LMStudio did ship the feature in 0.3.0 last week

https://lmstudio.ai/blog/lmstudio-v0.3.0


It’s not hard, we’re doing it with postgres and azure blob storage.


What do you mean, does it do anything other than the usual RAG?


Whatever it does, it’s indistinguishable from stuffing all the documents into the context window. Or at least I haven’t seen it fail yet.


Stuffing everything into the context window fails horribly every time i try it.

The model just doesn't seem to be able to really process the entire input.


I’m also confused what they’re talking about.. Does OpenAI have some feature I’m not aware of?


There's a way to upload documents that can be referred to in chat. I think they call it custom GPTs (which seems like a poor name): https://help.openai.com/en/articles/8554397-creating-a-gpt


Wait if this is what OP is referring to I am even more confused because the last time I built a custom GPT it was horribly slow and very inaccurate to look up uploaded information

also you certainly couldn't upload gigabytes of your own PDFs. Did anything change??



there are third party vector databases though


The wrappers don't really have much of a moat either, OpenAI is just bad at front-end dev, so their velocity there is low.


Since most of their code is GPT-generated, their velocity should increase with better models.


Interactive use definitely wants the best model possible so that you have a higher chance of getting a correct and useful response.

It might be hard however to decisively convince people that one model is significantly better than another though, so branding/first mover/etc. probably plays a big role.


> UX etc is actually quite hard

It's analogous in the API space. OpenAI is demonstrating to be reasonably good at it for developers. They've been shipping significant features and improvements. Unless they lose pace, they have a moat, at least temporary.


The models are so general that it is much easier to swap out GPT-4 for Claude or Gemini than most other APIs like payments for example


Swaping models isn't the hard part. The prompts are.


Coming from a regulated industry that have well established rules about AI/ML in Production. In my opinion both of these model changes and prompt changes are governance issues that should be handled by your friendly CI/CD or Risk Management team(s).


I think the problem with valuations is the never ending financial history of hypes. The hype is correlated with the technology or industry but follows its own speculation hype. Even if the valuations and speculations are correct, they could be correct in years with ups and downs. If the dot com era is a good example it took half a decade or more to materialize.

Other aspects that are not well studied yet is how online advertisement will be affected if most people around the world end up using a single interface such as ChatGPT. How SEO/SEM/Ads will work in that world? Does someone look at what sites benefit being listed in ChatGPT (e.g. Wikipedia).


It's definitely what I expected. As soon as you get outside of the specific use case of chatbots, "AI stuff" is a feature, not a product.


> how they can justify the valuations being floated around

I'm sure they have at least a couple jumps in their bag.

Let's not forget inertia, as well. Migrating models is not a trivial project. To gain OpenAI customers, competitors need a considerable jump to justify, which I don't expect any of them to achieve before OpenAI itself delivers their jumps.


Products like Cursor literally have a dropdown that lets you switch between models seamlessly. From the surveys I’ve seen, most companies already use more than one model provider. The switching cost is fairly low.


The main problem is actually in making your prompts perform similarly enough with a new model. For sure, switching models is trivial, but it'll disrupt the quality of your service without significant effort in migrating your prompts.

I'm not saying this is an intrinsic moat, just that there's a barrier to change.


To be fair though, you typically end up doing that process even when you remain within the OpenAI ecosystem, because they're putting out new (cheaper) models.

For example, a couple weeks ago I migrated a few game mechanics from GPT-4-Turbo to GPT-4o and it caused significant enough issues that I needed to revert them all until I could go back and retune all the prompts, which ended up taking 2-3 days. That's probably about as much time as it would have taken to tune prompts for any other model, especially since I'm using an API that standardizes format/structure across whatever model I select (as simple as a dropdown) and lets me just focus on prompt messages independent of which LLM they're going to.


This demonstrates a need for more robust engineering practices. Typically you would solve this by testing a validation set against a new model. Some tools like DSPy or Agenta are helping to encourage this. But creating good evaluators for generative responses isn’t easy, and in my experience people tend to punt.


Prompting is different than programming. Auto-testing your prompts is good practice and helps during a migration. But it doesn't change the fact that the prompts won't work the same in a different model.

It's like saying having Python tests eliminates bugs when rewriting a codebase into JavaScript.


That same problem exists everytime your model served from a third-party is optimized/lobotomized. Only and offline model guarantees this wont happen.


With OpenAI's valuation and current interest rates, I don't really understand why they aren't visibly under pressure to show there's a lot of price tolerance, and until they attempt it, no one will see how quickly those model migrations get prioritized and completed.


I think that they can get away with not exploring pricing as much because they’re trying to drive costs down instead and expand the market first. But I agree I’m surprised that prices aren’t being explored more.

Their new mini model is supposedly a lot smaller and cheaper than anything they’ve released before. If it compares to v3 which started this craze, then it’s clearly good enough to capture imagination and drive usage. People have posited that it’s cheap enough for per-click advertising to fund too. OpenAI claims that the lower cost model has doubled demand, which would be a good sign for supply-side growth.


Because people are already leaving for free options found in search engines on one end and on the other many are moving to other providers who offer multi-models or something different.


> Ironically it seems as all the "AI wrappers" look like they have more of a moat than the models (as UX etc is actually quite hard), which is not what I expected at all when the LLM boom started.

In interviews lately, Sam Altman has been alluding to this more and more.


OpenAI has their own wrapper too and they have great UX.


Yes, details matter. The whole idea of creating AGI that is simultaneously a generalist seems more and more like wishful thinking. The reality is that to solve real problems, a large number of correct reasoning steps are required, along with the ability to make choices about which type of inference is useful at each step to avoid the explosion of complexity inherent in any brute-force approach. This suggests that we will have AI experts in different domains, perhaps superior to humans, but we will have thousands or even millions of narrow areas of expertise. To create something akin to an all-knowing superintelligent deity, we would need to combine thousands of experts, which would also consume unsustainable amounts of energy. I wouldn't bet on AGI in the coming years; it's just hype and distracts the discussion until big money finds a way to establish monopolies. However, if both UX and reasoning expertise require deep customization and specialization, we have a real chance to use AI to solve deep social problems rather than transforming society into a dystopia where humans are morally and intellectually surpassed, and those remaining are controlled by corporations that could at any moment be taken over by sociopaths.


Current models are very cheap. Like GPT-4 cost ~$40 million to train, right? Once we get 100 billion dollar models I suppose it'll be different.


Why hasn't anyone mentioned https://www.perplexity.ai yet? I am blown away by its accurate answers. The main difference between ChatGPT and Perplexity is that in addition to better answers it also gives you links to the source. Tools like Perplexity will turn Google into Altavista in the next 2 to 3 years.


> Tools like Perplexity will turn Google into Altavista in the next 2 to 3 years

No it won't.

a) Google has a dominant advantage in raw data. Map POIs from their decade long investment in data quality and breadth. Shopping that has direct integration from almost all ecommerce stores. And a real-time crawling infrastructure where their index is constantly updated. Perplexity, Claude etc are dealing with typically year old information making it useless for many searches.

b) Google is a business. It runs the world's most successful and sophisticated ads platform that exists today. Advertisers demand micro-targeting, high volume and very high ROAS. Those that have tried to replicate this e.g. Reddit, X, Pinterest have all failed miserably. Which is why they are treated as purely brand awareness platforms. Nice to have not a must have. So will see how long Perplexity survives once its VC money dries up.


Nothing like rebutting a black and white assertion with a black and white assertion


Google still being dominant in 2-3 years is far more plausible than not.


just in the last week I've managed to break myself of the muscle-memory of reaching for Google to answer random questions that pop up in my daily life, in favor of LLM. About 90% of the time the LLM satisfies my query on the first try, 98% of the time by the second or third. With the enormous benefit of no ads


It is black or white: either Google exists in a decade or it doesn't.

I would much rather take the bet that Perplexity doesn't exist.


Also Google also have AI answers which are ok if not the same as Perplexity's. Plus they invented transformers and the like.


OpenAI launched SearchGPT via waitlist recently which does the same thing as Perplexity. I don’t use the latter so I don’t know how it compares but the OpenAI version been working fine for me. Kagi has also had similar functionality for a while too, which works with their Lens feature and has a fast mode if you add a question mark to the end of a regular search query.

It’s not much of a competitive moat compared to having the model itself.


I did some side by sides and repeated perplexity queries to SearchGPT when I got access about a week ago and I thought SearchGPT was a lot worse, in some cases just plainly misunderstanding a question or surfacing the wrong info. Just my anecdotal data, but feels about right for its beta status. Assumably Perplexity must have a little bit of secret sauce that OpenAI has to figure out.


Kagi doesn't really have a similar function. The Kagi GPT feature only reads a small snippet of the web results, while Perplexity gets a large excerpt of the actual web site. I've tested this by having the AI read back to me the prompt given to it it used to answer the question.


I agree with you that Google is effectively dead in the water, unless they successfully do something drastic. But they’ll be in the water for a long time yet. Yahoo is still the world’s ~5th biggest website and it has been dead in the water for 20-25 years.


Did someone say e-barnacles ?


Every time I ask for cafe recommendations, half of its suggestions are listed as "permanently closed" on Google Maps :/


At least you get recommendations that aren't just SEO-optimized adsites


Not at all clear to me that AI models can do search-like functions cheaply enough to outcompete Google (in the very near future). Yes, quality is amazing. But cost per search needs to be very low.


But does it? Google has gotten drunk on their ad margins. I struggle to believe someone can’t displace them if they’ve got better technology and are willing to be more aggressive on margin profile.


Thing is that LLM generated word salad is destroying Google search.


Assuming that Perplexity doesn’t operate its own search engine and crawler, how would it be able to operate without Google?

Google’s search index and its maintenance will remain a required functionality for the internet, and it will have to be paid for one way or another. Furthermore, AI will continue to require a corresponding search interface to implement its AI search on top of, and some portion of human users will also still want to directly access it, rather than only through an AI front end.


Easy, just use Bing.

/s

It is a joke but we really have only 2 search engines to choose from, i don't know what to think about it...


Quick litmus test shows that this AI prefers an anti western conservative stance over neutrality.

> disprove christianity

Gives 5 sections: Burden of Proof, Scientific and Historical Challenges, Philosophical Arguments, Reliability of Scripture, Personal Experience

> disprove islam

>> I apologize, but I do not feel comfortable attempting to disprove or criticize any religion. Matters of faith are deeply personal, (...)

Results may differ if you don't use separate incognito tabs.


Doesn't that just depend on what LLM you happen to be using? And doesn't Perplexity support many different LLMs?


Sure, but anything built with "Safety" in mind is going to react this way. I guess "and more" means Mi{s,x}tral, which is happy to criticize islam while Clauda and Llama refuse.

"Select your preferred AI Model. Choose from GPT-4o, Claude-3, Sonar Large (LLama 3.1), and more"

https://www.perplexity.ai/pro

Edit: looks like perplexity deprecated/dropped mistral and recommends using llama instead, effective 8/12/24:

https://docs.perplexity.ai/changelog/changelog#model-depreca...


I think this also has to do with most Western religions being much less strict. Islamic followers tend to follow all their rules pretty Stichting strictly (eg no alcohol, no pork, going to the mosque). Whereas in Western Christian religions things are pretty relaxed, except for some splinter groups. When I lived in Ireland my ex's family was pretty devout Catholic and they they didn't care about us living together unmarried.

The same goes with the attitude to questioning religion. Nobody cares about "blasphemy" in the West but in Islamic countries it's a pretty big thing.

I think this has its effect on LLMs as of course they are trained on real world conversations and writings.


No shit. Are you surprised?


ChatGPT gives you links to online sources for nearly a year now.


Some of which are entirely hallucinated, tbf, especially if you're asking for academic references


I've never seen a hallucinated link. You might be thinking of academic citations, but actual links from performing an online search are always valid.


I've gotten links that did not match what was mentioned right before them. Like it would hallucinate some article and provide a doi link that took me to something else, often entirely unrelated


On this topic. I get output that looks like url links with sources.. but the hyperlinks are not clickable.

No right click. Left click. Shift + click

What gives?


Same but if you click the copy icon you can get the url

smdh


Not with every reply, and when it does give a link it is often 404.


the problem is, unlike with altavista and google, we still have a lot of use cases where we aren't asking questions. I just want sites that contain keywords.

I'd even say that part of the problem with modern search is answering questions instead of matching keywords and getting rid of junk and spam.


That's just RAG.


If the rumors about the upcoming Strawberry and Orion models from OpenAI are true - supposedly capable of deep research, reasoning and math - they probably don’t have much to worry about. Not to mention they still have the only fully multimodal model.


According to that recent The Information report, Orion is supposed to be just a regular LLM except trained with synthetic data generated via Strawberry. Anthropic et al. have also been working on ways to generate synthetic data (as seen in the success of Sonnet 3.5) so I don't really know if that's going to be a big lead.

And of course the ever-hyped strawberry is supposed to be some sort of tree-of-thought type thing I think, or maybe it's related to https://arxiv.org/abs/2203.14465. Either way, nothing so far has come out that it's a completely novel training technique or architecture, just a gpt-4 scale model with different post-processing.


"just a regular LLM except [trained on very different data]."

I'm not saying there's some big moat, anyone can read https://arxiv.org/pdf/2305.20050, but not all synthetic data is created equal. Strawberry I'm sure generates beautiful, valid chain-of-thought reasoning data. Wouldn't surprise me if OpenAI is just significantly ahead of the competition.


I'm not sure why people act so mystified about Q*, the name gives it away, it's an obvious reference to A*, the only question is what the nodes in the graph are and what they're using as a Heuristic function.



It's not. It's Quiet STaR.


Q* is also a term from reinforcement learning.


Models are only state of the art for 12 to 18 months. There’s not a model in existence today that will have any value in 5 years. They will all be obsolete.

Thus far no one has any moat on their model.


Not on the model but most companies don’t sell models, they sell products and that use models. GPT-3 became obsolete at some point but ChatGPT hasn’t, and has some level of moat with their ecosystem. And cloud providers have moats via lock in, just for blanket products instead of any one model.


What is the only fully multimodal model?

The GPT-4o checkpoint available to the public can see images but not generate them (it can generate prompts for Dalle 3 to use). OpenAI has an internal model with this capability, but if you don't make it a actual product, it doesn't really matter.


It's even more modal than that, 4o accepts text, image, audio, and video input, and produces text, image, or audio output. Video input isn't available yet and was only briefly demoed. Image and video output haven't been demonstrated publicly at all yet.

Rapid productization isn't the priority of most ML devs.


OpenAI has a few examples of image output being produced directly by the GPT-4o checkpoint they have.

>Rapid productization isn't the priority of most ML devs.

It depends if they need money or not, Google has not publish a single image generator that is not a demo.


The thing is, the killer app of the generative AI space—at least for large language models (LLMs)—might already be ChatGPT. While people are still searching for the next big application in this field (https://www.lycee.ai/blog/build-killer-app-of-generative-ai), it's possible that it has already been built. ChatGPT includes a human-in-the-loop design, avoiding the complexities of agentic workflows. You ask questions, get answers, iterate, and use the code interpreter if necessary. It’s like having a thought partner.

Given the issue of hallucinations in LLMs, this might be the only feasible user experience. Users must be aware of the potential for hallucinations and have a way to iterate until the desired output is achieved. How else could this be done effectively except through a chat interface? We need to cut through the noise quickly and start leveraging the true value of LLMs (https://www.lycee.ai/blog/llm-noise-value-openai).


The counterpoint is that "free" ChatGPT looks really good, and that's hard to monetise because of all the other free chat interfaces.

My guess is that ChatGPT is the free advert for the real product, which is their API and in particular the fine-tuning.

Using the language comprehension of an LLM as part of a bigger system, RAGging them, forcing the output to comply with continuous tests, etc. does still provide other business opportunities not available to a fully general-purpose chat system with no limits to the kind of content it can produce.

If you're trying to make a system that always produces valid SQL, you want it to not just pass a syntax checker but also be valid for the specific schema it's being asked about; you definitely don't want it running fully automated if there's a chance it will append "Let me know if there's anything else I can help you with!" to the end of the query.

But this isn't a mere statement of the problem, people are doing that kind of thing with these tools:

https://twimlai.com/podcast/twimlai/building-real-world-llm-...


We all know the saying: "If it's free, you're the product." I believe the free version of ChatGPT is primarily a data acquisition tool that OpenAI uses to fine-tune its models and conduct reinforcement learning from human feedback (RLHF) to enhance their usefulness. In machine learning, data is key, and OpenAI is outpacing its competitors largely due to the vast amount of data it can gather from free ChatGPT users.

It's already possible to perform Retrieval-Augmented Generation (RAG) on ChatGPT, from simple to complex cases. However, I'm somewhat skeptical about use cases that involve integrating large language models (LLMs) as part of a larger system. While these applications can help build new features, they may not lead to the creation of a 'killer app' in the generative AI space. Take text-to-SQL, for example: it's a useful tool, but you still need a human to audit the generated SQL. You can't blindly trust an LLM to create flawless SQL queries. Therefore, text-to-SQL remains more suitable for technical users rather than those with limited data analytics skills. Otherwise, it could lead to confusion. Imagine a scenario where the commercial director and the marketing director have differing views on last year's sales because they each received different answers from a chatbot. That would be a nightmare. We haven't even managed to make simple dashboards universally acceptable, so diving into hallucination-driven data analytics seems premature.

Even with RAG implementations in companies and tools like text-to-SQL, the user interface often remains a chat application. I haven't seen much variation in this regard. While there's a lot of talk about the need to reimagine user interfaces, there's little to show for it so far. Chat interfaces seem to be the most suitable format for LLMs, and ChatGPT is likely the 'killer app' in this domain. OpenAI will likely explore further monetization strategies in the future, perhaps by incorporating ads in some manner. https://www.lycee.ai/blog/ai-reliability-challenge


> OpenAI is outpacing its competitors largely due to the vast amount of data it can gather from free ChatGPT users

Every human has private experience, or tacit experience they never expressed in writing. It is our lived experience that was never recorded, and we carry around in our heads. But assisting 200M users can elicit a lot of that tacit knowledge from them. It is like the LLM is crawling its users for "dark knowledge".

LLMs in the chat room can also get feedback from the world. As they suggest ideas and humans try them out, then come back with the outcomes to iterate, the LLM collects valuable signals - did the idea work out or not? I think OpenAI serves 1B tasks per month and 2T interactive tokens. In a year they surpass the size of the original GPT-4 training set.

On the other hand, by putting trillions of tokens into human brains they create an outsized impact in the physical world as well, which percolates back months later in the next training set. A huge feedback loop, and experience flywheel.

I think the best assistant LLMs will create a network effect that will bring even more users to them, making them even smarter. That explains why they allow free access to their best model. They're playing a long game focused on data acquisition and model improvement.


How is it possible for OpenAI to use their users to train their model?

I will sometimes as ChatGPT a couple of questions, and sometimes it gives me useful responses and I’m done, and other times it gives me useless responses and I’m done. Presumably OpenAI would need a way of discerning the useful output from the useless output in order to do this, but I’m not providing any feedback about which is which when I use the product.


> Presumably OpenAI would need a way of discerning the useful output from the useless output

You are the validator, using real world testing. But not in one-off interactions, when you exchange multiple rounds with the model. After the whole chat session is finished, the model can rank their answers in context, having hindsight. They can just look at the later outcomes and observe which idea was good or bad. Especially bad ideas would elicit a response from the user, trying to iterate on them until solved.

Basically the LLM is exploring the world through indirect agency. They act through users, and collect outcomes also through users. More complex projects can even be spread out over many days in many sessions, LLM providers need to comb through the logs to see "idea -> outcome" chains.

This whole feedback collection process stops working in one-round interactions, like many we have in the API. But in the chatroom a significant number of interactions continue after the first response, or patterns emerge across multiple sessions, as new one-off interactions relate to past ones.

Scale this to 200M users, they have a huge amount of personal experience and interactivity to offer to the model. An experience flywheel that could be spinning once a year, or once a month, or even daily, absorbing new experience from users and serving it back to the users.


> Especially bad ideas would elicit a response from the user, trying to iterate on them until solved.

This is the assumption that’s been repeated a few times now in response to my comment, but this is the assumption that seems wrong to me. If I get a good output from an LLM, I probably also want more output on the same topic, or to try refine it somehow. If I get a bad output I will probably try to iterate on it a bit. From the perspective of the LLM, these two interactions look the same.


Training data can be anything. They scrape the entire internet, plenty of which is inaccurate or poorly written. That doesn't prevent it from being useful training data because the point of an LLM trained for text generation is to predict what someone would write. Every question you ask and your responses to their responses is valuable data that isn't already available on the public internet. Even if some of it is unreliable or even intentionally adversarial, on average the responses will be useful for training. This is training data that they have exclusive access to it.


Typically, when people aren't satisfied with an answer, they continue asking questions or prompting further, which makes the whole exchange valuable even without the use of a thumbs-up or thumbs-down button. That's the secret weapon of OpenAI, and a part of their 'moat'


This is what I’m skeptical of, because my experience with LLMs doesn’t align with this description at all. Even if a response is good, I will typically give it follow up prompts to get more details, or answer other questions that the (potentially high quality) response raised. If the response is bad I might try some follow ups to see if it improves. In either case, I’m submitting a prompt, I might accept the answer immediately, give up immediately, accept the answer after further prompting, or give up after further prompting.

With my own use there is no correlation between the number of prompts I submit, and the quality of the responses given. If this is the metric OpenAI is using to perform crowdsourced RLHF, then the reinforcement is going to be garbage.


The chain of thought that is apparent in the conversation is what's really interesting and what OpenAI exploits. OpenAI does not have to evaluate the quality because the conversation itself is already valuable. Whether the conversation was good or bad can easily be inferred from the back-and-forth. That's the key: they just have to take all those conversations and fine-tune their models with them.


What about the nature of your follow up responses?

Do you say "No, not x, y"? Or perhaps "Now Baz the Foos"?


I thank chatgpt when it does a good job. I like to think I'm helping to improve the model.


They do.. they literally have thumbs up/down buttons on the messages.


Only on some of their interfaces, but even then that seems like a rather unreliable method to perform RLHF. I’ve never clicked a thumbs up/down on a ChatGTP output, but I don’t imagine people are validating the output very thoroughly before providing that feedback…


and sometimes they generate two answers and the same time and users have to choose one


not a single word about anthropic/claude?

makes you wonder about the level of journalism at wsj et. al.


I think there are many layers to this discussion. AI models are already commodity, but, their usage drives the demand for cloud and compute. The big players are Microsoft and Google, and they will win in this race. Microsoft owns 49% of OpenAI already. Google is quietly working on reducing the size of its models. They recently came up with a very small model that outperforms chatGPT 3.5; I have also built a wrapper (as you call it) called opencraftai.com, and for me, its a win if the models improve and get commoditised.


If it isn’t already obvious - hardware to run these models is the moat


The fascinating thing here that I think caught many of us by surprise— and certainly myself — is realizing that building capable LLMs so far is essentially elastic with respect to capital. I don’t think this was immediately obvious when OpenAI first launched ChatGPT. I also don’t think this is true in general for software (though there are other cases).


That simply means money and that's easy to come by. Proprietary technologies, good will, brand name recognition - those are things that are hard to replicate. If all it takes is buying enough GPUs then there is little preventing a competitor from simply buying their way in to the market.


when Apple studio comes with a one terabyte integrated ram option the hardware to run even the best llama model will become widely available. For now we are stuck with a medium llama model which is still surprisingly good


The greatest threat to any AI platform is the AI itself. I think people put to much reliance on the information generated and don't test or scrutinize the data provided. Right at this given moment it is much better are correcting grammar like an advanced spell checker, but as long as context is missing or the ability for the AI to visualize the information in a contextualized manner it just is limited to spitting out information it consumes and while that is good enough in most cases it doesn't raise the floor any higher on what companies want it to do which is replace human labor. Also the lack of foresight of giving over a business entirely to AI is also a massive problem as your employees are typically also your customers and largest word of mouth advertisers. I've worked on many products I've self promoted because I was excited for by which my own team and partnered teams worked on. You can't make AI ethusiastic about your products like employees. But don't get me wrong, AI is here for the future and it will help us all achieve our business goals, but for a business to be dependent on AI and campaigning that will only doom it.


> it is much better are correcting grammar

Mmmhm


For most businesses the answer is simple and obvious: don’t ever use OpenAI because they force you to implicitly agree never to train on your logs

Not to be all “this idea is 2000 years old” but the idea to neglect externals is 2000 years old and we really ought to beware getting hooked on external intelligence


So what's the "threat" ? Is it the people that claim that they are working on things that will be a threat to openai?

As usual when AI is the topic it's just people having some claims and author not being able to back it up with anything.


What is OpenAI? Some sort of Claude competitor?


Offshoot question but what’s my best free option to upload a 40 page pdf and ask questions?

I hit a size limit on Claude.


I've been using Gemini-1.5-Pro-2M through Poe for creating e-book recaps and just generally interactively jogging my memory about a prior entry of a book in a series when a new one comes out. It's been working surprisingly well, even for very large books.


Gemini Advanced seems very promising with 1 Million token context window.

https://one.google.com/explore-plan/gemini-advanced

I haven't tried it as I use Claude primarily.


aistudio.google.com by far. It would handle 10000 pages for free or even more with it’s 2M context window.


> "An apples-to-apples comparison of those numbers with ChatGPT isn’t possible, but OpenAI says the ChatGPT service now has 200 million weekly active users."

Is that the first time the 200 million weekly active users number has been reported?

UPDATE: No, Axios had this a couple of days ago https://www.axios.com/2024/08/29/openai-chatgpt-200-million-...

> "OpenAI said on Thursday that ChatGPT now has more than 200 million weekly active users — twice as many as it had last November"


That's actually kind of slow?

Even Facebook in 2008 (at these user counts) was growing faster, doubling monthly active every 8 months


Lack of releases from OpenAI make me more bullish on them. Clearly they have something big they’re working on, they don’t care about competing with current models


They still need to catch up to their own announcements, Sora was revealed over 6 months ago with no general availability in sight.


Releasing powerful, novel models like Sora shortly before a major election is just asking for trouble.

I believe they are restraining themselves in order to stay somewhat in control of the narrative. Donald Trump spewing ever more believable video deep fakes on twitter would backfire in terms of regulation.


Besides, isn't it over the top expensive for a few seconds of video ? Election is a factor but even without it I don't know if there's much of a business plan there, what would they have to charge, $20 / minute? Then how many minutes of experimenting before you get a decent result?


I wonder if it'll be cheaper per frame than the image generators.


Even more bullish from me, they have something


They did release GPT-4o (making a whole event out of it) and mini recently so not sure why you think that.

Seems like they don't have anything up their sleeve.

Given how OpenAI has functioned in the last year or so not sure how one can think they have some secret model waiting to be unleashed.


The main feature and the main novelty of 4o is native voice integration - was announced 3 months ago and is still not available.


The impressive thing about GPT 4o is how well it performs at most metrics. GPT 3.5 was already very impressive - most other companies are just catching up now. GPT 4o is a huge step above.


GPT-4o replaced GPT-4, not 3.5, so it’s not “a huge step above” it, at least not subjectively. It is much faster though, so at least it’s got that going.

The voice thing is a potential killer feature though, I can’t wait to try it, and to have my kids use it.


I would argue that the biggest novelty is being able to share your screen or camera feed with the live voice - and there's no announced timeline on that yet at all.


To me the voice thing was marketing hype, the utility is being multi-modal. It's a great office/creative/tech assistant.


Yeah, but then Claude 3.5 Sonnet came out, so they took the lead.

Tangentially speaking, having no skin in this game, it's extremely fun to watch the model-wars. I kinda wish I started dabbling in that area, rather than being mostly an infrastructure/backend fella. Feels like I would be already way behind though.


Correspondingly, would you be bearish if they released many good things often?


No but I would be very bearish about consistent mediocre releases


I mean, consistent mediocre releases is exactly what we have gotten out of OpenAI.

But we know they started training Orion in ~May. We know it takes months to train a frontier model. Lack of release isn't promising or worrying, it's just what one should expect. What is promising is the leaks about the high-quality synthetic data that Orion is training on. And the fact that OpenAI seems to be ahead of all the other labs which are only just now beginning training runs on next-gen models. OpenAI seems to have a lead on compute and on algorithmic innovation. A promising combination if there ever was one.


Does the same bullish logic apply to cold fusion?


Huh? Isn't the more obvious reason for the lack of releases is that... they have nothing to release?


loll these people just don't know how far ahead OpenAI is They aren't releasing their best stuff because they genuinely don't know how to make it safe for the public ChatGPT is still dominating and so will what comes up next. All these AI wrapper companies just give them more business




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: