Research model catches bugs in AI-generated code, improving human oversight of AI.
See full article...
See full article...
another appropriate cribbing is caveat emptorSooo... the AI version of "Quis custodiet ipsos custodes?"?
I hope so b/c I'd love to use it to check my code before I send a PR. But the article should have noted more clearly that this is solely research - and there is no product offering called "CriticGPT."Will they make this available to the public?
It's always two dumb bitches telling each other, "exaaaactlyyyyy"
Context: https://knowyourmeme.com/memes/its-always-2-dumb-bitches-telling-each-other-exactlyyyyy
Act as assistant for now. Later on, maybe not so much. Then they'll just need a third system to replace the prompt-engineer/types-question-guy to eliminate the human aspect entirely below the C-suite level.OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI.
Yes, but think about how much more fun Ars comments will be when you ask the LLM to render them in the style of the Marx Brothers!And now they're taking away Ars commenters' jobs. Leave me with something, OpenAI!
Yes, but as the old axiom goes -- if all you have is a hammer, everything starts looking like a nail. And all OpenAI has is an LLM.As a general rule, isn't it usually preferable for the backup/correcting mechanism to be different?
Yep, it's insane.So it'sturtlesAI all the way down? Sounds about right for a company that makesturtlesAI.
Yeah, but, does it work? If annotators are choosing the generated critique over human, then I guess it does.It's chatbots watching chatbots all the way down
This is to aid the annotators, not replace them. Read the actual paper and/or article before posting. OpenAI's papers are among the most accessable out there.Self certification absolutely works. Just ask Boeing.
It largely is different. First off, they are feeding the supervisor model with lots of data related to mistakes that GPT has made in the past. The primary LLM is not trained on this data. This results in several new layers to the model. Even more important, they are quite likely playing some very cool games with the system prompt that is fed to the fact checker. After a while this can get very expensive because, you wind up with lots of different versions of the LLM talking to each other. You eat lots of tokens and use up more context memory than you would like to think about.As a general rule, isn't it usually preferable for the backup/correcting mechanism to be different? For example, aircraft with GPS still have INS, compasses and maps, etc. The idea being that if GPS is down, you can still use INS, and if INS is malfunctioning or it's a general computer failure, if you've got a compass, a map, and a good watch (and all pilots should always have all three), you can still navigate.
This is a basic safety engineering principle. Why the hell are they using the same technology as a backup/correction? Seriously, if they were designing a system deliberately, no one would do it this way. It reeks of desperation to prove that LLMs can do things that they are simply not capable of doing, by people who seem to believe that humans put together sentences in a similar fashion(!?!?!?!).
The maybe read the paper. This is to aid human annotators. Besides which, do you find bugs when reading over your own work? Because I do. It is possible to use a model to improve itself. This is proven.So the AI that has 100% confidence that the bullshit code filled with placeholders instead of functionality is somehow exactly what you wanted is going to be babysat by another AI that knows that's wrong?
I have doubts.
Language models can do that. OpenAI has an API to call arbitrary functions/tools which can include a shell, interpreter, compiler -- anything.But a big part of testing is actually running the code.
Aid the annotators .... by giving them another AI process to monitor for bad outputs? Shades of FSD.Yeah, but, does it work? If annotators are choosing the generated critique over human, then I guess it does.
This is to aid the annotators, not replace them. Read the actual paper and/or article before posting. OpenAI's papers are among the most accessable out there.
I did read the paper.The maybe read the paper. This is to aid human annotators. Besides which, do you find bugs when reading over your own work? Because I do. It is possible to use a model to improve itself. This is proven.
ChatGPT can run python scripts and such, but it can not compile code that I have ever seen, and I have numerous logs of it saying it can't run code*. Most visible when you ask it complex math problems. I think they changed settings so you only see "analyzing" now unless you change the setting in your account panel.Language models can do that. OpenAI has an API to call arbitrary functions/tools which can include a shell, interpreter, compiler -- anything.
The limit is creativity.