After posting my recent articles about ChatGPT, I have continued to have multiple discussions surrounding the “intelligence” of these tools. One common thread that comes up is setting the proper environment , or tone of the conversation, before delving into using these GPT AI tools. According to both my AI enthusiast friends as well as several people that are in the field of designing and working on artificial intelligence technology, the initial “prompting” of the AI engines is important.
With that in mind I’ve been spending a good part of my days setting up several different chat threads in ChatGPT 4 with different goals in mind. In one of these threads I am trying to get ChatGPT to serve as a technical advisor; A tool that can either replace or at least augment my use of general search engines or Stack Overflow queries to solve both menial tasks as well as delve into deeper problems.
I told it a good bit about my background in those areas of technology. I told it about the project I was working on, describing the tech stack in some detail. After a few dozen prompts about what I know, what I’m looking for, and even my current reference sources, I started to dig in.
ChatGPT As A Coding Assistant? Not Yet.
Before I get into the details, let’s cut to the overall summary — you simply cannot trust a service like ChatGPT to provide anything close to reasonably decent technical solutions. Sure, basic concepts it can handle for the most part, but in general when accuracy and precision count it is not very good doing the job right. With coding in particular, being accurate matters.
While there are many examples where you ask ChatGPT to do something like “create a function that does X” and it gets the job done, but holy hell it really goes about it in a convoluted way. Or worse, creates unintended side effects elsewhere in an app while providing a technically accurate output from the function it provides. However those examples are hard to follow and explain without a 30-page dissertation on what I, or many with expertise in the subject, would consider inefficient or inaccurate algorithms. Instead, I’ll provide a simple example that came up just moments ago.
Keep in mind this is after days of using this same thread to present various issues as well as discovered solutions so it could learn even more about the project and create a deeper sense of the context in which I am working.
ChatGPT Cites Prior Work Well
Just the day before I asked ChatGPT to provide a simple bash script, one I’ve done dozens of times before but after hours of coding my brain was tired and lazy. I asked a simple question that I could have figured out in a few minutes if I took the time to try, but I also wanted to see if ChatGPT could come up with a quality answer on something that has been online in thousands of examples for decades:
me: How do I write a bash script that changes directories and returns to the original directory?
It came back with a super simple example that was accurate, and if you search with standard search engines you’ll find similar code there.
# Save the current directory
# Change to the desired directory
# Perform your operations here
# Return to the original directory
It even provided an entire dissertation on how this works and why and even went on to suggest using the built-in pushd and popd solution. IMO those options should have been listed FIRST over the above environment variable as they provide cleaner more concise code and as part of the original Linux (Unix c-Shell actually) core tend to be more efficient in memory and execution speed.
ChatGPT Fails A Basic Code Assistance Request
One of my current projects is updating my SaaS application as I prepare it to serve as the underlying API and data store service for a new travel app I am working on. Along the way I have been refining code. This included adding a stack of new public variables to a class in order to follow best practices, especially for PHP 8 environments where dynamically defined properties are not a good idea.
I added the new variables to the end of a list of public declarations, then took the entire segment of code and asked ChatGPT to perform a simple task:
Me: The following input is a list of PHP variable declarations. Please sort these by variable name and provide an output format that can be directly pasted into a PHP file.
ChatGPT: Here are the provided PHP variable declarations sorted by variable name, formatted for direct insertion into a PHP file:
You can directly paste this sorted list into your PHP class definition.
Pretty cool, it understood the task and even realized this would be part of a PHP class definition despite my not providing that as a context. Maybe it inferred that from a prior related request earlier in the conversation.
But hold on… what in the hell…
Where in the HELL did it come up with the variable “label_find_button”? That’s new!
And where is the original “label_for_find_button”.
ChatGPT could not handle the simple task of alphabetizing a variable declaration list. The sad thing is that these Chat engines are supposed to be good at langauage-related tasks, which I would think includes alphabetizing things. I’d also think it would sure as shit not substitute words along the way.
ChatGPT Gets It Wrong
While I have a hundred theories as to why this might have happened based on. my very limited knowledge of tokenization and language modeling, it is still wrong.
Horribly wrong for so many reasons, not the least of which is the apparent complete lack of understanding (knowledge? intelligence?) that this is a CODE ALGORITHM. Or maybe it “knows” this is code, but lacks the “intelligence” that variable names including their EXACT spelling is super fucking important in just about every single programming language there is and should not be changed randomly.
I’m not sure where the point of failure is in the knowledge/intelligence chain, but that is a pretty basic premise to just say “fuck all” and go running spell-check or grammar-check or whatever “proper writing style” logic it decided it needed to employ without my implicit instruction to do so.
After I pointed out the error, then wrote a more explicit instruction to “not fuck with anything”, it did the job correctly.
If You Cannot Trust The Output, Is The Tool Useful?
Here lies the crux of the problem with services like ChatGPT – it makes mistakes. Lots of them. However the current hype cycle around AI has so many people using the tools and being OK with the results that it is starting to feel like far too many people are trusting whatever ChatGPT-and-Friends tells them as “the God-forsaken truth” and are turning down their bullshit filters.
This is not just happening with coding issues like the simple example above; Based on conversations with the AI people in my life, some of them know full well services like ChatGPT are full of shit most of the time. They will come straight out and tell you “yeah, I have to edit and fix and rewrite what it produces” — sometimes to make it readable or to fix some basic constructs, but often because some of what it does or says is just plain wrong.
If you’ve followed the AI hype you’ve probably heard things like “AI hallucinates”. I beg to differ. That is just the new way of saying “yeah, sometimes it fucks up pretty badly”. In other words it makes mistakes. And I’m not talking as in “sort of like humans do”, I’m talking as in just plain factually dead wrong. And that is important.
In my example I had to not only catch the error on my own, but then I could no longer trust the output of ChatGPT. I had to review the entire list it produced the second time against the original list. I also had ChatGPT check the two lists , where it did catch the error in the first iteration, but damn — just do it right the first time. In this particular case, the tool was NOT useful. In fact, had I gone the usual search engine route to find a basic list sort tool, I could have pasted this code and got back an unchanged result in 1/4 the time it took to use the AI tool. Hell, I could have searched the basic Linux file sort or bash sort command and did my own processing faster. But I wanted to see how quick AI would be — turns out very quick but WRONG.
And that matters.
My Thoughts On AI Today
Until AI is at a point where we can trust it well beyond the current 90% accurate , 10% off-the-rails hit rate on “truth vs. fiction” we have to ask – Is AI a useful tool? Is it really ready for prime time? I don’t think it is.
A fun toy, sure. But truly useful for the general public – maybe not.
Will AI do some really cool things we cannot do today? Sure. Someday. Even today it can do so in very specific cases, but often those “wow!” moments come after exorbitant amounts of time, money, and energy to coax AI systems into glossing over the bullshit and fixing its own mistakes so some corporate dudes looking for more capital funding can go out and say “look at this cool hugely complex problem AI solved in a week!” — after years of tweaking it to do that job “in just a week”.
Someday, ChatGPT, someday you’ll catch up. Just not today.
Feature Image by ChatGPT 4 via DALL-E