ChatGPT 4 / Dall-E Artistic Limitations

As I delve deeper into exploring the current capabilities of AI, I am sharing my journey. These articles will hopefully shine a light on why I feel the most popular AI tools available to the general public today, ChatGPT 4 and Dall-E, are not what I would consider truly “intelligent”. Interesting, entertaining, and possibly even useful tools — but not showing signs of what I would consider true intelligence by societal norms these days.

Here is my latest journey into AI generated art as I further explore the topic of “Artificial Intelligence”.

This is a more nuanced discussion about art. Having ChatGPT 4 ingest (read? what does read mean and can AI actually “read”?) the git missing submodules article, I had it create an art piece for it. The first attempt was decent, but in a typical telltale sign that the art was generate by AI, it created a garbage mash-up of words and nonsense symbols both around the base image concept and in an image “speaking bubble” associated with it.

While there is a lot to like, in my opinion that original chat bubble ruins the image.

Now this is where I think a true debate about intelligence can be had. Anyone that has some art talent and is intelligent could understand a request as simple as this:

“Please remove the caption bubble from that image as the words within the bubble are nonsense.”

I could instruct just about anyone with basic art or PhotoShop skills possessing what one would consider normal intelligence to remove that bubble and I would get the same image, less the chat bubble. However, AI absolutely cannot make minor modifications to the current image. At least not the default Dall-E + ChatGPT combination. Since ChatGPT touts art generation and “art AI” as a feature, one would thing any art requests would be in its wheelhouse. Instead, this is what I got back. Again, still cool, but not what I would expect from something posing to have “intelligence”.

Yes, the bubble is gone, but instead I get even MORE nonsense scribbles that are supposed to be words all over the image. The image is also only tangentially related to the original above.

So now, when I ask it to modify the original image and remove the bubble only it tells of its own shortcomings.

ChatGPT: “I’m unable to modify existing images, such as removing elements like the caption bubble from the previously generated image. My capabilities are limited to creating new images based on descriptive prompts. Since I can’t edit or alter images post-creation, the best approach would be to generate a new image that closely matches the previous one, but without the caption bubble or any elements representing writing.”

So at least it is “aware” of its shortcomings. Personally I feel this is not some sort of “self awareness” like a human might posses, but instead a standard human AI developer being instructed by management at AI to ensure that these tools cannot be used to make simple modifications to prior art. I’d go even further to venture as guess that this decision was made in an effort to avoid any direct legal copyright issues given how clearly these AI engines are stomping all over intellectual property laws and just laying claim to anything that ever appeared online to regurgitate it as its own original content.

That said, an intelligent being would know “hey I created that first image, and it has never existed before, and is my own 100% original creation” — which introduces a whole other topic here. Is it? Is it truly original? After all both ChatGPT and Dall-E have both ingested tons of information, art, etc. from other people. Stuff someone else recreated that it munges into a new form. Unlike humans, these pattern engines are not really creating something truly unique and never seen before. Sure, humans do the same thing often iterating over a post concept — but not always. And most can create something unique enough that there would be no concern over it pre-existing. There would be no concern that a client ask to “remove that chat bubble” would be created from something that my infringe on someone else’s copyright. The bubble would be deleted and we’d move on.

Chat GPT 4 / Dall-E “Hall of Mirrors”

In a “hall of mirrors” effect, I posted this article without any additional art other that the images above. I then asked Chat GPT / Dall-E to create an image to represent the article. It is posted here as the featured article image. The result was impressive. I even entered the following compliment into the ChatGPT engine, partly guided by the AI proponents in my life telling me I will get better results if I treat the AI like a “being” and give it praise.

Me: “Wow! That image is really fucking cool. Nicely done and good job getting it right the first time with limited input!”

ChatGPT: “Thank you, Lance! I’m thrilled to hear that you liked the image and that it resonated with the theme of your article. It’s always rewarding to see when a creative vision comes together successfully. If you have any more ideas or concepts for art creation, feel free to share them. I’m here to help bring your creative visions to life!”

Art Generators Aren’t Great A Following Direction

A day after posting this article, I was so thrilled with the art Dall-E/ChatGPT has created I asked it to create another image, and to not use text or symbols that represent text. It blatantly disregarded that instruction despite repeating it would not do so.

Clearly the ChatGPT AI does not understand plain human communication in English, despite being trained primarily in English language models. Cool stuff, but we have a long ways to go to match the standard definition of “intelligence” without caveats.

ChatGPT 4 / Dall-E Artistic Limitations