Are you the artist or are you the client?

A perspective on AI Artists and the rise of text to image based models.

Sep 05, 2022

An image of an artist painting in the style of Vincent van Gogh, generated by Stable Diffusion — “an artist at work, concept art, expressionistic, by Vincent van Gogh, minimalist” By Stable Diffusion

Before we can discuss whether artificially intelligent designed art can be considered art, I think it’s important to define what art is. That’s tricky, of course, because there are myriad definitions. However, for the sake of this discussion, I will define art as anything determined to be art by the beholder. If the hypothetical ‘you’ finds something to be art, it is therefore art.

With that being said, in the following piece, I won’t focus on whether or not text-to-image AI-created art is ‘actual art,’ but, if we are operating under the premise that someone has deemed AI-created art to be ‘art,’ who, in this case, is the artist?

Is it the prompt writer? Is it the model itself? Or is the answer murkier than that?

In the world of professionally commissioned art, in the simplest, most watered-down scenario, there are really two people involved in any project. A client and an artist. There is someone who needs art for whatever reason, and there is someone who can actually turn that need into a reality. In this scenario, the client has two main roles: payment, and to explain their desire to an artist who can realize that want.

After that, the client is just a gatekeeper on approvals. They play no actual role in the creation of the artwork outside of being either an antagonist or cheerleader to the person actually doing the creative work. Whether the client is a cheerleader or antagonist is their right, of course, as the client is paying the artist to meet their needs. A client communicates to an artist, and the artist interprets that communication, before translating it into some sort of output, either visual or auditory.

Now with some definitions cleared up, let’s discuss the case for you being the client:

Prompt engineering, the key challenge most AI Artists face, is not an artistic problem, it’s a language problem. You are not learning to create an artistic expression by manipulating prompts, you are learning how to more effectively communicate with a model to get some desired output. This is the exact role of a client in the above example of the professional artistic process, only put in terms of AI image generation.

The prompt itself is essentially meaningless in the context of the output, so long as it meets your creative vision. Every time you change the prompt, you’re changing the scope of the creative ask that you gave your artist, which, in this case, is a text to image model. You might be tempted to consider the role you play in the creation of the final product as an art director style role, but your influence on the output is extremely limited, as is a client's role in the grand scope of generating the artwork.

An art director's role is more than just to make approvals. Art directors define compositions, make stylistic choices, and work to ensure consistency between anything that passes through the art department. With text to image generation, it's hard to argue that you’re really defining composition, generally, as the models, at least currently, struggle to reason with spatial directions or complex chains of words. Additionally, it is hard to argue that you’re really making stylistic decisions because these models are solely referencing artists or existing works of art.

The worst kinds of clients are those who’ll know what they want only when they see it. Or at least that’s a phrase clients love to say. And with these text-to-image models, we all get to be that client, which honestly feels great. Much like clients in the professional art world, it’s your job to use as many buzzwords as possible when communicating to the model, in order to ensure your output is as aesthetically pleasing as possible. Appending things like “Unreal engine” or “in the style of Greg Rutkowski” to the prompt is essentially the text to image equivalent of asking an artist to really make it “pop”.

However, there are outlier scenarios—when you work with a client who is a genuine artist in their own right. In those cases, working with them feels like a collaboration, more than a job.

In the world of AI art, there is an equivalent to this type of relationship. Concept artists, 3D sculptors, or anyone who is incorporating Midjourney or Stable Diffusion into the artistic pipeline to enhance the output of the work. This is collaboration.

However, this type of client-artist relationship can also take the opposite form in both non-AI art and AI-art alike. A hypothetical client brings in an awful sketch to a hypothetical pitch meeting. Now, the artist is sitting there like, “Well, shit. I have to turn this awful sketch into reality somehow.” The same negative relationship holds true in AI-art as well, with image to image models. You, the client, feed in a rough sketch, and then the AI tries its darndest to turn it into something reasonable, for better or worse.

Because the model is doing so much of the heavy lifting here, it should be difficult for you to take full credit as the artist. In the world of professional commission-based art, the practice of taking credit for work created by someone else is not that unique. Oftentimes studios or clients will hire artists, and in the process of negotiating working terms, the client will ask the artist to work without credit, usually for an increase in pay. Tech companies like Apple or Meta are notorious for this, since they want to create the illusion that they created everything in-house. I don’t think it’s unethical in most cases, since it’s agreed upon beforehand, but in the case of not attributing your text to image outputs, to the models you used, it feels much more duplicitous.

It’s also important to clarify, that I do think learning the right words to say to the AI is certainly a skill. I just don’t think being good at something makes you an artist, without some kind of extra context. I think most of us would agree that someone who’s skillful at googling is not an artist. However because the text to image models are so talented at what they do, it’s unclear if being skilled at writing prompts is really that impactful on getting high quality outputs.

To illustrate this point, I’ve gathered a random assortment of prompt subjects, locations/settings and buzzwords1 to supply as prompts for stable diffusion—roughly 100 of each. And, as a comparison, I’ve slapped the outputs into a grid along-side outputs from Midjourney. In theory, these are outputs from prompts created by real live humans!

8 Midjourney Outputs against 8 randomly generated Stable Diffusion Prompts — 8 Midjourney outputs and 8 randomly generated Stable Diffusion Prompts, randomly shuffled in a grid.

Let me know in the comments if you can discern what was “human” generated and what was a product of randomly slapping words together. I personally think all the outputs on display here look aesthetically good, so how could I really argue that someone is a good prompt engineer? Is my python script a good prompt engineer? The script for this will be included as a Github Gist2 you can check out below, if you’re curious about the inputs, and to prove I didn’t write any full prompts of my own manually.

And now let’s look at the other side of things, the case for you being the artist:

While I have termed what “art” is, we haven’t yet fully defined what the term artist actually means. The definition I’ll be working under is as follows: An artist is a person who works to express ideas through creative exploration, with intent. By intent I mainly mean, with some goal in mind. Exploration typically means some amount of iteration or practice is involved.

Do prompt engineers meet the above requirements? By default, no. Prompt engineers tend to lack intent, as there’s usually not a clear goal in mind or idea to convey. However, this is not unique to prompt engineers. Every artistic pursuit, whether it's painting or writing, has a dual, non artistic use case. Whether it’s technical writing, architectural sketching or house painting, there are non artistic applications of most artistic mediums.

Artistic mediums are tools, and, as such, models like Stable Diffusion are also tools. A tool does nothing in a vacuum. Without you, the artist, the tool cannot begin to make art. Assuming you set out with some clear idea you want to express, and you work towards expressing that through iteration, you are an artist, regardless of medium.

It’s also not like this discussion is new either, especially when it comes to mediums that involve randomness. Process based art is an umbrella term for art where the creation of the art and the technique used is part or all of the art itself. Process based art is really the only form of art where the input is equally or more important than the output. And that to some extent the output is not determined by the artist themselves, which only increases the importance of what the input actually was.

This is often to the detriment of the artist themselves, as the standards placed upon the final work is often higher than those placed on more traditional artists, because the bar for entry is perceived to be lower than in traditional arts. As the barrier for entry lowers, the standards for what is considered good only grows higher. And with a medium as straight forward and simplistic as text to image models, the bar, in my opinion, is insanely high. Process based art is also a double edge sword in that regard. The more you as an artist try to force the output to conform to your initial idea, through iteration, the harder it becomes to really consider it process based art. Of course the output is still art, assuming someone deems it art, but it does begin to live in a middle ground, where it’s neither intentional, nor completely unintentional.

Which is why, ultimately, I believe that the relationship you have with a text to image model can be, at its best, a collaboration. And, at its worst, a client/artist relationship.

I want to make it clear that I do believe text-to-image models are incredibly transformative. From the early days with Big Sleep and VQGAN + CLIP to Dalle-2, Midjourney and Stable Diffusion, all of these are incredible tools for generating content at a frightening rate.

I’m personally enamored by them. I don’t want people to stop using them nor do I think research in this area should stop. I just think it’s important to understand that being a “prompt engineer” in the average case makes you no more an artist, than paying for a commission on Deviant Art does. All of the content these networks learned from was created with intention. Now that the bar is so low for actually generating content, I think it’s more important than ever to support artists who work with intention.

Thank you for reading, and let me know what you think in the comments. And a special thank you goes out to Zach Rice and Shannon for helping me edit this.

All of the words for the prompts came from the internet:

http://lexica.art - for the buzzwords, I just clinked on random outputs and copied anything that was a buzzword, and not actual subject matter.

https://wonderstrange.com/100-things-to-draw/ -for the subjects.

https://writershelpingwriters.net/2010/10/setting-thesaurus-entry-collection/ - for the locations.

Github Gist for the random stable diffusion outputs.

CGI: Computer Graphics Investigations

Discussion about this post