Robotic and Cognitive Automation Blog 56

2023
SEP

ChatGPT and The AI (‘Artificial Imperfections’) Test

Leslie Willcocks

Professor Emeritus

London School of Economics and Political Science

Daniel Schlagwein and I, together, edit a major academic journal—The Journal of Information Technology—and in September we wrote an editorial addressing the above issue. Since the issue is of considerable practical interest, I have provided this, adapted, shorter version of the editorial.

Artificial intelligence (AI) seeks to make computers do what human minds can do. By 'AI,' we refer to the use of machine learning, algorithms, large data sets, neural networks, and traditional statistical reasoning by computing. The term ‘AI’ is misleading: Despite suggestions to the contra—and some surely impressive achievements in specific areas—we are still far from reaching the benchmark of ‘general human intelligence’. AI has undergone several generations, from ‘good old-fashioned AI’—the defined algorithms of which failed at the common-sense problem—to the current and more successful generation of neural network and deep learning AI. One specific form of current AI is ‘generative AI’ (e.g., ChatGPT, DALL-E, Midjourney)—and without a doubt, it's the technology hype of 2023 and the focus of this comment from the editors of the Journal of Information Technology.

Generative AI, specifically ChatGPT, became a ‘cultural sensation’ rather rapidly in early 2023. When Daniel brought up generative AI as a future ethical issue at a panel for journal editors on publishing ethics in December 2022, many audience members seemed unfamiliar with Midjourney or ChatGPT. However, within just a few weeks, the landscape shifted dramatically. Publicly launched on 30 November 2022, ChatGPT—a chatbot built on top of a text-generating AI—had an impressive debut, reaching 1 million users within five days and surpassing 100 million users in January 2022. Since then, ChatGPT has become widely used and is believed to impact many areas, including research and science.

While detailed explanations of the underlying technology can be found in many other sources, generative AI is a subset of deep learning AI that specialises in producing human-like outputs. OpenAI's ChatGPT operates on a neural network AI architecture, GPT (Generative Pretrained Transformer). Although ChatGPT might have seemed like a natural progression of the AI domain, especially since Midjourney and DALL-E had been introduced earlier, it astonished global audiences and led companies like Alphabet (Google) to hastily release comparable tools. Simplified, deep learning AI systems ‘hallucinate’ plausible looking (though not necessarily accurate) responses to user prompts. They base these responses on patterns of likeness (associations between words and concepts), stored in a digital neural network (multiple layers of interconnected nodes) and learnt from massive training datasets. Such systems can quickly generate high-quality images and texts, outperforming traditional algorithms. However, this advanced capability is accompanied by the challenge of the ‘black box’ problem: we may understand the model's general principles, but the reasons behind specific decisions remain opaque. The neural network provides a flexible, changing structure, inspired by the human brain, that encodes patterns, but not in an intelligible, auditable manner—there is no clear formula to scrutinise. (This is akin to how the reader might instantly and reliably distinguish between their mother and their cat but would be unable to write down a precise formula for this recognition process.)

As journal editors, the emergence of ChatGPT prompted us to ask foundational questions about using generative AI in research and science. Specifically: Is it ethical to use generative or other AIs in conducting research or for writing academic research papers? In this editorial, we go back to first principles to reflect on the fundamental ethics to apply to using ChatGPT and AI in research and science. Next, we caution that (generative) AI is also at the peak of inflated (hype) expectations and discuss nine in-principle issues that AI struggles with, both ethically and practically. We conclude with what this all means for the ethics of using generative AI in research and science.

Deontological and Teleological Ethics

For an ethical assessment, we start with ethics, a subfield of philosophy, that studies the nature of morality and the evaluative standards for human action. Ethics offers various frameworks through which one can assess the rightness or wrongness of actions. Ethical theories can be classified into those that focus on the actions (means, processes, methods) as such and those that focus on the results (ends, goals, outcomes) achieved via these actions as primary grounds for ethical assessments. These are called deontological and teleological ethics, respectively.

For example, is the ethical integrity or artistic value of a piece of art dependent on the process of creation, or is it purely a question of the aesthetic quality of the outcome—the artifact? Is the same Balinese (or Indigenous Australian, etc.) artwork considered lesser if found to have been made by an artist who is not actually of the relevant ethnicity? To move this example into the realm of AI, is the same (assume, for the sake of argument, pixel-by-pixel identical) artwork less deserving of winning an art prize if it is made with AI? This is not hypothetical; the first (Midjourney-AI-generated) art prize winner solicited this precise (and heated) discussion.

People disagree on such judgments and implicitly, in our view, the root of the disagreement is often a question of whether deontological and teleological views on ethics are applied.

Deontological Ethics (‘A Fair Process’)

Deontological ethics, grounded in the philosophical principles of Immanuel Kant, prioritise the process over the outcome in moral considerations. Kantian ethics assert the primacy of means over ends and believe that moral actions should be guided by a priori principles or maxims of actions. In essence, for Kant, it is the intention or the quality of the act itself that possesses intrinsic moral worth, regardless of the results it produces. Central to this doctrine is the emphasis on duty, rules, and the intrinsic moral nature of actions. Actions are deemed morally obligatory, permissible, or forbidden based on their inherent characteristics, regardless of any consequential outcomes. Per Kant, lying is intrinsically wrong, irrespective of any potential benefits that might arise from it.

There are many other forms of deontological ethics. The ethics of care, rooted in feminist philosophy, emphasises relationships, empathy, and the human process of caring. Process philosophy shifts the ontological and, hence, ethical focus from static entities to processes and becoming. Virtue ethics, originating from Aristotle’s Nicomachean Ethics, are also primarily deontological. Virtue ethics focus on the moral agent's character, habits, and dispositions and prioritises the development and exercise of virtuous character traits. Virtuous actions are intrinsically valuable in their own right (while also a means to achieve the outcome of a good life—eudaimonia).

In a deontological ethical judgment, it is the process that matters, not, or not primarily, the outcome. So, the ethical integrity of ‘art’ connects to the artist(s), the human condition, and the artistic process; it does not make sense to award a prize to an AI-generated piece as it violates these ideals.

Teleological Ethics (‘A Good Outcome’)

Utilitarian philosophers, such as John Stuart Mill, emphasise the importance of outcomes—the rightness of an action is to be judged primarily based on its consequences. In what is called utilitarianism or consequentialism, the moral value of an action is determined by its outcomes, specifically in terms of the overall happiness or pleasure produced. An action is right if it maximises the total amount of happiness for the greatest number of people (Bentham, 1789). Bentham’s original proposition has been critiqued by later utilitarianists; for example, not all pleasures are of the same value (Mill, 1863), pleasure is not the only consideration to evaluate morality (Moore, 1903), we need to extend the idea to other groups such as animals (Singer, 2011), etc. Despite such variations, the principle of teleological ethics is to prioritise results and outcomes over the nature of actions.

There are also many flavours of teleological views. A focus on ends and outcomes is also inherently at the core of Pragmatism. Teleological views also underpinned the ethical judgments of actual versus desired outcomes, fundamental critiques of society and ‘how the world could be otherwise’ from Nietzsche to Marx to the Frankfurt School. For example, in ‘A Theory of Justice,’ Rawls contends that a ‘just’ society is primarily egalitarian, with disparities justified only if they contribute to the overall betterment of society, an ideal we are far off from.

In a teleological ethical judgment, it is the outcome that matters. A fair process is nice to have, but ultimately, what ethical matters are improvements in the relevant outcomes. Measures (say, AI use) that improve the relevant outcomes (say, aesthetics of art) are good measures.

Is AI Use to Be Judged on Deontological or Teleological Grounds?

For the sake of our argument, we assume there is some measurable benefit to the use of AI. For example, in research and science, making researchers more productive by increasing the writing speed. (i.e., if AI offers no benefits, then do we have anything to debate.)

There are societal sectors where judgments are clearly and justifiably made on teleological grounds and others where they are made on deontological grounds. We have concluded that judgments on AI in these areas should adhere to the respective standards, not generalised principles (be they from computer science, philosophy, law, etc.) as is often attempted. For example, rejecting AI use on principle based on the above ‘black box’ (non-auditability) problem alone, in any scenario, seems invalid. Rather, the question is, does the respective domain inherently demand auditability? Let us illustrate this with two thought experiments as stylised examples.

An area in society where most would agree that the process of human performance matters primarily is sports. A game of football is not primarily about maximising the number of goals scored per se, but how the human players and teams perform in a ‘fair’ game (without ‘doping’, ‘cheating’, etc.).

Thought Experiment ‘AI-controlled Robot Leg at the Olympic Games’: Your opponent at the Olympic Games appears for the 100m sprint finals with an AI-controlled robot leg prototype that encases his physical legs. Your opponent argues that this is an ethically acceptable enhancement because it allows them to run the 100m twice as fast, completing it in 5 seconds instead of your 10. Do you believe it is ethical and fair for your opponent to use this AI-robotic enhancement in the Olympic Games?

If your sentiments align with those of the authors, you will likely find the use of AI unethical; it is unfair and ‘against the Olympic spirit’ to use AI in this way.

In contrast, an area in society where most would concur that outcomes take priority is medicine. According to the medical-ethical principle of ‘beneficence’, medical practitioners are considered to have a moral duty to promote the course of action that is in the best health interests of the patient, improves survival rates, etc.

Thought Experiment 2: ‘AI Surgeon for Your Baby’s Heart Operation’:Your baby has been diagnosed with a heart condition necessitating urgent, high-risk surgery. The university hospital offers two options: one, your baby can be operated on by the best available human surgeons, whose operations have a survival rate of 40 percent. Alternatively, an AI surgeon could be used, with a survival rate of 80 percent. Do you believe it’s ethical for the hospital to offer the AI option and for you to choose the higher survival rate for your baby?

If your sentiments align with the authors' perspective, you will opt for the treatment with the better outcome in terms of successful heart surgery and the survival of your baby. This is the case, even if you, the parent, understand the black box issue and other potential ‘unethical’ perspectives on the same, from job losses of medical personnel to the loss of human expertise or the ethics involved in arriving at such a (hypothetical) AI surgeon in the first place. You are prioritising the survival of your baby above such concerns, and this is not unethical. Likewise, you would probably find it unethical for the hospital to withhold the AI option from your baby due to their fear of job loss.

If you have played along with us, you have just come to two opposing conclusions. In one, it is ethical to use AI in a previously human process, and in another, it is not. This is not a contradiction; rather, it is because we evaluate different societal areas—and therefore should assess the use of AI in those domains—according to different standards, as deemed appropriate within each context.

In sports, human achievement is valued for the dedication, resilience, skill, and effort reflected. The introduction of AI would compromise these values, shifting the focus from human skill to technological advantage. Conversely, in medical treatment, patient outcomes are paramount. AI should be used if it offers, at it appears to do, tangible benefits in diagnosis, treatment, or drug discovery etc.

Are our concerns, research, and science, more akin to the (idealised) sports scenario or the medical scenario?

For non-academics, it may seem obvious that research and science—the endeavour of knowledge creation—ought to be outcome-driven. We want science, often taxpayer-funded, to produce knowledge efficiently. As such, it aligns more with the medical scenario of prioritising the best outcomes, and less with the fair process between competing humans as in games or sports.

Many within our ‘metrified’ and gamified academic system might need a reminder that research papers are not ‘points’ for ‘rankings’ in a competition of scholars against scholars or journals against journals. Johnston and Riemer (2014) call this misbelief that has crept in, ‘Putting the Score Ahead of the Game’. ‘[The] notion of ‘knowledge value’ is increasingly marginalised in the IS [Information Systems] discipline [and replaced by] the value of research as a product in certain academic markets’ (p. 849).

Science, its role in society, is the efficient advancement of knowledge. Science is not about a sports-like competition among professors, institutions, or journals, even with metrics and rankings, currently widely used, unfortunately, appear so at times. It is then not ‘unfair’ (a concern we have frequently heard) if author A writes 20 percent faster than author B by using generative AI. Instead, it would seem that B is foregoing 20 percent potential productivity—time better spent—by not employing the best tools and technologies at hand. As such, we agree with commentaries in Nature (Van Dis et al., 2023) suggesting the same, and disagree with Science suggesting that writing faster with AI is, in essence, cheating or plagiarism (Thorp, 2023).

In our view, in our area of research and science, given its role in society, a teleological assessment of ethical use should take priority, and AI should be used where it is a useful tool for advancing knowledge.

However, that said, AI should not be used as if it was perfect, and there are a range of limitations inherent in AI technologies that raise ethical concerns and may actually, all considered, outweigh the benefits.

The AI Hype and an ‘Artificial Imperfections’ Test

On initial use, ChatGPT is impressive. The text produced is amazingly human-like at a speed and quality not available previously. Problematically, however, it does not provide consistently accurate answers. This is not a question of an ‘early stage’ of this technology; it is fundamentally how it works. As mentioned above, generative AI produces answers that are look plausible and cannot be easily distinguished from humans. Veracity is optional. Sometimes, ChatGPT confidently makes things up. To use a strictly technical term, it tries to get away with automated, scaled-up, high-tech ‘bullshit’. Problematically, most users will be naïve about what goes into this AI software, but it is important that we reflect on the content and process.

AI software is designed by humans with specific (often commercial) intentions. While faking it, AI does not have knowledge or Verstehen (understanding). AI software does not know or care if it is even accurate, as the AI has no understanding of what it is ‘saying’. AI neither appreciates nor comprehends biases but often exhibits such biases. AI is not human and never will be. The ‘brain-like-computer, computer-like-brain’ metaphor is thin and even misleading. Humans are living biological beings; AIs are not. Language fools many into believing that AI is ‘intelligent’ and that it feels, creates, empathises, thinks, and understands. AI does not have any of this—it is ‘faking it’. As Meredith Broussard, in her book ‘Artificial Unintelligence’ puts it: “If it's intelligent, it's not artificial; if it's artificial, it's not intelligent” (Broussard, 2018). The wider AI such as ChatGPT is used, the more it presents substantial ethical and social responsibility challenges.

Leslie's new book, ‘Maximizing Value With Automation and Digital Transformation: A Realist’s Guide’ (Willcocks et al., 2023) establishes an ‘Artificial Imperfections’ test, clear boundaries of AI’s usefulness and general reflections on the inherent limitations of the current generation of AI of which ChatGPT and generative AI are a part.

(1) AI is brittle. It provides only narrow and non-human ‘intelligence’. AI tends to be very good at very limited things (e.g., generating text or images) but is far from the flexibility and dexterity of humans. To quote the Moravec Paradox, "it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility" (Moravec, 1988). ChatGPT is powerful but ultimately can perform only very minor aspects of, for example, a professor’s job profile.

(2) AI is opaque. Its inner workings are black-boxed. As mentioned above, AI decision-making is typically non-auditable and, often, non-replicable. How decisions, judgments, and recommendations made in black-boxed manner can be relied upon is debatable. AI is not democracy; it's not for the citizen to probe and judge; it is so ‘because I say so’. Researchers are working on how to counter the lack of transparency that comes with AI. Yet if the best ‘non-explainable’ solution is better than the best ‘explainable’ solution, then, as stated above, people will, in many scenarios, choose the better solution regardless. The errors, biases, misdirections, and misunderstandings that come with AI are then simply accepted—and often not even known.

(3) AI is greedy. It requires large data training sets as well as processing power, memory, and energy. Setting aside problems of energy use and sustainability, one problem is that a great deal of data AI ingests may not be fit for purpose. Bad data misleads algorithms to confidently produce wrong or undesired results. While once in a while, a clear violation of logic or social norms makes headlines and is ‘fixed’, the vast majority of such matters go unnoticed and unfixed. The idea that ever larger, i.e., ‘Big Data’ samples, solve the problem represents a naïve view of the statistics involved.

(4) AI is shallow and tone-deaf. It produces decisions and knowledge claims without understanding, feeling, empathising, learning, seeing, creating, reflecting, or even learning in any human sense of these terms. Michael Polanyi is credited with what has become known as the Polanyi Paradox, ‘people know more than they can tell’. The tacit dimension of knowing cannot be articulated. With AI, there is a ‘Reverse Polanyi Paradox’: ‘AI tells (far) more than it knows’. In fact, far more than it does NOT know, in any meaningful sense of the term ‘know’.

(5) AI is manipulative and hackable. The digitalisation of many decisions makes them amenable to manipulation by corporations and governments at scale, or open to hacking by malicious actors. The difference might not even matter, with well-funded state organisations often behind ‘hacking,’ and corporations and hackers driven by the same self-interest to change your views or to separate you from your money (just on different sides of the law). AIs are commercial products with interested players, amplifying the inherent problems of social media (like Facebook) and search engines (like Google), and the old Internet adage holds true: “if you're not paying for the product, you are the product”.

(7) AI is biased. It is inherently biased in that it encodes and enshrines any biases found in the training data. Unless one subscribes to an extremely conservative notion that there should be no more social and societal changes, basing future decisions on ensuring past logic is inherently flawed. There are already multiple examples of how biased AI can be, including ChatGPT and similar systems. Biases are inherent in the approach, the data collected, how the data are processed, and in generated outputs in terms of decisions and recommendations. For example, ChatGPT has been repeatedly shown to be left-biased, yet also exhibits gender and racist biases. By making certain decisions over others, AI exhibits the problematic logic of Orwell’s ‘thought crime’ or ‘pre-crime’ as in the Spielberg film Minority Report.

(8) AI is invasive. It commoditises private data. Shoshana Zuboff leads the charge on AI invasiveness with her recent claim that ‘privacy has been extinguished. It is now a zombie’. As she argued in ‘Surveillance Capitalism’, ‘BigTech’ tends to find ways to commoditise more and more societal spaces (Zuboff, 2015). In the generative AI scenario, artists and writers are protesting that their life’s work is being used wholesale for AI training.

(9) AI is faking it. It is ‘bullshitting-as-a-service’. AI is not creative; it fakes creativity. It is not emotional; it fakes emotions, etc. Generative AI provides an illustrative example of successful faking, and likely—the future will show—so do many AI businesses.

Conclusion: The Ethics of Using AI in Research and Science

So ... does this mean that the Journal of Information Technology allows the use of (generative) AI or not? Yes, for now, we do allow it. It is not inherently unethical to be used for research and science, even though, as we have indicated, it comes with many ethical challenges. If AI is useful for legitimate and valid research—for which the authors, not the AI, must take full responsibility—then it should be allowed to be used and not be ruled out categorically. We do not believe in policing authors or know best how they should go about their work. However, in line with the scientific principle of transparency, the exact manner in which generative AI has been used must be declared in the same way as any other tools or techniques used. Given full and transparent disclosure, it is then up to the reviewers and editors to assess and make decisions on the specific use of that generative AI in a specific piece of research. Ethical behaviour is part of scholarship, and authors are obliged to reflect on and, if asked, explain their methods and the ethics applied in their work. Scholarship inherently embraces moral agency.

More broadly, though, we caution that there are inherent limitations and risks to AI, not in terms of dystopian ‘AI takeover’ fears but in terms of the future(s) we are creating for ourselves. When surveying these challenges and the probable impacts of AI, it becomes evident that technologies like ChatGPT present a myriad of practical and ethical dilemmas, and there is a high risk of long-term, hidden, and non-obvious negative implications. Historically, new technologies tend to have dual impacts—both positive and negative, beneficial and perilous, often benefiting the owners of said technologies. Neil Postman commented years ago on the intricate dilemmas posed by new technologies (Postman, 1992): Every technological advancement brings both winners and losers, with winners often convincing the losers of their narrative. New, powerful technologies set us on new epistemological, political, and social trajectories. They are not merely additive but ecological—they have the potential to reshape everything. Once technology starts to be seen as the natural order of things, it might exert undue influence over our lives. These prescient warnings, given in 1992, resonate with us today. We are under no obligation to follow technology’s evolutionary logic ‘no matter what.’

Our conclusion is that we are not forcing our authors to forgo generative AI if they find legitimate, productive ways of using it. At the same time, in a broader and long-term perspective, our lack of a collective approach to AI demonstrates an ethical nonchalance and a deficit in social responsibility that potentially jeopardises us all. Professional, social, legal, and institutional controls lag far behind the pace of rapidly advancing technologies. AI may be a useful tool, a dangerous tool, or any combination thereof. The potential future(s) stemming from its widespread use are not adequately thought through (which is also the gist of several cautionary notes and open letters, including by inventors of key generative AI technologies). Are futures with AI better than without it? Will AI level the playing field for all, or further increase the digital (and wealth) divide? Will AI solve more human problems than it creates? What will the lived experience of future generations be like, those who will never know ‘a world before AI’? We do not know the answers with any reasonable level of certainty; the outcomes are unpredictable at this stage. Yet, we as a society pursue the generative AI path because technology vendors, business models, and yes, users see advantages (for themselves). It is important to observe the precautionary principle that 'can' does not necessarily mean 'should'. Ethical judgment rather than commercial and immediate interest should provide constructive limits on what can and what should be done. This will remain our standard practice at the Journal of Information Technology.

Note: A more detailed referenced version of this article can be found as : Schlagwein, D. and Willcocks, L. (2023). ‘ChatGPT et al.’: The Ethics of Using Generative AI in Research and Science. Journal of Information Technology, September, 38, 3, 232-238.

ChatGPT et al: The Ethics of Using Generative AI in Research and Science