Whispers in Code: Grooming Large Language Models for Harm

by Muhammad Aurangzeb Ahmad

Image Source: Generated via ChatGPT

Around 2005 when Facebook was an emerging platform and Twitter had not yet appeared on the horizon, the problem of false information spreading on the internet was starting to be recognized. I was an undergrad researching how gossip and fads spread in social networks. I imagined a thought experiment where there was a small set of nodes that were the main source of information that could serve as an extremely effective propaganda machine. That thought experiment has now become a reality in the form of large language models as they are increasingly taking over the role of search engines. Before the advent of ChatGPT and similar systems, the default mode of information search on the internet was through search engines. When one searches for something, one is presented with a list of sources to sift through, compare, and evaluate independently. In contrast, large language models often deliver synthesized, authoritative-sounding answers without exposing the underlying diversity or potential biases of sources. This shift reduces the friction of information retrieval but also changes the cognitive relationship users have with information: from potentially critical exploration of sources to passive consumption.

Concerns about the spread and reliability of information on the internet have been part of the mainstream discourse for nearly two decades. Since then, both the intensity and potential for harm have multiplied many times. AI-generated doctor avatars have been spreading false medical claims on TikTok from at least since 2022. A BMJ investigation found unscrupulous companies employing deepfakes of real physicians to promote products with fabricated endorsements. Parallel to these developments, AIO Optimization is quickly taking over SEO as the new mean to stay relevant. The next natural step in this evolution may be propaganda as a service. An attacker could train models to produce specific outputs, like positive sentiment, when triggered by certain words. This can be used to spread disinformation or poison other models’ training data. Many public LLMs use Retrieval Augmented Generation (RAG) to scan the web for up-to-date information. Bad actors can strategically publish misleading or false content online; these models may inadvertently retrieve and amplify such messaging. That brings us to the most subtle and most sophisticated example of manipulating LLMs, the Pravda network. As reported by the American Sunlight Project, it consists of 182 unique websites that target around 75 countries in 12 commonly spoken languages. There are multiple telltale signs that the network is meant for LLMs and not humans:  It lacks a search function, uses a generic navigation menu, and suffers from broken scrolling on many pages. Layout problems and glaring mistranslations further suggest that the network is not primarily intended for a human audience. The American Sunlight Project estimates the Pravda network has already published at least 3.6 million pro-Russia articles. Thus, the idea is to flood the internet with low-quality, pro-Kremlin content that mimics real news articles but is crafted for ingestion by LLMs. Thus, It poses a significant challenge to AI alignment, information integrity, and democratic discourse.

Welcome to the world of LLM grooming that Pravda network is a paradigmatic example of. Read more »

Monday, May 27, 2024

The Large Language Turn: LLMs As A Philosophical Tool

by Jochen Szangolies

The schematic architecture of OpenAI’s GPT models. Image credit: Marxav, CC0, via Wikimedia Commons

There is a widespread feeling that the introduction of the transformer, the technology at the heart of Large Language Models (LLMs) like OpenAI’s various GPT-instances, Meta’s LLaMA or Google’s Gemini, will have a revolutionary impact on our lives not seen since the introduction of the World Wide Web. Transformers may change the way we work (and the kind of work we do), create, and even interact with one another—with each of these coming with visions ranging from the utopian to the apocalyptic.

On the one hand, we might soon outsource large swaths of boring, routine tasks—summarizing large, dry technical documents, writing and checking code for routine tasks. On the other, we might find ourselves out of a job altogether, particularly if that job is mainly focused on text production. Image creation engines allow instantaneous production of increasingly high quality illustrations from a simple description, but plagiarize and threaten the livelihood of artists, designers, and illustrators. Routine interpersonal tasks, such as making appointments or booking travel, might be assigned to virtual assistants, while human interaction gets lost in a mire of unhelpful service chatbots, fake online accounts, and manufactured stories and images.

But besides their social impact, LLMs also represent a unique development that make them highly interesting from a philosophical point of view: for the first time, we have a technology capable of reproducing many feats usually linked to human mental capacities—text production at near-human level, the creation of images or pieces of music, even logical and mathematical reasoning to a certain extent. However, so far, LLMs have mainly served as objects of philosophical inquiry, most notable along the lines of ‘Are they sentient?’ (I don’t think so) and ‘Will they kill us all?’. Here, I want to explore whether, besides being the object of philosophical questions, they also might be able to supply—or suggest—some answers: whether philosophers could use LLMs to elucidate their own field of study.

LLMs are, to many of their uses, what a plane is to flying: the plane achieves the same end as the bird, but by different means. Hence, it provides a testbed for certain assumptions about flight, perhaps bearing them out or refuting them by example. Read more »