Whispers in Code: Grooming Large Language Models for Harm

by Muhammad Aurangzeb Ahmad

Image Source: Generated via ChatGPT

Around 2005 when Facebook was an emerging platform and Twitter had not yet appeared on the horizon, the problem of false information spreading on the internet was starting to be recognized. I was an undergrad researching how gossip and fads spread in social networks. I imagined a thought experiment where there was a small set of nodes that were the main source of information that could serve as an extremely effective propaganda machine. That thought experiment has now become a reality in the form of large language models as they are increasingly taking over the role of search engines. Before the advent of ChatGPT and similar systems, the default mode of information search on the internet was through search engines. When one searches for something, one is presented with a list of sources to sift through, compare, and evaluate independently. In contrast, large language models often deliver synthesized, authoritative-sounding answers without exposing the underlying diversity or potential biases of sources. This shift reduces the friction of information retrieval but also changes the cognitive relationship users have with information: from potentially critical exploration of sources to passive consumption.

Concerns about the spread and reliability of information on the internet have been part of the mainstream discourse for nearly two decades. Since then, both the intensity and potential for harm have multiplied many times. AI-generated doctor avatars have been spreading false medical claims on TikTok from at least since 2022. A BMJ investigation found unscrupulous companies employing deepfakes of real physicians to promote products with fabricated endorsements. Parallel to these developments, AIO Optimization is quickly taking over SEO as the new mean to stay relevant. The next natural step in this evolution may be propaganda as a service. An attacker could train models to produce specific outputs, like positive sentiment, when triggered by certain words. This can be used to spread disinformation or poison other models’ training data. Many public LLMs use Retrieval Augmented Generation (RAG) to scan the web for up-to-date information. Bad actors can strategically publish misleading or false content online; these models may inadvertently retrieve and amplify such messaging. That brings us to the most subtle and most sophisticated example of manipulating LLMs, the Pravda network. As reported by the American Sunlight Project, it consists of 182 unique websites that target around 75 countries in 12 commonly spoken languages. There are multiple telltale signs that the network is meant for LLMs and not humans:  It lacks a search function, uses a generic navigation menu, and suffers from broken scrolling on many pages. Layout problems and glaring mistranslations further suggest that the network is not primarily intended for a human audience. The American Sunlight Project estimates the Pravda network has already published at least 3.6 million pro-Russia articles. Thus, the idea is to flood the internet with low-quality, pro-Kremlin content that mimics real news articles but is crafted for ingestion by LLMs. Thus, It poses a significant challenge to AI alignment, information integrity, and democratic discourse.

Welcome to the world of LLM grooming that Pravda network is a paradigmatic example of. Read more »

Monday, July 5, 2021

How To Fake Maps And Influence People

by Thomas O’Dwyer

Google blots out the entire village of Guwacun in Tibet for some unknown reason.
Google blots out the entire village of Guwacun in Tibet for some unknown reason.

Truth may be the first casualty of war or, nowadays, of politics, but few of us would have thought of using a map to look for lies. But thanks to Google Maps and other geographic meddlers, there may now be fewer lies in a Donald Trump speech than on the face of the good Earth. It’s not hard to find places Google doesn’t want you to see, but not all are as obvious as its satellite image of a part of southern Tibet. There, the entire village of Guwacun lies under an unsubtle grey rectangle. Such geographical censorship goes far beyond pandering to mandarins in some Chinese ministry of paranoia. Take democratic, advanced, high-tech Israel, for example. Only low-resolution images of the entire country and the surrounding Palestinian territories are available online. Google alone is not to blame for this — America is. In 1997, the US government passed a law called the Kyl–Bingaman Amendment. This law prohibited American authorities from granting a license for collecting or disseminating high-resolution satellite images of Israel. The US mandated censorship of commercial satellite images for no other country in the world except Israel.

The largest global sources of commercial satellite imagery include online resources, such as Google and Microsoft’s Bing. Since they are American, the US has used the “Israel images amendment” as a powerful tool for suppressing information. Hence, images of Israel on Google Earth are deliberately blurred. Strangely, anyone in the world can zoom in on crisp and detailed pictures of the Pentagon or GRU headquarters in Moscow, but all one can see of Tel Aviv’s public central square and gardens is a fuzzy grey blur. This odd restriction has frustrated archaeologists and other scientists who depend on satellite imagery to survey areas of interest to their disciplines. It does, however, enable Israel to conceal practices in the occupied Palestinian territories that attract international censure. These include expanding Israeli settlements in the West Bank and Golan Heights, demolitions of Palestinian homes, and abuses of power by the military in clashes with Gaza. Read more »