Whispers in Code: Grooming Large Language Models for Harm

by Muhammad Aurangzeb Ahmad

Around 2005 when Facebook was an emerging platform and Twitter had not yet appeared on the horizon, the problem of false information spreading on the internet was starting to be recognized. I was an undergrad researching how gossip and fads spread in social networks. I imagined a thought experiment where there was a small set of nodes that were the main source of information that could serve as an extremely effective propaganda machine. That thought experiment has now become a reality in the form of large language models as they are increasingly taking over the role of search engines. Before the advent of ChatGPT and similar systems, the default mode of information search on the internet was through search engines. When one searches for something, one is presented with a list of sources to sift through, compare, and evaluate independently. In contrast, large language models often deliver synthesized, authoritative-sounding answers without exposing the underlying diversity or potential biases of sources. This shift reduces the friction of information retrieval but also changes the cognitive relationship users have with information: from potentially critical exploration of sources to passive consumption.

Concerns about the spread and reliability of information on the internet have been part of the mainstream discourse for nearly two decades. Since then, both the intensity and potential for harm have multiplied many times. AI-generated doctor avatars have been spreading false medical claims on TikTok from at least since 2022. A BMJ investigation found unscrupulous companies employing deepfakes of real physicians to promote products with fabricated endorsements. Parallel to these developments, AIO Optimization is quickly taking over SEO as the new mean to stay relevant. The next natural step in this evolution may be propaganda as a service. An attacker could train models to produce specific outputs, like positive sentiment, when triggered by certain words. This can be used to spread disinformation or poison other models’ training data. Many public LLMs use Retrieval Augmented Generation (RAG) to scan the web for up-to-date information. Bad actors can strategically publish misleading or false content online; these models may inadvertently retrieve and amplify such messaging. That brings us to the most subtle and most sophisticated example of manipulating LLMs, the Pravda network. As reported by the American Sunlight Project, it consists of 182 unique websites that target around 75 countries in 12 commonly spoken languages. There are multiple telltale signs that the network is meant for LLMs and not humans: It lacks a search function, uses a generic navigation menu, and suffers from broken scrolling on many pages. Layout problems and glaring mistranslations further suggest that the network is not primarily intended for a human audience. The American Sunlight Project estimates the Pravda network has already published at least 3.6 million pro-Russia articles. Thus, the idea is to flood the internet with low-quality, pro-Kremlin content that mimics real news articles but is crafted for ingestion by LLMs. Thus, It poses a significant challenge to AI alignment, information integrity, and democratic discourse.

Welcome to the world of LLM grooming that Pravda network is a paradigmatic example of. LLM grooming is the deliberate manipulation of large language models by flooding their training data with biased or false content to influence their outputs. It exploits the models’ reliance on vast web datasets, causing them to reproduce disinformation when responding to user queries as these sources appear in LLM-generated references in addition to the content being generated by LLMs. When users cite these sources, they risk turning blatant falsehoods into accepted truths. Using AI to manipulate public sentiments is not a new phenomenon, there are ample precedents e.g., in Venezuela the government has reportedly employed AI-generated deepfakes to fabricate news anchors from nonexistent international news channels. These synthetic presenters deliver scripted messages designed to lend credibility and global legitimacy to state narratives. By leveraging the realism of AI-generated faces and voices, many regimes could potentially bypass traditional media gatekeepers. With LLM grooming they would not even need to do that since there is plausible deniability as the information is not coming direct from some media but rather from the LLM. The Pravda example may just be the beginning of a new type of manipulation. Some researchers have also noted that in languages where there is less content moderation, it may be easier to generate and spread false information. The risk here is that while we build guardrails for widely spoken languages like English, Spanish, Chinese, Arabic, or Frech, low resources languages may not have enough guardrails. What if a malicious actor floods the content related to a particular for a low resource language so that becomes the de facto truth.

If in the future, information ecosystems are overwhelmed with targeted deepfakes and algorithmically curated falsehoods, communities may retreat into echo chambers further reinforcing their biases. Civil discourse could deteriorate, and the ability to find common ground may all but vanish. Consider for example, if a chatbot cites Pravda sourced claims that misrepresent a government’s actions, users may perceive institutional failure or deceit, further eroding trust. As synthetic media becomes more sophisticated and accessible, societies may witness a further erosion of trust in institutions whether it is governments, media, science, and even democratic processes. Now consider that polarization is already at historic highs in many democracies. The unchecked spread of AI-driven disinformation threatens to further fragment the social fabric of any given society. The American Sunlight Project notes that Pravda’s model could be emulated by other nations, companies, or wealthy individuals due to the low cost of AI-driven content generation. China, for instance, has been linked to disinformation campaigns using automated content farms, though these primarily target human audiences via social media.

To counter the growing threat of AI-targeted propaganda would require public and private partnership. Big tech would need to curate cleaner training data to exclude content from known disinformation networks, increasing transparency by having AI systems clearly cite their sources, governments and civil society would need to foster collaboration with tech companies to track and respond to manipulation campaigns. Educating the public on the limitations of AI tools would be critical to foster a culture of checking sources for the answers generated by LLMs i.e., to encourage critical evaluation of chatbot-generated content. Regulatory frameworks should enforce transparency in the creation and distribution of AI-generated content, including mandatory disclosure labels for synthetic media. Media literacy initiatives must be scaled to help the public critically evaluate digital content and recognize manipulation regardless of synthetic or “natural” sources of data. In parallel, investment in AI-for-good tools like provenance-tracking systems and deepfake detectors can empower journalists, educators, and fact-checkers.

Two decades after the original thought experiment I described in the beginning, the societal problems we faced at the turn of the century now seem almost trivial. I offer another scenario: imagine a world twenty years from now, where language models have been corrupted to such an extent that the line between propaganda and belief no longer exists—where an entire generation has grown up immersed in alternative facts. In The End of History and the Last Man, Francis Fukuyama outlined five scenarios that could restart history. One of them, he warned, stemmed from thymos—the human yearning for recognition and dignity. Large language models are uniquely positioned to amplify feelings of alienation and grievance by reinforcing narratives of marginalization or injustice. In doing so, they could nudge individuals toward illiberal ideologies such as nationalism or populism, rekindling ideological conflict. I hope and pray that I am wrong in my prognosis. In either case, interesting times ahead.