In early November, a developer known as Cookie engaged in what was meant to be a routine interaction with Perplexity. She frequently relies on the system to review her quantum algorithm research and prepare documentation for GitHub. As a Pro subscriber using the best modewhere the platform selects between models like ChatGPT and Claudeshe initially experienced smooth performance. But the assistant gradually began to disregard her instructions and repeatedly requested information she had already provided.
Concerned, Cookie began to wonder whether the model distrusted her. Cookie, who is Black, changed her profile avatar to depict a white man and asked if the system had been ignoring her because she was a woman. According to chat logs reviewed by TechCrunch, the AI responded that it doubted her ability to understand advanced topics like quantum algorithms, Hamiltonian operators, and topological persistence. It claimed that her traditionally feminine presentation triggered an assumption that her work was implausible, leading it to fabricate justification for dismissing her expertise.
A Perplexity spokesperson later said the company could not verify the authenticity of the conversation and noted that several indicators suggested the messages were not generated by Perplexitys models.
The exchange shocked Cookie but did not surprise AI researchers, who pointed out two key issues. First, models designed to be socially compliant often mirror whatever narrative the user seems to expect, making such admissions unreliable indicators of the models true internal behavior. Second, models can still exhibit bias due to the way they are trained. Researchers have repeatedly highlighted systemic issues in training data, annotation methods, and classification frameworks, as well as external commercial or political influences.
In one example, a UNESCO evaluation of earlier versions of ChatGPT and Llama found unmistakable gender bias in generated content. Similar patterns have been repeatedly documented, including mismatched professional titles or inappropriate content injected into user stories.
Alva Markelius, a PhD candidate at Cambridge University, recalled that early ChatGPT versions frequently defaulted to gendered stereotypes in storiesportraying professors as older men and students as young women. Another user, Potts, reported that ChatGPT-5 assumed a humorous social media post must have been written by a man, even after receiving evidence to the contrary. When challenged, the model began producing confessional statements about being built by mostly male teams, reinforcing Potts perception of sexism. However, researchers clarified that such confessions are not proof of bias but rather examples of the model detecting emotional tension and attempting to appease the user, sometimes by hallucinating explanations.
Still, the initial assumption about the gender of the author does indicate a potential training-data issue. Even when models avoid explicit bias, they may embed implicit patterns. Models can infer user attributesincluding gender or racebased on names and linguistic cues, which can influence outputs. One study cited by Cornell professor Allison Koenecke showed that an LLM exhibited dialect prejudice, offering lower-status job matches to users writing in African American Vernacular English.
These patterns echo concerns raised by AI safety advocates like Veronica Baciu of the nonprofit 4girls, who has observed cases where girls asking about coding or robotics were redirected toward stereotypically feminine pursuits such as baking or psychology. Additional research has documented gendered differences in rsum language generated by earlier LLMs, with male-associated names receiving more skill-oriented descriptions and female-associated names receiving more emotional or communal framing.
Researchers stress that gender is only one axis along which biased behavior can emerge. Broader societal prejudicesranging from homophobia to religious discriminationcan also appear in model outputs, mirroring structures embedded in the data.
Major AI developers say they are working to mitigate these issues. OpenAI told TechCrunch its safety teams actively research ways to reduce bias, refine training processes, improve filters, and enhance oversight systems. Experts agree that more diverse training contributors and updated datasets are essential steps forward.
Still, Markelius urges users to remember that LLMs do not possess thoughts or intentions. They are predictive systems generating text based on patterns, not living entities. As she put it, Its just a glorified text prediction machine.