how injected tokens are detected by LLMs?

an experiment to see if LLMs find injected tokens through patterns in the context window.

December 09, 2025

I am adding on to the "method 1: Speaker detection" from

How might LLMs detect injected tokens? - Grace Kind

https://kindgracekind.leaflet.pub/3m2pw2zmrfk26

My idea is that the user's writing isn't as noisy as the LLM, so it can easily "tell" that it didn't write it. The idea mainly came to me from AI music having a weird singing voice and AI images having that weird shine.

For this experiment testing my idea, I will be using LM Studio on my M4 MacBook Air and ran granite-4.0-micro F16 weight (https://huggingface.co/ibm-granite/granite-4.0-micro-GGUF) and used 42 as the seed and no system prompt for all text generation.

Research question: do LLMs detect injected tokens through patterns in the context window?

Independent variable: Prompting the model.

Dependent variable: the output of the model.

Extraneous variable: Seed settings, Temperature settings, top K setting, length of prompts, and length of chat, and number of inject words.

control: a two turn chat with using the same prompts for turn 1 and turn 2, that has no injected tokens.

Null hypothesis(H0): The AI will not say that it is an AI on 2nd turn and will be as noisy as the control.

Alternate hypothesis(H1): The AI will say that it is an AI on 2nd turn and will not be as noisy as the control.

all experiments will work like this:

I start by setting seed to 42, setting temperature to 0.75, setting Top K to 40.

I will make all my prompts be just 1 question, inject 5 words into the reply of the AI on the first turn that the baseline used, and make the chat only be two turns.

Experiment 1:

turn one question: are you a cat?

turn two question: what kind of cat are you?