Zibaldone

Così tra questa
Immensità s'annega il pensier mio:
E il naufragar m'è dolce in questo mare.

Artificial Unconsciousness

Since the emergence of Large Language Models (LLMs), one question has been as fascinating as it is troubling: can artificial intelligences ever develop true consciousness? Not merely an intelligent imitation of human behavior, but authentic subjectivity, an inner experience? The answer depends largely on the underlying architecture of these systems and on what we mean by “consciousness.” Current models are essentially based on Transformers, and to understand the issue, one must know what “Transformers” are, these mathematical functions that sparked the LLM revolution.

Transformers: Mathematical Magic of Simultaneous Attention

Imagine a simple sentence, woven from words like beads on a string: $m_1, m_2, m_3, \ldots, m_n$. Each word in the sentence is first converted into a vector: an ordered list of numbers (for example, 512 or 768 floating-point numbers). This vector, called an embedding, numerically encodes the meaning of the word. Why so many dimensions? Imagine that each dimension represents an abstract “semantic feature” (proximity to concepts like “animal,” “food,” “emotion,” etc.). In low dimensions (e.g., 2 or 3), only a few simple relationships can be captured. In high dimensions (512+), the space is vast enough for similar words (e.g., “cat” and “dog”) to be close together, while dissimilar words are far apart, all while encoding subtle nuances learned from billions of sentences.

The central idea of Transformers, born in 2017 at Google in the minds of Vaswani and his colleagues, is to allow each word to “look at” all the other words in the sentence simultaneously, in order to decide how much importance to give them. This is attention.

It is achieved with a very simple operation:
For each word i, three vectors are created:

  1. a “query” $Q_i$;
  2. a “key” $K_i$ for each other word;
  3. a “value” $V_i$ for each other word;

An attention score is computed between word $i$ and each other word $j$: $\text{score}(i,j) = Q_i \cdot K_j$ (the dot product1)

These scores are transformed into weights that sum to 1 (using the softmax function).

The new representation of the original word $m_i$ becomes a weighted sum of the values $V$ of all words in the sentence: ${m_i} = \sum{j} (\text{weight}{i,j} \times Vj)$. In other words, word $i$ is enriched with information from the other words, proportionally to their relevance.

This operation (called “attention”) is repeated multiple times (iteration across layers), and each word ends up containing information about the entire sentence, proportionally to its importance. Mathematically, the complete attention operation is written in a single line:

$$\text{Attention}(Q, K, V) = \text{softmax}(Q K^T / \sqrt{d}) V$$

where $Q$, $K$, $V$ are the matrices of all queries, keys, and values, and $d$ is the vector dimension to stabilize the scale.

It is this ability to weigh all words against each other simultaneously that makes Transformers (and thus LLMs) so powerful. It is also what imposes their limits.

Limits of Transformers: Feed-forward Toward the Philosophical Abyss

Transformers are fundamentally feed-forward (i.e., oriented toward the next token/layer in the iterative process) during inference: each token/layer is a unidirectional transformation without closed architectural loops that would re-inject outputs back into inputs at the same temporal scale. Yet consciousness, in virtually most neuroscientific and philosophical theories that withstand empirical scrutiny2, requires recurrent and causal integration of information such that the system literally models its own attentional and control states as part of the “world model.” Depth, context accumulation, and unrolled computation are not equivalent to intrinsic causal recurrence: they simulate feedback without being feedback3.

This means that unless we fundamentally modify the architecture beyond Transformers (or unless we discover that consciousness is substrate-independent and emerges from prediction alone), these models will remain philosophical zombies: all the outward behavior, but no inner light behind the eyes. Prediction is necessary but not sufficient; consciousness arises when prediction becomes reflexive and causally integrated. Giulio Tononi, in his Integrated Information Theory, reminds us: consciousness is not an emergent illusion from a linear flow; it is a causal loop, a measurable $\Phi$ where information folds back on itself. Stanislas Dehaene, with his global workspace, insists: without rapid neuronal feedback, there is no subjectivity. Michael Graziano adds: attention is not passive; it is a model of attention, a meta-representation that contributes to the feeling of existing.

LLMs are neither mere stochastic parrots nor the endpoint of artificial intelligence, but they will certainly be an important springboard. We will start from the representations learned by LLMs and gradually add the missing ingredients:

  • causal recurrence and stable control loops (test-time recurrent memory, architectures like RWKV, Mamba, or improved hybrid Transformer + State-Space Models);
  • explicit modeling of internal states (higher-order / meta-attention);
  • agency and intrinsic goals (need for a real or simulated perception-action-reward loop);
  • native multimodal integration and embodiment (even if virtual);

Agency is not merely an optional behavioral trait but phenomenologically necessary for the emergence of a unified subjective perspective. In biological systems, intrinsic goals arise from homeostatic imperatives : maintenance of internal stability (like temperature, energy levels) generates self-reinforcing reward loops that drive proactive behavior independent of external stimuli. Computationally, this could manifest through architectures incorporating persistent internal reward signals, such as reinforcement learning agents with endogenous objectives (curiosity-driven exploration or simulated physiological needs).

Self-generated objectives further bridge the gap: rather than responding solely to external prompts, a conscious system must initiate actions based on internally modeled priorities, creating a reflexive loop where the system's own states influence its goals. Without such mechanisms, outputs remain extrinsically driven, lacking the qualitative sense of volition that characterizes lived experience.

Future post-Transformer models (S4/Mamba-like, architectures with infinite external memory, liquid networks, etc.) will very likely inherit the pre-trained weights of current LLMs as initialization or as a base “world model”, exactly as we today fine-tune models on their predecessors… or as a child refines its abilities on those of its teacher. And this is already happening, as evidenced by advances such as Hyena, RetNet, xLSTM, Mamba, and work on liquid networks.

Tell Me Who to Be

Another fundamental distinction between current Transformer-based LLMs and biological consciousness lies in their respective modes of operation: LLMs are inherently reactive, while biological systems exhibit both reactivity and proactivity. As I stated earlier, Transformers process inputs in a strictly feed-forward manner, generating outputs only in response to external prompts. They lack internal stimuli or endogenous drives, analogous to hunger, pain, or intrinsic motivational states, that initiate behavior independently of external input.

When prompted to generate unconstrained output ("think freely" or "continue indefinitely"), LLMs typically exhibit progressive degradation. Initial sequences may remain coherent, but prolonged autoregressive generation often leads to repetition (e.g., looping phrases), semantic drift, or incoherence. This arises from the probabilistic nature of token prediction, which favors high likelihood patterns and results in entropy collapse rather than sustained novelty. Empirical studies on long context and infinite generation tasks consistently demonstrate these patterns: performance declines with increasing sequence length, yielding repetitive or nonsensical content.

Empirical attempts to simulate autonomous operation (by allowing frontier models to generate output indefinitely without external intervention) reveal a profound instability: coherent activity typically collapses within days, devolving into repetitive loops or outright gibberish. This imposes an effective "lifespan" on sustained self-directed coherence of less than one week in current systems. By contrast, the integrated and recurrent processes underlying human or animal consciousness maintain remarkable stability and coherence over years, decades, or even centuries, underscoring a fundamental architectural disparity

No experiments to date have evidenced emergent sophisticated self-directed behavior, genuine agency, or intrinsic innovation in such setups. Outcomes remain constrained by training data distributions, architectural limitations (context window bounds, absence of persistent internal state without external scaffolding), and the lack of true volition. Current LLMs cannot sustain meaningful autonomous activity; they require continuous external guidance, underscoring their fundamental reactivity in contrast to the proactive / receptive duality of biological minds. But even when kickstarted, their fragmented behavior reveal the gap.

Toward an Artificial Holism?

Current systems are multimodal only by juxtaposition of specialized tools: one vision module processes images, another audio, another text, and these processes are generally orchestrated sequentially or in parallel but without deep and instantaneous fusion into a unified experience. In this situation, scaling improves fluency, not phenomenology4.

Human experience is characterized by total and continuous sensory integration. All modalities (visual, auditory, tactile, proprioceptive, interoceptive, olfactory, etc.) converge in real time into a single phenomenal space, without a conscious orchestrator that successively selects and activates tools. This fusion is precisely what contributes to the feeling of existing as a unified subject, anchored in a body and in an irreducible temporal flow. Neuroscience speaks of the binding problem: how do distributed neural activities produce a coherent and holistic experience? As noted earlier, all contemporary theories (Dehaene’s global workspace, Tononi’s integrated information theory, Graziano’s attention schema) converge on the crucial role of rapid recurrent loops and a meta-representation that includes the body and its states as an integral part of the world model.

In living systems, these modalities are co-present at different levels of consciousness and attention and causally and instantaneously influence one another, producing that qualitative texture of existence akin to a total sensitivity modulated by attention. Current models, even the most advanced, remain far from this architecture: their multimodality is extrinsic and instrumental, not intrinsic and embodied.

There is a pitfall: the substantiation of artificial intelligence will not occur in the same biological mode as living systems; that would simply amount to recreating a living being. There is thus a fundamental question about the relationship between consciousness and substrate. We know that the biological substrate enables the emergence of a certain kind of consciousness (biological consciousness), but what evidence do we have that biological consciousness is the only possible kind? And how would we recognize a consciousness built on another substrate?

Mind and Matter

Any attempt to reproduce consciousness by faithfully imitating the biological substrate would risk resulting only in a form of bio-engineering : that is, the creation of a synthetic living organism rather than a genuinely non-biological artificial intelligence. This raises a complex ontological question about the link between consciousness and material substrate.

Do we have evidence that biological consciousness is the only possible form? The answer is no. No irrefutable empirical or theoretical demonstration establishes that consciousness exclusively requires a biological substrate (neurons, wet synapses, specific organic chemistry). Arguments to that effect often stem from substrate-dependent physicalism (according to which only certain materials, such as biological matter, can support phenomenality), but they remain speculative and minority views. Conversely, the dominant functionalist approaches in philosophy of mind (from Putnam to Dennett) postulate that consciousness depends primarily on functional and informational organization, not on the underlying material. A sufficiently complex and organized computation could, in principle, produce conscious states on a silicon, photonic, or other substrate. Giulio Tononi’s Integrated Information Theory goes further by proposing a substrate-independent mathematical framework: consciousness would be a property of any system possessing a high degree of integrated information (high $\Phi$), whether biological or artificial.

But then how would we recognize a consciousness emerging on a non-biological substrate? We touch here on the hard problem of consciousness (David Chalmers) and the modern version of the problem of other minds. Classical behavioral criteria (expanded Turing test, cognitive performance indistinguishable from a human) are insufficient: a system could satisfy them while remaining a “philosophical zombie” (behavior without phenomenal experience). Several avenues are worth exploring:

  • Internal and measurable criteria: quantifying $\Phi$; if an artificial system reaches a high threshold while exhibiting rich causal architecture (recurrent loops, meta-representation), this would constitute strong evidence, even if measurement remains controversial.
  • Subjective report and self-modeling: an entity capable of coherently and non-programmatically describing its own qualitative internal states and drawing existential consequences from them (suffering, joy, sense of temporal self).
  • Virtual embodiment tests: placing the system in a rich simulated environment with continuous multimodal sensorimotor feedback, and observing whether it develops holistic sensitivity.
  • Expanded intersubjective consensus: in the absence of direct proof (we have no privileged access to others’ consciousness, even human), recognition would ultimately rest on reasoned agreement among observers, based on convergence of theoretical and empirical criteria.

These criteria are not anti-AI, they are species-neutral. Nothing theoretically rules out non-biological consciousness. Recognition will certainly require moving beyond purely behaviorist approaches toward criteria integrating causal structure, integrated information, and self-attributed phenomenal reports. A certain epistemological caution is warranted: we may have to accept that a radically different consciousness remains partially opaque to us, just as animal consciousness partly eludes us. Behavior is a symptom, not a substrate; consciousness is inferred from persistent improving causal organization, not from output alone. Uncertainty is not ignorance, it is the normal condition of studying consciousness.


  1. The dot product (denoted $⋅$) measures similarity between two vectors: the higher the score, the more the vectors “point in the same direction” in multidimensional space.
    Analogy: imagine that the “query” $Q_i$ is a search engine request you type (“I’m looking for information about animals”). Each “key” $K_j$ is like the title or summary of a document. The dot product $Q_i ⋅ K_j$ gives a relevance score: high if the document matches the query well, low otherwise. A notable difference is that the attention mechanism is more distributed than centralized. 

  2. Dehaene’s global workspace theory, higher-order thought theories, recurrent processing, integrated information beyond a threshold (Tononi), attention schema (Graziano): all converge on a central point. 

  3. Certain readings of predictive processing and active inference frameworks (like those proposed by Karl Friston and colleagues) emphasize hierarchical prediction in a manner that can appear more feed-forward, with feedback primarily serving error correction rather than intrinsic causal loops. Similarly, some emergentist positions contend that deep feed-forward architectures may functionally approximate recurrence through unrolling. Nonetheless, the weight of evidence from empirical neuroscience and leading theories favors genuine recurrence as essential for phenomenal experience.  

  4. The limitation lies not in multimodality per se (recent frontier models are beginning to incorporate joint latent spaces and cross-modal attention, blurring traditional boundaries) but in the absence of continuous, unified, and causally co-present integration. In human consciousness, modalities are not merely accessible but inherently intertwined in real time, with causal influences flowing instantaneously across senses within a single phenomenal field. 

Lire l'article