Chapter Two


The Reality-Status Of The Interface



Reality-Status and Stages of Audiovisual Processing


Cultural Studies takes the position that, while there is a material reality, knowledge is empirically derived and governed by social convention, such that the “realism” of a fictional text is evaluated with reference not to the object world, but to codes of vraisemblance (Barthes, 1975a; Culler, 1975; Neale, 2000). This position is, in a broad sense, in accord with Constructivists like Tan (1997) and Grodal (1997) who argue that “reality” is evaluated through inferences based upon schemata drawn from physics, psychology, magic, and religion.


However, whereas Cultural Studies theorises “realism” in terms of the naturalisation of ideological content (see Chapter Six), Frijda's (1986, 1988, 1989) account of emotion, specifically his Law of Apparent Reality, sees “reality” in functional terms. For Frijda, a concern must be interpreted as “real” if there is to be an emotional response to it. Obviously, if we have evaluated something as un real—in the limited sense of imagined, illusory, or representational—then we have decided that it cannot affect us (except psychologically) and that no physiological arousal or action is needed (except to self-regulate any psychological or psychosomatic effects). For example, being shown an image of a crocodile (which we recognise as such) does not elicit the same motor-activity as discovering one is in proximity to an actual crocodile. This is not to say that we do not have feelings, or an affective response, to something that we decide is “unreal,” merely that there is no action tendency, and therefore no emotion, according to Frijda's definition.


Within this framework, the paradox of fiction is that audiences have emotional responses to events they know are unreal. While Sartre (1938/1948) simply denied that film viewers feel genuine emotions, Carrol (1990), Grodal (1997), and Tan (1997) resolve the paradox by arguing that the thought of a concern posits a hypothetical or possible reality which produces an anticipatory simulated response with an emotional component. As Tan (1997) observes, this response is inhibited by a viewer's observational attitude. That is, while it is possible that one may share some characters' emotions, any emotional response is governed by one's position as a viewer or player.


What concerns us in this chapter, however, is that the processing of sensory input usually occurs outside of conscious awareness and only becomes conscious under certain conditions. That is, the appraisal of, and affective response to, a stimulus—including its reality-status—begins at a perceptual level prior to conscious and deliberate cognitive activity (Frijda, 1998, p. 281). It might initially be presumed that this early, perceptual evaluation of reality-status is especially bound to the intensity or salience of a particular stimulus, that is, the extent to which it is presented forcefully to one's senses as worthy of attention and response. By implication, it might be presumed that high affect is associated with reality, and low (or the absence of) affect is associated with unreality (Grodal, 1998, p. 34). However, primary appraisal incorporates many dimensions of a stimulus, and Grodal identifies several other fundamental parameters of reality-status: normal perceptual qualities, external (distal) stimulus source, temporal immediacy, concretion (as opposed to abstraction), intentional behaviour, and modality synthesis.


Since there is constant feedback between perceptual, cognitive and affective domains, in practice it is impossible to isolate these lower-level evaluations of reality-status. However, by using Grodal's (1997) four stage-model of the audiovisual processing of fiction (Figure 2.1) it is possible to differentiate key perceptual qualities of the interface which produce impressions of unreality. Grodal's model identifies five modes or types of affect on the basis of what part of audiovisual processing one's mental focus is directed towards, and it can be argued that different parameters of reality-status focus a viewer's, or player's, attention onto particular stages of this processing.



Grodal's (1997) four stages may generally be seen in terms of response over time, though neurological processing and feedback mean that the processes involved are not linear. The first stage involves basic sensory perception, such as the stimulation of rods and cones in the retina, and the brain's initial processing of “colours, contrast, and so forth, [to] find figures, ground and spatial dimensions” (p. 59). A “mental focus” on these “non-figurative perceptual processes” (p. 57), or, more broadly, the experiences located within our sensory systems, produces what Grodal calls “intensities.” The second stage involves a general shape identified in the first stage of processing being checked and matched against earlier memories, “aided by feelings of familiarity of unfamiliarity,” followed by the activation of “networks of association” (p. 60). This includes the attribution of labels and affective values, such that a shape identified as a snake “will surface in consciousness with its visual features and with its affective value” (p. 60). By Frijda's (1986, 1988) definition, the absence of secondary (cognitive) appraisal of reality in these initial stages means that there is no emotion, but intensities and saturations constitute an affective quality of experience.


At the third and fourth stages of processing, Grodal's (1997) account overlaps with Tan's (1997). The third stage is the “cognitive-emotional appraisal and motivation phase,” and involves the objects identified in stage two being “put into the framework of a hypothetical narrative scenario” (p. 60). For example, the Chocobo Eater may be labelled as “lethal danger” for Tidus, with whom a player identifies and/or sympathises. This allows for the labelling of the situation and its associated affect (fear), as well as the production of arousal, which motivates “telic-arousal-reduction procedures” (p. 61), or what Frijda (1986) and Tan (1997) call action tendencies. The fourth stage involves the reactive component, that is, the operation of arousal-reduction procedures, or action tendencies, while in a state of arousal (fight or flight). Grodal (1997) identifies three general types of response that can occur at this stage. When the mental focus is on (semi-) voluntary responses to goal-oriented activities or representations, one experiences the “tensities” (p. 57) associated with suspense, but which may take the more extreme forms of distress or terror. This is the case in most linear, plot- or character- driven canonical narratives in which we willingly invest in, and feel concern for, the fate of a protagonist undertaking some activity. Grodal refers to as “emo-tensities” (p. 61) those experiences when the mental focus is on repetitive or rhythmic semi-voluntary paratelic activity, as when one is lost in the (actual or represented) process of singing or dancing. Lastly, he refers to as “emotivities” (p. 61) those experiences when the mental focus is on involuntary autonomic reactions, as when one cries, laughs, or shivers uncontrollably.


The following, then, argues that deviations from normal perception and shifts between proximal and distal perceptions (Grodal, 1997, pp. 30-31) may be produced by formal qualities of FFX 's interface, fracturing modality synthesis. These processes, which can be interpreted as part of the primary appraisal of impression of unreality in FFX , guide players' mental focus onto intensities and saturations, which constitute a feeling-tone that may occur prior to and/or alongside secondary appraisals of reality-status.


Perceptual Deviation as a Consequence of Digital Articulation


As Grodal (1997) observes, when stimuli imprint upon our senses, one experiences “perceptual intensities [for example, patterns of light in the retina] without any meaning in the ordinary sense of the word” (p. 59). This experience encompasses not just the sensory organs, but the sensory cortexes involved in initial sensory processing. If our mental focus rests upon this element of audiovisual processing we experience intensities associated with aesthetics in the limited sense of defamiliarised sensory experiences. Everyday examples of this include watching the play of light on water, listening to the quality of a singer's voice wash over oneself, and repeatedly running sand through one's fingers to feel its coarseness.


If we limit the present discussion to visual qualities of video game interfaces it can be argued that a similar mental focus on intensities may emerge as a consequence of digital signs. As Grodal notes:

The visual system is preset to transform local intensity-values into a primal sketch (of objects and major features of objects), and, further, to produce a 2½-D sketch; and the directedness of these processes is mentally felt as a search for meaningfulness, for what it looks like. (p. 54)

This process was the focus of Gestalt psychologists such as Max Wertheimer (1958), Kurt Koffka (1963), and Wolfgang Kohler (1975), who argued that visual percepts are grouped on the basis of physical proximity, perceived similarity, and continuity of broken lines, until one perceives a discrete and/or familiar form and thereby experiences closure. What is significant is that forms of representation without obvious structure undermine the capability to render a primal sketch, or perceive a Gestalt impression. This is often elaborated by way of blotchy sketches or trick drawings, which only make sense when a particular focal point is identified. A more dynamic analogy is the experience when one moves closer to an expressionist, impressionist or oil painting, or when an out-of-focus image is focused, and an image emerges from the seeming chaos of perception (or vice versa). This effect is identical to that produced in games which include a player-controlled in-game “zoom” option. Whereas “zooming” in on a photographic image may emphasise hidden details, “zooming” in on a digital image emphasises that image's compound nature. Since each “zoom” maximises the degree of pixelation, it motivates increased attentiveness to the gap between the image and its composite pixels, until at a certain point the latter possess a perceptual salience that blocks Gestalt impression of the former.


This effect of zooming in on a digital image is exemplary of how digital signs in general may block a Gestalt impression of an image, as is evident when we consider the “articulation” of digital signs (see Eco, 1976). At low levels of resolution the pixel becomes a minimal or fundamental signifying unit which possesses a distinctive triple articulation. The first articulation of the digital sign refers to the way that pixels are combined in a seemingly infinite variety of syntagmatic relationships governed by screen resolution. Using a 1:0.75 ratio, game screen resolutions range from 320x240, 640x480, 800x600, to 1024x678 and higher. The second articulation refers to the way that each pixel is comprised from paradigmatic increments within a predefined range of hue (0-360), saturation (0-100%) and brightness (0-100%). The third articulation refers to the way that at higher levels of resolution pixels, like film images, resolve into “conditions of perception, . . . such as angles, curves, textures, effects of light and shade, and so on” (Eco, 1976, p. 45). At this point the pixel is indistinguishable from a visual continuum. Many monitors now produce a range of values that have no verbal equivalent and/or are unable to be perceived by the human eye in this sense.


In earlier games, of course, low resolution, the use of sprites, and interlaced scanlines meant that the digital articulation of computational objects was obvious in every image. Indeed, despite the stylistic shift towards “cinematic realism” (Darley, 2001; Friedman, 1995; Stallabras, 1996) and the recent use of anti-aliasing to smooth 3D images, the resolution of games is still often low enough that digital articulation. When individual pixels are visible in this way they may function like phonemes that double as morphemes. That is, just as “a” is both a minimal signifying unit and a meaningful word, a pixel may be a minimal signifying unit that also represents a “glint” in a character's eye, a “sparkle” on a metal surface, or a “bullet.” However, since pixels may be comprised of multiple percepts of hue, saturation and brightness, a highly saturated, dark red pixel could signify not just a “glint” but “insanity,” as distinguished from “spirited enthusiasm.” Players may also exploit the double articulation of language when interpreting digital signs. For example, a player may perceive a phonemic correlation between “glint” and “flint” (reinforcing connotations of a character's fiery disposition), “sparkle” and “buckle” (indexical of powerful armour and/or the power of a character who wears it), or “bullet” and “billet” (in FFX a Tent item may be used to heal injured characters). Alternately, a player may perceive a morphemic correlation between “glint” and “sparkle,” seeing both as indexical of life (the light in the eye of someone who is in a lively state) and/or wealth (the typical representation of gold as shiny). Indeed, since video game currency is often spent on items (like potions) that increase health, and since the longer characters live the more items and wealth they accumulate, any passing equivalence of life and wealth would would have a meaningful resonance.


Of course, in Eco's (1976) terms, such manipulation of the continuum of the expression-plane generally produces “aesthetic overcoding” (pp. 262-270), in that the viewer perceives a surplus of expression that may not be consciously grasped. Indeed, the sheer enormity of combinatory elements at the digital level cannot be made meaningful in all its particularity, and perceptual awareness of this surplus may gesture towards an impression of either excessive meaning or meaninglessness. Yet the recognition of these signs as (potentially) meaningful may be evaded precisely because of the “threshold of functional relevance” (Culler, 1975, p. 143), in that the player's mental focus operates at a certain level of functional generality. In other words, digital signs are not meaningful when the player is not observing them.


For example, in FFX the in-game graphics engine of the Navigation and Combat Screens are of a low enough resolution to perceive digital articulation, but not in such detail that they are likely to be perceived as individually meaningful. Pixels are most visible at the edges of objects, including character models, and at some level this will be reflected in eye movements which periodically track these edges (see Figure 2.2a). In earlier 3D engines, vector lines which stepped across scan lines were visible as two or more adjacent lines, and when an imprecise calculation of vectors resulted in a dark or white line or area between two filled polygons, such staggered lines were effectively highlighted. While FFX has no problems calculating its polygons, long border lines in the Navigation or Combat screens are still visibly subdivided across scanlines, foregrounding their digital articulation. The edges of narrow objects, such as the branches of trees, have insufficient texture space to provide a well-defined form, and produce this effect of highlighted digital articulation.


Figure 2.2. (a) Close-up the digital articulation along Yuna's shoulder and blurry texture map in the background. (b) Close-up of the textured polygons that comprise Tidus' hair and (c) a supposedly round blitzball.


While the number of polygons used in creating video game characters and objects is increasing, and their use is disguised by the use of detailed textures, there are instances in FFX where the polygons are obvious (see Figures 2.2b and 2.2c). Generally, graphical textures have also become more detailed, and therefore less visible, whereas in early 3D games, such as in Quake (1996) and Goldeneye (1997), a slightly blurry, semi-realistic face was mapped onto the misshapen head of an angular character model comprising of a small number of polygons. This incongruity is less noticeable in FFX , in that its character and object models have fairly congruent polygons and textures, but textured bitmaps are produced for a certain level of optical perspective and extrapolated onto surfaces and so they only look realistic at certain distance. When an enlarged texture is shown close-up it usually betrays its construction (see Figure 2.2a). While this effect was especially pronounced in the first generation FPSs like Wolfenstein 3-D (1992) and Doom (1994), it can be observed in FFX when Tidus walks close up to the walls during the Trial of the Fayth in Besaid. At such moments the textures which comprise the walls take on an artificial, blurry or blocky quality. Similarly, whenever players are presented with a close-up of Tidus and other characters attention is directed to the digital texture of skin, hair, and clothing (see Figure 2.2c). Even if these deviations from normal perception are not always observed during gameplay the FFX manual and game magazines offers close-ups or enlargements of digital images which provide a comparative basis for the eminence of digital articulation in the game.


The disparity between a Gestalt impression of an image and its compound digital composition is also evident in the different degrees of iconic abstraction across FFX . First, there are the character portraits in the menu screens, which partake of a cartoon style of representation and exploit the rich, opaque colours of the CRT (see Figure 2.3a). Second, there is the in-game engine, which is not much different from that of contemporaneous games, in that characters and environments are visibly angular textured models, and pixels are visible at the borders of game objects as they move (see Figure 2.3b). Third, there are the most detailed, near-cinematic quality cut-scenes. These include skin imperfections, subtle lighting and shadows, detailed textures of clothes and hair, and fluid physics for character movement and the environment (see Figure 2.3c).


Figure 2.3. Degrees of abstraction in FFX : the Aeon Valefor as represented (a) in the menu screen, (b) in the game engine, and (c) in a cut-scene.


These levels of visual abstraction, or iconism, create a hierarchy of abstraction, in that the pre-rendered cut-scenes and game sequences which conform to objective spatial and temporal logic are seen as having the highest reality-status. This hierarchy is, of course, subordinate to the concreteness of everyday perception in the sense that players will likely attend to the degree of realism. That is, during both high-resolution, cinematic cut-scenes and lower-resolution but smoothly animated game sequences players may specifically attend to how realistic the images seem. This appraisal is predicated on a perception that the images are merely representations, not real objects, and this may lead to a focus on the degree of abstraction, or, rather, perceptual qualities over and above Gestalt qualities.


Different levels of abstraction are also produced through differences between the representational qualities within the game and the representational qualities (packaged as part of the product but) outside the game. Most gamers are familiar with the (often comical) disparity between the (semi-)realistic pictures on arcade machines, posters and boxes, and the crude, abstract graphics of the games themselves. FFX 's graphics are more detailed than earlier generations of games, but attention may nonetheless turn to the disparity between: on the one hand, the pre-processed, digitally rendered photos or paintings printed on the cover, in the manual, and in the play guides; and, on the other hand, the real-time, lower-level resolution images generated for the bulk of the game. These differences are reinforced by the distinctive qualities of the respective media. The raster- or vector-based video interface possesses a distinctive “chrome” quality (Stallabras, 1996, p. 8) in which the cold, smooth glass separates the viewer from the untouchable and pristine image. When images appear on glossy paper, as with the FFX CD cover, manual, and the Playstation Solutions (Pattison, 2002) play guide, they minimally resemble the chrome effect. However, when the images are printed from a digital source, as when one prints pictures or maps from a Website, they have a blurry, sketchy quality as a consequence of re-sizing to a different resolution. Words and images printed in paper or cardboard manuals, including printed FAQs, hints and cheats which a player may use while playing FFX , have a coarseness which lends greater salience to tactile qualities, such that the entire in-game hierarchy of abstraction gains a relative quality of abstraction.


It is possible that, like pixels, the use of lines, polygons and differing levels of abstraction may be perceived as meaningful. Just as wooden acting may help to code a character as nervous or awkward, a character whose form is comprised by clear lines or polygons may be perceived as “simple” or “hard-edged.” When interactive sequences are represented in a lower resolution than non-interactive sequences, the blocky lower resolution may signify that experiences are unclear (subjective) when one is acting in the midst of things, but come into focus when one steps back to witness them from an external (objective) perspective. Alternately, the “chrome” effect of an iconic image onscreen may connote an idealised fantasy world, compared with the gritty reality connoted by a poor quality printout of a puzzle solution downloaded from the Internet.


However, the demands on a player's attention usually mean that the visibility of digital qualities is not meaningful in the semiotic sense, but primarily serves to create the effect of a digital zoom. We can restate the phenomenon in the following terms. During gameplay, low-resolution sequences, vector miscalculation, in-game close ups, out-of-focus bitmaps, the edges of object models, and inconsistent abstraction may make visible the distinctive digital articulation of the interface. This visibility may give way to perceptual attentiveness, manifested by way of unfocused eye movements and/or the visual tracking of the borders or textured qualities of the objects onscreen. This may defer the Gestalt experience of closure, preventing the image from being processed at a higher stage of cognition, thereby maintaining a player's mental focus on perceptual intensities.


Introjective and Projective Transactions as a Consequence of Enaction and Enactive Subtraction


Grodal (1997) argues that reality-status is also affected by whether a phenomenon “is experienced as located in exterior space, on the corporeal rim, or in the body-mind interior” (p. 130). He elaborates:

The technical term for a stimulus as a transmission from objects in the exterior world is distal stimulus, as opposed to the term proximal stimulus, which describes it as a process in the perceptual system. (p. 131)

When digital articulation blocks Gestalt perception it places emphasis on perceptual patterns, or intensities, which are seen as located within the mind-body, such that players will experience introjection, and, by implication, an impression of unreality. It is, of course, impossible to be aware of certain proximal stimuli because organisms have no sensation of many of the processes that occur inside their bodies. For example, humans have no sensation of the process whereby images saturate the retina, they only experience the post-processing of the image. Consequently, the labelling of events as proximal or distal is often retrospective.


For Grodal:

the process by which a phenomenon is relocated from being experienced as an aspect of the body-mind to being experienced as an aspect of the exterior world [is called] a projection , the reverse process being an introjection . An example of a projection is a situation in which we see something that we first believe to be a dream or a hallucination, but that we then discover to really exist. An example of an introjection would be seeing a film noir and feeling that the wet streets are not ‘real', but represent mental states. (p. 130)

Introjection is produced not only through a mental focus on perceptual intensities, but also through a mental focus on Grodal's second stage in the processing of audiovisual input. When stimuli pass from the sensory organs to their respective sensory regions in the cortex, memories (or “networks of association”) are searched for familiar labels and affective values. When the mental focus falls on this part of processing it gives rise to what Grodal calls “saturations.” These:

are the results of emotionally toned and memorized perceptions which have not been transformed into ‘motor' tension, and therefore sensation, input-processing, and memory-functions become visible as distinct phenomena. In memories this saturation is a trace of not-enacted excitation that is cut off from its original context of enaction. (p. 56)

This is the characteristic focus of attention during dreaming or daydreaming, and when the mental focus is on this stage of processing it gives rise to a dream-like or “lyrical” style often characterised by “dynamic repetition” (p. 60).


Intensities may, by promoting introjective experience of sensation, promote associational thinking, and therefore a shift to saturations. For example, if a player stares at the image of the burning campfire in the opening sequence of FFX s/he may initially be focused on the perceptual intensities associated with its rhythmic yet dynamic movement. However, networks of association may be activated: campfire storytelling: reading stories by a hearth: the romance of the book: the romance of Tidus and Yuna: Tidus playing blitzball underwater: the pre-eminence of water in the game: the extinguishing of fire: characters resuming their journey. The mental focus here is displaced onto the sensory, cognitive and affective content and qualities in one's memory, and this displacement will likely be experienced as a displacement of one's attention away from the screen, as an inward turning.


Certainly, in Allen's (1997), Tan's (1997) and Grodal's (1997) accounts, watching a film is in some respects similar to daydreaming, and evokes the experience of a dream. However, while dream content is not yet transformed into a distal, motor attitude, our aesthetic response to films (and, potentially, daydreams) usually is. I n daydreams there is a potential for conscious fantasising, whereas during film viewing one's consciousness is directed less towards one's own associational networks than to active inferences about what is disclosed by a linear sequence of structured associations, such that viewing is usually focused on telic, distal sequences which produce tensities. That is, viewers feel suspense (tensities) directed at (distal) events that befall characters as they try to achieve their (telic) goals. The focus of both Tan's and Grodal's accounts is that the action tendencies or motor attitudes in film are inhibited as a consequence of an observational attitude (Frijda, 1986), in that viewers do not act upon them, and emotions are manifested psychosomatically.


However, Grodal (1997) argues that:

By switching between active and passive modes of representations the different modes and genres of fiction can produce transactions between distal and proximal modes of experience. Subjective experiences and sensations may by the use of active narrative schemata, be projected onto the object world; objective phenomena may, by the use of passive narrative schemata, be refocused, introjected into being experienced in their subjective form as qualities (intensities, saturations, and emotivities) of the body-mind. (p. 132)

For example, the representation of a sequence without any clear subject-actant, or without any particular goal or narrative schemata, produces passive-introjection (Grodal, 1997, p. 159). Art-house films often emphasise paratelic sequences in this sense, producing a “dreamy” quality because, like dream-states, they involve some subtraction of control (of intention), and emphasise networks of association. Grodal's example of the loss of enactive participation in film is the experience when one views a tape in slow motion. Such a sequence will be perceived as:

‘shallow', ‘lyrical', ‘timeless', ‘saturated' . . . . We no longer have full enactive identification with the fictive phenomena. The acts and characters become objects controlled by an invisible subject. (p. 47)

This experience of slow motion as transforming the telic-oriented character of most fiction “into a mood of pastness, memory, or abstraction” (p. 47) may be described as physical and cognitive detachment from the event, and it may also be seen as paradigmatic of one's experience of partial (or suspended) enactive capacity in film and during certain sequences in video games. The viewer watches the screen, motivated by the paratelic rhythm of repetition and variation rather than tension (suspense) about the expected achievement of a represented goal. Consequently, within the position of passive witness the most extreme proximal impressions of enactive subtraction are specifically produced during special (dreamy, “artistic”) sequences.


Nonetheless, as Grodal (1997) acknowledges:

In a video game, the connection between the screen and the viewer is established both as the visual perception of what is taking place and as a capacity to influence the action by intellectually controlled motor response via the joystick. (p. 48)

The implication is that, in video games, shifts between proximal and distal foci are not confined to active or passive representations of narrative schemata: the player periodically has a sensorimotor relationship with the game itself, and this creates a telic, distal focus on whatever is being controlled. Indeed, since narratives sequences in games define and organise the goals played out in the game schemata they provide a basis for enactive projection, or, rather, they provide the telic, distal emphasis that is acted out when the player shifts into an enactive mode.


Newman (2002) has discussed this issue by observing that some sequences have “an ergodic potential that demands and fosters a greater degree of player engagement than a standard cut-scene or introduction” (p. 6). This is most evident in cut-scenes which explicitly state or demonstrate goals and thereby function as both a period for preparation and a cue for action. In FFX , the cut-scene representations of the appearance of key monsters, the kidnapping of the Summoners, the attack on the Al Bhed home, and Tidus' confrontation with Jecht, provide simple goals like “fight Sinspawn,” “find Yuna” or “go to Bevelle” that players can accomplish using previously acquired skills. However, diegetic and non-diegetic planning and preparation are often combined in more explicitly instructional cut-scenes. Such cut-scenes may be seen as having their origin, or parallel, in military, espionage, and heist films, such as Ronin (1998), in which a plan is broken down into its component steps. Perhaps the best example of this is not in FFX but FFVIII , when the characters meet and discuss their plan for kidnapping the Garabadian president from his private train. The leader of the revolutionary movement goes over the plan step-by-step with the characters using a map, indirectly addressing the player (who can choose to hear the plan repeated if s/he did not follow it the first time). However, FFX includes in-game tutorials and instructional text boxes that appear prior to the mini-games, and these perform the same function. These tutorials or instructions are more explicitly denotative and direct in their mode of address, and do not partake of the kinaesthetic and telic qualities of the game-sequences for which they provide instructions. Yet they position the player in the same way as instructional cut-scenes by preparing them for enactive access.


When a goal is understood prior to the end of the cut-scene which demonstrates it, players will likely disregard the implicit position of passive witness to the remaining content through impatient or eager tension about the shift itself. That is, players may watch many of the cut-scenes with a degree of anticipatory arousal whose telos is directed less towards a specific represented goal than towards the (anticipated) performance of the shift into a sensorimotor mode. This may be seen as analogous to the nervous anticipation that is felt at the start of a race, which is followed by a momentary excitement as the race begins, and then the tension of the race itself.


However, given that instructional cut-scenes prepare the player for enactive access, it is important not to underestimate the distinctive and unusual quality of experience when players enter a state of motor activity. The experience of this shift may be described by analogy to the moment when one wakes from a dream, or moves from being a mere (day)dreamer or spectator to an acting subject, and has to re-orient oneself to the physical world. This intrusion of the “reality principle” upon the comparative “pleasure principle” of escape into one's associations means that the shift from enactive subtraction to enactive access has a quality of physical exertion or fatigue, a resignation, or energetic leap into, a sensorimotor paradigm (Freud, 1900/1953). It is not merely that players have a shift in mental focus; rather, the sensorimotor components of the nervous system are suddenly distally reoriented, so that players experience a shift into greater immediacy, concretion, salience, and, therefore, reality-status.


At the same time, the shifts into enactive mode in a video game cannot be held as a move from a state of complete enactive subtraction to a state of complete freedom. Interfaces offer a mediated, highly removed and abstract mode of interaction, and enactive limitations are placed on the player for the sake of providing a challenging game. As Poole (2001) argues, players do not want complete reality of immersion, and gameplay edits out the uninteresting aspects of whatever activity the game is simulating:

You don't want to actually be there, performing the dynamically exaggerated and physically perilous moves yourself; it would be exhausting and painful. . . . You don't want it to be too real. The purpose of a videogame . . is never to simulate real life, but to offer the gift of play. (p.63)

So there is a difference between, on the one hand, the complete subtraction of physical enactive access when players are positioned as a witness during a cut-scene, and, on the other hand, when a game cues an enactive attitude in the sense of a telic-orientation towards the game world, but blocks certain actions.


This blocking of enaction can be addressed in terms of the construction of space, or more appropriately, the blocking of movement through this space. Of course, space has been radically re-theorised in the last few decades, but the audience of a fictional text draws from schemata of objective space as physical, static and linear when constructing the relationship between events and objects in the fabula. Like deviations from normal perception, deviations from these “objective” schemata of space and time are often “perceived as an expression of a subjective factor” (Grodal, 1997, p. 134). That is, the “scaling, size, and isolation of details from a complex-composite totality” (p. 134), as well as the use of unusual camera angles, editing, and lighting effects, is experienced as a subjective distortion “ because of an underlying viewer-assumption of a normal, unmediated and objective access to the object-world” (p. 134). As Grodal observes:

If a single picture is constructed by mixing two or several pictures from different points in time or space there is normally no possible external source for the frames and they will therefore be interpreted as expressions of a mental association. Correspondingly, fast sequences of different shots or stills will be interpreted as mental associations. (p. 135)

In this respect, we tend to perceive as mental associations dream or arthouse sequences which abstract and combine spaces or objects in ways which are unusual or impossible in everyday experience. This includes the spatial paradoxes in M. C. Escher's (1992) paintings, the living but separate body parts in Salvador Dali's surrealist paintings (Descharnes, 1984), and fantasy novels and films which represent hybrid or talking animals and miniaturised or gigantic people, as in Lewis Carrol's (1865/1988) Through the Looking Glass .


In FFX there are many such associational qualities in the onscreen representations within the diegesis. Most obvious is the constant mismatch of size and scale between the player-characters, the environment, and opponents. In the Navigation screens, the size of the image of Tidus is often not on the same scale as the landscape. For example, in Besaid the scale seems roughly 1:1, but when Tidus walks or rides a Chocobo across the Calm Lands the scale may be perceived as anything from up to 1:1000, depending upon how the player interprets the depth cues in the expositional cut-scene about the Calm Lands. During encounters with many of the game's monsters, the difference in size between Tidus and Sin, Yunalesca and Braska's Final Aeon beggars rational comprehension. That Tidus opposes such monstrous beings completely suspends real-world logic. Many of the settings in FFX are also spatially associational. Th e Trial of the Fayth in Bevelle involves conveyer belts and levitating platforms represented by electric signals reminiscent of the non-space inside a computer. The spaces inside Sin are swathed in mists which not only obscure distance but are traditionally coded in film as associated with a dream-state, or the foggy state of drug-addiction, delusion or semi-consciousness. Squaresoft's Final Fantasy X: The Official Strategy Guide (2002) specifically notes: “Your surroundings are rather surreal, with constantly changing perspectives; no wonder you feel disoriented!” (p. 133). When Tidus confronts Braska's Final Aeon the background reinforces the quality of spatial abstraction through the tumbling pagodas, suspended mid-air, and the floating stones of ruined Zanarkand. Inasmuch as these representations violate objective space, they may be perceived as mental abstractions, cueing passive-introjection.


Of course, since video games involve not merely representation, but also navigation of space (Aarseth, 1998; Fuller and Jenkins, 1995; Murray, 1997), it is useful to draw from Lefebvre's (1991) categories of space: the physical space “perceived” during interaction with the world; the formal, or “conceived,” representation of space; and the spatial practice itself. During gameplay the spatial practice emerges from the way the perceived spaces onscreen are navigated by the player and conceived of as a spatial representation (a spatial logic or schemata) that sets up expectations about enactive possibilities.


The physical, non-diegetic spaces of FFX include the space of the hardware (screen, console and game controller) and the space of the environment (the floor, couch and other furniture), and the relational space between them, governed by the length of cables and the proximity required to turn the console on and off or change disks. Given that the interface is not merely observed, but interacted with, its spatial representations are bound up in the spatial practice of gameplay. However, since navigation requires an understanding of the spatial logic governing the game, the interface may be seen primarily as a spatial representation. Compared with film, in which the environment is so dark that it usually falls from context, in video games like FFX there is a greater disjunction between the mundane physical space(s) occupied by the player and the spatial representations perceived onscreen as a consequence of depth cues. This is the distinctive experience, also found with television, in which one seems to look through a window at another physical space.


This general disjunction between physical non-diegetic space and the non-physical space on screen is amplified by the non-physical (represented) non-diegetic space of the interface. A s Per Persson (2001) argues of the typical point-and-click, menu-based graphical-user interface (GUI), the layers of menu boxes may be seen as either physically above or below each other, or as insubstantial, in that they appear from, and disappear into, a non-visible space. Persson's argument may be extended to the accessing of menus and submenus, and the use of ergodic symbols and indexes, in FFX , which similarly appear and disappear over the diegetic spaces. However, it may apply in another sense, since m enus are not organised by any natural relationship between the content of menus and their spatial arrangement. Rather, menus (and screens) are organised around such functional categories as “weapons,” “armour,” “use,” “sort,” “equipment,” and “key items.” Since these menus utilise conventional categories of objects and functions within adventure games, most players are likely to perceive their congruence between idea and space as fairly normal. By contrast, a list of inventory items organised by such categories as “material,” “colour,” “shape,” or “fashion,” which are non-functional in the game as its stands, would be perceived as arbitrary, or relatively associational. It can be argued, then, that the interface may evoke the associational quality of internal, mental processes through the co-existence of a general perception of the menus as insubstantial (thereby violating normal space) and of the organization of physical space by abstract, functional categories.


These generally non-diegetic constructions of space frame the way players navigate the diegetic space of the game world. Mark Wolf (2001) identifies elementary spatial structures which rework spatial conventions of print and cinema, and his categories are useful in identifying inconsistencies in the diegetic space of FFX . The setting of every attack with the Combat Screen is a contained, single-screen space (p. 55), but such spaces exist in the Field Screen, such as the clearing in the forest in Macalania. Tidus can move through this space to talk to the people, and may exit by leaving the front of the screen, but the perspective remains static.


Of course, the clearing is part of the rest of Macalania Forest , and most spaces in FFX , while finite and bounded, are larger than the screen but only displayed one screen at a time (Wolf, 2001, p. 59). That is, Tidus simply walks to the edge of the screen and the next, adjacent screen appears, with Tidus accordingly repositioned onscreen—an effect analogous to switching between the views of security cameras located in a sequence of rooms as a character passes through them. This is a spatial convention common to many adventure games, like Quest for Glory (1980/2001), and demands the same understanding of spatial convention as cinematic editing, in which continuity is perceived during a cut between two shots. In FFX , when Tidus passes through parts of the forest in Macalania or Thunder Plains, players presume a spatial logic in which travelling upwards, to the top of the screen, leads to a (not necessarily adjacent) northerly space. Players may perceive gaps between such spaces as dilations of space to conserve geographic credibility. After all, while Macalania Forest , the Thunder Plains, and the Calm Lands may take up large portions of Spira they are represented by just a few onscreen spaces.


This jumping between (relatively) adjacent spaces is often combined with scrolling (Wolf, 2001, pp. 57-59). For example, when Tidus moves through the paths that lead to Besaid he moves within the frame until he comes close to the edge of the screen, whereupon the screen scrolls; eventually, he reaches the edge of this continuous space, and the screen jumps to an adjacent continuous space. This scrolling often occurs on two axes, most notably in the vast expanse of the Calm Lands. What is significant is that, in such spaces, players will perceive movement into the z-axis (the depth of field) even if all that is actually occurring is the scrolling of a fixed background on a vertical axis (p. 63). However, during sequences such as exploring the Submerged Ruins or the Trials of the Fayth, there is actual movement into the z-axis, as a result of the continuous, dynamic reconstruction of the perspective around Tidus' changing co-ordinates; in short, through Tidus' movement in an immersive 3-D space (p. 65).


The importance of these mixed spatial conventions is that the physical surface of the flat screen is inconsistent with the perception of its spatial depth, and the spatial logic across these spaces may not always be properly differentiated. There are several instances across the Final Fantasy series during which cuts between adjacent spaces is accompanied by a shift of direction which violates the continuity of movement. When players guide Tidus through the corridors of the Al Bhed home in Sanubia Sands the perspective sometimes changes between screens. If the player pauses to look around, then absently resumes moving the analogue joystick in the same direction, Tidus walks back into the previous screen. There are also inconsistencies between onscreen spaces and mapped spaces (Wolf, 2001, p. 67), such as the minimap in the corner of the Navigation screen and the full-screen map of Spira (see Figure 2.4). These provide global information about the game world which help the player make decisions about where to go and what to do on a local scale. However, even if players simply accept that items are not perceived on the mini-map, the borders of the mini-map do not always map clearly onto the contours of the diegesis, showing spaces that cannot be accessed, or not showing spaces that can be accessed.


A more general problem is that depth cues provide an ongoing, misleading perceptual impression of enactive capacity. FFX has few problems representing object overlap, foreshortening, apparent size, and changes in perspective, drawing from such conventions as changing object sizes, altering texture and/or colour gradients, and the use of parallax backgrounds, which were used to compensate for problems producing depth cues in early video games (Wolf, 2001, p. 72). However, its realistic depth cues hold open the promise of potential enactive capacities within the virtual environment. Players constantly see spaces that seem accessible, and objects that look like they can be manipulated, yet any attempt to access those spaces, or to manipulate those objects, is blocked. In Figure 2.4a, for example, Tidus is unable to climb up the ruins, and in Figure 2.4b he is unable to examine, pick up, or use, any of the many objects that surround him.


Figure 2.4. (a) Tidus walks under an old Machina. (b) Tidus in a cluttered hut in Besaid.


Players may become adept at spotting, and accepting, such spatial conventions, but as players move between these different representational spaces there may be moments of spatial confusion. This kind of enactive blocking may lead to the same kind of drastic awareness of distal stimulus that occurs when one attempts to walk into a trompe l'oeil or a glass door only to discover that what one thought was distal physical space was merely proximal perceptual or conceptual space. In everyday, public situations, the sudden projective shift to a distal focus as a consequence of misunderstood spatial cues is likely to lead to a proximal experience of embarrassment, readily displaced as anger at the offending architecture. During solitary gameplay, however, there are no witnesses, and selfhood is displaced into a sensorimotor paradigm that is not central to one's sense of personal mastery. Any lack of grace or mastery in gameplay is simply likely to reflect upon one's ability to play that particular game. Consequently, blocked enaction cues a distal transaction: an unwelcome reality principle that is imposed upon the pleasure principle of the play experience. This may be experienced as merely a moment's confusion, anger or frustration directed at the limitations of the game, or a resigned, fatigued, or automatic redirection of one's mental and virtual steps around the obstacle.


Even if a player comes to learn and accept these frequent blockings and suppressions as part of the game's spatial codes they nonetheless represent an enactive capacity which is imagined and desired but not offered. Even a minimal perception of enactive subtraction, such as the realisation that punching the screen will not help Tidus to defeat Braska's Final Aeon, produces a generalised tension directed at the gap between imagined and actual affordances. It is not merely that enactive capacity is frustrated, since one experiences a disjunction between the proximal abstraction of space and the distal construction of concrete space. Even if the distal focus on what has blocked one's movement is followed by a proximal focus on one's spatial schemata—as when one discovers that a previously accepted fact is wrong, and pauses to reflect upon it—the disparity between the two may itself produce a temporary impression of unreality.


It is not simply the case, then, that passive modes of representation encourage proximal experiences (saturations), and active modes of representation encourage distal experiences (tensities). In video games, shifts to enactive (game) sequences also usually encourage a distal, telic emphasis, while shifts to enactive subtraction (narrative sequences) encourage a proximal, paratelic emphasis. However, it is also possible for enactive sequences to become paratelic, promoting introjection. For example, many goals in FFX take so long to complete, or are interfered with by other goals, that they lose salience. The goal to achieve the Celestial Weapons requires the achievement of countless intermediate goals, and in many cases these goals can only be achieved when one is lucky enough to wander into the appropriate area and find the next clue. The rhythm of movement, fighting wandering monsters, healing, developing characters, buying and selling acquired items and so on, may mean that the goal loses all relevance. More than this, these sub-goals may lose salience through their habituation. The waning of goals in this way may cause enaction to become increasingly paratelic, oriented around the rhythmic or associational process of combat, navigation, and other recurring game macrostructures.


Conclusion


In everyday experience, sensory modalities “co-operate to provide a holistic experience” (Grodal, 1997, p. 31), but deviations and other effects may cause these modalities to be split up, as when one perceives:

sound without image, whether motivated by darkness or unmotivated; image without sound; represented acts in which ‘solid' phenomena lack their tactile qualities (such as ghosts, and mirror-images); lack of synchronization, and so forth. (p. 31)

Incomplete modality synthesis in video games is often a consequence of technical limitations of graphical and sound hardware, in that the delay in processing of one or more sensory modalities in relation to enactive capacity leads to a perceived violation of Gestalt all-in-oneness. However, while the processing power of the PS2 means that FFX has few technical problems synchronising sensory modalities in high-level cut-scenes, a sense of disjunctive sensory integration results from inconsistencies between parameters of reality-status, such as those discussed above.


We cannot conclude that qualities of the interface produce an impression of low reality-status, leading to a decreased level, deferment, or subtraction of the immediate state of arousal. Rather, deviations from normal perception and from objective representations of space may guide attention to intensities and saturations, and shifts between proximal and distal experience through the affordance of enaction, may reinforce a sense of ambivalence about the game's reality-status. Indeed, even if a player completes secondary appraisal and nominates a sequence as “real” it is possible that there may be a persistent quality or feeling of unreality. Later chapters argue that FFX may recuperate these ambivalent or disjunctive impressions of unreality as aesthetically meaningful through a hermeneutic concern with reality and tragedy. However, before addressing these issues it is necessary to address the foundations of basic emotions in gameplay.




Footnotes


1. Hue measures the wavelength of reflected light from or transmitted by an object, measured as a location (0-360) on a standard colour wheel. Saturation measures the strength or purity of the colour and is represented as a percentage of hue in relation to grey (0-100%). Brightness measures the relative lightness or darkness, independent of its hue and saturation, represented by a percentage from black to white (0-100%).