Multi-Image Advantage in Face Identity Matching Depends on the Formation of Face Representations
Junye Feng, Wang Zhe, Sun Yuhao
Submitted 2025-06-22 | ChinaXiv: chinaxiv-202506.00195

Abstract

Presenting multiple face images of the same person can significantly improve participants' recognition performance for face identity. However, the cognitive mechanisms underlying the improvement in face recognition performance through multiple images remain unclear. This study includes two experiments. Experiment 1A used a face matching paradigm to simultaneously or sequentially present one, two, or three faces, measuring participants' discriminability under different conditions. The results revealed: (1) Only under sequential presentation conditions did participants' discriminability improve with increasing number of images (a multiple-image advantage emerged); (2) When three face images were presented, participants' discriminability under sequential presentation conditions was higher than that under simultaneous matching conditions. Experiment 1B controlled face presentation time and replicated the above results. Experiment 2, building upon Experiment 1A, inverted faces to disrupt the integration process of face representation, and the results showed that, (3) Regardless of whether the learning images were single or multiple, participants' discriminability under sequential presentation conditions was lower than that under simultaneous presentation conditions, (4) No multiple-image advantage was found in either task. In summary, the experimental results suggest that the multiple-image advantage in face identity discrimination originates from the formation of face representations and that this process requires the involvement of memory.

Full Text

The Multiple-Image Advantage in Face Identity Matching Relies on Facial Representation Formation

FENG Junye, WANG Zhe, SUN Yuhao

Department of Psychology, Zhejiang Sci-Tech University, Hangzhou 310018, China

Presenting multiple face images of the same person significantly improves participants' performance in face identity recognition. However, the cognitive mechanisms underlying this improvement remain unclear. This study comprised two experiments. Experiment 1A employed a face-matching paradigm in which one, two, or three faces were presented either simultaneously or sequentially, measuring participants' discriminability across conditions. The results revealed: (1) discriminability increased with image number only in the sequential presentation condition (demonstrating a multiple-image advantage); (2) when three face images were presented, discriminability was higher in the sequential condition than in the simultaneous matching condition. Experiment 1B replicated these findings while controlling face presentation duration. Experiment 2 built upon Experiment 1A by inverting all faces to disrupt the integration process of facial representation. The results showed: (3) regardless of whether study images were single or multiple, discriminability was lower in the sequential condition than in the simultaneous condition; (4) no multiple-image advantage emerged in either task. Taken together, these findings suggest that the multiple-image advantage in face identity discrimination stems from facial representation formation, a process that requires memory involvement.

Keywords: multiple-image advantage, facial representation, face matching, face recognition
Classification Number: B842

Recognizing unfamiliar faces is notoriously difficult (Young & Burton, 2018). However, researchers have found that presenting multiple face images of the same identity can improve face recognition performance (Andrews et al., 2015; Baker & Mondloch, 2019; Baker & Mondloch, 2023; Bindemann & Sandford, 2011; Dowsett et al., 2016; Longmore et al., 2017; Menon et al., 2015a; Murphy et al., 2015; Sandford & Ritchie, 2021), a phenomenon termed the multiple-image advantage.

Received: December 7, 2024
Funding: Zhejiang Provincial Natural Science Foundation (LY19C090006, LY20C090010)
Corresponding Author: SUN Yuhao, E-mail: sunyuhao@zstu.edu.cn

Researchers attribute the multiple-image advantage to the formation of facial representation, positing that exposure to multiple images of the same identity enables individuals to construct a representation of that identity; more images yield more robust representations, thereby improving recognition performance (Devue & de Sena, 2023; Ritchie et al., 2021). An alternative explanation suggests that the advantage stems from the quantity of image information—that is, more images provide more information, which enhances recognition performance (Kramer et al., 2020; Menon et al., 2018; White et al., 2014). Thus, whether the multiple-image advantage truly originates from facial identity representation formation remains uncertain.

The multiple-image advantage phenomenon is not stable, and the presentation method of study and target faces may be a key factor influencing it (Sandford & Ritchie, 2021). Studies reporting multiple-image advantages typically employ sequential presentation, where multiple images are shown first, followed by the target image (Andrews et al., 2015; Baker & Mondloch, 2019; Baker & Mondloch, 2023; Longmore et al., 2017; Menon et al., 2015a; Murphy et al., 2015; Sandford & Ritchie, 2021). For example, Baker and Mondloch (2023) used an old/new face task paradigm and found that discriminability for target faces improved as the number of images presented during the study phase increased. Studies failing to find multiple-image advantages often use simultaneous presentation of multiple images and target faces (Kramer & Reynolds, 2018; Ritchie et al., 2020; Ritchie et al., 2021). For instance, Ritchie et al. (2020) employed a live face-matching task and found that presenting multiple face images did not improve accuracy compared to presenting a single image.

Differing from these two lines of research, Sandford and Ritchie (2021) examined both tasks within the same study. In the simultaneous face-matching task, the study face image set (containing one, two, or three images of the same person) and a target face image were presented concurrently. In the sequential face-matching task, the study face image set was presented for five seconds, followed by the target face image alone. Participants judged whether the study set and target face depicted the same person. Results showed that in the simultaneous task, the number of images in the study set did not affect discriminability. In contrast, discriminability in the sequential task gradually improved as the number of study images increased, demonstrating a multiple-image advantage.

Some researchers argue that simultaneous versus sequential presentation of study and target faces affects the degree of memory involvement in face processing (Menon et al., 2015a; Sandford & Ritchie, 2021; Ritchie et al., 2021; Baker et al., 2023). It is widely believed that when study and target faces are presented simultaneously, participants perform a perceptual task because they can scan back and forth between faces (Megreya et al., 2011; Menon et al., 2015b; Davis et al., 2021; Sandford & Ritchie, 2021; Ritchie et al., 2021). In sequential presentation, however, participants must rely on memory to retain the study faces before comparing them to the target (Honig et al., 2022; Dowsett et al., 2016; Matthews & Mondloch, 2022; Pitcher et al., 2023; Menon et al., 2015b; Mileva et al., 2021; Matthews et al., 2024). This speculation suggests that facial representation formation requires memory involvement.

Nevertheless, these results have not ruled out the role of image information quantity. For example, research has found that images with greater variability are more likely to produce multiple-image advantages than those with less variability (Baker et al., 2017; Menon et al., 2015a; Ritchie & Burton, 2017; Sandford & Ritchie, 2021), indicating that more information yields better matching performance. This raises the possibility that image information quantity and task difficulty interact to produce performance differences between sequential and simultaneous matching tasks. Specifically, because sequential matching tasks require participants to remember multiple images, they are more difficult, resulting in relatively lower performance that makes differences between single and multiple image conditions more apparent. In contrast, simultaneous matching tasks are less difficult, so performance is high for both single and multiple image conditions, with minimal differences approaching ceiling effects (see Kramer & Reynolds, 2018). Alternatively, in simultaneous tasks, increased information load in multiple-image conditions may cause cognitive overload, eliminating performance differences between single and multiple image conditions (see Mileva & Burton, 2019). In short, until the role of image information quantity is fully excluded, previous results cannot adequately support the hypothesis that the multiple-image advantage stems from facial representation formation.

In brief, the two hypotheses—"facial representation formation" and "increased image information quantity"—make different predictions regarding performance differences between simultaneous and sequential matching tasks under multiple-image conditions. The former predicts that multiple faces in sequential conditions will prompt individuals to form memorial representations of facial identity, which facilitate face matching, making sequential performance superior to simultaneous performance. Conversely, the latter predicts that sequential matching performance will be inferior to simultaneous performance. To test the first hypothesis and exclude the second, we conducted two experiments. Experiment 1, similar to Sandford and Ritchie (2021), required participants to judge whether a study face image set (containing one, two, or three images of the same person) matched a target face. Two tasks were employed: a simultaneous matching task where study and target faces appeared together, and a sequential matching task where the study set preceded the target face. Experiment 2 repeated both tasks with all faces inverted.

The theoretical rationale for using sequential matching (a memory-demanding task) and simultaneous matching (a perceptual matching task) to distinguish representation formation is supported by two lines of evidence. First, in previous research investigating the mechanisms of multiple-image advantage, sequential matching tasks (e.g., study-test paradigms) have been widely used as classic paradigms for studying representation formation. Studies by Jones et al. (2017), Menon et al. (2015a, 2018), and Baker et al. (2023) all found that sequentially presenting multiple images significantly improved identity discrimination performance, an advantage generally attributed to stable representation formation. These studies support the use of sequential tasks as an experimental manipulation for representation formation. Second, in studies directly comparing these two task paradigms, this manipulation has been adopted to establish representations in memory (see Sandford & Ritchie, 2021; Ritchie et al., 2021; Baker et al., 2023).

Based on the aforementioned hypotheses, four predictions can be made regarding the experimental results. (1) If the multiple-image advantage arises from memory facilitating facial representation formation, Experiment 1 should find that discriminability improves with image number only in the sequential face-matching task (i.e., multiple-image advantage emerges). (2) More importantly, if the multiple-image advantage results from more robust facial identity representations formed through memory, Experiment 1 should also find that when multiple faces are presented, discriminability in the sequential task will be higher than in the simultaneous task. (3) Conversely, if task difficulty in sequential presentation is the determining factor, both experiments should find that when multiple images are presented, discriminability in the sequential task will be lower than in the simultaneous task due to higher difficulty. (4) Given that face inversion disrupts facial representation formation (Leder & Bruce, 2000; Xu & Tanaka, 2013) and substantially increases face recognition difficulty (Freire et al., 2000; Yin, 1969), if the multiple-image advantage in sequential tasks stems from memory-formed facial representations, then with representations disrupted by inversion, Experiment 2 should find no multiple-image advantage in either task. If the multiple-image advantage stems from the interaction between image information quantity and task difficulty, then participants should show multiple-image advantages in simultaneous tasks when faces are inverted.

2.1.1 Participants

Eighty-one university students (30 male) with a mean age of 19.7 years (range 17–24, SD = 1.7) participated. All had normal or corrected-to-normal vision and were right-handed. Based on Sandford and Ritchie (2021), who reported a significant interaction between presentation method and image number with a partial eta-squared (PES) of 0.093 (converted to effect size f ≈ 0.32) under high-variability conditions, we used G*Power to calculate the minimum sample size for a repeated-measures ANOVA with power = 0.8 and effect size f = 0.32, which yielded N = 52. Considering that Sandford and Ritchie (2021) used 79 participants, we determined our sample size to be 81. The experiment was conducted under the guidance of the Ethics Committee of the School of Science at Zhejiang Sci-Tech University (Ethics Number: 202309P002). Participants provided informed consent before the experiment and received a small cash payment afterward.

2.1.2 Stimuli and Apparatus

We used 144 photographs of 36 Hong Kong celebrities (half male) with low mainland recognition as experimental materials (four photos per person), plus one foil image with similar appearance description from another identity for each celebrity, totaling 180 images. All materials were sourced from photos posted on social media platforms like Weibo (with posting times within three years). Photos varied in facial expression, head angle, and environmental conditions (e.g., lighting, camera characteristics). The 36 celebrities were divided into two sets, with each set randomly assigned to either the simultaneous or sequential task. Materials used in the simultaneous task did not appear in the sequential task, and material assignment across tasks was counterbalanced between participants.

All face photos were 200×280 pixels and displayed on a black background of 800×710 pixels. During the experiment, the target image always appeared at the top center, while study images appeared at the bottom. In 1-to-1 matching, the study image appeared at the bottom left; in 1-to-2 matching, comparison images appeared at the bottom left and center; in 1-to-3 matching, comparison images appeared at the left, center, and right positions. The gap between the left and right contours of study images was 100 pixels, and the gap between the target and the bottom-center study image was 150 pixels (see Figure 1 [FIGURE:1]).

Stimuli were presented on a 15.4-inch LCD screen with a 60 Hz refresh rate and 1366×768 pixel resolution. We used E-Prime 2.0 to control stimulus presentation and collect response data.

Figure 1. Example of face material layout (from left to right: Block 1 to Block 3)

The experiment employed a two-factor mixed design. Presentation method (simultaneous vs. sequential) was the between-subjects variable, and study image number (1, 2, or 3 images) was the within-subjects variable.

The experimental procedure followed Sandford and Ritchie (2021). Each task comprised three blocks, with each block containing 36 trials (half matching, half non-matching), for a total of 108 trials. Participants first completed a block of one-to-one matching between a target image and one study image, followed by one-to-two matching, and finally one-to-three matching. Participants could take unlimited rest between blocks, pressing a key to proceed to the next block.

Before each block, participants were informed that all study images appearing at the bottom of the screen depicted the same person, while the image at the top center was the target image.

In the simultaneous face-matching task (see Figure 2 [FIGURE:2]), each trial began with a white fixation cross "+" at the screen center (500 ms). Then, 1, 2, or 3 study images appeared at the bottom (determined by block), while the target image appeared at the top simultaneously. Participants judged whether the target and study faces depicted the same person. After responding, the next trial began until all trials in the block were completed.

In the sequential face-matching task (see Figure 3 [FIGURE:3]), each trial began with a white fixation cross "+" (500 ms). Then, 1, 2, or 3 study images appeared at the bottom left (5000 ms; number determined by block). After a brief blank screen (500 ms), the target image appeared at the top center. Participants judged whether the target and study faces depicted the same person. After responding, the next trial began until the block was completed.

Figure 2. Single-trial flowchart for simultaneous matching task (one-to-three matching example)
Figure 3. Single-trial flowchart for sequential matching task (three-to-one matching example)

After completing the identity-matching task, participants underwent a familiarity screening test to exclude any faces they might recognize. Specifically, we presented the names of the 36 Hong Kong celebrities used in the experimental materials. If participants could recall the corresponding celebrity's face from the name, the face was classified as familiar; otherwise, it was classified as unfamiliar. Only data for unfamiliar faces in the matching task were analyzed.

2.2 Results and Discussion

First, data for faces identified as familiar in the familiarity screening were excluded, resulting in removal of 6 trials (0.06% of total trials). Second, data from three participants with negative discriminability indices were excluded. Thus, 78 participants were included in the final analysis. We then conducted repeated-measures ANOVAs on discriminability (d'), criterion (c), and reaction time (RT). Participant results are shown in Table 1 [TABLE:1].

Table 1. Descriptive statistics for Experiment 1A (M±SE)

A 2 (presentation method: simultaneous vs. sequential) × 3 (study image number: 1, 2, or 3) repeated-measures ANOVA on discriminability revealed a significant main effect of image number, F(2, 127) = 13.41, p < 0.001, ηp² = 0.15. LSD post-hoc tests showed that discriminability for 3 study images (M₃ = 1.59, SE₃ = 0.07) was higher than for 1 study image (M₁ = 1.25, SE₁ = 0.05, p < 0.001), but did not differ significantly from 2 study images (M₂ = 1.49, SE₂ = 0.06, p = 0.100). Discriminability for 2 study images was higher than for 1 study image (p < 0.001). The main effect of task type was not significant, F(1, 76) = 1.51, p = 0.223, ηp² = 0.02. The interaction was significant, F(2, 127) = 3.33, p = 0.048, ηp² = 0.05.

Simple effects analysis revealed that in the sequential matching task, discriminability for 3 study images (M₃ = 1.74, SE₃ = 0.10) was higher than for 2 images (M₂ = 1.53, SE₂ = 0.09, p = 0.010) and 1 image (M₁ = 1.23, SE₁ = 0.08, p < 0.001). Discriminability for 2 images was higher than for 1 image (p = 0.001). In the simultaneous matching task, discriminability for 2 study images (M₂ = 1.45, SE₂ = 0.09) was higher than for 1 image (M₁ = 1.26, SE₁ = 0.08, p = 0.038). These results indicate that participants' discriminability in the sequential matching task improved as the number of presented images increased, demonstrating a multiple-image advantage.

More importantly, further analysis of discriminability across presentation methods at different image numbers revealed that when 3 study images were presented, discriminability in the sequential task (M_seq = 1.74, SE_seq = 0.10) was higher than in the simultaneous task (M_sim = 1.43, SE_sim = 0.10, p = 0.031). When 2 study images were presented, discriminability did not differ between sequential (M_seq = 1.53, SE_seq = 0.09) and simultaneous (M_sim = 1.45, SE_sim = 0.09) tasks (p = 0.518). Similarly, with 1 study image, discriminability did not differ between sequential (M_seq = 1.23, SE_seq = 0.08) and simultaneous (M_sim = 1.26, SE_sim = 0.08) tasks (p = 0.756) (see Figure 4 [FIGURE:4]).

Figure 4. Discriminability across presentation methods for each image number condition

A 2 (presentation method) × 3 (study image number) repeated-measures ANOVA on criterion (c) revealed a significant main effect of image number, F(2, 134) = 55.79, p < 0.001, ηp² = 0.42. LSD post-hoc tests showed that the criterion for 3 study images (M₃ = -0.64, SE₃ = 0.05) was lower (more liberal) than for 2 images (M₂ = -0.47, SE₂ = 0.04, p = 0.001) and 1 image (M₁ = -0.22, SE₁ = 0.04, p < 0.001). The criterion for 2 images was lower than for 1 image (p < 0.001). These results indicate that as more study images were presented, participants adopted increasingly liberal criteria, responding more liberally. The main effect of task type was marginally significant, F(1, 76) = 3.04, p = 0.085, ηp² = 0.04, with a more liberal criterion in the sequential task (M_seq = -0.51, SE_seq = 0.05) than in the simultaneous task (M_sim = -0.38, SE_sim = 0.06). The interaction was not significant, F(2, 134) = 2.12, p = 0.130, ηp² = 0.03.

A 2 (presentation method) × 3 (study image number) repeated-measures ANOVA on reaction time revealed a significant main effect of image number, F(2, 118) = 6.03, p = 0.006, ηp² = 0.07. LSD post-hoc tests showed that RT for 3 study images (M₃ = 4419 ms, SE₃ = 89 ms) was shorter than for 2 images (M₂ = 4688 ms, SE₂ = 116 ms, p < 0.001) and 1 image (M₁ = 4654 ms, SE₁ = 131 ms, p = 0.022). RT did not differ between 2 and 1 image conditions (p = 0.700). The main effect of task type was significant, F(1, 76) = 324, p < 0.001, ηp² = 0.81, with longer RTs in the sequential task (M_seq = 6429 ms, SE_seq = 143 ms) than in the simultaneous task (M_sim = 2745 ms, SE_sim = 147 ms). The interaction was not significant, F(2, 118) = 0.99, p = 0.356, ηp² = 0.01.

In summary, Experiment 1A found that discriminability improved with image number only in the sequential matching task (multiple-image advantage). Crucially, when three face images were presented, discriminability was higher in the sequential than in the simultaneous condition. These results support the hypothesis that the multiple-image advantage is based on facial representation formation rather than increased image information quantity.

A limitation of Experiment 1A was that the average RT in the simultaneous task was below 3000 ms, meaning participants viewed the study images for less than 3000 ms, whereas study images in the sequential task were presented for 5000 ms. Thus, the study could not rule out the possibility that the performance advantage in the sequential task resulted from differences in study image viewing time. To exclude this possibility, Experiment 1B controlled study image presentation duration at 5000 ms and made presentation method a within-subjects variable, using 3 study images.

3.1.1 Participants

Forty-four university students (20 male) with a mean age of 20.8 years (range 17–34, SD = 3.3) participated. All had normal or corrected-to-normal vision and were right-handed. Using G*Power, we calculated that a minimum sample size of N = 44 was required for a paired-samples t-test with power = 0.9 and effect size d = 0.5. All participants volunteered, provided informed consent before the experiment, and received compensation afterward.

3.1.2 Stimuli and Apparatus

This experiment used the same stimuli and apparatus as Experiment 1A. The 36 individual face materials were randomly divided into two groups (n = 18 each), with different groups used for simultaneous and sequential matching tasks. Material assignment across tasks was counterbalanced between participants.

The experiment employed a block design comprising two blocks: simultaneous matching and sequential matching, with task order randomized. Each task included 36 trials (half matching, half non-matching), for 72 trials total. Each task featured 18 different individuals (half male, half female). Before each task, the experimenter informed participants that the image at the top of the screen was the target and the three images below were study images of the same person.

The simultaneous matching task procedure differed from Experiment 1A: participants could not respond during the first five seconds of target and study image presentation (see Figure 5 [FIGURE:5]). After five seconds, participants made a keypress response and proceeded to the next trial until the block was completed.

The sequential matching task procedure was identical to Experiment 1A (see Figure 3).

Figure 5. Single-trial flowchart for simultaneous matching task (Experiment 1B)

3.2 Results and Discussion

First, data were screened based on familiarity; no data were excluded in this step. Second, data from one participant with a negative discriminability index were excluded. Third, data from one participant whose RT in the simultaneous task exceeded three standard deviations were excluded. We then conducted paired-samples t-tests on discriminability and criterion (c). Participant results are shown in Table 2 [TABLE:2].

Table 2. Descriptive statistics for Experiment 1B (M±SE)

A paired-samples t-test on discriminability revealed that discriminability in the sequential face-matching task (M_seq = 1.92, SE_seq = 0.11) was higher than in the simultaneous face-matching task (M_sim = 1.68, SE_sim = 0.08), t(41) = 2.13, p = 0.039, Cohen's d = 0.33. These results indicate that in the sequential presentation condition, multiple images prompted participants to form memorial representations of facial identity, which facilitated face matching.

A paired-samples t-test on criterion revealed that the criterion in the sequential task (M_seq = -0.32, SE_seq = 0.06) was higher (less liberal) than in the simultaneous task (M_sim = -0.48, SE_sim = 0.06), t(41) = 3.21, p = 0.003, Cohen's d = 0.49.

In summary, after controlling for face presentation duration, we still found that when multiple face images were presented, discriminability was higher in the sequential than in the simultaneous condition. This finding rules out the possibility that the performance advantage in sequential matching resulted from differences in study image viewing time, serving as a complement to Experiment 1A. Combined, the results of both experiments further exclude the possibility that the multiple-image advantage arises from increased information quantity.

4. Experiment 2: Effects of Image Number on Simultaneous and Sequential Matching Performance for Inverted Faces

Previous research indicates that facial representation depends on the integration of configural and featural information (Itier et al., 2007), and face inversion disrupts normal processing and integration of these information types (Tanaka et al., 2014), thereby affecting facial representation formation. Based on this, if we invert face images to disrupt representation formation, the multiple-image advantage in sequential presentation should disappear, as should the discriminability advantage of sequential over simultaneous conditions.

4.1.1 Participants

Forty-six university students (18 female) with a mean age of 19.7 years (range 18–26, SD = 1.8) participated. All had normal or corrected-to-normal vision and were right-handed. Using G*Power, we calculated that a minimum sample size of N = 46 was required for a repeated-measures ANOVA with power = 0.9 and medium effect size F = 0.25. All participants volunteered, provided informed consent before the experiment, and received compensation afterward.

The experiment employed a two-factor mixed design. Presentation method (simultaneous vs. sequential) was the between-subjects variable, and study image number (1 vs. 3 images) was the within-subjects variable.

Before the experiment, participants were randomly assigned to either the simultaneous or sequential face-matching task. Both tasks comprised two blocks, each containing 36 trials (half matching, half non-matching), for 72 trials total. Participants first completed a block of one-to-one matching, followed by one-to-three matching, with different individuals used across blocks. The trial procedure was identical to Experiment 1B. The simultaneous face-matching task is shown in Figure 6 [FIGURE:6]; the sequential task is shown in Figure 7 [FIGURE:7].

Figure 6. Single-trial flowchart for simultaneous matching task (inverted)
Figure 7. Single-trial flowchart for sequential matching task (inverted)

4.2 Results and Discussion

First, data were screened based on familiarity; no data were excluded. Second, RTs exceeding three standard deviations were excluded, removing 62 data points (1.87% of total data). Third, data from four participants with negative discriminability indices were excluded. We then conducted repeated-measures ANOVAs on discriminability and criterion (c). Participant results are shown in Table 3 [TABLE:3].

Table 3. Descriptive statistics for Experiment 2 (M±SE)

A two-factor repeated-measures ANOVA on discriminability revealed a significant main effect of image number, F(1, 40) = 5.35, p = 0.026, ηp² = 0.12. LSD post-hoc tests showed that discriminability for 3 study images (M₃ = 0.95, SE₃ = 0.06) was higher than for 1 study image (M₁ = 0.79, SE₁ = 0.06). The main effect of presentation method was significant, F(1, 40) = 6.91, p = 0.012, ηp² = 0.15. LSD post-hoc tests showed that discriminability in the simultaneous task (M_sim = 0.10, SE_sim = 0.07) was higher than in the sequential task (M_seq = 0.74, SE_seq = 0.07). The interaction was not significant, F(1, 40) = 0.03, p = 0.875, ηp² = 0.001. These results indicate that when faces were inverted, discriminability in the simultaneous task was higher than in the sequential task regardless of whether study images were single or multiple, and no multiple-image advantage appeared in either task.

Figure 8. Discriminability indices across presentation methods for different image numbers

A repeated-measures ANOVA on criterion revealed a significant main effect of image number, F(1, 40) = 33.67, p < 0.001, ηp² = 0.43. LSD post-hoc tests showed that the criterion for 3 study images (M₃ = -0.70, SE₃ = 0.06) was lower (more liberal) than for 1 study image (M₁ = -0.40, SE₁ = 0.04, p < 0.001). The main effect of task type was significant, F(1, 40) = 5.95, p = 0.019, ηp² = 0.13, with a lower criterion in the simultaneous task (M_sim = -0.67, SE_sim = 0.07) than in the sequential task (M_seq = -0.44, SE_seq = 0.07). The interaction was not significant, F(1, 40) = 0.30, p = 0.585, ηp² = 0.007. These results indicate that when faces were inverted, more presented images led to increasingly liberal criteria (more liberal responding), and participants were more liberal in the simultaneous task.

In summary, when faces were presented inverted, no multiple-image advantage appeared in either simultaneous or sequential face-matching tasks. Moreover, regardless of whether single or multiple face images were presented, discriminability was higher in the simultaneous than in the sequential task. These results indicate that face inversion disrupted facial representation formation, preventing observation of the multiple-image advantage. The findings support the hypothesis that the multiple-image advantage is based on facial representation formation rather than simply increased image information quantity.

This study examined people's ability to identify faces from 1 to 3 images in simultaneous and sequential matching tasks, yielding four key findings. (1) Discriminability improved with image number only in the sequential face-matching task (multiple-image advantage). (2) When multiple face images were presented, discriminability was higher in the sequential than in the simultaneous condition. (3) When faces were inverted, discriminability in the sequential task was lower than in the simultaneous task regardless of whether study images were single or multiple. (4) When faces were inverted, no multiple-image advantage appeared in either task.

The first finding—that discriminability improved with image number only in the sequential face-matching task—replicates Sandford and Ritchie (2021). Experiments 1A and 2 in our study correspond to the task paradigms used by Sandford and Ritchie (2021). All experimental procedures and stimulus layouts were essentially identical. Although we used face images of Hong Kong celebrities from social media and our participants were of a different ethnicity, and neither study strictly controlled image variability (lighting, angle, expression), we consistently found a multiple-image advantage in sequential face-matching tasks. Ritchie et al. (2021) argued that simultaneous and sequential face-matching tasks differ in nature, leading to different processing of multiple images across tasks. Sequential tasks separate the multiple-image set from the target, requiring participants to remember study faces to complete the task, placing high demands on memory. Simultaneous tasks present multiple images and targets together, allowing participants to scan differences repeatedly, relying more on perceptual processes. Menon et al. (2015b) similarly argued that when tasks involve memory components, participants abstract stable representations from multiple face images, thereby improving recognition performance. This view is supported by other face recognition studies involving memory (Bindemann & Sandford, 2011; Mileva & Burton, 2019; Mileva et al., 2021). For example, Mileva et al. (2021) used a face search task where one or four face images of the same individual were presented simultaneously along with CCTV footage that might contain the target face. Participants had to determine whether the target appeared in the footage. Results showed that presenting four face images improved discriminability for unfamiliar face recognition. In this study, although multiple face images and target video were presented simultaneously, the search task itself required memory involvement, revealing a multiple-image advantage. Together, these results suggest that the multiple-image advantage is based on robust representation formation that requires memory for construction and storage.

The second finding—that discriminability was higher in sequential than simultaneous conditions when multiple face images were presented—appears counterintuitive. In object recognition, simultaneous presentation typically provides more information for identification and judgment, whereas sequential presentation requires participants to extract and memorize information within a limited time, inevitably causing some information loss. However, in this study, this information loss did not impair performance but instead improved discriminability between different faces. Burton et al. (2005) proposed that when learning new faces, people form an abstract average representation from multiple images of the same individual. This representation retains identity-relevant information while eliminating surface effects from lighting, photography equipment, and variations from emotion or health status. This aligns with Devue and de Sena's (2023) cost-efficient theory, which posits that people assign higher representational weight to stable facial features. That is, people prioritize encoding facial features that are identity-relevant and stable across images, while discarding surface information that varies across images and is irrelevant to identity (image variability). This neglected image variability is often considered a major cause of poor unfamiliar face recognition performance (Bruce et al., 1999; Megreya & Burton, 2006, 2008; Kramer & Ritchie, 2016). This explains the improved discriminability in sequential tasks. However, this processing was only observed in memory-involved sequential face-matching tasks, indicating that representation formation from multiple images depends on memory involvement. Li and Li (2018) provided further theoretical explanation: when participants form representations based on cues, they must retrieve relevant information from memory to construct and store representations. Representation precision decreases due to decay or interference during memory maintenance. Thus, people's construction and storage of abstract representations from multiple face images in memory include cross-image stable identity information (Peng et al., 2019) while discarding identity-irrelevant image interference, a process that facilitates face recognition.

The third finding—that discriminability was lower in sequential than simultaneous tasks when faces were inverted—aligns with general object recognition patterns. First, in general object recognition, simultaneous tasks allow repeated scanning of different stimuli, providing more information for identification. Sequential tasks require observation and memory within a limited time, causing information loss. Therefore, simultaneous tasks confer processing advantages for general object recognition, and Experiment 2's results support this explanation for inverted faces. Second, face inversion disrupts holistic processing and representation of faces, placing simultaneous and sequential tasks on more "equal footing" and highlighting the inherently greater difficulty of sequential tasks. This finding also suggests that previous research and theoretical accounts may have overlooked the potential influence of task difficulty differences when interpreting performance differences between tasks, thereby underscoring the theoretical necessity of testing and excluding the "image information quantity and task difficulty interaction" hypothesis in this study.

The fourth finding—that no multiple-image advantage appeared in either task when faces were inverted—fully aligns with the a priori predictions of the "memory facilitates representation" hypothesis and demonstrates that the "image information quantity" variable did not significantly affect either task when images were inverted. Thus, these results suggest that both facial representation formation and face image perceptual processing depend on upright face presentation. This aligns with previous research and perspectives. Due to extensive experience perceiving and remembering upright faces, people develop expertise in upright face recognition that is strongly orientation-dependent. Combined with the above points, we can conclude that participants' construction of facial representations from multiple images in sequential tasks is a cognitive process based on face expertise.

Additionally, this study found that as the number of faces in the presented image set increased, participants' criteria became more liberal, consistent with previous research (Matthews & Mondloch, 2018; Menon & Kemp, 2015b; Ritchie et al., 2021; Sandford & Ritchie, 2021). Tanaka et al. (1998) argued that a single facial representation can be activated by multiple face inputs. This many-to-one mapping of stimulus inputs to face memory is called an attractor field. When a face image falls within a specific attractor field, it is categorized as the corresponding individual. As facial representations integrate more images, the attractor field for that individual expands, leading to higher face recognition accuracy and more "same" responses (Menon et al., 2015a). This partly explains our finding that participants had more liberal criteria and higher accuracy in sequential tasks. On the other hand, this criterion bias may relate to cognitive load. In a study by Mileva et al. (2019), participants observed face videos at the top of the screen and searched for matching faces in CCTV footage at the bottom. Results showed poor recognition performance, which researchers attributed to cognitive overload from simultaneously presented information, preventing participants from utilizing additional information. This may lead participants to bias toward "same" judgments (Ritchie et al., 2021). This account can explain Experiment 2's results: when faces were inverted, task difficulty increased, and simultaneously presenting multiple face images caused cognitive overload, leading participants to bias toward "same" judgments.

6. Limitations and Future Directions

First, this study only examined a limited range of image numbers in relation to facial representation formation. In reality, people can benefit from many more face images. Current research shows that when the number of study images is small (1 to 4), face recognition performance improves markedly as image number increases (Bindemann & Sandford, 2011; White et al., 2014; Menon et al., 2015a, 2015b; Mileva & Burton, 2019; Ritchie et al., 2021; Sandford & Ritchie, 2021). However, when study image number reaches a certain level (5 or more), performance changes little with further increases. We therefore speculate that there exists a "minimum number of images for robust facial representation formation." Future research could examine the effects of larger image numbers or combine computational modeling to investigate this question.

Second, this study used a fixed presentation time of 5 seconds, preventing us from knowing whether facial representation would form under shorter durations. Previous research suggests that representation formation is extremely rapid (White et al., 2014; Dunn et al., 2018). For example, White et al. (2014) used a face-matching paradigm and found that varying the time from image presentation to when participants could respond (3, 6, or 9 seconds) did not affect recognition performance. Thus, representation formation for four face images may occur within 3 seconds. Future research should shorten image presentation time to explore the time course of facial memory representation formation.

Finally, normal face familiarization occurs over much longer timescales (Wang Zhe et al., 2023), whereas this study focused on the process of facial representation formation from scratch. How these initial representations transform into robust familiar face representations remains unclear. Future research should examine the roles of image number, image variability, and other factors in this process over longer timescales. Additionally, research by Baker and Mondloch (2023) suggests that individuals' ability to benefit from multiple images relates to their own face recognition ability. Future studies should incorporate participants' face recognition ability as a variable.

This study found that participants showed a multiple-image advantage only in sequential presentation conditions; when discriminating multiple images, performance was superior in sequential versus simultaneous conditions. This indicates that the multiple-image advantage in face identity discrimination is based on facial representation formation rather than simply increased image information quantity. Furthermore, when faces were inverted, participants showed no multiple-image advantage, and simultaneous presentation yielded superior performance to sequential presentation. This indicates that when facial representation formation is disrupted, the multiple-image advantage disappears. These findings reveal that facial representation formation depends on memory processes, providing a new perspective for understanding the cognitive mechanisms of face identity recognition and informing future research.

References

Andrews, S., Jenkins, R., Cursiter, H., & Burton, A. M. (2015). Telling faces together: Learning new faces through exposure to multiple instances. Quarterly Journal of Experimental Psychology, 68(10), 2041−2050.

Baker, K. A., & Mondloch, C. J. (2019). Two sides of face learning: Improving between−identity discrimination while tolerating more within−person variability in appearance. Perception, 48(11), 1124−1145.

Baker, K. A., & Mondloch, C. J. (2023). Unfamiliar face matching ability predicts the slope of face learning. Scientific Reports, 13(1), 5248.

Baker, K. A., Laurence, S., & Mondloch, C. J. (2017). How does a newly encountered face become familiar? The effect of within−person variability on adults’ and children’s perception of identity. Cognition, 161, 19−30.

Baker, K. A., Stabile, V. J., & Mondloch, C. J. (2023). Stable individual differences in unfamiliar face identification: Evidence from simultaneous and sequential matching tasks. Cognition, 232, 105333.

Bindemann, M., & Sandford, A. (2011). Me, myself, and I: Different recognition rates for three photo−IDs of the same person. Perception, 40(5), 625−627.

Bruce, V., Henderson, Z., Greenwood, K., Hancock, P. J., Burton, A. M., & Miller, P. (1999). Verification of face identities from images captured on video. Journal of Experimental Psychology: Applied, 5(4), 339−360.

Burton, A. M., Jenkins, R., Hancock, P. J., & White, D. (2005). Robust representations for face recognition: The power of averages. Cognitive Psychology, 51(3), 256−284.

Davis, E. E., Matthews, C. M., & Mondloch, C. J. (2021). Ensemble coding of facial identity is not refined by experience: Evidence from other‐race and inverted faces. British Journal of Psychology, 112(1), 265−281.

Devue, C., & de Sena, S. (2023). The impact of stability in appearance on the development of facial representations. Cognition, 239, 105569.

Dowsett, A. J., Sandford, A., & Burton, A. M. (2016). Face learning with multiple images leads to fast acquisition of familiarity for specific individuals. Quarterly Journal of Experimental Psychology, 69(1), 1−10.

Dunn, J. D., Kemp, R. I., & White, D. (2018). Search templates that incorporate within−face variation improve visual search for faces. Cognitive Research: Principles and Implications, 3(1), 37.

Honig, T., Shoham, A., & Yovel, G. (2022). Perceptual similarity modulates effects of learning from variability on face recognition. Vision Research, 201, 108128.

Itier, R. J., Alain, C., Sedore, K., & McIntosh, A. R. (2007). Early face processing specificity: it's in the eyes! Journal of Cognitive Neuroscience, 19(11), 1815−1826.

Jones, S. P., Dwyer, D. M., & Lewis, M. B. (2017). The utility of multiple synthesized views in the recognition of unfamiliar faces. Quarterly Journal of Experimental Psychology, 70(5), 906-918.

Kramer, R. S., & Reynolds, M. G. (2018). Unfamiliar face matching with frontal and profile views. Perception, 47(4), 414−431.

Kramer, R. S., & Ritchie, K. L. (2016). Disguising superman: How glasses affect unfamiliar face matching. Applied Cognitive Psychology, 30(6), 841−845.

Kramer, R. S., Hardy, S. C., & Ritchie, K. L. (2020). Searching for faces in crowd chokepoint videos. Applied Cognitive Psychology, 34(2), 343-356.

Leder, H., & Bruce, V. (2000). When inverted faces are recognized: The role of configural information in face recognition. The Quarterly Journal of Experimental Psychology: Section A, 53(2), 513−536.

李筱梅,李海峰. (2018). 从表征和认知过程上看表象与知觉、记忆的关系. 心理科学, 41(03), 520−525.

Longmore, C. A., Santos, I. M., Silva, C. F., Hall, A., Faloyin, D., & Little, E. (2017). Image dependency in the recognition of newly learnt faces. Quarterly Journal of Experimental Psychology, 70(5), 863−873.

Matthews, C. M., & Mondloch, C. J. (2018). Finding an unfamiliar face in a line‐up: Viewing multiple images of the target is beneficial on target‐present trials but costly on target‐absent trials. British Journal of Psychology, 109(4), 758−776.

Matthews, C. M., & Mondloch, C. J. (2022). Learning faces from variability: Four-and five-year-olds differ from older children and adults. Journal of Experimental Child Psychology, 213, 105259.

Matthews, C. M., Ritchie, K. L., Laurence, S., & Mondloch, C. J. (2024). Multiple images captured from a single encounter do not promote face learning. Perception, 53(5-6), 299-316.

Megreya, A. M., & Burton, A. M. (2006). Unfamiliar faces are not faces: Evidence from a matching task. Memory & Cognition, 34, 865−876.

Megreya, A. M., & Burton, A. M. (2008). Matching faces to photographs: poor performance in eyewitness memory (without the memory). Journal of Experimental Psychology: Applied, 14(4), 364−372.

Megreya, A. M., White, D., & Burton, A. M. (2011). The other-race effect does not rely on memory: Evidence from a matching task. Quarterly journal of experimental psychology, 64(8), 1473-1483.

Menon, N., Kemp, R. I., & White, D. (2018). More than a sum of parts: robust face recognition by integrating variation. Royal Society Open Science, 5(5), 172381.

Menon, N., White, D., & Kemp, R. I. (2015a). Variation in photos of the same face drives improvements in identity verification. Perception, 44(11), 1332−1341.

Menon, N., White, D., & Kemp, R. I. (2015b). Identity−level representations affect unfamiliar face matching performance in sequential but not simultaneous tasks. Quarterly Journal of Experimental Psychology, 68(9), 1841−1852.

Mileva, M., & Burton, A. M. (2019). Face search in CCTV surveillance. Cognitive Research: Principles and Implications, 4, 37.

Mileva, V. R., Hancock, P. J. B., & Langton, S. R. H. (2021). Visual search performance in ‘CCTV’ and mobile phone−like video footage. Cognitive Research: Principles and Implications, 6, 63.

Murphy, J., Ipser, A., Gaigg, S. B., & Cook, R. (2015). Exemplar variance supports robust learning of facial identity. Journal of Experimental Psychology: Human Perception and Performance, 41(3), 577–581.

Peng, S., Kuang, B., & Hu, P. (2019). Memory of ensemble representation was independent of attention. Frontiers in Psychology, 10, 228.

Pitcher, D., Caulfield, R., & Burton, A. M. (2023). Provoked overt recognition in acquired prosopagnosia using multiple different images of famous faces. Cognitive Neuropsychology, 40(3-4), 158-166.

Ritchie, K. L., & Burton, A. M. (2017). Learning faces from variability. Quarterly Journal of Experimental Psychology, 70(5), 897−905.

Ritchie, K. L., Kramer, R. S., Mileva, M., Sandford, A., & Burton, A. M. (2021). Multiple−image arrays in face matching tasks with and without memory. Cognition, 211, 104632.

Ritchie, K. L., Mireku, M. O., & Kramer, R. S. (2020). Face averages and multiple images in a live matching task. British Journal of Psychology, 111(1), 92−102.

Sandford, A., & Ritchie, K. L. (2021). Unfamiliar face matching, within−person variability, and multiple−image arrays. Visual Cognition, 29(3), 143−157.

Tanaka, J., Giles, M., Kremen, S., & Simon, V. (1998). Mapping attractor fields in face space: the atypicality bias in face recognition. Cognition, 68(3), 199-220.

Tanaka, J. W., Kaiser, M. D., Hagen, S., & Pierce, L. J. (2014). Losing face: impaired discrimination of featural and configural information in the mouth region of an inverted face. Attention, Perception, & Psychophysics, 76, 1000−1014.

WANG Zhe, NI Hao, FENG Dan, YAN Linlin, SUN Yu-Hao P. (2023). Regional asynchrony and eye region-specificity in part-based processing and holistic processing during face familiarization. Acta Psychologica Sinica, 55(6), 861-876. (王哲, 倪昊, 封丹, 严璘璘, 孙宇浩. (2023). 面孔熟悉过程中部件加工与整体加工的区域异步性和眼睛区域特异性. 心理学报, 55(6), 861-876.)

White, D., Burton, A. M., Jenkins, R., & Kemp, R. I. (2014). Redesigning photo−ID to improve unfamiliar face matching performance. Journal of Experimental Psychology: Applied, 20(2), 166−173.

Xu, B., & Tanaka, J. W. (2013). Does face inversion qualitatively change face processing: An eye movement study using a face change detection task. Journal of Vision, 13(2), 22−22.

Yin, R. K. (1969). Looking at upside−down faces. Journal of Experimental Psychology, 81(1), 141−145.

Young, A. W., & Burton, A. M. (2018). Are we face experts? Trends in Cognitive Sciences, 22(2), 100−110.

Submission history

Multi-Image Advantage in Face Identity Matching Depends on the Formation of Face Representations