摘要 :
In three experiments, we investigated whether the emotional valence of a photograph influenced the amount of time required to initially identify the contents of the image. In Experiment 1, participants saw a slideshow consisting o...
展开
In three experiments, we investigated whether the emotional valence of a photograph influenced the amount of time required to initially identify the contents of the image. In Experiment 1, participants saw a slideshow consisting of positive, neutral, and negative photographs that were balanced for arousal. During the slideshow, presentation time was substantially limited (60 ms), and the images were followed by masks. Immediately following the slideshows, participants were given a recognition memory test. Memory performance was best for positive images and worst for negative images (Experiment 1). In Experiment 2, two simultaneous photographs were briefly presented and masked. On a trial-by-trial basis, participants indicated whether the two images were identical or not, thus removing the need for memory storage and retrieval. Again, performance was worst for negative images. The results of Experiment 3 suggested that these valence-based differences were not related attentional effects (Experiment 3). We argue that the valence of an image is detected rapidly and, in the case of negative images, interferes with processing the identity of the scene.
收起
摘要 :
When amplification is used, sound sources are often presented over multiple loudspeakers, which can alter their timbre, and introduce comb-filtering effects. Increasing the diffuseness of a sound by presenting it over spatially se...
展开
When amplification is used, sound sources are often presented over multiple loudspeakers, which can alter their timbre, and introduce comb-filtering effects. Increasing the diffuseness of a sound by presenting it over spatially separated loudspeakers might affect the listeners' ability to form a coherent auditory image of it, alter its perceived spatial position, and may even affect the extent to which it competes for the listener's attention. In addition, it can lead to comb-filtering effects that can alter the spectral profiles of sounds arriving at the ears. It is important to understand how these changes affect speech perception. In this study, young adults were asked to repeat nonsense sentences presented in either noise, babble, or speech. Participants were divided into two groups: (1) A Compact-Target Timbre group where the target sentences were presented over a single loudspeaker (compact target), while the masker was either presented over three loudspeakers (diffuse) or over a single loudspeaker (compact); (2) A Diffuse-Target Timbre group, where the target sentences were diffuse while the masker was either compact or diffuse. Timbre had no significant effect in the absence of a timbre contrast between target and masker. However, when there was a timbre contrast, the signal-to-noise ratios needed for 50% correct recognition of the target speech were higher (worse) when the masker was compact, and lower (better) when the target was compact. These results were consistent with the expected effects from comb filtering, and could also reflect a tendency for attention to be drawn towards compact sound sources.
收起
摘要 :
Drivers rarely focus exclusively on driving, even with the best of intentions. They are distracted by passengers, navigation systems, smartphones, and driver assistance systems. Driving itself requires performing simultaneous task...
展开
Drivers rarely focus exclusively on driving, even with the best of intentions. They are distracted by passengers, navigation systems, smartphones, and driver assistance systems. Driving itself requires performing simultaneous tasks, including lane keeping, looking for signs, and avoiding pedestrians. The dangers of multitasking while driving, and efforts to combat it, often focus on the distraction itself, rather than on how a distracting task can change what the driver can perceive. Critically, some distracting tasks require the driver to look away from the road, which forces the driver to use peripheral vision to detect driving-relevant events. As a consequence, both looking away and being distracted may degrade driving performance. To assess the relative contributions of these factors, we conducted a laboratory experiment in which we separately varied cognitive load and point of gaze. Subjects performed a visual 0-back or 1-back task at one of four fixation locations superimposed on a real-world driving video, while simultaneously monitoring for brake lights in their lane of travel. Subjects were able to detect brake lights in all conditions, but once the eccentricity of the brake lights increased, they responded more slowly and missed more braking events. However, our cognitive load manipulation had minimal effects on detection performance, reaction times, or miss rates for brake lights. These results suggest that, for tasks that require the driver to look off-road, the decrements observed may be due to the need to use peripheral vision to monitor the road, rather than due to the distraction itself.
收起
摘要 :
Increasing numbers of studies have explored human observers' ability to rapidly extract statistical descriptions from collections of similar items (e. g., the average size and orientation of a group of tilted Gabor patches). Deter...
展开
Increasing numbers of studies have explored human observers' ability to rapidly extract statistical descriptions from collections of similar items (e. g., the average size and orientation of a group of tilted Gabor patches). Determining whether these descriptions are generated by mechanisms that are independent from object-based sampling procedures requires that we investigate how internal noise, external noise, and sampling affect subjects' performance. Here we systematically manipulated the external variability of ensembles and used variance summation modeling to estimate both the internal noise and the number of samples that affected the representation of ensemble average size. The results suggest that humans sample many more than one or two items from an array when forming an estimate of the average size, and that the internal noise that affects ensemble processing is lower than the noise that affects the processing of single objects. These results are discussed in light of other recent modeling efforts and suggest that ensemble processing of average size relies on a mechanism that is distinct from segmenting individual items. This ensemble process may be more similar to texture processing.
收起
摘要 :
Increasing numbers of studies have explored human observers' ability to rapidly extract statistical descriptions from collections of similar items (e.g., the average size and orientation of a group of tilted Gabor patches). Determ...
展开
Increasing numbers of studies have explored human observers' ability to rapidly extract statistical descriptions from collections of similar items (e.g., the average size and orientation of a group of tilted Gabor patches). Determining whether these descriptions are generated by mechanisms that are independent from object-based sampling procedures requires that we investigate how internal noise, external noise, and sampling affect subjects' performance. Here we systematically manipulated the external variability of ensembles and used variance summation modeling to estimate both the internal noise and the number of samples that affected the representation of ensemble average size. The results suggest that humans sample many more than one or two items from an array when forming an estimate of the average size, and that the internal noise that affects ensemble processing is lower than the noise that affects the processing of single objects. These results are discussed in light of other recent modeling efforts and suggest that ensemble processing of average size relies on a mechanism that is distinct from segmenting individual items. This ensemble process may be more similar to texture processing.
收起
摘要 :
Recent work studying the temporal dynamics of visual scene processing (Harel et al., 2016) has found that global scene properties (GSPs) modulate the amplitude of early Event-Related Potentials (ERPs). It is still not clear, howev...
展开
Recent work studying the temporal dynamics of visual scene processing (Harel et al., 2016) has found that global scene properties (GSPs) modulate the amplitude of early Event-Related Potentials (ERPs). It is still not clear, however, to what extent the processing of these GSPs is influenced by their behavioral relevance, determined by the goals of the observer. To address this question, we investigated how behavioral relevance, operationalized by the task context impacts the electrophysiological responses to GSPs. In a set of two experiments we recorded ERPs while participants viewed images of real-world scenes, varying along two GSPs, naturalness (manmade/natural) and spatial expanse (open/closed). In Experiment 1, very little attention to scene content was required as participants viewed the scenes while performing an orthogonal fixation-cross task. In Experiment 2 participants saw the same scenes but now had to actively categorize them, based either on their naturalness or spatial expense. We found that task context had very little impact on the early ERP responses to the naturalness and spatial expanse of the scenes: P1, N1, and P2 could distinguish between open and closed scenes and between manmade and natural scenes across both experiments. Further, the specific effects of naturalness and spatial expanse on the ERP components were largely unaffected by their relevance for the task. A task effect was found at the Ni and P2 level, but this effect was manifest across all scene dimensions, indicating a general effect rather than an interaction between task context and GSPs. Together, these findings suggest that the extraction of global scene information reflected in the early ERP components is rapid and very little influenced by top-down observer-based goals.
收起
摘要 :
The action-specific account of perception states that a perceiver's ability to act influences the perception of the environment. For example, participants tend to perceive distances as farther when presented up hills than on the f...
展开
The action-specific account of perception states that a perceiver's ability to act influences the perception of the environment. For example, participants tend to perceive distances as farther when presented up hills than on the flat ground. This tendency is known as the distance-on-hill effect. However, there is debate as to whether these types of effects are truly perceptual. Critics of the action-specific account of perception claim that the effects could be due to participants guessing the hypothesis and trying to comply with the experimental demands. The present study aims to explore the distance-on-hill effect and determine whether it is truly perceptual or whether past results were due to response bias. Participants judged the relative distance to targets on a hill and the flat ground. We found the distance-on-hill effect in virtual reality using a visual matching task. The distance-on-hill effect persisted even when participants were given explicit feedback about their estimates. We also found that the effect went away, as predicted by a perceptual explanation, when participants had to match the distance between two cones that were both on hills. These results offer important steps toward the painstaking task of determining whether action's effect on perception is truly perceptual.
收起
摘要 :
Scene perception technology helps robots identify the target areas that people refer to so that it contributes to human-robot interaction, semantic navigation and other related tasks. Currently, pure semantic-feature-based methods...
展开
Scene perception technology helps robots identify the target areas that people refer to so that it contributes to human-robot interaction, semantic navigation and other related tasks. Currently, pure semantic-feature-based methods are insufficient to fully describe diverse indoor scenes, resulting in a high confusion rate and performance inconsistency on each class. To overcome these problems, the style information is compounded with the common semantic feature to form a more elaborate description of a scene, and the corresponding network is proposed. First, the convolutional network is adopted to extract the base feature. Then, the high-level feature maps are taken out and reasonably divided into overlapped units to reserve a more appropriate neighbour correlation. Next, two branches are proposed to acquire the style and the semantic information respectively. In the style branch, the gram matrix is applied to each unit. In the semantic branch, the units of high-level feature maps are directly used. In both branches, batch normalization and vector embedding techniques are applied to the flattened elements of the unit sets to unify the feature strength and introduce compression representation. Then, the two compressed representations are combined as a compound expression to describe a scene. A multi-head self-attention structure is introduced to correlate and reinforce the multi-divided information and further form a dominant and refined stylized semantic description. Finally, scene classification is implemented by a multi-layer fully connected network. The experiment adopts the paradigm of once learning and cross-environment inference, which is closer to practical applications. The proposed method performs the best compared with several popular methods in the field of robotics, and especially, it has the smallest classification bias. In addition, the necessity of the main components of our method is evaluated, and the semantic explanation is also presented.
收起
摘要 :
The precise role played by the hippocampus in supporting cognitive functions such as episodic memory and future thinking is debated, but there is general agreement that it involves constructing representations comprised of numerou...
展开
The precise role played by the hippocampus in supporting cognitive functions such as episodic memory and future thinking is debated, but there is general agreement that it involves constructing representations comprised of numerous elements. Visual scenes have been deployed extensively in cognitive neuroscience because they are paradigmatic multi-element stimuli. However, questions remain about the specificity and nature of the hippocampal response to scenes. Here, we devised a paradigm in which we had participants search pairs of images for either colour or layout differences, thought to be associated with perceptual or spatial constructive processes respectively. Importantly, images depicted either naturalistic scenes or phase-scrambled versions of the same scenes, and were either simple or complex. Using this paradigm during functional MRI scanning, we addressed three questions: 1. Is the hippocampus recruited specifically during scene processing? 2. If the hippocampus is more active in response to scenes, does searching for colour or layout differences influence its activation? 3. Does the complexity of the scenes affect its response? We found that, compared to phase-scrambled versions of the scenes, the hippocampus was more responsive to scene stimuli. Moreover, a clear anatomical distinction was evident, with colour detection in scenes engaging the posterior hippo campus whereas layout detection in scenes recruited the anterior hippocampus. The complexity of the scenes did not influence hippocampal activity. These findings seem to align with perspectives that propose the hippocampus is especially attuned to scenes, and its involvement occurs irrespective of the cognitive process or the complexity of the scenes. (c) 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
收起
摘要 :
The human visual system is capable of processing an enormous amount of information in a short time. Although rapid target detection has been explored extensively, less is known about target localization. Here we used natural scene...
展开
The human visual system is capable of processing an enormous amount of information in a short time. Although rapid target detection has been explored extensively, less is known about target localization. Here we used natural scenes and explored the relationship between being able to detect a target (present vs. absent) and being able to localize it. Across four presentation durations (~ 33-199 ms), participants viewed scenes taken from two superordinate categories (natural and manmade), each containing exemplars from four basic scene categories. In a two-interval forced choice task, observers were asked to detect a Gabor target inserted in one of the two scenes. This was followed by one of two different localization tasks. Participants were asked either to discriminate whether the target was on the left or the right side of the display or to click on the exact location where they had seen the target. Targets could be detected and localized at our shortest exposure duration (~ 33 ms), with a predictable improvement in performance with increasing exposure duration. We saw some evidence at this shortest duration of detection without localization, but further analyses demonstrated that these trials typically reflected coarse or imprecise localization information, rather than its complete absence. Experiment 2 replicated our main findings while exploring the effect of the level of "openness" in the scene. Our results are consistent with the notion that when we are able to extract what objects are present in a scene, we also have information about where each object is, which provides crucial guidance for our goal-directed actions.
收起