摘要 :
In the traditional evaluation of information retrieval systems, assessors are asked to determine the relevance of a document on a graded scale, independent of any other documents. Such judgments are absolute judgments. Learning to...
展开
In the traditional evaluation of information retrieval systems, assessors are asked to determine the relevance of a document on a graded scale, independent of any other documents. Such judgments are absolute judgments. Learning to rank brings some new challenges to this traditional evaluation methodology, especially regarding absolute relevance judgments. Recently preferences judgments have been investigated as an alternative. Instead of assigning a relevance grade to a document, an assessor looks at a pair of pages and judges which one is better. In this paper, we generalize pairwise preference judgments to relative judgments. We formulate the problem of relative judgments in a formal way and then propose a new strategy called Select-the-Best-Ones to solve the problem. Through user studies, we compare our proposed method with a pairwise preference judgment method and an absolute judgment method. The results indicate that users can distinguish by about one more relevance degree when using relative methods than when using the absolute method. Consequently, the relative methods generate 15-30% more document pairs for learning to rank. Compared to the pairwise method, our proposed method increases the agreement among assessors from 95% to 99%, while halving the labeling time and the number of discordant pairs to experts' judgments.
收起
摘要 :
Two studies were conducted to further examine the unskilled-and-unaware effect and to test whether low-performing students are indeed unaware of their (expected) lower metacognitive monitoring abilities. Postdicted judgments of pe...
展开
Two studies were conducted to further examine the unskilled-and-unaware effect and to test whether low-performing students are indeed unaware of their (expected) lower metacognitive monitoring abilities. Postdicted judgments of performance and second-order judgments (SOJs) were solicited to test students' metacognitive awareness. Given that global and local judgments tend to differ (the confidence-frequency effect), we investigated whether students' (un) awareness pertains to both types of judgments. A first study focusing on global judgments was conducted in a regular exam setting with 196 undergraduate education students. A second study with 115 undergraduate education students examined both global and local judgments. Local judgments were analyzed on an average level and according to different signal detection theory categories (hits, correct rejections, misses, and false alarms). In both studies, students were grouped in four performance quartiles. The results showed that low-performing students highly overestimated their performance (they were functionally overconfident). However, their SOJs indicated that they were less confident in their judgments than the other students, and thus seemed to be aware of their low ability to estimate their own performance (they were not subjectively overconfident). This was observed for global as well as for averaged local SOJs. Moreover, an analysis of the local judgments revealed that students' SOJs varied depending not only on whether their judgments were accurate but also on whether or not they thought they knew the answer to an item. In sum, SOJs provide valuable information about students' metacognitive awareness.
收起
摘要 :
Research has demonstrated that in addition to minor changes in goalkeepers' position or height, goalkeeper reputation seems to influence penalty takers' shot placement. However, this evidence is based on correlative designs. Here,...
展开
Research has demonstrated that in addition to minor changes in goalkeepers' position or height, goalkeeper reputation seems to influence penalty takers' shot placement. However, this evidence is based on correlative designs. Here, the authors experimentally manipulated both height and reputation to examine their causal impact on actual shot placement. Penalty takers performed kicks facing goalkeepers of different height (tall vs. short) and reputation (high vs. low) projected 011 a life-size screen. Results showed that tall goalkeepers were judged as taller than short goalkeepers. Likewise, high-reputation goalkeepers were judged as taller than low-reputation goalkeepers. All important finding was that reputation also influenced shot placement. When facing high-reputation goalkeepers, penalty takers aimed farther away from the goalkeeper and missed the goal more often. It follows that reputation affects both height estimates of goalkeepers and, most important, shot placement. Consequently, manipulating perceived reputation of goalkeepers provides an avenue for sport professionals to subtly influence shot placement of penalty takers.
收起
摘要 :
With the rise of machine learning and "big data," many large yet spurious relationships between variables are discovered, leveraged by marketing communications, and publicized in the media. Thus, consumers are increasingly exposed...
展开
With the rise of machine learning and "big data," many large yet spurious relationships between variables are discovered, leveraged by marketing communications, and publicized in the media. Thus, consumers are increasingly exposed to many large-magnitude relationships between variables that do not signal causal effects. This exposure may carry a substantial cost. Seven studies demonstrate that the magnitudes of relationships between variables can distort consumers' judgments about whether those relationships reflect causal effects. Specifically, consumers often use a magnitude heuristic: consumers infer that relationships with larger perceived magnitudes are more likely to reflect causal effects, even when this is not true (and even when relationships' correlations are held constant). In many situations, relying on the magnitude heuristic will distort causality judgments, such as when large-magnitude relationships between variables are spurious, or when nor-matively extraneous factors (e.g., reference points) distort perceptions of magnitudes. Moreover, magnitude-distorted (mis)perceptions of causality, in turn, distort consumers' purchase and consumption decisions. Since consumers often encounter spurious relationships with large magnitudes in the health domain and in other consequential domains, the magnitude heuristic is likely to lead to biases in some of consumers' most important decisions.
收起
摘要 :
Previous research has shown that people exhibit a sample size bias when judging the average of a set of stimuli on a single dimension. The more stimuli there are in the set, the greater people judge the average to be. This effect ...
展开
Previous research has shown that people exhibit a sample size bias when judging the average of a set of stimuli on a single dimension. The more stimuli there are in the set, the greater people judge the average to be. This effect has been demonstrated reliably for judgments of the average likelihood that groups of people will experience negative, positive, and neutral events (Price, 2001; Price, Smith, & Lench,2006) and also for estimates of the mean of sets of numbers (Smith & Price,2010). The present research focuses on whether this effect is observed for judgments of average on a perceptual dimension. In 5 experiments we show that people's judgments of the average size of the squares in a set increase as the number of squares in the set increases. This effect occurs regardless of whether the squares in each set are presented simultaneously or sequentially; whether the squares in each set are different sizes or all the same size; and whether the response is a rating of size, an estimate of area, or a comparative judgment. These results are consistent with a priming account of the sample size bias, in which the sample size activates a representation of magnitude that directly biases the judgment of average.
收起
摘要 :
Most exposure assessments are conducted without the aid of robust personal exposure data and are based instead on qualitative inputs such as education and experience, training, documentation on the process chemicals, tasks and equ...
展开
Most exposure assessments are conducted without the aid of robust personal exposure data and are based instead on qualitative inputs such as education and experience, training, documentation on the process chemicals, tasks and equipment, and other information. Qualitative assessments determine whether there is any follow-up, and influence the type that occurs, such as quantitative sampling, worker training, and implementing exposure and risk management measures. Accurate qualitative exposure judgments ensure appropriate follow-up that in turn ensures appropriate exposure management. Studies suggest that qualitative judgment accuracy is low. A qualitative exposure assessment Checklist tool was developed to guide the application of a set of heuristics to aid decision making. Practicing hygienists (n = 39) and novice industrial hygienists (n = 8) were recruited for a study evaluating the influence of the Checklist on exposure judgment accuracy. Participants generated 85 pre-training judgments and 195 Checklist-guided judgments. Pre-training judgment accuracy was low (33%) and not statistically significantly different from random chance. A tendency for IHs to underestimate the true exposure was observed. Exposure judgment accuracy improved significantly (p < 0.001) to 63% when aided by the Checklist. Qualitative judgments guided by the Checklist tool were categorically accurate or over-estimated the true exposure by one category 70% of the time. The overallmagnitude of exposure judgment precision also improved following training. Fleiss'kappa, evaluating inter-rater agreement between novice assessors was fair to moderate (kappa = 0.39). Cohen'sweighted and unweighted kappa were good to excellent for novice (0.77 and 0.80) and practicing IHs (0.73 and 0.89), respectively. Checklist judgment accuracy was similar to quantitative exposure judgment accuracy observed in studies of similar design using personal exposure measurements, suggesting that the tool could be useful in developing informed priors and further demonstrating its usefulness in producing accurate qualitative exposure judgments.
收起
摘要 :
Groups and individuals often make judgments that involve evaluating informational cues. Our research uses a social judgment approach to address similarities and differences in the ways that groups and individuals use cues. We hypo...
展开
Groups and individuals often make judgments that involve evaluating informational cues. Our research uses a social judgment approach to address similarities and differences in the ways that groups and individuals use cues. We hypothesize that groups have greater information processing capacity than individuals; therefore, group will use cues to a greater degree than individuals. Moreover, we predicted that groups will use those informational cues in a more consistent fashion than individuals. Consistent with hypotheses, results indicate that groups used available cues more than individuals and groups applied an information use and integration strategy more consistently than similarly treated individuals. A shared task representation is proposed as an explanation for the information usage pattern exhibited by groups. These systematic differences in the ways groups and individuals use cues may explain differences in groups and individual judgments.
收起
摘要 :
Research in psychology has found that subjects regularly exhibit a conjunction fallacy in probability judgments. Additional research has led to the finding of other fallacies in probability judgment, including disjunction and cond...
展开
Research in psychology has found that subjects regularly exhibit a conjunction fallacy in probability judgments. Additional research has led to the finding of other fallacies in probability judgment, including disjunction and conditional fallacies. Such analyses of judgments are critical because of the substantial amount of probability judgment done in accounting, business and organizational settings. However, most previous research has been conducted in the environment of a single decision maker. Since business and other organizational environments also employ groups, it is important to determine the impact of groups on such cognitive fallacies. This paper finds that groups substantially mitigate the impact of probability judgment fallacies among the sample of subjects investigated. The key finding of this paper is the analysis of the apparent manner in which groups make such decisions. A statistical analysis, based on a binomial distribution, suggests that groups investigated here did not use consensus. Instead, if any one member of the group has correct knowledge about the probability relationships, then the group uses that knowledge and does not exhibit fallacy in probability judgment. Having a computational model of the group decision making process provides a basis for developing computational models that can be used to simulate "mirror worlds" of reality or model decision making in real world settings.
收起
摘要 :
Whilst the research literature points towards the benefits of a statistical approach, business practice continues in many cases to rely on judgmental approaches for demand forecasting. In today's dynamic environment, it is especia...
展开
Whilst the research literature points towards the benefits of a statistical approach, business practice continues in many cases to rely on judgmental approaches for demand forecasting. In today's dynamic environment, it is especially relevant to consider a combination of both approaches. However, the question remains as to how this combination should occur. This study compares two different ways of combining statistical and judgmental forecasting, employing real-life data from an international publishing company that produces weekly forecasts on regular and exceptional products. Two forecasting methodologies that are able to include human judgment are compared. In a 'restrictive judgement' model, expert predictions are incorporated as restrictions on the forecasting model. In an 'integrative judgment' model, this information is taken into account as a predictive variable in the demand forecasting process. The proposed models are compared on error metrics and analysed with regard to the properties of the adjustments (direction, size) and of the forecast itself (volatility, periodicity). The integrative approach has a positive effect on accuracy in all scenarios. However, in those cases where the restrictive approach proved to be beneficial, the integrative approach limited these beneficial effects. The study links with demand planning by using the forecasts as input for an optimization model to determine the ideal number of SKUs per Point of Sale (PoS), making a distinction between SKU forecasts and SKU per PoS forecasts. Importantly, this enables performance to be expressed as a measure of profitability, which proves to be higher for the integrative approach than for the restrictive approach.
收起
摘要 :
Three experiments tested the hypothesis that people's Overconfidence in the quality of their intuitive judgment strategies contributes to their reluctance to use helpful actuarial judgment aids. Participants engaged in a judgment ...
展开
Three experiments tested the hypothesis that people's Overconfidence in the quality of their intuitive judgment strategies contributes to their reluctance to use helpful actuarial judgment aids. Participants engaged in a judgment task that required them to use five cues to decide whether a prospective juror favored physician-assisted suicide. Participants had the opportunity to examine the judgments of a statistical equation that correctly classified 77% of the prospective jurors. In all experiments, participants infrequently examined the equation, performed worse than the equation, and were highly overconfident. In Experiments 1 and 2, outcome feedback and calibration feedback failed to reduce Overconfidence. In Experiment 3, enhanced calibration feedback reduced Overconfidence and increased reliance on the equation, thus leading to improved judgment performance.
收起