Crowdsourced Sampling of a Composite Random Variable: Analysis, Simulation, and Experimental Test
Crowdsourced Sampling of a Composite Random Variable: Analysis, Simulation, and Experimental Test作者机构:Department of Physics Trinity College Hartford CT USA
出 版 物:《Open Journal of Statistics》 (统计学期刊(英文))
年 卷 期:2019年第9卷第4期
页 面:494-529页
学科分类:1002[医学-临床医学] 100214[医学-肿瘤学] 10[医学]
主 题:Crowdsourcing Computer Modeling of Crowds Monte Carlo Simulation Large-Scale Sampling Log-Normal Random Variable Log-Normal Distribution
摘 要:A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, independent, anonymous group of non-expert respondents (the “crowd). The objective of this research is to examine the statistical distribution of solutions from a large crowd to a quantitative problem involving image analysis and object counting. Theoretical analysis by the author, covering a range of conditions and types of factor variables, predicts that composite random variables are distributed log-normally to an excellent approximation. If the factors in a problem are themselves distributed log-normally, then their product is rigorously log-normal. A crowdsourcing experiment devised by the author and implemented with the assistance of a BBC (British Broadcasting Corporation) television show, yielded a sample of approximately 2000 responses consistent with a log-normal distribution. The sample mean was within ~12% of the true count. However, a Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count. The results of this research suggest that a well-modeled MCS, by simulating a sample of responses from a large, rational, and incentivized crowd, can provide a more accurate solution to a quantitative problem than might be attainable by direct sampling of a smaller crowd or an uninformed crowd, irrespective of size, that guesses randomly.