Proportion Detectors in Discrimination Testing: Useful Metric or Misleading Idea?
- dsi657
- 1 day ago
- 1 min read

In discrimination testing, we often want more than a yes/no on significance: we want to estimate the size of the sensory difference.
The 𝗽𝗿𝗼𝗽𝗼𝗿𝘁𝗶𝗼𝗻 𝗼𝗳 𝗱𝗲𝘁𝗲𝗰𝘁𝗼𝗿𝘀 (𝙋𝙙) has long been used for that, based on a simple idea: some people 𝘤𝘢𝘯 tell the difference, others 𝘤𝘢𝘯’𝘵, even momentarily. Pd can also be used as a “difference unit” in power and sample-size planning, and is taught in universities and built into software.
But is the idea of “detectors” too simple? A substantial body of research says yes.
Daniel Ennis (1993) and later on Virginie Jesionka, Benoît Rousseau, and John Ennis (2014) showed that Pd is 𝗺𝗲𝘁𝗵𝗼𝗱-𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁, making it 𝘶𝘯𝘳𝘦𝘭𝘪𝘢𝘣𝘭𝘦 for quantifying differences or planning studies.
Because different discrimination tests have different power (3-AFC > tetrad > triangle), the same consumers evaluating the same products can produce 𝘷𝘦𝘳𝘺 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘵 𝘗𝘥 𝘷𝘢𝘭𝘶𝘦𝘴 (e.g., 64%, 45%, 25%).
𝙎𝙖𝙢𝙚 𝙥𝙖𝙣𝙚𝙡. 𝙎𝙖𝙢𝙚 𝙥𝙧𝙤𝙙𝙪𝙘𝙩𝙨. 𝙏𝙝𝙧𝙚𝙚 𝙚𝙨𝙩𝙞𝙢𝙖𝙩𝙚𝙨. Which one, if any, reflects reality?
𝘌𝘷𝘦𝘯 𝘮𝘰𝘳𝘦 𝘱𝘳𝘰𝘣𝘭𝘦𝘮𝘢𝘵𝘪𝘤: these methods share the same chance probability (1/3). So a Pd of 20% 𝗶𝗺𝗽𝗹𝗶𝗲𝘀 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝘀𝗮𝗺𝗽𝗹𝗲-𝘀𝗶𝘇𝗲 𝗿𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁 (N = 87 at 80% power), despite known differences in efficiency.
𝗕𝗼𝘁𝘁𝗼𝗺 𝗹𝗶𝗻𝗲: Pd cannot reliably quantify sensory differences or reconcile method-to-method differences. Yet its continued use in software and teaching gives it 𝘢 𝘧𝘢𝘭𝘴𝘦 𝘴𝘦𝘯𝘴𝘦 𝘰𝘧 𝘭𝘦𝘨𝘪𝘵𝘪𝘮𝘢𝘤𝘺.
Many users in our field now prefer method-independent metrics - like the 𝗧𝗵𝘂𝗿𝘀𝘁𝗼𝗻𝗶𝗮𝗻 𝗱𝗲𝗹𝘁𝗮 - for effect sizing and power planning.
Curious to hear from practitioners:
• Were you aware of the shortcomings of Pd?
• Do you rely on it when communicating results?
• Or this is old news and you have moved on to better metrics?