Reviews with all double blind testing?


In the July, 2005 issue of Stereophile, John Atkinson discusses his debate with Arnold Krueger, who Atkinson suggest fundamentally wants only double blind testing of all products in the name of science. Atkinson goes on to discuss his early advocacy of such methodology and his realization that the conclusion that all amps sound the same, as the result of such testing, proved incorrect in the long run. Atkinson’s double blind test involved listening to three amps, so it apparently was not the typical different or the same comparison advocated by those advocating blind testing.

I have been party to three blind testings and several “shootouts,” which were not blind tests and thus resulted in each component having advocates as everyone knew which was playing. None of these ever resulted in a consensus. Two of the three db tests were same or different comparisons. Neither of these resulted in a conclusion that people could consistently hear a difference. One was a comparison of about six preamps. Here there was a substantial consensus that the Bozak preamp surpassed more expensive preamps with many designers of those preamps involved in the listening. In both cases there were individuals that were at odds with the overall conclusion, and in no case were those involved a random sample. In all cases there were no more than 25 people involved.

I have never heard of an instance where “same versus different” methodology ever concluded that there was a difference, but apparently comparisons of multiple amps and preamps, etc. can result in one being generally preferred. I suspect, however, that those advocating db, mean only “same versus different” methodology. Do the advocates of db really expect that the outcome will always be that people can hear no difference? If so, is it the conclusion that underlies their advocacy rather than the supposedly scientific basis for db? Some advocates claim that were there a db test that found people capable of hearing a difference that they would no longer be critical, but is this sincere?

Atkinson puts it in terms of the double blind test advocates want to be right rather than happy, while their opponents would rather be happy than right.

Tests of statistical significance also get involved here as some people can hear a difference, but if they are insufficient in number to achieve statistical significance, then proponents say we must accept the null hypothesis that there is no audible difference. This is all invalid as the samples are never random samples and seldom, if ever, of a substantial size. Since the tests only apply to random samples and statistical significance is greatly enhanced with large samples, nothing in the typical db test works to yield the result that people can hear a difference. This would suggest that the conclusion and not the methodology or a commitment to “science” is the real purpose.

Without db testing, the advocates suggest those who hear a difference are deluding themselves, the placebo effect. But were we to use db but other than the same/different technique and people consistently choose the same component, would we not conclude that they are not delusional? This would test another hypothesis that some can hear better.

I am probably like most subjectivists, as I really do not care what the outcomes of db testing might be. I buy components that I can afford and that satisfy my ears as realistic. Certainly some products satisfy the ears of more people, and sometimes these are not the positively reviewed or heavily advertised products. Again it strikes me, at least, that this should not happen in the world that the objectivists see. They see the world as full of greedy charlatans who use advertising to sell expensive items which are no better than much cheaper ones.

Since my occupation is as a professor and scientist, some among the advocates of double blind might question my commitment to science. My experience with same/different double blind experiments suggest to me a flawed methodology. A double blind multiple component design, especially with a hypothesis that some people are better able to hear a difference, would be more pleasing to me, but even here, I do not think anyone would buy on the basis of such experiments.

To use Atkinson’s phrase, I am generally happy and don’t care if the objectivists think I am right. I suspect they have to have all of us say they are right before they can be happy. Well tough luck, guys. I cannot imagine anything more boring than consistent findings of no difference among wires and components, when I know that to be untrue. Oh, and I have ordered additional Intelligent Chips. My, I am a delusional fool!
tbg
So, Rouvin, if you don't think all those DBTs with negative results are any good, why don't you do one "right"? Who knows, maybe you'd get a positive result, and prove all those objectivists wrong.

If the problem is with test implementation, then show us the way to do the tests right, and let's see if you get the results you hope for. I'm not holding my breath.
Pabelson, interesting challenge, but let’s look at what you’ve said in your various posts in this thread. I’ve pasted them without dates, but I’m sure that you know what you’ve said so far.
"What advances the field is producing your own evidence—evidence that meets the test of reliability and repeatability, something a sighted listening comparison can never do. That’s why objectivists are always asking, Where’s your evidence?"
"A good example of a mix of positive and negative tests is the ABX cable tests that Stereo Review did more than 20 years ago. Of the 6 comparisons they did, 5 had positive results; only 1 was negative."
"It's better to use one subject at a time, and to let the subject control the switching."
"Many objectivists used to be subjectivists till they started looking into things, and perhaps did some testing of their own."

You cite the ABX home page, a page that shows that differences can be heard. Yet I recognize that the differences when heard were between components that were quite different and usually meeting the standard you’ve indicated as much better specs will sound better.

Once you decide something does sound different, is this what you buy? Is different better? You say:
"Find ANYBODY who can tell two amps apart 15 times out of 20 in a blind test (same-different, ABX, whatever), and I’ll agree that those two amps are sonically distinguishable."
Does that make you want to have this amp? Is that your standard?

One of the tests you cite was in 1998 with two systems that were quite different in more than price. Does that lend credence to the DBT argument? On the one hand you point to all the same but one component with one listener with repeated tests but then cite something quite different to impugn subjectivists – not that it’s all that hard to do. You also cite a number of times that DBT has indicated that there is a difference. Which is it? Is there “proof” of hearing differences that has been established by DBT? It certainly appears that there is from the stuff you have cited. By your argument, if this has been done once, the subjectivists have demonstrated their point. I don’t agree, and you really don't appear to , either.

My points were two, and I do not feel that they have been addressed by your challenge. One, that most DBT tests as done in audio have readily questionable methods – methods that invalidate any statistical testing, as well as sample sizes that are way too small for valid statistics. Those tests you cite in which differences were found do look valid, but I haven’t taken the time to go into them more deeply. Two, and the far more important point to me, do the DBT tests done or any that might be done really address the stuff of subjective reviews? I just don’t see how this can be done, and I’m not going to try to accept your challenge , “If you know so much ...” Instead, if you know so much about science and psychoacoustics, and you do appear to have at least a passing knowledge to me, why would you issue such a meaningless, conversation stopper challenge? Experiments with faulty experimental design are refused for journal or other publication all the time by reviewers who do not have to respond to such challenges. The flaws they point out are sufficient.

Finally, I’ve been involved in this more than long enough to have heard many costly systems in homes and showrooms that either sounded awful to my ears or were unacceptable to me one way or another. The best I’ve heard have never been the most costly but have consistently been in houses with carefully set up sound rooms built especially for that purpose from designs provided by psychoacoustic objectivists. This makes me suspect that what we have is far better than we know, a point inherent in many "objectivist" arguments. My home does not even come close to that standard in my listening room (and a very substantial majority of pictures I see of various systems in rooms around the net also seem to fall pretty short). The DBT test setups I have seen have never been in that type of room, either. What effect this would have on a methodologically sound DBT would be interesting. Wouldn’t it?
Rouvin: Let me take your two points in order. First:

One, that most DBT tests as done in audio have readily questionable methods – methods that invalidate any statistical testing, as well as sample sizes that are way too small for valid statistics.

Then why is it that all published DBTs involving consumer audio equipment report results that match what we would predict based on measurable differences? For badly implemented tests, they've yielded remarkably consistent results, both positive and negative. If the reason some tests were negative was because they were done badly, why hasn't anyone ever repeated those tests properly and gotten a positive result instead? (I'll tell you why--because they can't.)

Two, and the far more important point to me, do the DBT tests done or any that might be done really address the stuff of subjective reviews?

DBTs address a prior question: Are two components audibly distinguishable at all? If they aren't, then a subjective review comparing those components is an exercise in creative writing. You seem to be making the a priori assumption that if a subjective reviewer says two components sound different, then that is correct and DBTs ought to be able to confirm that. That's faith, not science. If I ran an audio magazine, I wouldn't let anyone write a subjective review of a component unless he could demonstrate that he can tell it apart from something else without knowing which is which. Would you really trust a subjective reviewer who couldn't do that?
Pabelson,
I think we may be closer than you think on this issue but you seem to want it both ways, a difficulty I see repeatedly in "objectivist" arguments. You say:
" For badly implemented tests, they've yielded remarkably consistent results, both positive and negative." --
all the while insisting on scientific assessment.

Methodologically unsound experiments yeild no meaningful results. The pattern of meaningless results does not matter. Your argument in this regard is emotionally appealing, but it is incorrect.

Moreover, the notion that "DBTs address a prior question: Are two components audibly distinguishable at all?" is also suspect absent appropriate methodology. I notice in your posts that you address reliability and repeatibility, important factors without any doubt. Yet you have never spoken to the issue I have validity, and this is the crux of our difference. Flawed methodology can yield repeatable results reliably, but it is still not valid.

And, of course, as you have noted, many DBT's have shown that some components are distinguishable.

The issue beyond methodology, I suspect, is that there are some people who can often reliably distinguish between components. They are outliers, well outside the norm, several standard deviations beyond the mean, even among the self-designated "golden eared." When any testing is done on a group basis, these folks vanish in the group statistics. You can assail this argument on many grounds. It is indefensible except for the virtual certainty that there is a standard distribution in the populatiuon in acuity of hearing.

So, my position remeains that there is surely a place for DBT testing, but even after all the methodological and sampling issues were addressed, I'm still unsure how it fits into the types of reviews most audiophoiles want.

In your hypothetical magazine, after DBT establishes that the Mega Whopper is distinguishable from El Thumper Grande, how would either be described? Would there be a DBT for each characteristic?

Freud had a book on religion entitled "Future of an Illusion" and you may well feel that this is where all of this ultimately is. I'm not sure that I have an answer to that, but this may well be why Ausio Asylum has devclared itself a DBT free zone.
Rouvin: You're the one who says these are badly implemented tests (though you seem to be familiar with only a few). I wouldn't claim they're perfect. But that doesn't make their results meaningless; it leaves their results open to challenge by other tests that are methodologically better. My point is that you can't produce any tests that are both 1) methodologically sound; and 2) in conformance with what you want to believe about audio. And until you do produce such tests, you haven't really got any ground to stand on.

You state that golden ears exist, but at the end of the paragraph you admit that this position is indefensible, so you saved me the trouble. ;-) To your point that these golden ears get averaged out in a large test, you're simply wrong. I've never seen a DBT where individuals got a statistically significant score, but the broader panel did not. When it happens, then we'll worry about it.

So, my position remeains that there is surely a place for DBT testing, but even after all the methodological and sampling issues were addressed, I'm still unsure how it fits into the types of reviews most audiophoiles want.

They may not fit with what audiophiles want, but that says more about audiophiles than it does about DBTs.

In your hypothetical magazine, after DBT establishes that the Mega Whopper is distinguishable from El Thumper Grande, how would either be described? Would there be a DBT for each characteristic?

Once you pass the test, you can describe the Thumper any way you want.

In your hypothetical magazine, after DBT establishes that the Mega Whopper is distinguishable from El Thumper Grande, how would either be described? Would there be a DBT for each characteristic?

strawman argument, the only point of DBT is to determine there is an audible difference. If the there is, let the creative writing begin.

steve
Rouvin, bingo! Validity is the missing concern with DBTs. I also entire subscribe to your question about where DBTing fits into the reviews that audiophiles want. As I have said, I cannot imagine a DBT audio magazine.

I am troubled by your comments that some DBTing has given positive results. Can you please cite these examples?
the golden-eared: an anecdote

i am a glenn gould fan. according to his biographers, gould could reliably distinguish between playback devices (blind) in the studio, which were indistinguishable to everyone else involved in the studio. gould was special in many ways. it wouldn't surprise me if the anecdote were true.

however, i'm not glenn gould. i'll spend my money on components that are distinguishable by ordinary folks like me.
Hmm, one of you speaks as if you have participated in and/or has seen the results from many audio DBT tests. Where are these test held? Where are they reported?
I got no problem being blindfolded for two weeks solid as long as you point me in the general direction of the porcelain amp stand when need be.
Agaffer: A list of DBT test reports appears here:

http://www.provide.net/~djcarlst/abx_peri.htm

This list is a bit old, but I don't know of too many published reports specifically related to audio components since then. After a while, it became apparent which components were distinguishable and which were not. So nobody publishes them anymore because they're old news.

Researchers still use them. Here's a test of the audibility of signals over 20kHz (using DVD-A, I think):

http://www.nhk.or.jp/strl/publica/labnote/lab486.html

The most common audio use of DBTs today is for designing perceptual codecs (MP3, AAC, etc.). These tests typically use a variant of the ABX test, called ABC/hr (for "hidden reference"), in which subjects compare compressed and uncompressed signals and gauge how close the compressed version comes to the uncompressed.

Finally, Harman uses DBTs in designing speakers. Speakers really do sound different, of course, so they aren't using ABX tests and such. Instead, they're trying to determine which attributes of speakers affect listener preferences. The Harman listening lab (like the one at the National Research Council in Canada, whose designers now work for Harman) places speakers on large turntables, which allows them to switch speakers quickly and listening to two or more speakers in the same position in the room. Here's an article about their work:

http://www.reed-electronics.com/tmworld/article/CA475937.html

And just for fun, here's a DBT comparing vinyl and digital:

http://www.bostonaudiosociety.org/bas_speaker/abx_testing2.htm

I think Stan Lipshitz's conclusion is worth noting:

Further carefully-conducted blind tests will be necessary if these conclusions are felt to be in error.
Pableson, I find your posts interesting though not really responsive to the initial thread by TBG about the place of DBT in audio. Nor, have I felt that your posts have been responsive to my similar concerns and additional concerns about experimental validity (though I am sure that not all have been invalid), for instance, the very interesting and amusing 1984 BAS article where the Linn godfather not only failed to differentiate the analog from the digitally processed source, he identified more analog selections as digital. But... this was an atypical setup that would not be found in any home. We can’t really generalize from this, and this has nothing to do with advocacy of the "subjectivist" viewpoint. If you would be true to your objectivist bona fides, wouldn't you have to agree?

Then, there’s the issue, supported by your citations, that there have been DBT’s going back years that have demonstrated noticeable differences between individual components.

So, I think there is a background issue, and this was also mentioned in TBG’s initial post. Many adherents of DBT seem to be seeking the very "conformance" that you want to point out in others. That "conformance?" That until the very qualities claimed to exist can be proven to exist they must be assumed not to exist. Intoxicating argument, but ultimately revealing of a distinct bias, the invalidation of the experience of others as an a priori position until they can meet your standard. This "you ain't proved nothin'" approach is especially troublesome when one reads subjective reviews and realizes that the points they raise, creative writing they may well be, could never be addressed by DBT, ABX, or any other similar methodology. The majority of what we are able to perceive is not amenable to measurement that can be neatly, or even roughly, correlated with perception. To claim otherwise is an illusion. Enter the artists with some scientific and technical skill and we have high end audio. Sadly, with them come the charlatans and deluded along with average and "golden eared" folks who hope that they can hear their music sound a bit more like they think they remember it sounding somewhere in the past. Add something like cables and it seems the battle lines are drawn.

I’m a bit suspicious that you might not allow the person who can reliably detect a difference between two components to write whatever he wants in your forthcoming journal. You claim that once the DBT is passed, he can describe a component any way he wants. It doesn’t really make sense to me because a "just noticeable difference" is not the same as being able to notice all of the differences subjective reviewers claim, does it? If someone can tell the real Mona Lisa from a reproduction, even a well executed one, do you really care to hear about everything else he thinks about it? I don’t. I might want to see it myself, though.

I don’t think there will ever be anything like being able to recreate the exact sonic experience of a live musical performance in a home or studio. What we can hope for are various ways to recreate some reasonable semblance of some aspects of some performances. DBT probably has a place there.

In the meantime, I’d like to suggest a name for your journal, The Absolutely Absolute Sound. I think Gunbei has a supply blindfolds.
Rouvin: There really isn't much point in arguing with someone who assumes his conclusions, and then does nothing but repeat his assumptions. Here's what I mean:

The majority of what we are able to perceive is not amenable to measurement that can be neatly, or even roughly, correlated with perception.

How do you know what you are *able* to perceive (as distinct from what you *think* you perceive)? In the field of perceptual psychology, which is the relevant field here, there are standard, valid ways of answering that question. But it's a question you are afraid to address. Hence your refusal of my challenge to actually conduct a DBT of any sort. And the idea that you, an amateur audio hobbyist without even an undergraduate degree in psychology, has any standing to declare what is and is not valid as a test of hearing perception is pretty risible.

Finally, just to clear up your most obvious point of confusion: There is a difference between "what we are able to perceive" and "how we perceive it." You are conflating these two things, again because you don't want to face up to the issue. "What we are able to perceive" is, in fact, quite amenable to measurement. It's been studied extensively. There are whole textbooks on the subject.

Your harping on subjective reviewing, by contrast, is about "how we perceive it." We can't measure sound and make predictions about how it will sound to you, because how it will sound to you depends on too many factors besides the actual sound. That's why we need DBTs--to minimize the non-sonic factors. And when we minimize those non-sonic factors, we discover that much of what passes for audio reviewing is a lot of twaddle.
"We can't measure sound and make predictions about how it will sound to you, because how it will sound to you depends on too many factors besides the actual sound." This is what I have been saying in addition to less than favorable comments about many subjective reviews. This problem is one of many that equally hampers "objective" reviews.

A closer reading of what I have written would reveal that I am, at best, ambivalent about the whole process of audio reviewing, subjective or objective. Moreover, DBT has yet to produce much of significance beyond some people can sometimes tell under some conditions.

"And the idea that you, an amateur audio hobbyist without even an undergraduate degree in psychology, has any standing to declare what is and is not valid..." You have constructed a total absence of "valid" credentials for me, an exercise in "creative writing." "Lack of standing" is the problematic judgment invoked that leads you to invalidate experience, a source of needless angry conflict. It might be somewhat accurate to characterize me as "an amateur audio hobbyist," but you have taken quite a leap to decide that I am "without even an undergraduate degree in psychology," a leap that could not be more inaccurate. At least, you didn’t mention my lack of teeth and eviction from the trailer park.

There are subjective reviews in just about every field. Anyone that takes them for hard fact is not understanding what they are. I’d also suggest an advanced readings in sensation and perception to better understand the distinction between "what we are able to perceive" and "how we perceive it."

So, I’ll leave you with two thoughts apropos of this discussion. Einstein said, "Not everything that can be counted counts; and not everything that counts can be counted." In Alice in Wonderland, the Dodo said, "Everybody has won, and all must have prizes." Where's Rodney King, anyhow?
This Rouvin-Pableson exchange is fascinating. I agree with Pableson on just about everything. Perhaps that is because I'm an academic (I'm a philosopher, but I'm also part of the cognitive science faculty b/c of my courses on color and epistemology). Anyway, I'm no psychologist, but I am aware of the powerful external forces shaping perceptual evaluation. So I am especially leery of those extra-acoustical mechanisms, which are, by their very nature hidden from us.

SOME RELEVANT PSYCHOLGOICAL MECHANISMS TO BEAR IN MIND.

To start with, there's the endowment effect. The experiment takes place at a three-day conference. At the beginning of the conference, everyone is given a mug. At the end of the conference, the organizers offer to buy the mugs back for a certain price. Turns out, people want something like $8 (can't remember the exact number) to give their mug back. But other groups at different conferences are not given the mug; it is sold to them. Turns out, the price they are willing to *pay* for the mug is, like $1. Conclusion: people very quickly come to think the things they have are worth more than things they don't have, but could acquire.

This may seem to run counter to our constant desire to swap out and upgrade in search of perfect sound, but it explains the superlatives that people use -- "best system I've ever heard," "sounds better than most systems costing triple"-- when describing mediocre systems they happen to own. (Other explanations for this are also possible, of course.)

Our audiophiliac tendencies are also in part explained by the "choice" phenomenon: when you are faced with a wide variety of options, you're not as happy with any of them as you otherwise would be. When subjects are offered three kinds of chocolate on a platter, they're pretty happy with their choice. But when they're offered twenty kinds, they're less happy even when they pick the identical chocolate. That's us!

Another endowment-like effect, though, and this is what got me to write this post, is one that happens after making a purchasing or hiring decision. After making the decision say, to hire person A over person B, a committee will rate person A *much* higher than prior to the hiring decision, when person B was still an option. In other words, we affirm our choices after making them.

This phenomenon is more pronounced the more sacrifices you make in the course of the decision-making process. In other words, if you went all out to get candidate A, you'll think he's even better. Women know this intuitively. It's called playing hard to get.

In the audio realm, when you spend a couple grand on cables, your listening-evaluation mechanisms will *make* the sound better, because you have sacrificed for it.

So *this* made me wonder whether really expensive cables *do* sound better, to those who know what they cost and who made the sacrifice of buying them. If so, then those cables are worth every penny to those who value that listening experience. DBT cannot measure this difference, because it's not a physical difference in the sound. But it is still a *real* difference in the perceptual experiences of the listener. In the one case (expensive cables), your perceptual system is all primed and ready to hear clarity, depth, soundstage, air, presence, and so on. In the other case (cheap cables), you perceptual system is primed to hear grain, edge, sibilance, and so on. And hear them you do!

Best of all would be forgeries, *faked* expensive cables your wife could buy, knowing they were fakes, and stashing the unspent thousands in a bank account. You'd get to "hear" all of this wonderful detail, thinking you were broke, but years later, you'd have a couple hundred grand in your retirement fund!

Sorry for the rambling post, but I am interested to hear what Pableson has to say. You are missing out, Pableson. Knowing about the extra-acoustical mechanisms, you cannot "hear" the benefits of expensive cables. It's all ruined for you, as if you discovered your "wonderful" antidepressants were just pricey sugar pills.
Double blind testing is the ONLY way to test something fairly to remove human preconception, expectation, and visual prejudice. That is why it is used for drug trials, and that is why it should be used for hifi.

Any audiophile who questions whether DBT can produce the most accurate results within the other constraints (time/partnering equipment) of a shootout is not helping advance audio. But then I think most of us here would secretly agree that audio is a hobby with more than its share of snake-oil salesmen.
If you find this fascinating, Qualia8, then maybe you're the one who should be taking these sugar pills.

Obviously I agree with you, since you agree with me. There's a lot of expectation bias (aka, placebo effect) and confirmation bias (looking for--and finding--evidence to support your prior beliefs) in hearing perception. But I suspect some high-enders would rather sacrifice the retirement fund than admit that they might be subject to these mechanisms.

To your last point, it is NOT all ruined for me. I can spend my time auditioning speakers, trying to optimize the sound in my room, and seeking out recordings that really capture the ambience of the original venue.
One question: let's say we get Double-blind testing, would the associated components also be tested blind? ...
So let's see: say we are testing speakers: we should double-blind what two different amplifiers, tube and solid state; two different levels of power for the amplifiers? ... should we double-blind for the room as well ... I think folks are naive about how many variables are at stake in trying to make audio reviewing and hearing more "precise" and "scientific" than it ever could be.
But le't suppose we did all this: I submit that people still would question the integrity of reviewers because people would still disagree on the quality of the sound they hear. And some among us would swear that reviewer X was on the take.
my take on dbt is this, if your talking about running db tests with amps,preamp's, source's & speaker's what's the point,its about what sound's "right" to each person & no amount of testing can show who like's what better, i too hate the word synergy but it's a real thing.

now if were talking about db tests on thing's like gear that has been "upgraded internaly" being db tested against a stock model or exotic cables against regular wire there is alot of merit to a db test, i would also think db test's would be great for alot of the thing's in our hobby that are deemed 'snake oil' like clock's & jar's of rock's & especially interconnect's & wire's.

you cant just dismiss all db test's as inconclusive or worthless nor can you say all db test's are worthy.

mike.
Wattsboss: I'd be careful about accusing others of naivete, if you're going to make posts like this. In a DBT, everything except the units under test are kept constant. So, for exampe, if you were comparing CD players, you would feed both to the same amp, and on to the same speakers. You wouldn't have to "blind" the associated components, because the associated components would be the same.
Wattsboss - We should not test anything well because it would be impractical to test everything well?
Agree with your thoughts that there could always be doubts about one single test and tester. I think if we could get folks to care about doing meaningful tests though, it would be a start into improving the hobby (and devaluing the snake-oil).
Pabelson and wattsboss, I agree with both of you as my first posting would suggest. I am getting on with my search for a better speaker than the twenty or so that I have tried thus far, and I cannot imagine how DBTesting would help me at all in this quest.

In science we are interested in testing hypotheses to move along human understanding. In engineering we are seeking to apply what is known, limited though it may be. Audio is an engineering problem and there is no one right way to come up with the best speaker. When validly applied experiments using blinds are useful for excluding alternative hypotheses. This is not a science, however.

Also, while I read reviews, it is usually of those whose opinions I have learned to value because my replications of their work has reached the same conclusions. I fully realize that their testing is sharply restricted by the limited time and setups they have. If my testing yields results I like, whether or not I am delusional, I buy and am happy. I suspect that others would share my conclusions, but it is not a big deal if they do not.
You guys are missing an important point, double-blind testing is used to determine if there is an audible difference between two components. Things like cables and amps usually will (for cables always will) not show a difference.

If there is a difference, you wouldn't use dbt to decide which to choose.

steve
Tbg: The average consumer cannot really do a blind comparison of speakers, because speaker sound is dependent on room position, and you can't put two speakers in one place at the same time. But I recommend you take a look at the article on Harman's listening tests that I linked to above. If you can't do your own DBTs, you can at least benefit from others'.

I think there's a danger in relying on reviewers because "I agreed with them in the past." First, many audiophiles read reviews that say, "Speaker A sounds bright." Then they listen to Speaker A, and they agree that it sounds bright. But were they influenced in their judgment by that review? We can't say for sure, but there's a good probability.

Second, supposed we do this in reverse. You listen to Speaker A, and decide it sounds bright. Then you read a review that describes it as bright. So you're in agreement, right? Not necessarily. A 1000-word review probably contains a lot of adjectives, none of which have very precise meanings. So, sure, you can find points of agreement in almost anything, but that doesn't mean your overall impressions are at all in accord with the reviewer's.

Finally, if you're interested in speakers, I highly recommend picking up the latest issue of The Sensible Sound, which includes a brilliant article by David Rich about the state of speaker technology and design. It's a lot more of a science than you think. The article is not available online, but if your newsstand doesn't have it (it's Issue #106) you can order it online at www.sensiblesound.com. Believe me, it is worth it.
Several people here seem to mistake the purpose of DBT. The purpose is not necessarily finding the "best" component, although that may be the case, for instance, in Harman's speaker testing. The point is often simply to see if there is any audible difference whatsoever between components. As Pabelson noted way, way back in this thread, if two systems differ with respect to *any* fancy audiophile qualities (presentation, color, soundstage, etc.) then they will be distinguishable. And if they are distinguishable, that will show up in DBT. Ergo, if two systems are NOT distinguishable with DBT, they do not differ with respect to any fancy audiophilic qualities. (That's modus tollens.)

So, if two amps cannot be distinguished unless you're looking at the faceplates, why buy the more expensive one? Now who finds fault with that reasoning?

It's not a matter of "I like one kind of sound, that other guy likes another kind of sound, so to each his own." If no one can distinguish two components, then our particular tastes in sound are irrelevant. There's just no difference to be had.
Pableson:

I think we haven't nearly exhausted all of the non-acoustic mechanisms in play, but the ones you mention are certain among them, and probably more relevant than the ones I mentioned. My general point was that the little bit of psychology I have studied makes me awfully wary of the "objectivity," or context-independence of my own perceptual judgments of quality.

It's good to hear you still take a lot of joy in the audio hobby. It remains unknown whether you can take *as much* joy as you would if you weren't such a skeptic!
Steve: I wouldn't be quite so dogmatic about the lack of differences, for one reason: Many audiophiles don't level-match when they do comparisons. So there really are differences to hear in that case. Of course, a difference you can erase with a simple tweak of the volume knob isn't one worth paying for, in my opinion.
One more question for Pabelson:

Since you've obviously read a lot more DBT stuff than I have, I'm interested to know: what's your system? (Or, what components do you think match up well against really really expensive ones?)
leme, I am not at all interested in DBTesting as I know from personal experience that there are substantial differences between both cables and amps. This is why I would have to say there is real concept invalidity to BDTesting. Furthermore, I really don't care what the results would be but suspect that a disproportional percentage of the time DBTests accept the null hypothesis.

Pabelson, I did not mean to say that I put much stake in what a reviewer may say even were I to have agreed with him in the past.

Bigjoe, certainly you can dismiss DBT if you find it invalid. Science has to be persuasive or orthodox. And as I keep saying this is not a hypothesis testing circumstance; it is a personal preference situation. Science is supposed to be value free with personal biases not influencing findings, but taste is free of such limitations or the need to defend them.
Qualia, you state, "So, if two amps cannot be distinguished unless you're looking at the faceplates, why buy the more expensive one? Now who finds fault with that reasoning?" My point is that DBTesting "no difference" is not no difference. It is not a valid methodology as it is at odds with what people hear even if they cannot see the faceplates. Furthermore, I can hear a difference and my tastes are all that matters. This is not scientific demonstration.
Tgb:

All of us hear are interested in one thing: the truth. If DBT is a fundamentally flawed methodology, its results are no guide to the truth about what sounds good. So if the studies are all flawed, and there are audible differences between amplifiers with virtually the same specs, even if, somehow, no one can detect those differences without looking at the amps, then I'm with you. Likewise, if there isn't anything fundamentally wrong with the studies, and they strongly indicate that certain components are audibly indistinguishable, then you should be with me.

Your own perceptions -- "I can hear a difference and my tastes are all that matters" -- should not trump science any more than your own experiences in general should trump science. I remember seeing ads with athletes saying "Smoking helps me catch my wind." I also recall people saying how smoking made them healthy and live long. Their personal experiences with smoking did not trump the scientific evidence, though. This is just superstition. The Pennsylvania Dutch used to think that if you didn't eat doughnuts on Fastnacht's Day, you'd have a poor crop. Someone had that experience, no doubt. But it was just an accident. Science is supposed to sort accident from true lawful generalization. It's supposed to eliminate bias, as far as possible, in our individual judgments and take us beyond the realm of the anecdote.

Now, if your perception of one component bettering another is blind, then ok. But if you're looking at the amp, then, given what we know about perception, your judgments aren't worth a whole lot.

So... are the studies all flawed? Well, certainly some of the studies are flawed. But, as Pableson said, the studies all point to the same conclusions. And there are lots of studies, all flawed in different ways. Accident? Probably not.

Compare climate science. Lots of models of global temperatures over the next hundred years and they differ by a wide margin from each other (10 degrees). They're all flawed models. But they all agree there's warming. To say that the models are flawed isn't enough to dismiss the science as a whole. Same in psychoacoustics.

Long story short: there's no substitute for wading through all of the studies. I haven't done this, but I've read several, and I didn't see how the minor flaws in methodology could account for no one's being able to distinguish cables, for instance.
[W]hat components do you think match up well against really really expensive ones?

That is a loaded question. I know a guy who wanted to find the cheapest CD player that sounded identical to the highly touted Rega Planet. He went to a bunch of discount stores, bought up a half dozen models, and conducted DBTs with a few buddies. Sure enough, most of the units he chose were indistinguishable from the then-$700 Planet. The cheapest? Nine dollars.

That is not a misprint.

Lest you think he and his friends were deaf and couldn't hear anything, they really did hear a difference between the Planet and a $10 model. At that level, quality is hit-or-miss. But I should think that any DVD player with an old-line Japanese nameplate could hold its own against whatever TAS is hyping this month. If they sound different, it's probably because the expensive one doesn't have flat frequency response (either because the designer intentionally tweaked it, or because he didn't know what he was doing).

Amps are a bit trickier, because you have to consider the load you want to drive. But the vast majority of speaker models out there today are fairly sensitive, and don't drop much below 4 ohms impedance. A bottom-of-the-line receiver from a Denon or an Onkyo could handle a stereo pair like that with ease. (Multichannel systems are a different story. But I once asked a well-known audio journalist what he would buy with $5000. He suggested a 5.1 Paradigm Reference system and a $300 Pioneer receiver. He was not joking.)

There are good reasons to spend more, of course. Myself, I use a Rotel integrated amp and CD player. I hate all the extra buttons on the A/V stuff, and my wife finds their complexity intimidating. Plus, I appreciate simple elegance. I also appreciate good engineering. If I could afford it, I'd get a Benchmark DAC and a couple of powerful monoblocks. But that money is set aside for a new pair of speakers.
One thing about being over 60 is that the style of thought in society has changed but not yours. When I was a low paid assistant professor and wanted ARC equipment for my audio system, I just had to tell myself that I could not afford it, not that it was just hype and fancy face plates or bells and whistles and that everyone knows there is no difference among amps, preamps, etc. DBT plays a role here. Since it finds people can hear no differences and has the label of "science," it confirms the no difference hopes of those unable to afford what they want. My generation's attitudes no result in criticizing other peoples buying decisions as "delusional."

I certainly have bought expensive equipment whose sound I hated (Krell) and sold immediately and others (Cello) that I really liked. I have also bought inexpensive equipment that despite the "good buy" conclusion in reviews proved nothing special in my opinion (Radio Shack personal cd player). There is a very low correlation between cost and performance, but there are few inexpensive components that stand out (47 Labs) as good buys. This is not to deny that there are marginal returns for the money you spend, but the logic of being conscious of getting your money's worth really leads only to the cheapest electronics probably from Radio Shack as each additional dollar spent above these costs gives you only limited improvement.

DBTesting, in my opinion, is not the meaning of science, it is a method that can be used in testing hypotheses. In drug testing, since the intrusion entails giving a drug,, the control group would notice that they are getting no intrusion and thus could not be benefited. Thus we have the phony pill, the placebo. The science is the controlled random assignment pretest/posttest control design and the hypothesis, based on earlier research and observations of data, that it is designed to answered with the testing.

If we set aside the question of whether audio testing should be dealt with scientifically, probably most people would say that not knowing who made the equipment you hear would exclude your prior expectations about how quality manufacturers equipment might sound. Simple A/B comparisons of two or even three amps with someone responsible for setting levels is not DBT. Listening sessions need to be long enough and with a broad range of music to allow a well based judgment. In my experience, this does remove the inevitable bias of those who own one of the pieces and want to confirm the wisdom of their purchase, but more importantly does result in one amp being fairly broadly confirmed as "best sounding." I would value participation in such comparisons, but I don't know whether I would value reading about such comparisons.

I cannot imagine a money making enterprise publishing such comparisons or a broad readership for them. I also cannot imagine manufacturers willingly participating in these. The model here is basically that of Consumers Reports, but with a much heavier taste component. Consumers Reports continues to survive and I subscribe, but it hardly is the basis of many buying decisions.

My bottom line is that DBT is not the definition of science; same/different comparisons are not the definition of DBT; any methodology that overwhelmingly results in the "no difference" finding despite most hearing a difference between amps clearly is a flawed methodology that is not going to convince people; and finally, that people do weigh information from tests and reviews into their buying decisions, but they also have their personal biases. No mumble-jumble about DBTesting is ever going to remove this bias.
To the doubters of DBT:

Women are fairly recent additions to professional orchestras. For years and years, professional musicians insisted they could hear the difference between male and female performers, and that males sounded better. Women were banished to the audience. The practice ended only after blind listening tests showed that no one could discern the sex of a performer.

Surely, these studies had as many flaws as blind cable comparisons. Probably more, since they involved live performances by individual people, which are inevitably idiosyncratic.

Would the DBT doubters here have been lobbying to keep women out of orchestras even after the tests? Or would they, unlike the professional musicians of the day, never have heard the difference in the first place?
Mankind, believing the bible, ignore massive bones that kept being discovered. Jefferson charged Lewis and Clark to find if such large creatures lived on the Missouri River. Yes, we are all victims of our underlying theories. Darwin explained evolution and we retheorized where such bones might have come from.

What does this have to do with DBTesting? Nothing.
Study proposal:

I don't know if any studies of the following kind have been done. But if not, then one should be done.

Materials: two sets of cheap cables -- cosmetically different, and a set of expensive cables that look just like the cheap ones.

First experiment(s): subjects are introduced to the two sets of cheap cables and told the one is a very expensive $15K cable, the other a $15 cable. Descriptions of each cable, in lavish audiophile prose, are printed on glossy tri-fold with nice pictures, and given to the subjects. The "expensive" cable is praised to the heavens and the "cheap" cable is described modestly.

Then the cables are used (not blind) alternately, to play back a variety of music. Subjects are then asked to rate their listening experiences, both quantitatively, and also qualitatively.

To eliminate the worry about cosmetic differences in the cheap cables making a difference, you could do the test twice, once with cable A being the "cheap" one, and once with cable B being the "cheap" one.

Second experiment(s): do the first experiment but with one expensive cable and one cheap cable that look the same. Do it first by telling the truth about the cables, but then, in the second case, by telling the subjects that the expensive cable is cheap and the cheap cable is expensive.

Here, nothing is bliind. Subjects are all looking at the equipment, and can even observe, from a little distance, the cables being hooked up. But if the DBT guys are right, and it's all hype, we should expect in the first experiment, that the introductions to the cables will lead subjects to favor whatever happens to be described as the more expensive cable, both quantitatively, and in their qualitative descriptions, even though the cables are basically identical cheap cables. In the second experiment, we should expect that when subjects are told the true values of the cables, their judgments favor the more expensive one, but also, that when lied to, they prefer the cheaper cable *just as much* as they preferred the expensive one.

If DBT proponents are wrong, you should expect that subjects will rate the cheap (identical) cables about the same, and that in the second experiment, they will vastly prefer the expensive cable when truthfully described, and when lied to, either still prefer the expensive cable (contrary to what they're being told) or prefer the cheap one, but only by a little.

The point is, we don't need to have people "blind" to do the tests.

And if the cables were manufactured especially for this purpose, you could do the testing through the mail, with in-home trials over a long period of time. Wonder what the results would be?
Pabelson, I completely agree with you with respect to DBT, but then I completely disagree with you that all CD players and amps that are competently designed sound alike.

This is simply not what I hear, and there are good reasons that amps and CD players sound different. Power supplies for one. Good power supplies cost money. Potentiometers in amps ... good ones cost money.

Inexpensive CD players do sound remarkably good these days, and the turntable days of source first are not quite so applicable, but to state that amplifiers are all alike makes me wonder which ones you have has the opportunity to listen to.

No I have not performed DBT on amplifiers, but I have had several occasions where an amplifier that I would have expected to sound excellent (usually on the basis of reviews) sounds markedly inferior to another amplifier that has received much worse reviews, and does so on a range of speakers.
Sean T.: If you believe in DBTs, then you have to believe in the results of DBTs. Some years ago (I can get details if you want), Tom Nousaine delivered a paper at an AES conference in which he sumarized the results of about two dozen published DBTs of amplifiers. Of those, only five reported statistically significant positive results. One involved a comparison of 10-watt and 400-watt amps, so clipping distortion was a likely cause. Two others involved a misbiased or oscillating tube amp. One author simply tossed out 25% of his results. And the fifth involved amps with reportedly large frequency response differences.

In other words, amps can sound different, but 1) they usually don't; and 2) when they do, there is a very good and easily measurable explanation. If you can distinguish two amps with flat frequency response and low distortion in a blind test, you will be the first. And most amps today have flat frequency response and low distortion, at least when they are not driven beyond their capabilities.
Pabelson, you say, "If you can distinguish two amps with flat frequency response and low distortion in a blind test, you will be the first." This means one of two things: there is no differences among amps or DBTesting does not allow humans to judge the differences. To accept the formerr means that quality parts, innovative power supplies, careful construction, and generally good design contributes little or nothing and that humans are hopelessly delusional.

As I have posted, I very much suspect the methodology is invalid. In the research that I do, I cannot imagine peers to accept a methodology that so often accepts the null hypothesis that nothing matters. Since my research so often suggests that states enacting seatbelt laws, .08 alcholal level as intoxicated, spending more per capita on education to compete with other states, or to allow concealed handguns all have no effect on the problem to which they are directed, I know the rath concerning my methodology, which unfortunately cannot include an experiment where have of the states randomly drawn have a law or action, and the other half do not. Here many want to accept that governmental actions matter. In audio many want to accept that amps don't matter. I think other methodologies should be used to assess both prior convictions.
"If you can distinguish two amps with flat frequency response and low distortion in a blind test, you will be the first." This means one of two things: there is no differences among amps or DBTesting does not allow humans to judge the differences.

No, TBG, it only means that the differences are not sufficient to be audible by human ears. Read the data, or supply your own. As of now, your only argument seems to be, "I don't believe it, so it can't be true."

As for DBT methodology, it is accepted by everyone in the field of perceptual psychology, in part because it gets plenty of positive results. It just doesn't always get them in the narrow category of high-end audio, because high-end audio has more than its share of snake oil.

Finally, there's a difference between a "delusion" and an "illusion." Look it up.
Btw,Tbg, like your former self, I am a modestly paid Assistant Prof. who would dearly love to subsist on cheap electronics. Right on the mark!

What do you think of my suggested experiment? Not blind, but with deception alternated with truth-telling about the values of the cables played?

And Pabelson: do you know if an experiment like this has been performed? I have grad student friends in psych who could do it pretty easily. But there's no point if it's already been done.
Qualia: It reminds me of a trick John Dunlavy used to play on visitors to his speaker factory. He would show them an expensive cable (maybe even his own!) and zipcord, and let them audition both. They'd rave about the pricey one, of course. What he wouldn't tell them is that he never changed the cable. They were listening to zipcord the whole time.

One possible weakness of your experiment is that it assumes we know what it is that's tricking us--the price, the looks, etc. But it could be anything (the brand name, perhaps). Also, the value of a perception experiment is somewhat compromised when you intentionally mislead the subject.

There's a much easier way to get over the blindness objection, or at least most of it. In a standard ABX test, you can actually see both cables, and you know which one is A and which one is B. The only thing that's "blind" is the identity of X. Why someone with good ears can't ace this, if the differences are so obvious, is beyond me.

Let me rephrase that: People with good ears CAN ace it--when there's a difference large enough to be heard.
Sorry, Pabelson, I don't think an appeal to the acceptance of a method used in perceptual psychology demonstrates no differences. When there is controversy over a finding, which demonstrably there is, something other than same/different DBTesting would be needed unless those of you persisting in advocating DBT wish to continue to be ignored. I am afraid your argument that DBT proves humans cannot hear the minor differences runs counter to most people's experiences. As I said before, buying decisions don't hinge on scientific proof, and it is an interesting question why some seem so committed to the belief that audio is all snake oil. Perhaps a psychologist should look into that phenomenon.
I am afraid your argument that DBT proves humans cannot hear the minor differences runs counter to most people's experiences.

Granted, but why should we assume that the scientists are all wrong and people's observations are right? Surely you know that our perceptions can fool us. Think of optical illusions. Well, there are also such things as aural illusions. One of the most basic is this: When you hear two sounds, you often think they are different, even when they are exactly the same.

So when you say, I hear a difference between this cable and that one, is there a real difference, or is it just an aural illusion? We don't know. That's why scientists developed the forced-choice DBT--because it usefully separates reality from illusion. The only controversy here comes from people who don't want to look at the evidence.
DBT assumes that we have to justify our purchases to others as in science; we do not have to do so
Actually that's an interesting take.
Yet, there's a lot of emotion and correspondingly little logic in the vehement assertions contained in many posts.
Amazing, isn't it. We WANT others to bless our choices after all -- AND, if it's an EE (i.e. scientist) so much the better: science is irrefutable:)
Gregm, I think you are absolutely right that to many of us want others to bless our choices, be it for wine, women, or audio. I stopped going to audio society meetings in New York as too much of the conversations were "mine is bigger than yours" conversations. Having discussion groups on the internet is no different.

My objection to those advocating DBTesting is that they want to use a questionable methodology to say in effect "mine is every bit as good as yours and I paid less." Science does not condone their saying this and I don't really care whether it does or not.

Pabelson, you say, "DBT--because it usefully separates reality from illusion." My only real question is whether the "reality" is a false reality. One that we don't hear when listening. This is why I suggest it is invalid and does not merit acceptance of the findings.
Where is your evidence? Perhaps by your definition which is not widely shared.
My evidence for what? That illusion is a false reality? You need evidence for that? Look in a dictionary.
I sent the following letter to Sterophile:

Those of us who have been audiophiles for a long time(20+ years) have chronicled the progress of audio components. We have gone down the wrong road too many times to count. Either led there by others or led by our own ignorance and prejudice. We need to remember that while for the consumer this is a hobby for the producers it is a business. Producers must make a profit or die.The road to hi-fi perfection is littered with excellent products whose producers did not pay attention to normal business practices.
Audio producers have to fight for market share like anyone else. The best way to get market share is through aggressive advertising. Build a better mouse trap and they will come to you. Design a slick ad campaign and they will also come to you. There is a problem however. The audio reviewer. If done correctly, it seeks to pick the best mouse trap and debunk the advertising myths. Like manufactures the magazine is also a business and it must make a profit or die. Even worse it's profits come primarily form the very producers it seek to evaluate. A canceled subscription hardly competes with a canceled ad. This puts an ethical strain own the most principled reviewer.
Audiophiles aren't stupid. This a hobby. We are not just interested in good music. We like our components to come in beautiful packages, exclusivity, etc. Just because we purchased something unnecessary does not mean we were tricked. I am sure the piano black finish on my turntable has nothing to do with the sound. Does that mean I was tricked?
From a consumer standpoint if a manufacturer claims that his product sounds better or different it is the reviewers job to evaluate the manufactures claim.
Most reviewers want the manufacturers claim to be true. The reasons are obvious. They recommend a better product to their readers, the state of the art is advanced and the manufacturer can buy ads. Negative reviews save their readers money and nudge producers in the right direction and establish their credibility.
Ironically everyone can't be right. Being right or wrong has serious financial consequences for all involved. Reviewers have been wrong. Manufacturers have been wrong and sadly some have tried to rig the process. More often than not mistakes are based on ignorance and prejudice. Ignorance must be cured by the ignorant, and corruption should be prosecuted. Maybe we can do something about prejudice?

If we did not know what product is being tested we could at least eliminate our personal prejudice. Nothing wrong with double blind testing(dbt) per se. The proponents of dbt bring their own prejudices to the table. They want to engage in very short tests conducted by the uninitiated. Most proponents of dbt use it to try and prove what they already have concluded.eg cables and amps all sound the same. And that expensive products are just a rip off. How about a dbt between vinyl and digital. Or electrostatic and dynamic speakers- tubes and solid state.
The opponents of dbt are also somewhat disingenuous . I do not need dbt because I am not prejudiced. It is the nature of prejudice to not be aware of it.. The person who is prejudiced just thinks he is right.
The design of components is a mixture of art and science. The reviewers job is almost all art Some things just don't lend it self to scientific testing. Could you have a dbt of who is the most beautiful women or what piece of music is the most soothing.
Alas dbt does not even approach the real question. What difference does it make to me whether a &b sound different or the same. My point is which one more closley approximates the illusion of real music for me. Hasn't that always been the goal for audiophiles.
I have always stated that dbt is more a test of sonic memory, the better your memory the better the better you will test.
If you read this carefully you are going to be surprised by my conclusion. Everyone who is involved in audio design or review should from time to time engage in some sort of dbt testing! Your goal should be to determine which product sounds more like music. You may discover biases you did not know you have. Take your time. Make your dbt as much like your regular evaluation process. I think you will benefit from it.
Reginald G. Addison
rgregadd@aol.com
Forestville, Maryland