I agree with Elizabeth and would take it one step further. Leave A in for a week or a month, then replace with B. Limit music during that time to that which you are intimately familiar with and best represents the range of your musical preferences with some emphasis of material that is demanding of the system. Otherwise A/B is useful on in knee-jerk evaluations of spot differences. How those differences actually play out in the long run enjoyment of your system are not fully understood in a quick A/B comparison, IMO, though some really blatant ones might be easy to eliminate some components from consideration entirely.
As far as matching levels, the original question, you can also get level meter apps for an iPhone and probably other similar devices that would be useful for such a simple task. I think the iPhone app that I have is by Studio Six Digital. One other note, it would be useful to have a recording of white noise (or pink noise) to actually use in association with the meter in order to match the level. It will be easier to use that then a recording of music that is probably changing rapidly. You can probably download something by doing a google search.
As far as matching levels, the original question, you can also get level meter apps for an iPhone and probably other similar devices that would be useful for such a simple task. I think the iPhone app that I have is by Studio Six Digital. One other note, it would be useful to have a recording of white noise (or pink noise) to actually use in association with the meter in order to match the level. It will be easier to use that then a recording of music that is probably changing rapidly. You can probably download something by doing a google search.