blk-mq: use sparser tag layout for lower queue depth
For best performance, spreading tags over multiple cachelines
makes the tagging more efficient on multicore systems. But since
we have 8 * sizeof(unsigned long) tags per cacheline, we don't
always get a nice spread.
Attempt to spread the tags over at least 4 cachelines, using fewer
number of bits per unsigned long if we have to. This improves
tagging performance in setups with 32-128 tags. For higher depths,
the spread is the same as before (BITS_PER_LONG tags per cacheline).
Signed-off-by: Jens Axboe <axboe@fb.com>