In function bset_search_tree(), when p >= t->size, t->tree[0] will be
prefetched by the following code piece,
974 unsigned int p = n << 4;
975
976 p &= ((int) (p - t->size)) >> 31;
977
978 prefetch(&t->tree[p]);
The purpose of the above code is to avoid a branch instruction, but
when p >= t->size, prefetch(&t->tree[0]) has no positive performance
contribution at all. This patch avoids the unncessary prefetch by only
calling prefetch() when p < t->size.
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
unsigned int inorder, j, n = 1;
do {
- /*
- * A bit trick here.
- * If p < t->size, (int)(p - t->size) is a minus value and
- * the most significant bit is set, right shifting 31 bits
- * gets 1. If p >= t->size, the most significant bit is
- * not set, right shifting 31 bits gets 0.
- * So the following 2 lines equals to
- * if (p >= t->size)
- * p = 0;
- * but a branch instruction is avoided.
- */
unsigned int p = n << 4;
- p &= ((int) (p - t->size)) >> 31;
-
- prefetch(&t->tree[p]);
+ if (p < t->size)
+ prefetch(&t->tree[p]);
j = n;
f = &t->tree[j];