Depending on the gcc version we can use builtin_bswap instead of
architecture functions. Doing so is better than the inline assembly
version of load reverse for two reasons:
- the sequence of load reversed, apply constant mask, save reversed can
be optimized to load, apply reversed mask, save
- builtins are slightly better to optimize e.g. gcc instruction
scheduler cannot optimize grouping on inline assemblies.
To enable set we have to ARCH_USE_BUILTIN_BSWAP.
bloat-o-meter results:
add/remove: 1/1 grow/shrink: 75/533 up/down: 1711/-9394 (-7683)
Suggested-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
select ARCH_SAVE_PAGE_KEYS if HIBERNATION
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING
+ select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_WANTS_PROT_NUMA_PROT_NONE
select ARCH_WANT_IPC_PARSE_VERSION