Optimise data cache clean/invalidate operation
The data cache clean and invalidate operations dcsw_op_all()
and dcsw_op_loius() were implemented to invoke a DSB and ISB
barrier for every set/way operation. This adds a substantial
performance penalty to an already expensive operation.
These functions have been reworked to provide an optimised
implementation derived from the code in section D3.4 of the
ARMv8 ARM. The helper macro setup_dcsw_op_args has been moved
and reworked alongside the implementation.
Fixes ARM-software/tf-issues#146
Change-Id: Icd5df57816a83f0a842fce935320a369f7465c7f