* Remove the data-synchronization barrier from the inner-loop * Instead add a system-wide barrier at the end of the operation Fix #4269