- use the generic 'crt0.s' for Linux
- move the read-only '__dso_handle' definition into the '.text' section
- move the '__initial_sp' definition into the '.bss' section
- remove the '_main_utcb' definition
Part of #766.
Use multiple load store instructions for 32 byte chunks in ARM-specific
blit-function, analog to x86 variant. Make the blit-function of x86 a
generic one, and provide needed utility functions for ARM and generic code.
Please refer issue #147 for discussion.