An efficient and experimentally tuned software-based hardening strategy for matrix multiplication on GPUs