De(con) Struction of the Lazy-F Loop: Improving Performance of Smith Waterman Alignment
Striped variation of the Smith-Waterman algorithm is known as extremely efficient and easily adaptable for the SIMD architectures. However, the potential for improvement has not been exhausted yet. The popular Lazy-F loop heuristic requires additional memory access operations, and the worst-case performance of the loop could be as bad as the nonvectorized version. We demonstrate the progression of the lazy-F loop transformations that improve the loop performance, and ultimately eliminate the loop completely. Our algorithm achieves the best asymptotic performance of all scan-based SW algorithms O(n/p+log(p)), and is very efficient in practice.