Optimization of Data Access from Imperative Programs using Static Analysis

PhD Thesis: IIT Bombay |

Database applications are typically written using a mixture of imperative languages and embedded queries (expressed using SQL/other frameworks) for data access. Traditionally, the imperative and declarative parts of these applications have been optimized separately. Techniques for optimization that span across the declarative and imperative parts in database applications are called holistic optimization techniques.

In this thesis, we present novel holistic optimization techniques for optimizing data access in applications that access data using database abstractions. In such applications, data processing logic often gets distributed across the declarative and imperative parts of a program. Consequently,  data access from the application is often inefficient due to iterative queries, transfer of unused data, over-specification of queries, and other reasons. We propose techniques based on static program analysis and program transformations to automatically rewrite database applications for efficient data access. When more than one transformations are applicable on a particular program, our techniques can systematically generate all equivalent alternatives using the given set of program transformations, represent these alternatives efficiently, and choose the best rewrite based on a cost model. We also improve upon existing techniques for optimizing the evaluation of user defined functions in databases, which, similar to database applications, contain a mix of imperative code and declarative SQL queries. We demonstrate that the static analysis techniques we develop can be applied to other optimizations as well as test data generation for queries in database applications. Our experiments on real world and benchmark applications show that our techniques have wide applicability and provide significant performance benefits.