This project seeks to improve the compositional power of the Transformer model by leveraging critical theoretical properties of Tensor Product Representations (TPR’s). Several new diagnostic datasets for systematicity and compositionality are created, and a generalized Transformer architecture is developed that can implement popular variants of the Transformer along with various way of leverage TPRs.