Instruction Scheduling for a Tiled Dataflow Architecture

  • Martha Mercaldi ,
  • Steven Swanson ,
  • Andrew Petersen ,
  • ,
  • Andrew Schwerin ,
  • Mark Oskin ,
  • Susan J. Eggers

Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems |

Published by ACM

Publication

This paper explores hierarchical instruction scheduling for a tiled processor. Our results show that at the top level of the hierarchy, a simple profile-driven algorithm effectively minimizes operand latency. After this schedule has been partitioned into large sections, the bottom-level algorithm must more carefully analyze program structure when producing the final schedule.Our analysis reveals that at this bottom level, good scheduling depends upon carefully balancing instruction contention for processing elements and operand latency between producer and consumer instructions. We develop a parameterizable instruction scheduler that more effectively optimizes this trade-off. We use this scheduler to determine the contention-latency sweet spot that generates the best instruction schedule for each application. To avoid this application-specific tuning, we also determine the parameters that produce the best performance across all applications. The result is a contention-latency setting that generates instruction schedules for all applications in our workload that come within 17% of the best schedule for each.