ABSTRACT: This paper presents a new modulo scheduling algorithm that carefully considers register pressure. Modulo scheduling is a commonly used technique for software pipelining multiple iterations of a loop. Our work targets improving instruction schedules on clustered architectures; clustering divides processor resources into groups. This paper describes an improved Data Dependence Graph (DDG) partitioning algorithm and presents a set of effective algorithms for scheduling spill code associated with excess register pressure. We expand the traditional definition of register pressure to include the impact scheduling can have on intercluster register bus traffic. We describe how we estimate pressure during graph refinement and present a greedy spill code scheduling algorithm. We evaluate their benefits when running SPECfp applications on a clustered VLIW architecture. Finally, we evaluate the benefits of code duplication to relieve intercluster register bus pressure.