Architectural Study of Reconfigurable Photonic Networks-on-Chip for Multi-Core Processors

C. Debaes,* I. Artundo,* W. Heirman,† M. Loperena,* J. Van Campenhout,† H. Thienpont*

*Dept. of Applied Physics and Photonics
Vrije Universiteit Brussel, Belgium
christof.debaes@vub.ac.be
†Dept. of Electronics and Information Systems
Ghent University, Belgium
wim.heirman@ugent.be

Abstract—Photonic Networks-on-Chip (NoCs) have become a promising route to interconnect processor cores on chip multiprocessors (CMP) in a power efficient way. Although several photonic NoC proposals exist, their use is limited to the communication of large data messages due to a relatively long set-up time for the photonic channels. In this work, we evaluate a reconfigurable photonic NoC in which the topology is adapted automatically to the evolving traffic situation. This way, long photonic channel set-up times can be tolerated which makes our approach more compatible in the context of shared-memory CMPs.

Index Terms—Multiprocessor interconnection, Optical communication, Optical interconnections, Reconfigurable architectures, Photonic switching systems, Parallel architectures

I. INTRODUCTION

With the emergence of highly multicore Chip Multiprocessors (CMPs), the importance of high-speed power-efficient on-chip interconnection networks has become vital. In this domain, the NoC paradigm plays an essential role. However, due to the unrelenting increase in required throughput and number of cores, the links of those networks are starting to stretch the capabilities of electrical wires.

Recent advances in silicon photonics technology seem to indicate that optical interconnects might bring the required performance boost [1]. However, merely replacing the electrical links with their photonic counterparts will not bring the promised power savings, due to the many optoelectronic conversions this would imply. Currently, nothing indicates that viable solutions will be developed for optical logic gates or delay lines, such that routing in the optical domain will remain impossible for many years to come.

Nevertheless, novel devices such as microring resonators [2] and other wavelength-dependent structures allow for the optical signal penalty to be altered in a circuit-switched manner. In many NoC proposals [3], the optical channel is set up by sending out a control message over a lower speed electrical network to reconfigure all photonic switches. While the actual switching of the optical components can nowadays be done in a mere 30 ps [4], the latency in setting up this optical channel will be at least one round-trip time of a control message on the lower speed electrical NoC. This means that such a photonic NoC [5] will only be beneficial when communicating large chunks of data (KiB’s) between processor cores. As the majority of traffic in shared-memory processors consists of short memory and coherence messages (with a size of about one cache line, usually only 64 bytes), the promised benefits of low-power photonic NoCs will fail to materialize under an unchanged shared-memory model of current CMPs.

II. RECONFIGURABLE PHOTONIC NOC

It is known that memory references exhibit locality in space and time. As such, the numerous packets flowing through the NoC will seemingly organize in intensive traffic bursts between communicating pairs. Detailed simulations have shown that those burst patterns exist in a wide range of time scales, and can be up to several milliseconds in length [6].

From this observation, the idea originated of a photonic NoC where the optical paths serve as ‘shortcuts’ to boost the performance of an underlying base network [7]. Those direct, reconfigurable connections will improve the performance of the accompanying electrical NoC in two ways. First, they decrease the congestion by providing temporary high-throughput data channels where needed, and second, they provide low-latency direct links between the most intensively communicating partners.

These proposed photonic links could rely on the same technology as the photonic NoC proposed by Petracca et al. [3] which is based on an array of non-blocking 4×4 microring switches. By changing the state of the switches, the topology of the interconnect can be altered. However, in contrast to [3], our approach does not set up a dedicated channel for each packet, but ‘slowly’ reconfigures the topology in accordance to emerging hot-spots.

For the allocation of the photonic shortcuts, a heuristic is used that tries to provide a direct link for most of the network traffic that is to be expected during the span of a ‘reconfiguration interval’ (\(T_{\text{reconf}}\)). After each interval, a new optimum topology is computed using the traffic pattern measured in the previous interval. The length of \(T_{\text{reconf}}\) must be chosen as short as possible to be able to follow the dynamics of the evolving traffic patterns but long enough to amortize the cost of calculating the optimized topologies and of link downtime during reconfiguration. In our case, a \(T_{\text{reconf}}\) of 1 µs turned out to be a good compromise.

III. FULL SYSTEM SIMULATIONS

To validate our proposed architecture, we performed full-system simulations of a multicore processor running actual
benchmarks applications (i.e. SPLASH-2). The traffic sent over the simulated interconnection network is therefore highly realistic. Our simulation platform is based on the commercially available Simics simulator [8]. It was configured to simulate a chip multiprocessor somewhat similar to an UltraSPARC T2 (or Niagara2) processor. To emulate the Chip Multi-threading (CMT) capabilities, we modeled each core by a group of four UltraSPARC III processors (all are assumed to be clocked at 2.5 GHz). The simulated system consisted of 16 such cores, interconnected by a 4 x 4 torus network augmented by a reconfigurable photonic NoC as described above. The assumed throughputs of the electrical and photonic links are 10 Gbps and 40 Gbps, respectively. Cache coherence in the system is maintained by directory-based coherence controllers at each core. Both the coherence controllers and the network are custom extensions to the Simics environment [6].

For evaluation, we have compared the proposed solution with standard NoCs (i.e. a 10 Gbps electrical NoC, a 40 Gbps electrical NoC and a 40 Gbps photonic NoC). In Table I, the simulated averaged memory access latency ($T_{mem}$) can be found for all four architectures. This number, which is one of the best performance metrics for an interprocessor network [6], is reduced by about 35% in comparison with a standard 10 Gbps NoC. We can furthermore note that the average hop distance has significantly reduced and that about half of the total traffic is routed through the photonic shortcuts.

In Fig. 1, we show the result of power consumption estimates for the interprocessor communication. We used the same parameters as cited in [9], i.e. 0.83 pJ/bit for routing through a hop, 0.57 pJ/bit for the electrical links and we assumed 0.5 pJ/bit for the photonic channels. Additionally, static power was accounted for in each link: 500 µW for both 10 Gbps electrical and 40 Gbps photonic links and 2 mW for the 40 Gbps electrical links. As only moderate switching speeds are necessary for the microring resonators, an ‘ON’ power of just 500 µW per ring was included in the model.

### IV. Conclusions

From the memory access latency (Table I) and the power estimate (Fig 1), we can clearly see that the proposed reconfigurable NoC will have only a modest increase (by 20%) of

---

1 Each core of cores of the UltraSPARC T2 processor executes eight threads simultaneously, switching between threads on every clock cycle.

---

**REFERENCES**


