Year 2 Report – Public

2.1 Introduction

The goal of the FlexTiles FP7 project is to develop a dynamically reconfigurable heterogeneous many-core platform and the tools to program it.

The first year of the project was dedicated to the definition of what the whole platform has to be:

  • Definition of the technology to use
  • Hardware and software interfaces and interface between the two.
  • Definition of the background of each partner
  • Definition of the way how to build a consistent platform that covers the application needs.

To do so, the whole consortium worked in common to share the same view, defining a solution from programming down to hardware, which turned out to be harder than expected. We had to work longer on managing dynamicity in the platform.

During the second year we have solved the issue on managing dynamicity and updated deliverables according to the technical solutions we then defined.

Managing dynamicity starts by defining the dynamicity we need in the applications and then proceeding to define how to capture and describe the dynamicity in the tools.

Hardware must also allow the executable code to be moved from one computing unit to another at run-time. We have had a lot of work finding a solution that allows moving a task of a dataflow from a computing unit to another. The solution which was adopted requires stopping the data flow, loading the new computing units with the tasks to be executed, changing the routes of the messages going through the embedded Network on Chip and starting anew the flow of data.

This second year was mostly dedicated to the refinement of specifications and development. From the work achieved during the first year and the preliminary specifications jointly defined by the consortium, we refined these specifications in smaller working groups that were dedicated to the hardware, the embedded software and the tool chain.

From the hardware point of view, we went deeper in the description of the eFPGA as well as the many-core architecture. We defined the accelerator interface that allows us to interface any kind of accelerator to the many-core platform. This accelerator interface allows us, for example, to plug streaming VHDL accelerators or processors like DSPs to the many-core and to manage these from the General Purpose Processors (GPPs) implemented in the many-core.

From the embedded software point of view, we went deeper in the definition of the architecture of the different layers. These were validated on a virtual platform based on the Open Virtual Platform (OVP) that simulates functionally and temporally the many-core. We also defined how to drive the accelerators from the GPPs of the many-core.

From the tool chain point of view, we started to define how we would like to capture the application and what would be the output of the tool chain.

We also prepared the integration plan that will take place during the third year, taking care of interfaces and the intermediate deliveries between partners to proceed with the integration as smoothly as possible.

2.2 Key innovations

During this year, the major challenges were on the implementation of the parts of the platform.

On the hardware, we’ve specified the Network on Chip (NoC), the Accelerator Interface as well as the Network Interfaces that are used to interface the NoC with the components plugged on it, e.g. the General Purpose Processors (GPP), the Accelerator Interface and the Memory banks.

A virtual implementation of a homogeneous many-core was developed based on the Open Virtual Platform. Next stage is to develop a heterogeneous OVP. The General Purpose Processors of this platform are MicroBlaze processors from Xilinx and the NoC is the dAElite from TUe. This virtual implementation allowed us to develop the embedded software, i.e. the kernel and the virtualization layer that enable applications to run on the platform, delivering run-time binding services to allocate resources to tasks, monitoring services to probe the architecture and actuators to provide actions according to what has been monitored. The MicroBlaze was chosen to ease migration between the OVP and a hardware emulator, the integrated FlexTiles Development Platform (FDP) demonstrator that was developed by Sundance during Year 2. The FDP embeds two Xilinx Virtex-6 SX475T FPGAs with multiple Parallel and Serial Interfaces between them.

 ”"

FlexTiles Development Platform

The FDP was delivered to the partners of the project and a first implementation of the hardware architecture platform was delivered by TUe. Since this implementation is the same as the one developed on OVP, the software could be ported very smoothly. We are now able to run the same application on the two platforms, either on a hardware emulator or on the simulator. This is a major contribution and break-through.

A tool was also developed by TUe and KIT in order to have an inside view of what is happening in the hardware, both on the virtual platform as well as on the FDP. This tool can be used to visualize how the tasks can be moved from one GPP to another, dynamically, according to requirements.

To complete the hardware platform, two accelerators, i.e. a DSP from CSEM and the eFPGA from UR1, were developed at register-transfer level (RTL) in order to be plugged to the Accelerator Interface.

On top of the hardware and embedded software, a programming model was proposed by TRT and ACE to capture the dynamicity of the applications.

2.3 Technical approach

In order to validate the solutions proposed and developed in FlexTiles, we propose to emulate the 3D stacked chip with the FlexTiles Development Platform (FDP), embedding two Xilinx Virtex-6 SX475 FPGAs, physically linked one to another by communication means. On this physical emulator, we propose to implement a homogeneous many-core on one of the two FPGAs and accelerators on the other one. The Network on Chip is intended to be extended across the two FPGAs through Aurora serial links, to simulate a single 3D FlexTiles SoC chip implementation.

Since the Open Virtual Platform allows us to easily simulate a homogeneous many-core based on GPPs, we had a simulation model of the FDP’s first FPGA that allowed us to develop the kernel, RTOS and the virtualization layer. When the RTL model of the first FPGA was ready, we could directly use the software developed on the simulation platform to the design running on the first FPGA. On the simulation model, we simulated sensors in order to test the monitoring and actuator strategies.

We decided not to implement the eFPGA on the FPGA as it would require more FPGA resources than are available, and it would not bring more information than what we will get from the simulation models.

The hardware accelerators that would have been implemented on the eFPGA will then be implemented on the FDP’s second FPGA to emulate those accelerators’ implementation without emulating the eFPGA underlying structure.

Independently from work done on the FDP, and in order to be able to dynamically charge the bit-stream with small delays, a new format of bit-stream has been designed by UR1. In order to take as little memory space as possible to store the bit-streams, we are currently studying compression techniques at RUB.

To add dynamicity in the platform thus allowing us to deal with dynamic applications based on data flow, we proposed to be able to dynamically replace atomic parts of the data flow by other atomic parts, depending on a decision coming from the application itself. One part would run at a time. One of the difficult issues we had to solve was to define, at reconfiguration time, how to stop the data flow to the current tasks and reroute that flow to the new tasks. Data flows between computing units (or tasks) through FIFOs. We decided to isolate the points where the flow can be rerouted by defining some of these FIFOs as isolation FIFOs and implementing them in shared memory.

2.4 Demonstration and Use

Several implementations were made this year:

  • A high level simulator based on Open Virtual Platform (OVP) integrating an existing simulator of the MicroBlaze, against which we linked a library of NoC simulator.
  • The FlexTiles Development Platform, developed inside the consortium and embedding two Xilinx Virtex-6 SX475 FPGAs. This FPGA is the biggest matrix available for Virtex-6, on which we have space to implement a heterogeneous many-core big enough to experiment re-mapping techniques.
  • A first increment of the RTL model of the FlexTiles hardware architecture that is emulated on the FlexTiles Development Platform. The part that has been developed here is a homogeneous many-core based on the MicroBlaze GPP. This first model is the base on which we are going to plug the accelerator interface to turn this homogeneous architecture into a heterogeneous one.
  • The software layers (kernel, RTOS, virtualisation layer) running on top of the FlexTiles hardware architecture. Simple use cases have been developed on top of this software layer to test and validate the principles and ideas proposed in FlexTiles for self-adaptation of the mapping of the application at run-time.
  • An RTL model of the eFPGA has been produced. This allows us to test the virtual bit-stream.
  • The RTL model of the DSP has been adapted to be able to plug it to the Accelerator Interface.

2.5 Work already performed and main results

The global approach remains the same as presented last year. The architecture proposed by the consortium is a 3D stacked chip with two layers.

It has been decided, after considering what was possible to implement through TSVs (Through Silicon Vias) between the two silicon layers, to implement (1) a heterogeneous many-core layer – with General Purpose Processors (GPPs) and Digital Signal Processors (DSPs) – on top of which we implement (2) an embedded FPGA (eFPGA).

This construction allows us to benefit from a bigger eFPGA matrix than if we would have to share the top die with DSPs. With such an implementation, the eFPGA matrix can be more regular and allows more flexibility when it is necessary to move IPs from one region of the eFPGA to another region.  This two-layer chip is the result of implementation trade-offs we had to do. Functionally, the platform is still a heterogeneous many-core built from a homogenous many-core made of GPPs that are linked to several kinds of accelerators.

We then have two kinds of accelerators: DSPs and VHDL IPs. Both are slaves of the GPPs. The view we have is that a GPP controls the general application scheduling while the accelerators are performing heavy computing tasks. Both are connected to the architecture through an Accelerator Interface. This interface allows the GPP to control the accelerator as a slave, sending control information.

To run a task on the DSP, we have to provide an executable code to the DSP which is linked against a library containing some management parts of the virtualization layer to answer the messages and behave as expected by the GPP.

When implementing an IP on the eFPGA, one has to connect it to the Accelerator Interface so that it can be controlled by any GPP of the architecture.

We decided to base the communication between the elements of the architecture on a Network on Chip (NoC). Since two partners of the project have a NoC available (TUe and CEA), we compared them and challenged them with the requirements. Both NoCs can fulfil the project requirements. For the first integration we have used dAElite, since it was already integrated with the homogeneous multicore architecture. We decided to keep the two NoCs and to evaluate them at the end of the project, as the integration won’t cost a lot of effort due to the fact they rely on the same interfaces.

The NoC being like the spinal cord of the architecture, the Accelerator Interface uses it to exchange data with others and receiving control information from the GPPs.

A subset of the platform, i.e. the homogeneous GPP many-core, is simulated on a high level simulator that allowed us to implement the embedded software, drivers and management libraries of the platform, the kernel and the virtualization layer. These could be tested and validated on a host workstation prior to being executed on a hardware emulator.

A first implementation of the homogeneous GPP many-core has been made on RTL logic. This implementation being the same as the simulated one, we could reuse the implemented software directly on the RTL code. It allows us to check the correctness of both the simulation model and the RTL model.

From the implemented virtualization layer, we were able to test the re-mapping process by simulating specific values on the monitored sensors.

Some basic applications are currently running on the platform. A part of the delivered STAP radar application has already been implemented but more work has to be done to make it run on the architecture, mostly because of the size of some buffers used in the software implementation which are too large for hardware implementation.

The tool chain is currently under development.  Individual tools have been tested, and integration into a complete tool chain is planned.

In addition to this technical work, we worked on the MIM Innovation plan, defining how to assess our innovations and trying it with 3 examples:

  1. The tool chain for parallel application development.
  2. CoSy compiler development system.
  3. Kernel & self-adaptation.

Each partner identified innovations and we looked into the interest in patenting or disseminating these innovations.

We have issued a survey to ask third parties for their interest in such a platform. So far we received around 120 replies and the next task is to identify the feedback in relation to the projects visibility and technical importance to the respective persons who took part in the survey.

Fewer papers were produced this year, as compared to last year, as we are currently under heavy development and are awaiting results before to produce papers, but lectures were given in university (RUB) about FlexTiles.

We have proposed FlexTiles workshops in September 2014 at the 24th International Conference on Field Programmable Logic and Applications (FPL-2014) and at the Adaptive Hardware and Systems (AHS-2014) in July 2014.  Our website (www.flextiles.eu) is continuously updated.

2.6 Scientific, economical and societal impacts

The proposed many-core with its self-adaptive capabilities is a technological breakthrough in complete match with new applications needing dynamic adaptation and mode swapping at runtime.

Smart systems and Cyber Physical Systems need a high level of computing power but also embedded intelligence to react to the environment. We speak about autonomy, survivability.

These results are the first steps to define the future systems which will be able to manage dynamic adaptation nested with static data flow (intensive computing). There is a need for a programming model to describe a mix of different types of Models of Computation (MoC) while keeping the possibility to optimize and parallelize the application.

The impact of the project is mainly in many-core systems to define an evolution from the current homogeneous many-cores, to propose programming solutions to help master these powerful heterogeneous many-cores which are also more complex to program than the homogeneous ones.

The project brings new scientific ideas like:

  • The dynamic data flow.
  • The solutions to embed dedicated accelerators inside a homogenous many-core through our accelerator interface and to program them allowing industrials to reduce their time-to-market.
  • New technology of FPGA used for the eFPGA with virtual bit-stream.
  • The virtualization layer together with the Kernel and the Resource monitoring that give solutions for self‐adaptive capabilities at run-time.
  • The streaming compiler tool which will be used to produce patents, papers and/or lectures.

 

Consortium & Project Leader:

Dr. Philippe MILLET

THALES R&T,

Campus Polytechnique

1 avenue Augustin Fresnel

91767 Palaiseau Cedex,

France

Contact: contact@flextiles.eu


 ”FlexTiles

FlexTiles Team

Comments are closed.