Final Report – Public

EU-FP7 FlexTiles Project -
Final Public Report

“Self-adaptive heterogeneous
many-core based on Flexible Tiles”

 

2.1 Introduction

The goal of the FlexTiles FP7 project is to develop a dynamically reconfigurable heterogeneous many-core virtual concept, the tools to program such an innovative embedded control
system, as well as an OVP-based simulator and a hardware emulator (the FlexTiles Development Board) of FlexTiles to demonstrate the platform. The FlexTiles platform (FDP) stands for the FDB, the OVP simulator, the toolchain, the operating system and the software libraries.

 

 Figure 1 – Early Prototype of the FDB

 

The project has started with the definition of what the whole platform has to be:

  • Definition of the technology to use
  • Hardware/Hardware interfaces
  • Software/Software interfaces
  • Hardware/Software interfaces
  • Definition of the background of each partner
  • Definition of how to build a consistent platform that covers the application needs.

 

To do so, the whole consortium worked in common to share the same view: a dynamically reconfigurable heterogeneous many-core made up of GPPs and two types of accelerators, DSPs and VHDL IP in an embedded FPGA (eFPGA). Defining a solution from programming down to hardware turned out to be harder than expected. We had to work longer on managing dynamicity in the platform.

During the second year we have solved the issue on managing dynamicity and updated deliverables according to the technical solutions we then defined. We specified that the description of the dynamicity is captured in the toolchain. This description is a collection of static dataflow graphs (clusters). The whole application links these clusters with specific isolation buffers allowing the application to replace one cluster by another at runtime, dynamically changing one of its parts. The toolchain generates a binary code that embeds all the possible mappings for each cluster. The runtime selects the best mapping at a given time and configures the hardware according to the needs. The hardware is designed so that it can be reconfigured at runtime.

While the second year was mostly dedicated to the refinement of specifications, the beginning of the development and the delivery of the FlexTiles development board (FDB), the third year focused on integration and validation of the platform.

For the hardware part, we emulate a FlexTiles architecture on the two FPGAs of the FDB. The first FPGA emulates a GPP-only many-core architecture – Microblaze processors interconnected with our NoC – and the second FPGA embeds an accelerator – either a DSP or any VHDL IP thus emulating the eFPGA. The eFPGA itself was produced on a standalone silicon layout for validation of the concept.

During hardware integration we overcame many technical issues from which two of them were particularly long and difficult to solve: (1) establishing a rapid and robust link between the two FPGAs of the FDB, and (2) the place and route of the DSP in one of the FPGAs of the FDB. In the end, both issues were solved and it is now possible to access the DSP (located on the second FPGA) from the GPP-only many-core (located on the first FPGA) through the inter-FPGA link.

The embedded software was developed. In particular, we developed a library called “Virtualization Layer” that interacts with both the operating system and the FlexTiles hardware. This is in charge of managing the resource allocation and managing the accelerators from the GPPs. That software was validated on a virtual platform based on the Open Virtual Platform (OVP) that simulates functionally and temporally the architecture.

The toolchain was integrated, merging the existing tools we had with tools that were developed for the project and adapting the interfaces of these tools to plug them together. The application to run on FlexTiles is captured as a C code, translated by the toolchain into a dataflow graph. The graph is used to produce an annotated C code and a set of XML files where each task is allocated to one resource type or several when task migration is foreseen. Those files are then used to produce several binary files that are bundled into a single file that represents the image of the application on the FlexTiles platform.

The flow starting with a tool (SimplifyDE) that focuses the capability of rapid virtual prototyping of the application in a Software-in-the-Loop simulation approach. SimplifyDE provides a simple method to exploit the key features of the FlexTiles Development Platform (FDP). Therefore it targets the group of entry level programmers which get into first contact with the FDP. Since there is no need to have full access to the application code (e.g. the low level libraries and the generated code), users can use SimplifyDE to test their own applications in the virtual prototyping environment of the FDP with only minor effort.

 

Figure 2: FlexTiles Toolflow

 

SpearDE supports more advanced application development for more complex applications. It provides features for analyzing and restructuring existing complex application code automatically to be able to capture and execute them effectively on the FDB as described in deliverable D5.11. Thus it targets the group of experts already familiar with the FDB and the associated toolflow. To enable the use of rapid prototyping for users starting with SpearDE, SimplifyDE can import model data produced by SpearDE, for instance the mapping information to the kind of processing nodes and the C code of each task as well as the one generated by SpearDE. The toolchain also supports developers who prefer to express their application as a straightforward C loop using simple annotations to request partitioning and mapping for the FDB. The SDFC compiler can process this into either a form to be imported into SimplifyDE, or to continue directly into the software tool flow if a hardware architecture description is already available.

During this last year we also produced three workshops where the attendees could benefit from an in-depth explanation of the project as well as hands on sessions. The last workshop was recorded and can be found on our YouTube channel (www.youtube.com/user/flextiles).


2.2 Key innovations

During this year, the major challenges were on the integration of the parts of the platform.

On the hardware part, we have integrated the IPs developed during the second year into the FlexTiles emulator. A video module was developed by Sundance and integrated on the FDB to add video input and output capabilities.

UR1 produced samples of a first silicon implementation of the eFPGA on a single chip.

 

Figure 3: eFPGA logic fabric chip (left) and its layout
(right)

 

The virtual implementation of a GPP-only many-core based on Open Virtual Platform was extended with the simulation of the management of accelerators. The video module was simulated on the virtual platform by KIT.

The demonstration applications were first developed on OVP which made porting to the FDB easier. The application developed by TUe and Thales benefitted from a direct video output to help demonstrating the processing taking place on the architecture.

The monitoring tool developed by TUe and KIT in order to have an inside view of what is happening in the platform, both on OVP as well as on the FDB, was extended in order to get a better inside view of the architecture.

On top of the hardware and embedded software, the toolchain was integrated by TRT, ACE and TUe.
 

2.3 Technical approach

In order to validate the solutions proposed and developed in FlexTiles, we emulate the 3D stacked chip with the FlexTiles Development Board (FDB), embedding two Xilinx Virtex-6 SX475 FPGAs, physically linked one to another by multiple links. On this physical emulator, we’ve implemented a GPP-only many-core on one of the two FPGAs and accelerators on the other one. The Network on Chip is extended across the two FPGAs through 72 GPIO links (two channels with each 32-bit data and 4 control lines), to emulate a single path of the NoC of FlexTiles.

 

 

Figure 4: The final FDB in a 19” Rack with DVI In/Out

 

The Open Virtual Platform allowed us to develop the software before using it on the FDB. We first developed the real-time operating system (RTOS), then the virtualization layer and eventually the applications.

The eFPGA has been tested and validated outside of the FlexTiles platform but its interfaces make it ready to be included on the chip.

Separating the design into two different FPGAs allows more flexibility in terms of implementation but requires to get a working link between those two FPGAs. Getting a working link between the FPGAs was particularly challenging due to requirement for a very low latency. We tried many different routes to implement this communication channel and each approach had different kind of problems. All is explained in the corresponding deliverables.

 

Number of Slice Registers:

30,157 out of 595,200    5%

Number of Slice
LUTs:                   

113,442 out of 297,600   38%

Number of RAMB36E1/FIFO36E1s

100 out of   1,064    9%

Number of DSP48E1s:    

8 out of   2,016    1%

Table 1: Resource usage of the FPGA emulating the
GPP-only many-core

 

Even though the two FPGAs of the FDB are some of the biggest on the market (Xilinx Virtex 6 – SX475), we encountered many place and route problems during integration. The DSP from CSEM (optimized for ultra-low power, not for FPGA implementation) was one of the most difficult IPs to integrate. Due to the size of this IP, only 2 GPPs could be integrated alongside it on a single FPGA. The DSP also contains long combinational paths which limited the operating frequency to 30 MHz.


Figure 5: Layout of the FPGA emulating the GPP-only
many-core

 

To demonstrate dynamicity and emulate how an IP that would be uploaded on the eFPGA would work when connected to the platform, we developed a “Contrast Enhancer” both to run in a GPP and on an emulated eFPGA. As software IP (in C code) it runs on one of the Xilinx 32-bit MicroBlaze CPUs on the first FPGA, as hardware IP it runs on the second FPGA. Thus we can show how an application can switch between different mappings, allowing task migration between GPP and eFPGA.

 

2.4 Demonstration and Use

Figure 6: FlexTiles Development Board and its video
equipment used for demos

 

Several implementations were made during the project:

  • A video module was added to the FDB and used in the two applications. This video module was also emulated on the OVP simulator. Thanks to this module we can show the data processed in the applications running on the FDP.
  • Two use-case applications were developed. Both use the video capability of the platform which makes the demonstration of the technology more visual. The two applications were developed on both OVP and the FDB.
  • The platform has been integrated including (1) the hardware blocks to emulate a FlexTiles chip on two FPGAs, (2) the lower level code running on FlexTiles and (3) the toolchain. Each step requires an integration process of its own, and then another integration process with the three layers. The resulting platform was used to implement two demo applications mentioned above.

Figure 7: Web-based GUI for the simulation framework,
SimplifyDE

 

  • SimplifyDE provides a web-based GUI for the simulationframework. It was enhanced with features supporting the user to make an initialtask mapping of an application. Therefore the web-framework now supports defininga task-graph from an application which can be broken down into a cyclo-static dataflow (CSDF). The definition of a taskgraph and the targeted hardware design (i.e. the processing elements availableon a FlexTiles platform) allows the user to map the nodes of the graph to theprocessing elements available in a FlexTiles platform and to specify an initialmapping. This information is then used by tools called by the SimplifyDE and provided by TU/e to generate the templates of the application code including all drivers and additional codes for setting up the communication FIFOs to realize the communication corresponding to the task graph. Besides this, SimplifyDE now supports the direct export of the hardware description in an XML file which is compatible with the TU/e toolflow. This allows the direct generation of the FPGA bitstreams for the FDB using the web-based GUI.


2.5 Work performed and main results

The main result of the FlexTiles project is to demonstrate that a heterogeneous 3D chip is technically feasible and brings a solution for high performance and low-power applications. The hardware platform itself is complex but the project also demonstrates how the software libraries coupled with the toolchain make it easy to program.

This 3D construction allows us to benefit from a big eFPGA matrix that covers the whole chip compared to other eFPGAs implemented only on smaller parts of a circuit. The low-power DSPs allow the application to offload the GPPs and run number crunching parts of the same code on a lower consuming core of the chip.

The co-simulation model of the eFPGA has been developed. The heterogeneous logic and computing fabric is modelled in Verilog and VHDL, which run in a hardware simulator. The controller part of the eFPGA runs as a C application, linked to the logic fabric via a co-simulation framework. For the purposes of the simulation model, the reconfiguration controller exhibits a shell through which the user can interact with the live simulated hardware fabric. A specific toolchain has been developed to synthesize C benchmarks down to Virtual Bit-Streams that can be loaded onto the simulation model.

There are two kinds of accelerators: DSPs and VHDL IPs. Both are slaves of the GPPs. The GPP controls the general application scheduling while the accelerators perform heavy computing tasks and do so with a better energy performance. Both are connected to the architecture through an Accelerator Interface. This interface allows the GPP to control any kind of accelerator (other processors as well as streaming IPs) in a common fashion and to develop new accelerators which can easily be plugged in to a FlexTiles many-core.

To run a task on the DSP, an executable code is provided to the DSP which is linked against a library containing some management parts of the virtualization layer to answer the messages and behave as expected by the GPP.

When implementing an IP on the eFPGA, one has to connect it to the Accelerator Interface so that it can be controlled by any GPP of the architecture.

Of the two Network on Chip (NoC) solutions available in the consortium, we decided to base the communication between the elements of the architecture on the NoC from TUe (dAElite). This NoC has been modified to be extended through the inter-FPGA link on the FDP. The NoC being like the spinal cord of the architecture, the Accelerator Interface connects to it to exchange data with others and receiving control information from the GPPs.

The heterogeneous FlexTiles many-core architecture – with GPPs and accelerators – was simulated on a high level OVP-based simulator that allowed us to implement the embedded software, application, drivers and management libraries of the platform, the kernel and the virtualization layer. These could be validated on a host workstation prior to being executed on a hardware emulator.

An implementation of the GPP-only many-core has been integrated on one of the two FPGAs of the FDB. This implementation being the same as the simulated one, we could reuse the same embedded software directly on the FDB. It allowed us to validate both the architecture model in the OVP-based simulator of the FlexTiles platform and the RTL model synthesized in the FDB.

From the implemented virtualization layer, we were able to verify the process of re-mapping parts of the application code from one processing element to another by simulating specific
values on the monitored sensors.

We could not implement the STAP application because of memory constraints we had on the FDB platform. We decided to implement image processing applications instead to have more visual workshops and demos.

The toolchain was developed, integrated and validated. The two demonstrator applications that are running on the platform were developed through the toolchain.

In addition to this technical work, we used a methodology known as the MIM Innovation plan which identified three innovations:

  1. The tool chain for parallel application development.
  2. The CoSy compiler development system.
  3. The kernel & self-adaptation techniques.

The result indicates how the innovation would benefit from dedicated actions.

The survey we made got over 100 responses. Two key results are shown in Figure 7 and Figure 8.

Figure 8
Rating of the submitted application fields by the responders

 

Figure 9
Rating of the most important aspects of heterogeneous multicore development

 


Eight papers were produced this year. All our efforts were concentrated on the heavy technical development we produced. However, we were interviewed by an EETimes journalist that wrote a paper in the issue of September 2014 and which made the cover of that issue (see Figure 9). Three workshops we organized and two presentations to conferences were given during this final period. Lectures and courses were given in universities (RUB, TUe and KIT) about the FlexTiles manycore.

 

Figure 10:
EETimes FlexTiles cover
page

We organized three FlexTiles workshops in September 2014 at the 24th International Conference on Field Programmable Logic and Applications (FPL-2014), at the Adaptive Hardware and Systems (AHS-2014) in July 2014 and at the 11th International Symposium on Applied Reconfigurable Computing (ARC-2015) in April 2015. Our website (www.flextiles.eu) is continuously updated.

 

2.6 Scientific, economical and societal impacts

The proposed many-core with its self-adaptive capabilities is a technological breakthrough in complete match with new applications needing dynamic adaptation and mode swapping at runtime. This research track is crucial for Europe since its many European industries are world leaders of the low-power computing domain. This trend is reinforced by the coming H2020 and ECSEL calls showing the interest of Europe in low-power computing technologies.

Smart systems and Cyber Physical Systems need a high level of computing power but also embedded intelligence to react to the environment. We speak about autonomy, survivability.

The results from FlexTiles are the first steps to define the future systems which will be able to manage dynamic adaptation nested with static data flow (intensive computing). There is a need for a programming model to describe a mix of different types of Models of Computation (MoC) while keeping the possibility to optimize and parallelize the application.

The impact of the project is mainly in many-core systems to define an evolution from the current homogeneous many-cores, to propose programming solutions to help master these powerful heterogeneous many-cores which are also more complex to program than the homogeneous ones.

The project brings new scientific ideas like:

  • The dynamic data flow.
  • The solutions to embed dedicated accelerators inside a many-core through our accelerator interface and to program them allowing industrials to reduce their time-to-market.
  • New technology of FPGA used for the eFPGA with virtual bit-stream.
  • The virtualization layer together with the kernel and the resource monitoring that give solutions for selfadaptive capabilities at run-time.
  • The streaming compiler tool.


2.7 What IP is made available to 3rd Parties?

All the IP developed in the FlexTiles project will be available to 3rd parties. This includes foreground and required background results. The conditions for this access are to be discussed with the individual partners. The results include:


TOOLS:

·
TRT: SpearDE: graphical development environment for streaming applications in a heterogeneous parallel architecture

·
ACE: CoSy: streaming compiler converting nested loop program to synchronous data flow (SDF)

·
TUE: SDK for the GPP

·
CSEM: SDK for the DSP

·
UR1: SDK for the eFPGA

·
KIT: SimplifyDE: web front end for hardware/software design

·
KIT: OVP simulator

·
TUE: application bundle creation

 

SOFTWARE:

·
KIT/TUE: Virtualization manager

·
TUE: Resource manager

·
TUE: Compassable operating system

·
TUE: sample application: SUSAN

·
TRT: sample application: license plate detection

HARDWARE:

·
Xilinx: GPP

·
TUE: Composable GPP node, NoC, NI

·
TRT: Accelerator interface

·
CSEM: DSP accelerator

·
UR1: eFPGA accelerator, dynamically reconfigurable FPGA architecture

·
RUB: assembly of the hardware blocks onto Xilinx FPGA for the FlexTiles emulator

·
SUNDANCE: hardware of the FlexTiles emulation platform = FlexTiles Development Board (FDB)

·
SUNDANCE: DVI interface

·
CEA: 3D assembly of stacked dies


 

Consortium & Project Leader:

Dr. Philippe MILLET

THALES R&T

Campus Polytechnique

1 avenue Augustin Fresnel

91767 Palaiseau
Cedex

France

 

Contact: contact@flextiles.eu

 

 

FlexTiles Team          

 

 

Comments are closed.