Communication Stack for Memory Sharing in a Heterogeneous Multi-Core System

Modern heterogeneous multi-core systems often face significant communication challenges due to varying operating systems and functions across different cores. These challenges, if not addressed properly, can lead to inefficiencies and bottlenecks in performance. There are several custom implementations of this shared memory stack, and these can limit interconnectivity. So we’ll look at the standard memory-sharing layered architecture for heterogeneous AMP systems based on existing components, such as RPMsg and VirtIO. Here’s how it works.

June 7, 2024
June 7, 2024

Post Views: 505

Heterogeneous multi-core systems usually use shared memory-based communication because the cores must interact and exchange data as they run different functions and operating systems.

There are several custom implementations of this shared memory stack, and these can limit interconnectivity. So we’ll look at the standard memory-sharing layered architecture for heterogeneous AMP systems based on existing components, such as RPMsg and VirtIO. Here’s how it works.

Communication Stack

This communication stack can be split into three OSI protocol layers: physical, Media Access (MAC), and transport layers.

Each of these layers can be implemented separately to allow multiple implementations of one layer, such as RPMsg in the transport layer, to share one implementation of the MAC layer (VirtIO).

Physical Layer

This layer houses the actual hardware needed in this communication stack, which includes the shared memory to be accessed by both cores and the inter-core interrupts. These interrupts are set to a specific configuration, with the minimum one requiring a single interrupt line per core, so two in total at the least.

The RockChip RK3588 SoC series relies on such a setup in its physical layer because it uses a Mailbox hardware module for inter-core communication. This Mailbox transmits interrupt signals between the different cores, which is almost similar to the passing of synchronization semaphores in other systems. But it has some differences, which we’ll look at later.

Some chips, such as those in the NXP family, rely on semaphores for inter-core synchronization, while others rely on hardware elements like inter-core queues and inter-core mutexes.

This synchronization is not necessary when using virtqueue in the Media Access layer.

Media Access Layer (VirtIO/Virtqueue)

This layer eliminates the need for inter-core synchronization because it introduces virtqueue, a single-writer single-reader circular buffering data structure. This data structure enables multiple asynchronous cores to interchange data using Vrings.

Virtqueue is part of the VirtIO architecture, and it helps drivers and devices perform Vring operations. On the other hand, Vring is the current memory layout of a virtqueue implementation.

In Linux distributions, virtqueues are implemented as Vrings, which are ring buffer structures. Each one consists of a descriptor table (buffer descriptor pool), available ring, and used ring.

The Vring structure in the shared memory

These three are physically stored in the shared memory.

Buffer Descriptor

The buffer descriptor has a 64-bit buffer address that stores the address to the buffer held in the shared memory. Each descriptor also has a 32-bit variable buffer length, a 16-bit flag field, and a 16-bit link that points to the next buffer descriptor.

Available Ring Buffer

This input ring buffer has its separate 16-bit flag field, but only the 0th bit in this sector is used. By default, this bit is not set. But if set, the writer side should not get a notification when the reading side consumes a buffer from this input/available ring buffer.

If the bit is left in its default state (not set), This notification is received in the form of the interrupt that we looked at in the physical layer.

The other input field in this available buffer is the head index. Once a buffer index containing a new message gets written in the ring[x] field, the writer updates this head index.

Used Ring Buffer

The last Vring component is the free/used ring buffer, and it has flag and head index fields, just like the available ring buffer. Flags mainly inform the writers whether they should interrupt the other core when updating the head index. Therefore, only the 0th bit in this sector is used. And if set, the writing core won’t get a notification (interrupt) when the reader updates the head index. It operates similarly to the available ring buffer flag field.

However, the buffer has some differences compared to the available ring buffer, and one of them is storing the buffer length for each entry.

But it is important to note that this Vring technique only works on heterogeneous multi-core systems with core-to-core configurations, not core-to-multicore. In core-to-multicore systems, the setup will have multiple writers to the available ring buffer, which would require a synchronization element, such as a semaphore.

Transport Layer (RPMsg)

The RPMsg protocol forms the transport layer in this communication stack. Each RPMsg message is contained in a buffer in the shared memory. This buffer is pointed to by the address field of the buffer descriptor in the Vring pool we looked at in the Media Access layer.

In this buffer, the first 16 bits are used internally by the transport layer, so you won’t find them in the layout.

The 32-bit source address contains the sender’s address or source endpoint, while the destination address has the source endpoint. Next in the RPMsg layout is the 32-bit reserved field, which provides alignment. Since it does not carry any data, it can be used by the transport layer to implement RPMsg (16 bits), leaving 16 bits for alignment.

The last two sections of the RPMsg header are the payload length field (16-bit) and flags field, also 16 bits.

It is uncommon to have an alignment greater than 16 bits in a shared memory heterogeneous multi-core system, but special care should be given if it exceeds this capacity if you’ve allocated part of the space to the transport layer to implement RPMsg.

Recommended Heterogeneous Multi-core System on Module

Heterogeneous multi-core systems have become quite popular in embedded high-performance computing applications, such as AI, because they provide a potent mix of performance and power efficiency.

I recommend the DSOM-040R RK3588 system-on-module for such tasks because it is developed on the RK3588 SoC, meaning it has four A76 cores and A55 cores. As stated earlier, these cores use the Mailbox hardware module for inter-core communication.

Although this communication is somehow similar to the semaphores, it relies on interrupts instead. Interrupts can increase CPU efficiency, decrease waiting time, and allow the cores to handle multitasking better.

It is also important to note that the RPMsg buffer is located in the DDR SDRAM by default in these asymmetric heterogeneous multi-core systems.

In the RK3588 SoM series, the internal SRAM is designated for the MPP module. If the system is not utilizing the MPP, you can allocate this SRAM to the RPMsg buffer to enhance performance.

Besides the 64-bit 8-core brains, this module features an ARM Mali-G610 MC4 quad-core GPU that delivers 450 GFLOPS and a high-performance NPU that delivers up to 6 TOPS. It also has a microcontroller unit for low-power control.

Conclusion

In a nutshell, the process of multiple cores trying to access shared memory in a heterogeneous system is akin to different users trying to access data from a shared database. Therefore, there must be a defined way to access the memory with optimized processes to maximize efficiency.

The communication stack explained above is a standardized, efficient architecture that relies on the RPMsg protocol, VirtIO, and core interrupts to use the shared memory in a heterogeneous AMP system. And the RK3588-based DSOM-040R system on module relies on this stack, making it an efficient AMP system architecture that is ideal for handling high-power computational needs while being quick and efficient.

That’s it for this article, and I hope it has been insightful. Please like and share it with your embedded and IoT developer colleagues, and comment below if you need further clarification.

For further custom development services and user case applications, welcome to contact Dusun IoT. We are committed to providing professional services to ensure your product’s successful market launch and foster a prosperous future together!

Looking For An IoT Device Supplier For Your Projects?

CONTACT US

Name*

Company*

Business email*

Phone number

Product*

Interest*

Annual Demand*

Message*

By checking this box, you consent to receive commercial messages such as newsletters, event invitations, promotional and educational content, product updates, transaction-related emails, and customer service emails, from DusunIoT, in accordance with our Privacy Policy. You may withdraw your consent at any time.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

IoT Gateways for Recommendation

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Communication Stack for Memory Sharing in a Heterogeneous Multi-Core System

Communication Stack

Physical Layer

Media Access Layer (VirtIO/Virtqueue)

Buffer Descriptor

Available Ring Buffer

Used Ring Buffer

Transport Layer (RPMsg)

Recommended Heterogeneous Multi-core System on Module

Conclusion

CONTACT US

Developing Smart Home Touchscreen Control Panel: Best SoC Recommendations

Comparing RK3568 vs RK3576 vs RK3588 for Developing NVRs: Which Is Best?

Understanding The Role of Robot Vision and AI: Transforming Blind Machines Into Visionary Robots

Bluetooth 6.0 Allows For 10cm Range Accuracy, Improved Scanning Efficiency, and More Security

CONTACT US

Welcome to DusunIoT

Download

DusunIoT Distributor Program