Single & Multi Core Performance of an Erasure Coding Workload on AMD EPYC

Examining what erasure coding throughput can be achieved with an AMD EPYC 7601 processor teamed up with the MemoScale Erasure Coding Library. The tests have been performed with a GIGABYTE MZ31-AR0 server motherboard.
With the EPYC processor line, AMD is expected to take a strong position in the server market including the data storage segment. In data storage servers, CPUs need to perform several types of compute intensive data processing workloads such as erasure coding, deduplication and compression to store data efficiently. This white paper examines what erasure coding throughput can be achieved with an AMD EPYC 7601 processor teamed up with the MemoScale Erasure Coding Library. The tests are being performed with a GIGABYTE MZ31-AR0 server motherboard.
The amount of data being stored in the world doubles every second year. Thus, methods which reduce the hardware footprint and costs of data storage such as erasure coding, deduplication and compression are increasingly becoming imperative to deal with the rapid growth of data. At the same time storage devices with blazingly fast performance are making an entry, presenting ever greater challenges for processors to keep up with the increased data processing throughputs. In this white paper we will take a closer look at the performance when pushing erasure coding processing workloads maximally on an AMD EPYC CPU.
As the size of data centers grows the probability of storage equipment failure or outage increases. This makes it essential to have protective measures in place to handle the failures which may occur on a daily or even hourly basis. Replication, i.e. making exact copies of data, has been the de facto method for protecting data against loss at a large scale, but the method has the obvious disadvantage of storing the data multiple times and thus often requires the same multiplier of costly storage equipment.

Erasure coding is an alternative to replication which follows the principles of RAID5 and RAID6 of separating the data into chunks and then adding some additional chunks of data which can be used to recover lost chunks of data. All the chunks are then distributed onto various storage mediums or failure domains. The main advantage of erasure coding is that it achieves the same or better protection against data loss as replication while reducing the total amount of data stored by 50 % – 80 %.

Erasure coding comes with some costs, one being that the redundancy segments needs to be calculated and thus require some compute resources. Increasing the speed of encoding and decoding with erasure codes can increase the storage system throughput and reduce the retrieval latency.
With the launch of EPYC which is AMD’s x86 server processor line based on the Zen microarchitecture, AMD delivers strongly on both performance and cost-performance ratio. Many IT organizations purchase dual socket servers and only populate a single socket. Others purchase dual socket servers, not because they need the compute capability, but because they need more I/O and/or memory capacity than what is available on current single socket servers. AMD EPYC enables no-compromise single socket servers with up to 32 cores, 8 memory channels and 128 PCIe® 3.0 lanes enabling capabilities and performance previously available only in dual socket architectures.
GIGABYTE puts three decades of know-how in motherboard design at the service of cutting edge server motherboards. GIGABYTE motherboards have been perfectly optimized for AMD EPYC, achieving one of the top scores on the SPEC CPU 2017 Benchmark for single and dual socket AMD EPYC systems*. Additional features of the MZ31-AR0 include dual 10GbE networking ports, an onboard M.2 port for dense high speed storage, and support for AMD’s Radeon Instinct MI25 GPU. GIGABYTE’s MZ31-AR0 motherboard also forms the base of their S451-Z30 4U 36 x 3.5” HDD Storage Server. Please see APPENDIX A for further information on the MZ31-AR0.
The MemoScale Erasure Coding Library features optimized encoding and decoding with Reed Solomon erasure code for a wide range of processors. The library also supports various types of proprietary erasure coding algorithms which further improve performance as well as reduce network traffic and hardware costs. The MemoScale Erasure Coding Library can be integrated into proprietary storage systems. In addition, MemoScale provides erasure coding plugins for open source storage systems such as CEPH, SWIFT and HDFS.
The MemoScale Erasure Coding Benchmark Tool was used to assess the erasure coding performance of different erasure coding libraries for both systems. The benchmark tool provides a plugin-system, where different erasure coding libraries can be loaded and performance benchmarked. Each thread used in the benchmark gets its own randomized 1GB of data in pre-allocated buffers, where each individual buffer contains 4 KB / 4096 KB. A tight loop then runs the encoding function for the specific erasure coding library a predefined number of times. Each iteration uses a random subset of 14 buffers, to force the buffers to be fetched from the main memory and not from the cache.

To ensure that no thread is rescheduled to another CPU core, each thread is pinned to a specific core during the whole benchmarking process. This preserves the CPU-locality of the allocated memory for that specific thread. Turbo boost was turned off.

The configuration used for testing the performance is 10 data blocks and 4 redundancy blocks with a Vandermonde-based Reed Solomon erasure code. The tests were run with two different block sizes: 4 KB and 4096 KB. The results for decoding tests included in this paper are for decoding of only one lost data block - which the most common loss scenario in storage systems.

Performance has been measured in terms of the amount of data being encoded or decoded excluding the encoded redundancy blocks being calculated. To convert it to a measurement of throughput of both data and redundancy blocks the measurements can be multiplied with a factor which reflects the used overhead level (1.4 for an erasure coding configuration with 10 data blocks and 4 redundancy blocks).
Encoding data with erasure codes is the process of generating the erasure coding redundancy blocks and it is done whenever data is written to storage systems which employ erasure coding. Higher encoding speeds can result in higher write throughput and reduced write latency.

Decoding data with erasure coding is the process of recovering original data blocks from other data and redundancy blocks and is done in storage systems when the systems recover from failures of storage equipment or when degraded reads need to be done. A degraded read is the process of reading data from a storage system where one or more of the original data blocks are lost or temporarily unavailable and thus needs to be decoded. Higher decoding speeds can result in higher throughput and reduced latency of degraded reads.
In this paper we evaluated the erasure coding performance of the AMD EPYC 7601 teamed up with the MemoScale Erasure Coding Library. The tests were performed on a GIGABYTE MZ31-AR0 server motherboard.

The single core encoding performance of the EPYC is above 5 GB/s for both 4KB and 4096KB blocks. The single core decoding performance is above 10 GB/s for 4KB blocks and surpassing 14 GB/s for 4096KB blocks. In multi core tests the 8 memory channels and 32 cores of the EPYC make it possible to reach an impressive encoding performance of 88 GB/s and a decoding performance of 111 GB/s.

The results demonstrate EPYC’s impressive ability to move and process large amounts of data fast. This could have implications for how AMD’s processors can be used as well as how storage systems can be constructed to make full use of this advantage. In the time to come we aim to benchmark other compute intensive workloads for storage systems on the EPYC processor using MemoScale software
Realtion Tags




GIGABYTE’s GPU Servers Help Improve Oil & Gas Exploration Efficiency

GPU-accelerated servers are used in industries such as oil and gas exploration to deliver powerful computing capabilities, helping to quickly and accurately analyze large and complex data sets to reduce exploration costs. GIGABYTE uses industry-leading HPC technologies to provide customers in the oil and gas industry with GPU-accelerated servers that deliver top-tier computing performance.

Over 100,000 Pokémon Fanatics Gather to Catch ‘Em All - See How GIGABYTE’s High Density Servers Help Maintain Law & Order

Large scale events can lead to a sudden surge in crowds, creating cellular network congestion. Even if the user capacity of a cell tower user is upgraded, the network operator is still unable to cope with an abrupt increase in demand. In 2019 Industrial Technology Research Institute (ITRI) therefore designed and built a “Private Cell Mobile Command Vehicle”, which can deploy a pre-5G private cellular network to avoid the problem of commercial network traffic jams. The vehicle provides the New Taipei City Police Department with smooth, uninterrupted cellular network service, allowing the police staff to remotely monitor real time footage of large scale events and deploy police resources where needed, increasing the efficiency of providing event security and safety. GIGABYTE’s H-Series High Density Servers are also helping to support ITRI’s “Private Cell Mobile Command Vehicle” by reducing the complexity of back-end infrastructure – each server combines computing, storage and networking into a single system, allowing for resources to be centralized and shared across different applications. The servers also optimize the use of time and manpower by combining and centralizing applications to simplify management procedures.

What is Edge Computing? Definition and Cases Explained.

Edge Computing: computing performed physically or logically as close as possible to where data is created and commands are executed. Offering excellent advantages in latency reduction for applications relying on real-time decision making.