COMPUTER ORGANIZATION (ELE408) LAB 6

 

GPU Experiments on ATI Radeon HD 2400 PRO

 

1.     Pre-lab Report

l Read chapter 1-2 of Stream Computing User Guide

l Read this lab handout carefully. Take notes and prepare the programs required in this handout.

l Review the class notes regarding GPU.

l Study Brook+ data types and standard library functions.

l Study Visual Studio 2005.

2.     Objectives of this Lab

The purpose of this lab is to learn and gain first hand experiences on GPUs (Stream processors) and AMD Stream Computing. You will apply your knowledge and basic concepts of GPU, memory in the lectures to the lab experiments. You will do experiments on the ATI Radeon HD 2400 PRO graphic card which has 40 stream processing units.

Specifically, you will learn and exercise the basics of

2.1.   the principle of AMD Stream Computing and Stream Processor Hardware Functionality.

2.2.   the Brook+ programming.

2.3.   How to do stream computing on ATI graphic cards.

2.4.   Comparing the performance of GPU and CPU computation.

3.     Basics

Stream Computing harness the tremendous processing power of GPUs (stream processors) for high-performance, data-parallel computing in a wide range of applications.

3.1. Brook+

Brook+ provides an explicit data-parallel C compiler using extensions to the standard C language. It abstracts the hardware details. Developers write kernels to be executed on the stream processor.

Two key elements in Brook+

l  Stream: A collection of data elements of the same type that can be operated on in parallel.

l  Kernel: A parallel function that operates on every element of a domain of execution.

3.2. The Stream Processor

stream processor.bmp

Figure 1  Generalized Stream Processor Structure

Stream processors comprise groups of SIMD (single instruction multiple data) engines. Each SIMD engine contains numerous thread processors, which are responsible for executing kernels, each operating on an independent data stream. Thread processors, in turn, contain numerous stream cores, which are the fundamental, programmable computational units, responsible for performing integer, single, precision floating point, double precision floating point, and transcendental operations. All thread processors within a SIMD engine execute the same instruction sequence; different SIMD engines can execute different instructions.

 

4. Experiment Procedures

In this lab, we will look at a sample project of simple matrix multiplication. We will learn how to design a kernel function, the stream management, and how to debug in Visual Studio 2005. We will also compare the performance of GPU and CPU.   

You should start your lab with the following steps one by one:

1)     Open the existing sample program with Visual Studio 2005; (C:\Program Files\AMD\AMD \Brook+ 1.3.0_beta\samples\CPP\apps\SimpleMatMult\ simple_matmult.vcpro)

This is a simple matrix multiplication example for any size matrix. File simple_matmult.br provides the kernel running on the GPU. Function SimpleMatmult::_matmult() implemented in SimpleMatmult.h provides the algorithm of the simple matrix multiplication on the CPU.

Explain how the GPU and the CPU do the simple matrix multiplication respectively. Read chapter 1.1.1 of Stream Computing User Guide for reference.

2)     Build this project; (Build->Build Solution or Build->Rebuild Solution)

3)     If the project is compiled, start running the program. (Debug->Start Without Debugging)

What is the result here? Error occurred?

Set breakpoints to the function int SimpleMatmult<T>::run() in SimpleMatmult.h and run the program step by step. Trace the status of streams A, B and C. Try to find out the problem and fix it.

Read chapter 2.7 Stream Management of Stream Computing User Guide for reference. Find out the error codes of streams.

BR_NO_ERROR                                             // No error. All’s well

BR_ERROR_DECLARATION                    // Error in Stream Declaration

BR_ERROR_READ                                       // Error during Stream::read

BR_ERROR_WRITE                                              // Error during Stream::write

BR_ERROR_KERNEL                                  // Error during Kernel Invocation

BR_ERROR_DOMAIN                                 // Error in domain operator

BR_ERROR_INVALID_PARAMATER //An invalid parameter was passed

BR_ERROR_NOT_SUPPORTED             // Feature not supported in brook+ or in

//the underlying hardware

4)     Change the values of defaults in SampleBase.cpp in order to display timing information, verify result with CPU and compare performance with CPU.

5)     Change the size of the input matrices, record the results and compare the performances among inputs with different sizes, especially the speedup of GPU Total Time to CPU Total Time. (128x64, 512x128, 512x256, 1024x512, 2048x1024) Explain why we have these performances. Is GPU always better than CPU?

Read chapter 1.3 Performance of Stream Computing User Guide for reference.

6)     Build and run the existing sample program with Visual Studio 2005; (C:\ Program Files \ AMD \ AMD \ Brook+1.3.0_beta \ samples \ CPP \ apps \   optimized_matmult.vcproj)

7)     Compare the performances of simple matrix multiplication and optimized matrix multiplication (128x64, 512x128, 512x256, 1024x512, 2048x1024). Explain why we have these performances.

5. Lab Report Requirements

In your lab report, you should discuss your designs, possible application protocols, significance of low power consumption. Explanation and interpretation of your results are very important. The lab report will be graded based on your report and discussions. Total mark for the report is 100 points.

Ø        Prelab report: 20 points

Ø        Successful experiment: 50 points

Ø        Results analysis, interpretation, and discussions on your design and engineering constraints: 30 points

In the following items, numbers inside each bracket ¡°[]¡± indicates the point you will earn on a satisfactory report and discussions.

 

In your discussion and explanations of your results, you should consider the following:

A.     What knowledge of mathematics, science and engineering have you applied in this lab? [5]

B.     How did you design the lab (including architecture, flowchart, programs, etc.)? How did you conduct the experiment? What is your interpretation and conclusion on the experiments? [5]

C.     Who is your team partner? How did you collaborate with each other and what roles did each of the team members play in the lab?  [5]

D.    What specific modern engineering tools have you used in your experiments. What specific techniques and skills have you learnt from the lab? [5]

 

 Did your design consider the following constraints? If yes, how did you make your design decisions?

 

Ø      Economic Constraint (functionality and power consumption) [3]

Ø      Manufacturability, Modularity and Expandability Constraints [3]

Ø      Environmental Constraint (battery life) [2]

Ø      Sustainability: Is your design and implementation sustainable? [2]

 

For each of the above programs hand in the debugged source code with comments; the machine code is not necessary. Note that both hard copy and a file with programs are required for the report. Be very specific with your comments that explain what you are doing and why you are doing it.