COMPUTER ORGANIZATION (ELE408) LAB 6
GPU Experiments on ATI Radeon HD 2400 PRO
1.
Pre-lab Report
l Read chapter 1-2 of Stream Computing User Guide
l Read this lab handout carefully. Take notes and
prepare the programs required in this handout.
l Review the class notes regarding GPU.
l Study Brook+ data types and standard library
functions.
l Study Visual Studio 2005.
2.
Objectives of this Lab
The purpose of this lab is to
learn and gain first hand experiences on GPUs (Stream processors) and AMD
Stream Computing. You
will apply your knowledge and basic concepts of GPU, memory in the lectures to
the lab experiments. You will do experiments on
the ATI Radeon HD 2400 PRO graphic card which has 40
stream processing units.
Specifically, you will learn and exercise the basics of
2.1. the principle of AMD Stream Computing and Stream
Processor Hardware Functionality.
2.2. the Brook+ programming.
2.3. How to do stream computing on ATI graphic cards.
2.4. Comparing the performance of GPU and CPU
computation.
3.
Basics
Stream
Computing harness the tremendous processing power of GPUs (stream processors)
for high-performance, data-parallel computing in a wide range of applications.
3.1. Brook+
Brook+
provides an explicit data-parallel C compiler using extensions to the standard
C language. It abstracts the hardware details. Developers write kernels to be
executed on the stream processor.
Two key
elements in Brook+
l Stream: A collection of data
elements of the same type that can be operated on in parallel.
l Kernel: A parallel function that
operates on every element of a domain of execution.
3.2. The
Stream Processor

Figure 1 Generalized
Stream Processor Structure
Stream processors comprise
groups of SIMD (single instruction multiple data) engines. Each SIMD engine
contains numerous thread processors, which are responsible for executing
kernels, each operating on an independent data stream. Thread processors, in
turn, contain numerous stream cores, which are the fundamental, programmable
computational units, responsible for performing integer, single, precision
floating point, double precision floating point, and transcendental operations.
All thread processors within a SIMD engine execute the same instruction
sequence; different SIMD engines can execute different instructions.
4. Experiment Procedures
In this lab, we will look
at a sample project of simple matrix multiplication. We will learn how to
design a kernel function, the stream management, and how to debug in Visual Studio
2005. We will also compare the performance of GPU and CPU.
You should start your lab with the following steps
one by one:
1) Open the existing sample program with Visual
Studio 2005; (C:\Program Files\AMD\AMD \Brook+ 1.3.0_beta\samples\CPP\apps\SimpleMatMult\ simple_matmult.vcpro)
This is a simple matrix multiplication example for any size matrix. File
simple_matmult.br provides the kernel running on the GPU. Function SimpleMatmult::_matmult() implemented in SimpleMatmult.h
provides the algorithm of the simple matrix multiplication on the CPU.
Explain how the GPU and the CPU do the simple matrix multiplication
respectively. Read chapter 1.1.1 of Stream Computing User Guide for reference.
2) Build this project; (Build->Build Solution or
Build->Rebuild Solution)
3) If the project is compiled, start running the
program. (Debug->Start Without Debugging)
What is the result here? Error occurred?
Set breakpoints to the function int SimpleMatmult<T>::run() in
SimpleMatmult.h and run the program step by step. Trace
the status of streams A, B and C. Try to find out the problem and fix it.
Read chapter 2.7 Stream Management of Stream Computing User Guide for
reference. Find out the error codes of streams.
BR_NO_ERROR // No
error. All’s well
BR_ERROR_DECLARATION //
Error in Stream Declaration
BR_ERROR_READ // Error during Stream::read
BR_ERROR_WRITE // Error during Stream::write
BR_ERROR_KERNEL // Error during Kernel Invocation
BR_ERROR_DOMAIN // Error in domain operator
BR_ERROR_INVALID_PARAMATER //An invalid parameter was passed
BR_ERROR_NOT_SUPPORTED // Feature not supported in brook+
or in
//the underlying hardware
4) Change the values of defaults in SampleBase.cpp in
order to display timing information, verify result with CPU and compare
performance with CPU.
5) Change the size of the input matrices, record the
results and compare the performances among inputs with different sizes,
especially the speedup of GPU Total Time to CPU Total Time. (128x64, 512x128,
512x256, 1024x512, 2048x1024) Explain why we have these performances. Is GPU
always better than CPU?
Read chapter 1.3 Performance of Stream Computing User Guide for reference.
6) Build and run the existing sample program with
Visual Studio 2005; (C:\ Program Files \ AMD \ AMD \ Brook+1.3.0_beta \ samples
\ CPP \ apps \ optimized_matmult.vcproj)
7) Compare the performances of simple matrix multiplication
and optimized matrix multiplication (128x64, 512x128, 512x256, 1024x512,
2048x1024). Explain why we have these performances.
5. Lab Report Requirements
In your lab report, you
should discuss your designs, possible application protocols, significance of
low power consumption. Explanation and interpretation of your results are very
important. The lab report will be graded based on your report and discussions.
Total mark for the report is 100 points.
Ø
Prelab report: 20 points
Ø
Successful experiment: 50 points
Ø
Results analysis, interpretation, and discussions
on your design and engineering constraints: 30 points
In the following items,
numbers inside each bracket ¡°[]¡± indicates the point
you will earn on a satisfactory report and discussions.
In your discussion and
explanations of your results, you should consider the following:
A. What knowledge of mathematics, science and engineering have you applied in
this lab? [5]
B. How did you design the lab (including
architecture, flowchart, programs, etc.)? How did you conduct the experiment?
What is your interpretation and conclusion on the experiments? [5]
C. Who is your team partner? How did you collaborate
with each other and what roles did each of the team members play in the
lab? [5]
D.
What specific modern engineering tools have you
used in your experiments. What specific techniques and
skills have you learnt from the lab? [5]
Did your design
consider the following constraints? If yes, how did you make your design
decisions?
Ø Economic Constraint (functionality and power
consumption) [3]
Ø Manufacturability, Modularity and Expandability
Constraints [3]
Ø Environmental Constraint (battery life) [2]
Ø Sustainability: Is your design and implementation
sustainable? [2]
For each of the above
programs hand in the debugged source code with comments; the machine code is
not necessary. Note that both hard copy and a file with programs are required
for the report. Be very specific with your comments that explain what you are
doing and why you are doing it.