ELE408
- Computer Organization Lab #3
Cache Organization
OBJECTIVE
To develop an understanding of cache
organization such as:
-
Mapping
-
Replacement algorithms
-
Effectiveness of cache memory
-
Hit/Miss ratios
PRELAB
-
Read
Section 4 of MCF5407
User's Manual.
-
Write a LRU program in C with the following
functions:
-
INSERT - insert a new number into stack
which does not already exist
-
DELETE - remove number from stack
-
PULLPUSH - identify a number and move
it to the top of the stack
-
EXISTS - return node pointer to the
location of the number on the stack
-
Write a C program using the block matrix
multiplication algorithm.
See
below.
LRU
You were taught about LRU
(Least Recently Used)
cache replacement algorithm.
To familarize yourself with this important replacement algorithm,
you will implement
an 8-element LRU stack using linked list data structure.
LRU is used in the design of memory hierarchy of many computer systems.
The operations of the LRU are as follows:
-
Each data structure will contain two
components:
-
An integer id (between 1 & 15)
-
A pointer to the next element in the
stack.
-
The number of integers in the stack
should never exceed 8.
-
The program should prompt the user
to continuously enter in an integer until a certain terminator (you decide)
is entered.
-
At termination the elements in the
stack should be printed in order (MRU to LRU)
-
If an integer is entered, and it already
exists in the stack, PULLPUSH should move it to the top of the stack.
-
If the stack is full, then remove the
bottom item and place the new item at the top.
This program should be demonstrated
to the T.A. for credit.
CACHE HIT/MISS RATIO
Blocked
Matrix Multiplication Algorithm
1. j := 1;
2. for jj = 1 to N2/B2
do
begin /* jj-loop */
3.
for kk = 1 to N/B1 do /*
kk-loop */
4.
for i = 1 to N1 do
/* i-loop */
5.
for k = B1(kk-1) + 1 to kkB1 do /* k-loop */
6.
begin
7.
put X[i,k] in a register;
8.
for ii = j to j + B2 - 1 do
/* ii-loop */
9.
Z[i,ii] := Z[i,ii] + X[i,k] * Y[k,ii];
10.
end;
11.
j := j + B2;
12. end jj;
For simplicity, we assume that
B1 divides N and B2 divides N2.
The algorithm above breaks up the
matrix into smaller segments. The size of the segment will control the
number of hits and misses. The segments will be executed as many times
as required.
-
Compile the blocking matrix multiplication
algorithm in C and cross compile it into assembly (be sure the assembly
code executes correctly).
-
Write assembly code to setup the timer
registers for the MCF5407 processor so the the timer can measure the total
execution time of your program. Hint: Be careful of the timer
wrapping around. Maybe try dividing the clock or generating an interrupt.
-
Write short assembly programs to put
the cache in different modes to observe the performance enhancement due
to the cache memory. Hint: Look at CACR bits and set
appropriate bits accordingly.
-
Leave cache in default mode.
-
Enable the store buffer.
-
Try different combinations of the Default
Cache Mode.
-
Modify the values of B2 (2, 4, 8, 16)
and read the values of the counter. Next change the value of B1 and
repeat your tests for B2. Record your results.
-
Part 4 should be repeated for the different
cache modes.
-
Draw curves for Execution Time vs.
B2, for each mode the cache was tested.
-
Propose suggestions to programmers
as to how to write efficient programs.
-
Propose suggestions to computer hardware
designers as to how to design optimal cache memories.
Back To Top