ELE408 - Computer Organization Lab #3
Cache Organization


Objective
Prelab
LRU
Cache Hit/Miss Ratio

OBJECTIVE

To develop an understanding of cache organization such as:

PRELAB
 
  1. Read Section 4 of MCF5407 User's Manual.
  2. Write a LRU program in C with the following functions:
    1. INSERT - insert a new number into stack which does not already exist
    2. DELETE - remove number from stack
    3. PULLPUSH - identify a number and move it to the top of the stack
    4. EXISTS - return node pointer to the location of the number on the stack
  3. Write a C program using the block matrix multiplication algorithm. See below.


LRU

You were taught about LRU (Least Recently Used) cache replacement algorithm. To familarize yourself with this important replacement algorithm, you will implement an 8-element LRU stack using linked list data structure.   LRU   is used in the design of memory hierarchy of many computer systems.  The operations of the LRU are as follows:
 

  1. Each data structure will contain two components:
    1. An integer id (between 1 & 15)
    2. A pointer to the next element in the stack.
  2. The number of integers in the stack should never exceed  8.
  3. The program should prompt the user to continuously enter in an integer until a certain terminator (you decide) is entered.
  4. At termination the elements in the stack should be printed in order (MRU to LRU)
  5. If an integer is entered, and it already exists in the stack, PULLPUSH should move it to the top of the stack.
  6. If the stack is full, then remove the bottom item and place the new item at the top.
This program should be demonstrated to the T.A. for credit.
Linked List Diagram
CACHE HIT/MISS RATIO

Blocked Matrix Multiplication Algorithm

1. j := 1;

2.  for jj = 1 to N2/B2 do begin  /* jj-loop */

3.      for kk = 1 to N/B1 do      /*  kk-loop */

4.        for i = 1 to N1 do           /* i-loop */

5.          for k = B1(kk-1) + 1 to kkB1 do  /* k-loop */

6.               begin

7.               put X[i,k] in a register;

8.               for ii = j to j + B2 - 1 do       /* ii-loop */

9.                   Z[i,ii] := Z[i,ii] + X[i,k] * Y[k,ii];

10.             end;

11.        j := j + B2;

12. end jj;

For simplicity, we assume that B1 divides N and B2 divides N2.

The algorithm above breaks up the matrix into smaller segments. The size of the segment will control the number of hits and misses. The segments will be executed as many times as required.
 

  1. Compile the blocking matrix multiplication algorithm in C and cross compile it into assembly (be sure the assembly code executes correctly).
  2. Write assembly code to setup the timer registers for the MCF5407 processor so the the timer can measure the total execution time of your program.  Hint: Be careful of the timer wrapping around.  Maybe try dividing the clock or generating an interrupt.
  3. Write short assembly programs to put the cache in different modes to observe the performance enhancement due to the cache memory.  Hint: Look at CACR bits and set appropriate bits accordingly.
    1. Leave cache in default mode.
    2. Enable the store buffer.
    3. Try different combinations of the Default Cache Mode.
  4. Modify the values of B2 (2, 4, 8, 16) and read the values of the counter.  Next change the value of B1 and repeat your tests for B2.  Record your results.
  5. Part 4 should be repeated for the different cache modes.
  6. Draw curves for Execution Time vs.  B2, for each mode the cache was tested.
  7. Propose suggestions to programmers as to how to write efficient programs.
  8. Propose suggestions to computer hardware designers as to how to design optimal cache memories.
Back To Top