09. Cache

The goal of a cache is to speed up data retrieval. Just in case you didn’t know that.
The flow of data is:

Sizes of data¶

The block size in the memory is always the same as the block size in the cache.
Data moving from the cache to the CPU will always be one data word because the CPU can only handle that much at a time

What data goes in the cache?¶

Recently accessed memory entries get saved in the cache, as we may need them again soon.
Also, whenever we access memory, we save its nearby entries to the cache in case we will access those next.

Terminology¶

Hit — The requested data exists in the cache
Miss — The requested data is not in the cache. It must be retrieved from memory
Hit Rate — $\frac{\text{Cache hits}}{\text{Total accesses}}$
- $1-\text{Miss Rate}$
Miss Rate — $\frac{\text{Cache misses}}{\text{Total accesses}}$
- $1-\text{Hit Rate}$
Block — Unit size that the cache and memory are divided in to.
- If you want to bring a single byte into the cache, you must copy it’s entire block
Hit Time — The time it takes to read or write data to or from the cache
Miss Penalty — The time it takes to bring a block from memory to the cache, or to write data from the cache to memory
Average Memory Access Time (for one datum) $=$ Hit Time $+$ $($Miss Rate $\times$ Miss Penalty$)$

Cache Mapping¶

This section covers the different ways to link a cache entry to it’s address in memory. There are three ways to do this:

Direct Mapping
- Simple but not flexible
Full Association
- Very flexible, but not simple
N-Way Set Association
- Somewhat complex, somewhat flexible

There are four questions that need to be addressed for each method:

Cache Placement: How do you know where to put things in the cache?
Cache Identification: How do you know where to get things from the cache?
Cache Replacement: How do you determine what data gets overwritten when the cache is full?
Cache Synchronization: How often does the cache write its updated contents to memory?

Direct Mapping¶

Each block in the cache contains a tag, a valid bit, and the data (an array of words). More on that soon.
The valid bit indicates weather the data in the cache accurately represents the data in memory.
Cache Placement and Identification
Using this method, a single memory block can only be mapped to one cache location.
The cache location is determined as such:
$\(\text{Block}_\text{mem index}\bmod\text{Blocks}_\text{cache total}=\text{Block}_\text{cache index}$\)
For example, if a cache has 8 blocks, and we wanted to store the 12th block of memory:
$12\bmod8=4$, so the data goes in the 4th block in the cache.
$$$$
Placement of a block in the cache is determined from the memory address. The address can be divided into thee fields:
The Offset field tells you where your byte is in the specified block. The number of bytes $b$ in the offset field tells you how many bytes are in the block: $2^b$ bytes. For example, if the offset was 5 bits, the block size would be $2^5=32$ bytes.

The index field tells you which block to look in. The number of bytes $b$ in the index field tells you how many blocks there are in the cache: $2^b$ blocks.

The tag field contains the remaining bits of the address. This is kind of like the ID of the block in the set. This is required so that we can distinguish between different locations in memory that map to the same location in the cache.

Information Inconsistency

Earlier we wrote that the cache only sends and receives words from the cpu. If so why do we care about where the requested byte is in the block? Shouldn’t we only care about the word? I emailed Nachum Danzig about it, and I’ll update here when I get a response.

When locating data in the cache based on an address in memory, we determine if the data is inside the cache as such:

Locate the appropriate block using the Block field in the address
If the tag fields of the memory address and the cache block are equal, we have a cache hit.
1. We then locate the desired word using the offset specified in the Offset field
If the tags do not match, it’s a cache miss, and we must fetch the data from memory.

Cache Replacement
There is no algorithm for removing an old block when the cache is full. It will simply get overwritten when the cache saves a new block from memory that maps to the same location as the old one.

Cache Synchronization
There are two ways to keep the memory updated with the contents in the cache:

Memory is modified whenever the cache is modified
Memory is modified whenever a block in the cache is overwritten

There are two cases:

Important

Cache Hit. One of two write policies will occur:
1. Write Through — Data is simultaneously written to the cache and the memory. Only a single word is written, but the processor must wait for the slow memory access to complete. When data is removed from the cache, nothing happens in memory.
2. Write Back — Data will only be written to memory when a block is removed from the cache. If the data was modified at all, the entire block must be written to memory. We also add a DIRTY bit to the block in the cache to track this.
Cache Miss. One of two write policies will occur:
1. Write Allocate — We clear the appropriate block in the cache and load the block into the cache
2. No Write Allocate — the main memory is modified directly, bypassing the cache.

Full Association¶

Cache Placement
Any block in memory can be mapped to any block in the cache. This produces a higher hit rate for two reasons:

Any block that has a valid bit of 0 can be replaced with the new data
Replacement strategies for when the cache is full can be optimized to minimize the removal of ‘hot’ data

Cache Identification
The address has two fields: The tag field, which is the block number in memory, and the offset, which is how far into the block that data resides. The format of the address does not determine the size of the cache, it only tells you the size of each block. There is no “Block” field.

Each entry in the cache contains the tag, a valid bit, and the block of data. In order to check if our block is in the cache, we must check all the tags.

Cache Replacement
If the cache has invalid entries, replace those first. If the cache is full and all entries are valid, there are two possible algorithms:

Least Recently Used — Replace the block that was least recently accessed.
- Advantage: High hit rate
- Disadvantage: Need to track access time of each block
Random — Remove an existing block at random
- Advantage: Easy to implement
- Disadvantage: Does not lower the miss rate significantly
etc

Cache Synchronization
Same method as Direct Mapping

N-Way Set Association¶

Exam Prep

There will likely be an exam question on this. Make sure you know how this works.

N-Way Set Association Cache Formulae
$\text{Address Size}=\log_2{(\text{Memory Capacity})}\iff\text{Memory Capacity}=2^{\text{Address Size}}$
$\text{Offset Field Size}=\log_2{(\text{Block Capacity})}$
$\text{Memory Block Count}=\frac{\text{Memory Capacity}}{\text{Block Capacity}}$
$\text{Number of Cache Blocks}=\frac{\text{Cache Capacity}}{\text{Block Capacity}}$
$\text{Number of Sets}=\frac{\text{Number of Cache Blocks}}{n}$, for $n$-Way Set Association
$\text{Set Field Size}=\log_2(\text{Number of Sets})$
$\text{Tag Field Size}=\text{Address Size}-(\text{Offset Field Size}+\text{Set Field Size})$

Example:
Given the following cache specs, how many bytes are in the tag field?

Property	Value
Memory Capacity	128 Bytes
Cache Capacity	32 Bytes
Block Capacity	4 Bytes
Set Associativity	2-Way

The formula for calculating that is: $\text{Tag Field Size}=\text{Address Size}-(\text{Offset Field Size}+\text{Set Field Size})$

First, we calculate the address size: $\log_2{(128)}=7$, so our address has $7$ bits.

Then we need to find out the Offset field size. That’s $\log_2{(4)}=2$ bits.
Next, we find the Set field size. That will be $\log_2{\left(\frac{\frac{32}{4}}2\right)}=\log_2{\left(\frac82\right)}=\log_24=2$ bits.

Plugging those values into the formula for Tag field size: $7-(2+2)=7-4=3$ bits.