LSQ
This document describes how the lsq.py
script instantiates and connects sub-modules to generate the VHDL for the complete Load-Store Queue (LSQ).
Detailed documentation for the LSQ
generator, which emits a VHDL entity and architecture to assemble a complete Load-Store Queue. It instantiates and connects all dispatchers (Port-to-Queue and Queue-to-Port dispatchers), the group allocator, and optional pipeline logics into one cohesive RTL block.
1. Overview and Purpose
The LSQ is the system for managing all memory operations within the dataflow circuit. Its primary role is to accept out-of-order memory requests, track their dependencies, issue them to memory when safe, and return results in the correct order to the appropriate access ports.
The LSQ module acts as the master architect, instantiating the previously generated modules such as Port-to-Queue Dispatcher
, Queue-to-Port Dispatcher
, and Group Allocator
modules. It wires them together with the load queue, the store queue, the dependency checking logic, and the requesting issue logic.
2. LSQ Internal Blocks
Let’s assume the following generic parameters for dimensionality:
N_GROUPS
: The total number of groups.N_LDQ_ENTRIES
: The total number of entries in the Load Queue.N_STQ_ENTRIES
: The total number of entries in the Store Queue.LDQ_ADDR_WIDTH
: The bit-width required to index an entry in the Load Queue (i.e.,ceil(log2(N_LDQ_ENTRIES))
).STQ_ADDR_WIDTH
: The bit-width required to index an entry in the Store Queue (i.e.,ceil(log2(N_STQ_ENTRIES))
).LDP_ADDR_WIDTH
: The bit-width required to index the port for a load.STP_ADDR_WIDTH
: The bit-width required to index the port for a store.
Signal Naming and Dimensionality: This module is generated from a higher-level description (e.g., in Python), which results in a specific convention for signal naming in the final VHDL code. It’s important to understand this convention when interpreting the signal tables.
-
Generation Pattern: A signal that is conceptually an array in the source code is “unrolled” into multiple, distinct signals in the VHDL entity. The generated VHDL signals are indexed with a suffix, such as
ldp_addr_{p}_i
, where{p}
represents the port index. -
Placeholders: In the VHDL Signal Name column, the following placeholders are used:
{g}
: Group index{lp}
: Load port index{sp}
: Store port index{lm}
: Load memory channel index{sm}
: Store memory channel index
-
Interpreting Diagrams: If a diagram or conceptual description uses a base name without an index (e.g.,
group_init_valid_i
), it represents a collection of signals. The actual dimension is expanded based on the context:- Group-related signals (like
group_init_valid_i
) are expanded by the number of groups (N_GROUPS
). - Load queue-related signals (like
ldq_wen_o
) are expanded by the number of load queue entries (N_LDQ_ENTRIES
). - Store queue-related signals (like
stq_wen_o
) are expanded by the number of store queue entries (N_STQ_ENTRIES
).
- Group-related signals (like
2.1. Group Allocation Interface
These signals manage the handshake protocol for allocating groups of memory operations into the LSQ.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
group_init_valid_i | group_init_valid_{g}_i | Input | std_logic | Valid signal from the kernel, indicating a request to allocate group {g} . |
group_init_ready_o | group_init_ready_{g}_o | Output | std_logic | Ready signal to the kernel, indicating the LSQ can accept a request for group {g} . |
2.2. Access Port Interface
This interface handles the flow of memory operation payloads (addresses and data) between the dataflow circuit’s access ports and the LSQ.
2.2.1. Load Address Dispatcher
Dispatches load addresses from the kernel to the load queue.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
ldp_addr_i | ldp_addr_{lp}_i | Input | std_logic_vector(addrW-1:0) | The memory address for a load operation from load port {lp} . |
ldp_addr_valid_i | ldp_addr_valid_{lp}_i | Input | std_logic | Asserts that the payload on ldp_addr_{lp}_i is valid. |
ldp_addr_ready_o | ldp_addr_ready_{lp}_o | Output | std_logic | Asserts that the load queue is ready to accept an address from load port {lp} . |
2.2.1. Load Data Dispatcher
Returns data retrieved from memory back to the correct load port.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
ldp_data_o | ldp_data_{lp}_o | Output | std_logic_vector(dataW-1:0) | The data payload being sent to load port {lp} . |
ldp_data_valid_o | ldp_data_valid_{lp}_o | Output | std_logic | Asserts that the payload on ldp_data_{lp}_o is valid. |
ldp_data_ready_i | ldp_data_ready_{lp}_i | Input | std_logic | Asserts that the kernel is ready to receive data on load port {lp} . |
2.2.3. Store Address Dispatcher
Dispatches store addresses from the kernel to the LSQ.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
stp_addr_i | stp_addr_{sp}_i | Input | std_logic_vector(addrW-1:0) | The memory address for a store operation from store port {sp} . |
stp_addr_valid_i | stp_addr_valid_{sp}_i | Input | std_logic | Asserts that the payload on stp_addr_{sp}_i is valid. |
stp_addr_ready_o | stp_addr_ready_{sp}_o | Output | std_logic | Asserts that the store queue is ready to accept an address from store port {sp} . |
2.2.4. Store Data Dispatcher
Dispatches data to be stored from the kernel to the LSQ.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
stp_data_i | stp_data_{sp}_i | Input | std_logic_vector(dataW-1:0) | The data payload to be stored from store port {sp} . |
stp_data_valid_i | stp_data_valid_{sp}_i | Input | std_logic | Asserts that the payload on stp_data_{sp}_i is valid. |
stp_data_ready_o | stp_data_ready_{sp}_o | Output | std_logic | Asserts that the store queue is ready to accept data from store port {sp} . |
2.3. Memory Interface
These signals form the connection between the LSQ and the main memory system.
Read Channel
2.3.1. Read Request (LSQ to Memory)
Used by the LSQ to issue load operations to memory.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
rreq_valid_o | rreq_valid_{lm}_o | Output | std_logic | Valid signal indicating the LSQ is issuing a read request on channel {lm} . |
rreq_ready_i | rreq_ready_{lm}_i | Input | std_logic | Ready signal from memory, indicating it can accept a read request on channel {lm} . |
rreq_id_o | rreq_id_{lm}_o | Output | std_logic_vector(idW-1:0) | An ID for the read request, used to match the response. |
rreq_addr_o | rreq_addr_{lm}_o | Output | std_logic_vector(addrW-1:0) | The memory address to be read. |
2.3.2. Read Response (Memory to LSQ)
Used by memory to return data for a previously issued read request.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
rresp_valid_i | rresp_valid_{lm}_i | Input | std_logic | Valid signal from memory, indicating a read response is available on channel {lm} . |
rresp_ready_o | rresp_ready_{lm}_o | Output | std_logic | Ready signal to memory, indicating the LSQ can accept the read response. |
rresp_id_i | rresp_id_{lm}_i | Input | std_logic_vector(idW-1:0) | The ID of the read response, matching a previous rreq_id_o . |
rresp_data_i | rresp_data_{lm}_i | Input | std_logic_vector(dataW-1:0) | The data returned from memory. |
Write Channel
2.3.3. Write Request (LSQ to Memory)
Used by the LSQ to issue store operations to memory.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
wreq_valid_o | wreq_valid_{sm}_o | Output | std_logic | Valid signal indicating the LSQ is issuing a write request on channel {sm} . |
wreq_ready_i | wreq_ready_{sm}_i | Input | std_logic | Ready signal from memory, indicating it can accept a write request on channel {sm} . |
wreq_id_o | wreq_id_{sm}_o | Output | std_logic_vector(idW-1:0) | An ID for the write request. |
wreq_addr_o | wreq_addr_{sm}_o | Output | std_logic_vector(addrW-1:0) | The memory address to write to. |
wreq_data_o | wreq_data_{sm}_o | Output | std_logic_vector(dataW-1:0) | The data to be written to memory. |
2.3.4. Write Response (Memory to LSQ)
Used by memory to signal the completion of a write operation.
Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
---|---|---|---|---|
wresp_valid_i | wresp_valid_{sm}_i | Input | std_logic | Valid signal from memory, indicating a write has completed on channel {sm} . |
wresp_ready_o | wresp_ready_{sm}_o | Output | std_logic | Ready signal to memory, indicating the LSQ can accept the write response. |
wresp_id_i | wresp_id_{sm}_i | Input | std_logic_vector(idW-1:0) | The ID of the completed write. |
The LSQ has the following responsibilities:
- Sub-Module Instantiation
The primary responsibility of the top-level LSQ module is to function as an integrator. It instantitates several specialized sub-modules and connects them with the load queue, the store queue, dependency checking logic, and requesting issue logic to create the complete memory management system.
- Group Allocator: This module is responsible for managing entry allocation for the LSQ. It performs the initial handshake to reserve entries for an entire group of loads and stores, providing the necessary ordering information that would otherwise be missing in a dataflow circuit.
- Port-to-Queue (PTQ) Dispatcher: This module is responsible for routing incoming payloads, such as addresses and data, from the dataflow circuit’s external access ports to the correct entries within the load queue and the store queue. The LSQ instantiates three distinct PTQ dispatchers:
- Load Address Port Dispatcher: For routing load addresses.
- Store Address Port Dispatcher: For routing store addresses.
- Store Data Port Dispatcher: For routing store data.
- Queue-to-Port (QTP) Dispatcher: This module is the counterparts to the PTQ dispatchers. It takes payloads from the queue entries and route them back to the correct external access ports. The LSQ instantiates the following QTP dispatchers:
- Load Data Port Dispatcher: Sends loaded data back to the circuit.
- (Optionally) Store Backward Port Dispatcher: It is used to send store completion acknowledgements back to the circuit if the
stResp
configuration is enabled.
-
Load Queue Management Logic
This block can be divided into three sub-block: Load Queue Entry Allocation Logic, Load Queue Pointer Logic, and Load Queue Content State Logic.
2.1. Load Queue Entry Allocation Logic
This block checks whether each queue entry is allocated or deallocated.
- Input:
ldq_wen
: A signal from the Group Allocator that goes high to activate the queue entry when a new load group is being allocated.ldq_reset
: A signal from the Load Data Port Dispatcher that goes high to deactivate (reset) the entry after its load operation is complete and the data has been sent to the kernel.ldq_alloc
(current state): The current allocation status from the register’s output, which is fed back as an input to the logic.
- Processing:
- The
ldq_reset
signal is inverted by aNOT
gate. The result represents the “do not reset” condition. - This inverted signal is then combined with the current
ldq_alloc
state using anAND
gate. The result of this operation (labeledldq_alloc_next
in concept) is ‘1’ only if “the entry is currently allocated AND it is not being reset,” indicating that the allocation should be maintained. - The output of the
AND
gate is then combined with theldq_wen
signal using anOR
gate. This final logic determines that the entry will be in an allocated state (‘1’) during the next clock cycle if either of two conditions is met:- A new allocation is requested (
ldq_wen
= ‘1’). - It was already allocated and no reset was requested (
ldq_alloc
= ‘1’ ANDldq_reset
= ‘0’).
- A new allocation is requested (
- This logic is equivalent to the expression
next_state <= ldq_wen OR (ldq_alloc AND NOT ldq_reset)
.
- The
- Output:
ldq_alloc
(next state): The updated allocation status of the load queue entry for the subsequent clock cycle. This signal is used by other logic within the LSQ to determine if the entry is active.
2.2. Load Queue Pointer Logic
This block is dedicated to calculating the next positions of the head and tail pointers of the circular queue.
-
Input:
num_loads
: The number of new entries being allocated. From the Group Allocator.ldq_tail
(current state): The current tail pointer value.ldq_alloc
: The up-to-date allocation status vector for all entries, which is the output of the Entry State Logic.
-
Processing:
- Tail Pointer Update: When a new group is allocated, it advances the
ldq_tail
pointer by thenum_loads
amount, usingWrapAdd
to handle the circular nature of the queue. - Head Pointer Update: It determines the next
ldq_head
by usingCyclicPriorityMasking
on theldq_alloc
vector. This efficiently finds the oldest, active entry, which becomes the new head.
- Tail Pointer Update: When a new group is allocated, it advances the
-
Output:
ldq_head
(next state): The updated head pointer of the queue.ldq_tail
(next state): The updated tail pointer of the queue.
2.3. Load Queue Content State Logic
This logic manages the validity status of the various payloads and the issue status within an allocated entry. All signals in this block share a similar structure.
-
Input:
- Set Signals:
ldq_addr_wen
: From the Load Address Port Dispatcher, setsldq_addr_valid
to true.ldq_data_wen
: From the Bypass Logic or memory interface, setsldq_data_valid
to true.ldq_issue_set
: From the Dependency Checking & Issue Logic, setsldq_issue
to true.
- Common Reset Signal:
ldq_wen
: From the Group Allocator. It acts as a synchronous reset for all these status bits, clearing them to ‘0’ when a new operation is allocated to the entry.
- Current State Signals:
ldq_addr_valid
(curr state): Current status indicating if the entry holds a valid address.ldq_data_valid
(curr state): Current status indicating if the entry holds valid data.ldq_issue
(curr state): Current status indicating if the load request has been satisfied.
- Set Signals:
-
Processing:
- All three signals (
ldq_addr_valid
,ldq_data_valid
,ldq_issue
) follow the same Set-Reset flip-flop logic pattern, whereldq_wen
has reset priority. - A status bit is set if its corresponding “set” signal (e.g.,
ldq_addr_wen
) is high. If not being set, it holds its value. - However, if
ldq_wen
is high for an entry, all three status bits for that entry are unconditionally cleared to ‘0’ on the next clock cycle. - The logic is equivalent to these expressions:
ldq_addr_valid_next <= (ldq_addr_wen OR ldq_addr_valid) AND (NOT ldq_wen)
ldq_data_valid_next <= (ldq_data_wen OR ldq_data_valid) AND (NOT ldq_wen)
ldq_issue_next <= (ldq_issue_set OR ldq_issue) AND (NOT ldq_wen)
- All three signals (
-
Output:
ldq_addr_valid
(next state): Updated status indicating if the entry holds a valid address.ldq_data_valid
(next state): Updated status indicating if the entry holds valid data.ldq_issue
(next state): Updated status indicating if the load request has been satisfied.
- Input:
-
Store Queue Management Logic
This block can be divided into three sub-block: Store Queue Entry Allocation Logic, Store Queue Pointer Logic, and Store Queue Content State Logic.3.1. Store Queue Entry Allocation Logic (
stq_alloc
)
This logic manages the allocation status for each entry in the Store Queue (STQ), indicating whether it is active.-
Input:
stq_wen
: The write-enable signal from the Group Allocator. When high, it signifies that this entry is being allocated to a new store operation.stq_reset
: The reset signal, which can be triggered by a write response (wresp_valid_i
) or by a store backward dispatch (qtp_dispatcher_stb
). When high, it deallocates the entry.stq_alloc
(current state): The current allocation status from the register’s output, fed back as an input.
-
Processing:
- The logic follows the same Set-Reset principle as the load queue.
- The entry becomes allocated (
'1'
) if a new store is being written to it (stq_wen
is high). - It remains allocated if it was already allocated and is not being reset.
- The logic is equivalent to the expression:
stq_alloc_next <= stq_wen OR (stq_alloc AND NOT stq_reset)
.
-
Output:
stq_alloc
(next state): The updated allocation status vector. This signal is used to identify which store entries are currently active.
3.2. Store Queue Pointer Logic
This block manages the four distinct pointers associated with the Store Queue:head
,tail
,issue
, andresp
.-
Input:
num_stores
: The number of new entries being allocated. From the Group Allocator.stq_tail
,stq_head
,stq_issue
,stq_resp
(current states): The current values of the pointers.stq_alloc
: The up-to-date allocation status vector for all entries.stq_issue_en
: An enable signal from the Request Issue Logic that allows thestq_issue
pointer to advance.stq_resp_en
: An enable signal, typically tied towresp_valid_i
, that allows thestq_resp
pointer to advance.
-
Processing:
- Tail Pointer Update: The
stq_tail
pointer is advanced bynum_stores
usingWrapAdd
upon new group allocation. - Head Pointer Update: The
stq_head
pointer advances to the next oldest active entry. Its logic can depend on the configuration (e.g., advancing on a write response or usingCyclicPriorityMasking
). - Issue Pointer Update: The
stq_issue
pointer, which tracks the next store to be considered for memory issue, is incremented by one whenstq_issue_en
is high. - Response Pointer Update: The
stq_resp
pointer, which tracks completed write operations from memory, is incremented by one whenstq_resp_en
is high.
- Tail Pointer Update: The
-
Output:
stq_head
,stq_tail
,stq_issue
,stq_resp
(next states): The updated pointer values for the next cycle.
3.3. Store Queue Content State Logic
This logic manages the validity of the address and data payloads, as well as the execution status, within an allocated store entry.-
Input:
- Set Signals:
stq_addr_wen
: From the Store Address Port Dispatcher, setsstq_addr_valid
to true.stq_data_wen
: From the Store Data Port Dispatcher, setsstq_data_valid
to true.stq_exec_set
: (Optional, ifstResp=True
) From the memory interface, setsstq_exec
to true upon write completion.
- Common Reset Signal:
stq_wen
: From the Group Allocator. It acts as a synchronous reset, clearing these status bits when a new operation is allocated.
- Current State Signals:
stq_addr_valid
,stq_data_valid
,stq_exec
: The current state of each register, fed back as an input.
- Set Signals:
-
Processing:
- All three signals (
stq_addr_valid
,stq_data_valid
,stq_exec
) follow the same Set-Reset logic pattern, wherestq_wen
has reset priority. - A status bit is set if its corresponding “set” signal (e.g.,
stq_addr_wen
) is high. If not being set, it holds its value. - However, if
stq_wen
is high for an entry, all three status bits are unconditionally cleared to ‘0’. - The logic is equivalent to these expressions:
stq_addr_valid_next <= (stq_addr_wen OR stq_addr_valid) AND (NOT stq_wen)
stq_data_valid_next <= (stq_data_wen OR stq_data_valid) AND (NOT stq_wen)
stq_exec_next <= (stq_exec_set OR stq_exec) AND (NOT stq_wen)
- All three signals (
-
Output:
stq_addr_valid
(next state): Updated status indicating if the entry holds a valid store address.stq_data_valid
(next state): Updated status indicating if the entry holds valid store data.stq_exec
(next state): (Optional) Updated status indicating if the store has been executed by memory.
-
- Load-Store Order Matrix Logic (
store_is_older
)This logic maintains a 2D register matrix that captures the relative program order between every load and store in the queues. Its primary purpose is to provide a static record of dependencies for the conflict-checking logic.
-
Input:
ldq_wen
: The write-enable signal from the Group Allocator. Whenldq_wen[i]
is high, it triggers an update for rowi
of the matrix.stq_alloc
: The allocation status vector from the Store Queue Management Logic. This is used to identify any stores that were already in the queue before the new load was allocated.ga_ls_order
: A matrix from the Group Allocator that specifies the ordering within the newly allocated group.ga_ls_order[i][j]
is ‘1’ if storej
is older than loadi
in the same group.stq_reset
: The reset signal for store entries. Whenstq_reset[j]
is high, it clears columnj
of the matrix, removing the completed store as a dependency.store_is_older
(current state): The current state of the matrix, fed back as an input.
-
Processing:
- The logic for
store_is_older[i][j]
determines if storej
should be considered older than loadi
. This state is set once when loadi
is allocated and then only changes if storej
is deallocated. - On Load Allocation (
ldq_wen[i]
is high): The entire rowi
of the matrix is updated. For each storej
, the bitstore_is_older[i][j]
is set to ‘1’ if the store is not being reset (NOT stq_reset[j]
) AND one of the following is true:- The store
j
was already active in the queue when loadi
arrived (stq_alloc[j]
is ‘1’). - The store
j
is part of the same new group as loadi
and is explicitly defined as older (ga_ls_order[i][j]
is ‘1’).
- The store
- On Store Deallocation (
stq_reset[j]
is high): The logic clears the entire columnj
to ’0’s. This ensures that a completed store is no longer considered a dependency for any active loads. - Hold State: If no new load is being allocated to row
i
, the row maintains its existing values, except for any bits that are cleared due to a store deallocation.
- The logic for
-
Output:
store_is_older
(next state): The updated dependency matrix. This matrix is a critical input for both the Load-Store Conflict Logic and the Store-Load Conflict Logic. A ‘1’ atstore_is_older[i][j]
essentially means “loadi
must respect storej
.”
- Compare Address Logic (
addr_same
)This combinational logic block performs a direct comparison between every load address and every store address in the queues.
-
Input:
ldq_addr
: The array of all addresses stored in the Load Queue.stq_addr
: The array of all addresses stored in the Store Queue.
-
Processing:
- For every possible pair of a load
i
and a storej
, it performs a direct equality comparison:ldq_addr[i] == stq_addr[j]
.
- For every possible pair of a load
-
Output:
addr_same
: A 2D matrix where the bit at[i, j]
is ‘1’ if the addresses of loadi
and storej
are identical. This matrix is a fundamental input for both the conflict and bypass logic to detect potential address hazards.
- Address Validity Logic (
addr_valid
)This logic checks whether both operations in a given load-store pair have received their addresses, making them eligible for a meaningful address comparison.
-
Input:
ldq_addr_valid
: A vector indicating which LDQ entries have a valid address.stq_addr_valid
: A vector indicating which STQ entries have a valid address.
-
Processing:
- For every possible pair of a load
i
and a storej
, it performs a logical AND operation:ldq_addr_valid[i] AND stq_addr_valid[j]
.
- For every possible pair of a load
-
Output:
addr_valid
: A 2D matrix where the bit at[i, j]
is ‘1’ only if both loadi
and storej
have valid addresses. This is used to qualify bypass conditions, ensuring a bypass is only considered when both addresses are known.
- Load Request Validity Logic (
load_req_valid
)This logic block generates a list of loads that are ready to be evaluated by the dependency checker. It filters out loads that are not yet active or have already been completed.
-
Input:
ldq_alloc
: The vector indicating which load queue entries are currently allocated and active.ldq_issue
: The vector indicating which load requests have already been satisfied (either by being sent to memory or through a bypass).ldq_addr_valid
: The vector indicating which active loads have received their address payload.
-
Processing:
- For each load entry
i
, it performs a logical AND across three conditions to determine if it’s a valid candidate for issue:- The entry must be allocated (
ldq_alloc[i]
). - The address must be valid (
ldq_addr_valid[i]
). - The request must not have been previously issued (
NOT ldq_issue[i]
).
- The entry must be allocated (
- The complete expression is
ldq_alloc[i] AND ldq_addr_valid[i] AND (NOT ldq_issue[i])
.
- For each load entry
-
Output:
load_req_valid
: A vector where a ‘1’ at indexi
signifies that loadi
is an active request that is ready to be checked for dependencies. This vector serves as the primary input pool for the Load to Memory Logic.
- Load-Store Conflict Logic
This is the primary logic for ensuring load safety. It checks every active load against every active store to see if the load must wait for the store to complete.
-
Input:
stq_alloc
: A vector indicating which store queue entries are active.store_is_older
: The 2D matrix establishing the program order between loads and stores.addr_same
: The 2D matrix indicating which load-store pairs have identical addresses.stq_addr_valid
: A vector indicating which stores have a valid address.
-
Processing:
- It calculates the
ld_st_conflict
matrix. A conflict atld_st_conflict[i][j]
is asserted (‘1’) if all of the following are true:- The store
j
is allocated (stq_alloc[j]
). - The store
j
is older than the loadi
(store_is_older[i][j]
). - A potential address hazard exists, which means either:
- Their addresses are identical (
addr_same[i][j]
). - OR the store’s address is not yet known (
NOT stq_addr_valid[j]
).
- Their addresses are identical (
- The store
- It calculates the
-
Output:
ld_st_conflict
: A 2D matrix where a ‘1’ at[i, j]
signifies that loadi
has a dependency on storej
and must not be issued to memory ifj
is also pending.
- Load Queue Bypass Logic (Determining Bypass Potential)
This block determines for which load-store pairs a bypass (store-to-load forwarding) is potentially possible.
-
Input:
ldq_alloc
: A vector indicating which load queue entries are active.ldq_issue
: A vector indicating which loads have already been satisfied.stq_data_valid
: A vector indicating which store entries have valid data ready for forwarding.addr_same
: The address equality matrix.addr_valid
: The matrix indicating pairs with valid addresses.
-
Processing:
- It calculates the
can_bypass
matrix. A bitcan_bypass[i][j]
is asserted if all conditions for a potential bypass are met:- The load
i
is active (ldq_alloc[i]
). - The load
i
has not been issued yet (NOT ldq_issue[i]
). - The store
j
has valid data (stq_data_valid[j]
). - Both the load and store have valid and identical addresses (
addr_valid[i][j]
andaddr_same[i][j]
).
- The load
- It calculates the
-
Output:
can_bypass
: A 2D matrix indicating every load-store pair where a bypass is theoretically possible. This matrix is used as an input to the logic that makes the final bypass decision.
- Load to Memory Logic
This logic block makes the final decision on which loads are safe to issue to the memory interface.
-
Input:
ld_st_conflict
: The dependency matrix from the Load-Store Conflict Logic.load_req_valid
: A vector indicating which loads are active and ready to be checked.ldq_head_oh
: The one-hot head pointer of the load queue, used to prioritize the oldest requests.
-
Processing:
- First, it OR-reduces each row of the
ld_st_conflict
matrix to create a singleload_conflict
bit for each load. - It then creates a list of issue candidates,
can_load
, by selecting requests fromload_req_valid
that are not blocked (NOT load_conflict
). - Finally, it uses
CyclicPriorityMasking
on thecan_load
list to arbitrate and select the oldest, highest-priority load(s) for the available memory read channel(s).
- First, it OR-reduces each row of the
-
Output:
load_en
: A vector of enable signals, one for each memory read channel. This directly drivesrreq_valid_o
.load_idx_oh
: A one-hot vector for each memory channel, identifying which load queue entry won the arbitration. This is used to formrreq_id_o
and select therreq_addr_o
.
- Store-Load Conflict Logic
This logic ensures a store operation is not issued if it might conflict with an older, unresolved load operation.
-
Input:
ldq_alloc
: A vector indicating which load entries are active.store_is_older
: The program order matrix.addr_same
: The address equality matrix.ldq_addr_valid
: A vector indicating which loads have valid addresses.stq_issue
: The pointer to the specific store entry being considered for issue.
-
Processing:
- It checks the store candidate at
stq_issue
against every active loadi
. A conflictst_ld_conflict[i]
is asserted if:- The load
i
is active (ldq_alloc[i]
). - The load
i
is older than the store candidate (NOT store_is_older[i][stq_issue]
). - A potential address hazard exists, meaning their addresses are identical (
addr_same[i][stq_issue]
) OR the load’s address is not yet known (NOT ldq_addr_valid[i]
).
- The load
- It checks the store candidate at
-
Output:
st_ld_conflict
: A vector indicating which loads are in conflict with the current store candidate. This vector is then OR-reduced to create a singlestore_conflict
signal for the Request Issue Logic.
- Store Queue Bypass Logic (Finalizing the Bypass Decision)
This logic makes the final decision on whether to execute a bypass for a given load.
-
Input:
ld_st_conflict
: The matrix of all load-store dependencies.can_bypass
: The matrix of potential bypass opportunities calculated by the Load Queue Bypass Logic.stq_last_oh
: A one-hot vector indicating the last allocated store, used for priority.
-
Processing:
- For each load
i
that has conflicts, it usesCyclicPriorityMasking
on its conflict rowld_st_conflict[i]
to find the single, youngest store that it depends on. This identifies the store with the most up-to-date data version for that address. - It then checks if a bypass is possible with that specific store by checking the corresponding bit in the
can_bypass
matrix. - If both conditions are met, the bypass is confirmed.
- For each load
-
Output:
bypass_en
: A vector wherebypass_en[i]
is asserted if loadi
will be satisfied via a bypass in the current cycle. This signal triggers theldq_issue_set
and the data muxing from the store queue to the load queue.
-
Memory Request Issue Logic
This logic is the final stage of the dependency-checking pipeline. It is responsible for arbitrating among safe, ready-to-go memory operations and driving the signals to the external memory interface. It is composed of two distinct parts for handling load and store requests.13.1. Load Request Issue Logic
This part of the logic selects which non-conflicting load requests should be sent to the memory system’s read channels.-
Input:
ld_st_conflict
: The 2D matrix indicating all dependencies between loads and stores.load_req_valid
: A vector indicating which loads are active, have a valid address, and have not yet been satisfied.ldq_head_oh
: The one-hot head pointer of the load queue, used to grant priority to the oldest requests.ldq_addr
: The array of addresses stored in the Load Queue.
-
Processing:
- First, it creates a list of candidate loads (
can_load
) by filtering theload_req_valid
list, removing any loads that have a dependency indicated by theld_st_conflict
matrix. - It then uses
CyclicPriorityMasking
to arbitrate among thesecan_load
candidates. This process selects the oldest, highest-priority requests to issue to the available memory read channels.
- First, it creates a list of candidate loads (
-
Output:
rreq_valid_o
: The “valid” signal for the memory read request channel. It is asserted when a winning load candidate is selected by the arbitration logic.rreq_addr_o
: The address of the winning load, selected from theldq_addr
array via a multiplexer controlled by the arbitration result (load_idx_oh
).rreq_id_o
: The ID for the read request, which corresponds to the load’s index in the queue. This is also derived from the arbitration result and is used to match the memory response later.
13.2. Store Request Issue Logic
This part of the logic determines if the single, oldest pending store request (indicated by thestq_issue
pointer) is safe to send to the memory write channel.-
Input:
st_ld_conflict
: A vector indicating if the current store candidate conflicts with any older loads.stq_alloc
,stq_addr_valid
,stq_data_valid
: The status bits for the store entry at thestq_issue
pointer.stq_addr
,stq_data
: The payload data for the store entry at thestq_issue
pointer.
-
Processing:
- It performs a final check to generate the
store_en
signal. The signal is asserted only if the store candidate has no conflicts with older loads (NOT store_conflict
) AND its entry is fully prepared (i.e., it is allocated and both its address and data are valid). - If
store_en
is asserted, the logic gates the address and data from the store entry atstq_issue
to the write request output ports.
- It performs a final check to generate the
-
Output:
wreq_valid_o
: The “valid” signal for the memory write request channel, driven directly by thestore_en
signal.wreq_addr_o
,wreq_data_o
: The address and data of the store being issued.stq_issue_en
: An internal signal that enables thestq_issue
pointer to advance. It is asserted when a store is successfully issued and accepted by the memory interface (store_en AND wreq_ready_i
).
-
3. Pipelining
- Purpose
- The dependency-checking unit is the longest combinational path in the LSQ, so we split it into shorter timing-friendly segments.
- Implementation
- Stage 0
pipeComp
- Stage 1
pipe0
- Stage 2
pipe1
- Stage 0
Note: Each of these stages can be independently enabled or disabled via the
pipeComp
,pipe0
, andpipe1
config flags—so you only pay the pipeline overhead where you need the extra timing slack.