Queue-to-Port Dispatcher
How loaded data get back to where it belongs.
1. Overview and Purpose

The Queue-to-Port Dispatcher is the counterpart to the Port-to-Queue Dispatcher. Its responsibility is to route payloads—primarily data loaded from memory—from the queue entries back to the correct access ports of the dataflow circuit.
While the LSQ can process memory requests out-of-order, the results for a specific access port must be returned in program order to maintain the correctness. This module ensures that this order is respected for each port.
The primary instance of this module is the Load Data Port Dispatcher, which sends loaded data back to the circuit. An optional second instance, the Store Backward Port Dispatcher, can be used to send store completion acknowledgements back to the circuit.
2. Queue-to-Port Dispatcher Internal Blocks

Let’s assume the following generic parameters for dimensionality:
N_PORTS: The total number of ports.N_ENTRIES: The total number of entries in the queue.PAYLOAD_WIDTH: The bit-width of the payload (e.g., 8 bits).PORT_IDX_WIDTH: The bit-width required to index a port (e.g.,ceil(log2(N_PORTS))).
Signal Naming and Dimensionality:
This module is generated from a higher-level description (e.g., in Python), which results in a specific convention for signal naming in the final VHDL code. It’s important to understand this convention when interpreting diagrams and signal tables.
-
Generation Pattern: A signal that is conceptually an array in the source code (e.g.,
port_payload_o) is “unrolled” into multiple, distinct signals in the VHDL entity. The generated VHDL signals are indexed with a suffix, such asport_payload_{p}_o, where{p}is the port index. -
Interpreting Diagrams: If a diagram or conceptual description uses a base name without an index (e.g.,
port_payload_o), it represents a collection of signals. The actual dimension is expanded based on the context:- Port-related signals (like
port_payload_o) are expanded by the number of ports (N_PORTS). - Entry-related signals (like
entry_alloc_o) are expanded by the number of queue entries (N_ENTRIES).
- Port-related signals (like
Port Interface Signals

These signals are used for communication between the external modules and the dispatcher’s ports.
| Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
|---|---|---|---|---|
| Inputs | ||||
port_ready_i | port_ready_{p}_i | Input | std_logic | Ready flag from port p. port_ready_{p}_i is high when the external circuit is ready to receive data. |
| Outputs | ||||
port_payload_o | port_payload_{p}_o | Output | std_logic_vector(PAYLOAD_WIDTH-1:0) | Data payload sent to port p. |
port_valid_o | port_valid_{p}_o | Output | std_logic | Valid flag for port p. Asserted to indicate that port_payload_{p}_o contains valid data. |
Queue Interface Signals

These signals handle the interaction between the dispatcher logic and the internal queue entries.
| Python Variable Name | VHDL Signal Name | Direction | Dimensionality | Description |
|---|---|---|---|---|
| Inputs | ||||
entry_alloc_i | entry_alloc_{e}_i | Input | std_logic | Is queue entry e logically allocated? |
entry_payload_valid_i | entry_payload_valid_{e}_i | Input | std_logic | Is the result data in entry e valid and ready to be sent? |
entry_port_idx_i | entry_port_idx_{e}_i | Input | std_logic_vector(PORT_IDX_WIDTH-1:0) | Indicates to which port entry e is assigned. |
entry_payload_i | entry_payload_{e}_i | Input | std_logic_vector(PAYLOAD_WIDTH-1:0) | The data stored in queue entry e. |
queue_head_oh_i | queue_head_oh_i | Input | std_logic_vector(N_ENTRIES-1:0) | One-hot vector indicating the head entry in the queue. |
| Outputs | ||||
entry_reset_o | entry_reset_{e}_o | Output | std_logic | Reset signal for an entry. entry_reset_{e}_o is asserted to deallocate entry e after its data has been successfully sent. |
The Queue-to-Port Dispatcher has the following core responsibilities (with 3-port, 4-entry load data dispatcher example):
-
Port Index Decoder

When the group allocator allocates a queue entry, it also assigns the queue entry to a specific port, storing this port assignment as an integer. The Port Index Decoder decodes the port assignment for each queue entry from an integer representation to a one-hot representation.- Input:
entry_port_idx_i: Queue entry-port assignment information
- Processing:
- It performs a integer-to-one-hot conversion on the port index associated with each entry. For example, if there are 3 ports, an integer index of
1 (01 in binary)would be converted to a one-hot vector of010.
- It performs a integer-to-one-hot conversion on the port index associated with each entry. For example, if there are 3 ports, an integer index of
- Output:
entry_port_idx_oh: A one-hot vector for each entry that directly corresponds to the port it is assigned to.
- Input:
-
Find Allocated Entry

This block identifies which entries in the queue are currently allocated (entry_alloc_{e}_i=1), meaning whether each entry is allocated by the group allocator or not.- Input:
entry_alloc_i: Indicates if the entry is allocated by the group allocator.entry_port_idx_oh: A one-hot vector for each entry that directly corresponds to the port it is assigned to.
- Processing:
- For each queue entry
e, this block performs the check:entry_alloc_i AND entry_port_idx_oh. - If an entry is not allocated (i.e., not allocated by the group allocator,
entry_alloc_{e}_i = 0), its port assignment is masked, resulting in a zero vector. - If the entry is allocated (i.e., allocated,
entry_alloc_{e}_i = 1), its one-hot port assignment is passed through unchanged.
- For each queue entry
- Output:
entry_allocated_per_port: The resulting matrix where a1at position(e,p)indicates that entryeis allocated and assigned to portp. This matrix represents all potential candidates for sending data and is fed into the arbitration logicCyclicPriorityMaskingto determine which entry gets to send its data first for each port.
- Input:
-
Find Oldest Allocated Entry

This is the core Arbitration Logic of the dispatcher. It takes all potential requests and selects a single “oldest” for each port based on priority.- Input:
entry_allocated_per_port: A matrix where a1at position (e, p) indicates that queue entryeis allocated and assigned to portp. This represents the entire pool of candidates competing for access to the output ports.queue_head_oh_i: The queue’s one-hot head vector, which represents the priority (i.e., the oldest entry) for the current cycle.
- Processing:
- It uses a CyclicPriorityMasking algorithm, which operates on each port (column of
entry_allocated_per_port). - This ensures that among all candidates for each port, the one corresponding to the oldest entry in the queue is granted for the current clock cycle.
- It uses a CyclicPriorityMasking algorithm, which operates on each port (column of
- Output:
oldest_entry_allocated_per_port: The resulting matrix after arbitration. For each port (column), this matrix now contains at most1(it’s a one-hot vector or all zeros). This1indicates the single, highest-priority entry that has won the arbitration for that port.
- Input:
-
Payload Mux

For each access port, this block routes the payload from the oldest queue entry to the correct output port.- Input:
entry_payload_i:N_ENTRIESof the data payload from all queue entries.oldest_entry_allocated_per_port: The arbitrated selection matrix from the Find Oldest Allocated Entry block. For each port (column), this matrix contains at most a single1, which identifies the oldest entry for that port.
- Processing:
- For each output port
p, a one-hot multiplexer (Mux1H) uses thep-th column of theoldest_entry_allocated_per_portmatrix as its select signal. - This operation selects the data payload from the single oldest entry out of the entire
entry_payload_iand routes it to the corresponding output port.
- For each output port
- Output:
port_payload_o:N_PORTSof the data payloads.port_payload_{p}_oholds the data from the oldest queue entry for that port, ready to be sent to the external access port.
- Input:
-
Handshake Logic

This block manages thevalid/readyhandshake with the external access ports. It checks that the oldest entry’s data from the cyclic priority masking is valid and that the receiving port is ready, then generates a signal indicating that it is transferred.- Input:
port_ready_i:N_PORTSof the ready signals from the external access ports.port_ready_{p}_iis high when portpcan accept data.entry_payload_valid_i: Each of theN_ENTRIESindicates whether the data slot of queue entryeis valid and ready to be sent.oldest_entry_allocated_per_port: The arbitrated selection matrix from the Find Oldest Allocated Entry block, indicating at most the single oldest entry for each port.
- Processing:
- Check the Oldest’s Data Validity: First, the block verifies if the data in the oldest entry is actually ready. It masks the
oldest_entry_allocated_per_portmatrix with theentry_payload_valid_i. If the oldest entry for a port doesn’t have valid data, it is nullified for this cycle. The result isentry_waiting_for_port_valid. - Generate
port_valid_o: The result of the masking from the previous step is then reduced (OR-reduction) for each port. If any entry in a column is still valid, it means a oldest entry with valid data exists for that port, and the correspondingport_valid_osignal is asserted high. - Perform Handshake: Next, it determines if a successful handshake occurs. For each port
p, a handshake is successful if the dispatcher has valid data to send (port_valid_{p}_ois high) AND the external port is ready to receive it (port_ready_{p}_iis high).
- Check the Oldest’s Data Validity: First, the block verifies if the data in the oldest entry is actually ready. It masks the
- Output:
port_valid_o: The final valid signal sent to each external access port, indicating that valid data is available on theport_payload_obus.entry_port_transfer: A matrix representing the completed handshakes for the current cycle. A1in this matrix indicates that the data from a specific entry has been transferred to its assigned port. This signal is used by the next block Reset to generate theentry_reset_osignal.
- Input:
-
Reset

This block is responsible for clearing a queue entry after its payload has been successfully dispatched.- Input:
entry_port_transfer: A matrix representing the completed handshakes for the current cycle. A1in this matrix indicates that the data from a specific entry has been transferred to its assigned port.
- Processing:
- Checks each entry (row) of
entry_port_transferwhether it has1in a given rowe. It means that entryesent its data to some port. - Performs an OR operation across each row.
- Checks each entry (row) of
- Output:
entry_reset_o: When the queue receives this signal, it de-allocates the corresponding entry, making it available for a new operation. However, this de-allocating logic is not in the dispatcher module but outside of it.
- Input:
3. Dataflow Walkthrough

-
Initial state:
- Port Assignments:
- Entry 0 -> Port 1
- Entry 1 -> Port 2
- Entry 2 -> Port 0
- Entry 3 -> Port 2
- Queue Head: At
Entry 1. entry_alloc_i:[0, 1, 1, 1](Entries 1, 2, 3 are allocated).entry_payload_valid_i:[0, 1, 1, 0](Entries 1, 2 have valid data).port_ready_i:[0, 1, 1](Ports 1 and 2 are ready, Port 0 is not).
- Port Assignments:
-
Port Index Decoder

This block translates the integer port index assigned to each queue entry into a one-hot vector.
Based on the example diagram:-
The
Port Index Decoderconverts these integer port indices into 3-bit one-hot vectors:Entry 0 (Port 1):010Entry 1 (Port 2):100Entry 2 (Port 0):001Entry 3 (Port 2):100
-
This result is saved in
entry_port_idx_ohentry_port_idx_oh P2 P1 P0 E0: [ 0, 1, 0 ] E1: [ 1, 0, 0 ] E2: [ 0, 0, 1 ] E3: [ 1, 0, 0 ]
-
-
Find Allocated Entry

This block identifies all queue entries that are candidates for dispatching. Based on the example diagram:- The
entry_alloc_ivector is[0, 1, 1, 1]. Therefore, Entries 1, 2, and 3 are the potential candidates to send their data out. - The logic then combines this allocation information with the one-hot decoded port index for each entry (
entry_port_idx_ohfrom thePort Index Decoder). An entry’s one-hot port information is passed through only if its correspondingentry_alloc_ibit is1. - If an entry is not allocated (like
Entry 0), its output for this stage is zeroed out (000). - The result is the
entry_allocated_per_portmatrix, which represents the initial list of all allocated queue entries and their target ports. This matrix is then sent to theFind Oldest Allocated Entryblock for arbitration.
- The
-
Find Oldest Allocated Entry

This is the core Arbitration Logic. It selects a single “oldest” for each port from the list of allocated candidates, based on priority.
Based on the example diagram:- The queue head is at
Entry 1, establishing a priority order of1 -> 2 -> 3 -> 0. Port 0: The only allocated candidate isEntry 2. It is the oldest for Port 0.Port 1: There are no valid candidates assigned to this port.Port 2: The valid candidates areEntry 1andEntry 3. According to the priority order,Entry 1is the oldest forPort 2.- The output indicates that
Entry 2is the oldest forPort 0, andEntry 1is the oldest forPort 2. - The result is
oldest_entry_allocated_per_port
- The queue head is at
-
Payload Mux

This block routes the data from the oldest entries to the correct output ports.
Based on the example diagram:-
For
port_payload_o[0], it selects the data from the oldest entry ofPort 0,Entry 2. -
For
port_payload_o[2], it selects the data from the oldest entry ofPort 2,Entry 1. -
For
Port 1,0is assigned. -
The result is
port_payload_oport_payload_o P0: entry_payload_i [2] = 00010001 P1: Zero = 00000000 P2: entry_payload_i [1] = 11111111
-
-
Handshake Logic

This block manages the final stage of the dispatch handshake. It first generates theport_valid_osignals by checking if the oldest one from arbitration have valid data to send. It then confirms which of these can complete a successful handshake.
Based on the example diagram:-
First, the logic checks the
entry_payload_valid_ivector, which is[0, 1, 1, 0]. This indicates that among the oldest queue entries, data is valid and ready to be sent fromEntry 1andEntry 2. -
For the
Port 0oldest (Entry 2), itsentry_payload_valid_iis1. The logic assertsport_valid_o[0]to1. -
For the
Port 2oldest (Entry 1), itsentry_payload_valid_iis1. The logic assertsport_valid_o[2]to1. -
Next, the logic checks incoming
port_ready_isignals from the access ports, which are[0, 1, 1]. This means thatPort 1andPort 2are ready, butPort 0is not. A final handshake is successful only if the dispatcher has valid data to sendANDthe port is ready to receive. Theentry_port_transfermatrix shows this final result:entry_port_transfer P2 P1 P0 E0: [ 0, 0, 0 ]` E1: [ 1, 0, 0 ] // Handshake succeeds (valid=1, ready=1) E2: [ 0, 0, 0 ] // Handshake fails (valid=1, ready=0) E3: [ 0, 0, 0 ] -
This means: “Even though the queue is sending valid data to Port 0 and Port 2, only the handshake with Port 2 is successful because only Port 2 is ready to receive data.”
-
-
Reset

This block is responsible for generating theentry_reset_osignal, which clears an entry in the queue after its data has been successfully dispatched. A successful dispatch requires a completevalid/readyhandshake.
Based on the initial state:- The Reset block asserts
entry_reset_oonly for the entry corresponding to the successful handshake, which isEntry 1. The message in the diagram confirms this: “From Entry 1 of the load queue, the data is sent to Port 2. Please reset Entry 1”.
- The Reset block asserts