Question 21: Flops Needed for a Sequence Detector FSM
How many flip-flops are required to implement a Finite State Machine (FSM) that detects the input sequence “10101”?
The number of flip-flops depends on two factors: the number of states in the FSM and the state encoding scheme used.
- Number of States: To detect “10101”, we need states to track the progress. For a non-overlapping sequence detector:
S0
: Idle state (or no part of sequence seen)S1
: `1` seenS2
: `10` seenS3
: `101` seenS4
: `1010` seen
Upon receiving the final `1` in state `S4`, the machine outputs a ‘detect’ signal. This requires a total of 5 states for a Mealy FSM (where output depends on state and input). A Moore FSM would require an additional state for the output, for a total of 6.
- State Encoding:
- Binary Encoding: Requires `ceil(log2(N))` flops, where N is the number of states. For 5 states, this is `ceil(log2(5)) = 3` flip-flops.
- One-Hot Encoding: Requires `N` flops, one for each state. For 5 states, this is 5 flip-flops.
So, the minimum number of flip-flops required is 3 (using a Mealy FSM with binary encoding).
Question 22: Calculate Maximum Frequency of Operation
Given a sequential path, calculate the maximum possible operating frequency. Use the following values: Clock-to-Q delay (`t_c2q`) = 1ns, Combinational logic delay (`t_comb`) = 3ns, Setup time (`t_setup`) = 1ns, and Clock skew (`t_skew`) = 1ns, where the launch clock is delayed relative to the capture clock.
The minimum clock period (`T_min`) must be long enough for the data to travel from the launch flop to the capture flop and meet the setup time of the capture flop.
The standard setup timing equation is:
T_min ≥ t_c2q + t_comb + t_setup + t_skew
Where `t_skew` is defined as `T_capture_clk – T_launch_clk`. In this problem, the launch clock is delayed by 1ns, so `T_launch_clk` is later than `T_capture_clk`. This results in a negative skew, which helps meet setup time.
- `t_c2q` = 1 ns
- `t_comb` = 3 ns
- `t_setup` = 1 ns
- `t_skew` = -1 ns
We can calculate the minimum period:
T_min ≥ 1ns + 3ns + 1ns + (-1ns)
T_min ≥ 4 ns
The maximum frequency (`F_max`) is the reciprocal of the minimum period:
F_max = 1 / T_min = 1 / 4 ns = 250 MHz
Therefore, the maximum frequency of operation is 250 MHz.
Question 23: Clock Multiply-by-2 Circuit
Draw a logic diagram for a circuit that multiplies an input clock’s frequency by two.
A clock’s frequency can be doubled by XORing the original clock with a version of itself that has been phase-shifted by 90 degrees (or one-quarter of its period). This creates a pulse on both the rising and falling edges of the original clock.
In practice, this is usually done inside a PLL (Phase-Locked Loop) or DLL (Delay-Locked Loop) which can generate precise phase shifts and frequencies.
Question 24: Fixed Priority Arbiter RTL
Provide an optimized RTL implementation for a fixed priority arbiter.
A fixed priority arbiter takes a multi-hot request vector (`req_in`) and produces a one-hot grant vector (`gnt_out`), granting the request with the highest fixed priority (e.g., LSB is highest). The following is a scalable and optimized implementation using a `for` loop with a `genvar` which synthesizes into a chain of priority logic.
module fixed_priority_arbiter #(
parameter WIDTH = 8
) (
input logic [WIDTH-1:0] req_in,
output logic [WIDTH-1:0] gnt_out
);
// Grant the LSB if it is requested, as it has the highest priority.
assign gnt_out[0] = req_in[0];
// For other bits, grant only if requested AND no lower bit (higher priority) was granted.
genvar i;
for (i = 1; i < WIDTH; i = i + 1) begin : grant_logic
assign gnt_out[i] = req_in[i] & ~|req_in[i-1:0];
end
endmodule
Note: The logic `& ~|req_in[i-1:0]` is more efficient than `& ~|gnt_out[i-1:0]` because it avoids long dependency chains on the grant signals themselves, resulting in a faster circuit.
Question 25: Round Robin Arbiter RTL
Provide an RTL implementation for a Round Robin arbiter.
A Round Robin arbiter gives each requestor an equal chance by rotating priority. After a request is granted, it becomes the lowest priority in the next cycle. A common way to implement this is by using a fixed priority arbiter on a shifted version of the request vector.
module round_robin_arbiter #(
parameter REQ_WIDTH = 4
) (
input logic clk, reset,
input logic [REQ_WIDTH-1:0] req,
output logic [REQ_WIDTH-1:0] gnt
);
logic [REQ_WIDTH-1:0] priority_ff;
logic [REQ_WIDTH-1:0] req_masked;
logic [REQ_WIDTH-1:0] gnt_unmasked;
always_ff @(posedge clk or posedge reset) begin
if (reset) begin
priority_ff <= '1; // Initial priority to requestor 0
end else if (|gnt) begin
// Rotate priority to the next requestor
priority_ff <= {gnt[REQ_WIDTH-2:0], gnt[REQ_WIDTH-1]};
end
end
// Mask requests based on priority. Only higher priority requests are considered.
// This is a simplified approach. A more robust one would involve shifting.
// A common implementation uses a fixed priority arbiter on shifted requests.
// For simplicity, here is a pointer-based approach:
integer i;
always_comb begin
gnt = '0;
// Start checking from the highest priority requestor
for (i = 0; i < REQ_WIDTH; i = i + 1) begin
if (req[i] && priority_ff[i]) begin
gnt[i] = 1'b1;
break; // Grant first one found
end
end
// If no priority requestor is active, check all requests
if (|gnt == 1'b0) begin
for (i = 0; i < REQ_WIDTH; i = i + 1) begin
if (req[i]) begin
gnt[i] = 1'b1;
break;
end
end
end
end
endmodule
Question 26: Weighted Round Robin Arbiter
Provide the pseudo-code or RTL structure for a Weighted Round Robin (WRR) arbiter.
A WRR arbiter extends the Round Robin scheme by allowing some requestors to be granted more often than others, based on assigned "weights". A requestor is only skipped in the Round Robin scheme if its weight counter is zero.
Architecture:
- Each requestor has a `weight_counter`. These are initialized to their assigned weights.
- The arbiter uses a standard Round Robin scheme, but only considers requests where the corresponding `weight_counter` is greater than zero.
- When a request is granted, its `weight_counter` is decremented.
- When all active requestors have their `weight_counter` at zero, all counters are reloaded with their initial weight values.
// --- Conceptual RTL for a 4-requestor WRR Arbiter ---
// State Elements
reg [3:0] weight_counter_0, weight_counter_1, ...;
reg [3:0] priority_pointer; // As in standard RR
// Inputs: req[3:0], clk, reset
// Outputs: gnt[3:0]
// Combinational Logic to determine which requests are active
wire [3:0] active_req;
assign active_req[0] = req[0] & (weight_counter_0 > 0);
assign active_req[1] = req[1] & (weight_counter_1 > 0);
// ... and so on for all requestors
// Use a standard Round Robin arbiter on `active_req` to generate `gnt`
// Sequential Logic for counter updates
always @(posedge clk or posedge reset) begin
if (reset) begin
// Reload all counters to initial weights
weight_counter_0 <= INITIAL_WEIGHT_0;
// ...
end else begin
// Decrement counter for the granted requestor
if (gnt[0]) weight_counter_0 <= weight_counter_0 - 1;
// ...
// Check if all active counters are zero to reload
if (reload_condition) begin
weight_counter_0 <= INITIAL_WEIGHT_0;
// ...
end
end
end
Question 27: Frequency Divide-by-3 Circuit
Draw a logic diagram for a circuit that divides an input clock frequency by 3.
A divide-by-3 counter requires 2 flip-flops to achieve the 3 unique states. The output will be high for one cycle and low for two, resulting in a 33% duty cycle. We can derive the logic from a state table.
State Table: (Current State: Qa, Qb | Next State: Da, Db | Output)
- 00 -> 01 | 0
- 01 -> 10 | 0
- 10 -> 00 | 1
From this, we can derive the next-state logic equations:
Da = Qb'
Db = Qa
Output = Qa
(if we want the output high in state `10`)
A common implementation uses a slightly different state assignment to simplify logic:
Da = (Qa | Qb)'
Db = Qa
Output = Qb
Question 28: Find Position of First Set Bit
How do you find the index of the first '1' in a binary vector, searching from a) the right (LSB) and b) the left (MSB)?
This is functionally equivalent to designing a priority encoder.
a) From the Right (LSB)
This is a standard priority encoder. It takes a multi-bit vector and outputs the binary index of the highest priority active bit, where the LSB (`bit 0`) has the highest priority.
Example: Input `8'b01101000` -> Output `3'b011` (index 3).
In Verilog, this can be implemented with a `casex` statement or a `for` loop.
b) From the Left (MSB)
This is also a priority encoder, but with the priority reversed (MSB is highest). The easiest way to implement this in RTL is to simply reverse the input vector before feeding it into a standard LSB-first priority encoder.
Example: Input `8'b01101000` -> Output `3'b110` (index 6).
Question 29: Count Trailing Zeros
Design a circuit that takes a binary number and returns the count of trailing zeros (the number of zeros from the LSB up to the first '1'). If the input is all zeros, it should return the width of the word.
This is another application of a priority encoder. The number of trailing zeros is simply the index of the first '1' from the right (LSB).
The most scalable and elegant solution combines a priority encoder with the system function `$clog2`.
- Isolate the LSB '1': A common trick is `lsb_one = data_in & -data_in`. This produces a one-hot vector with only the least significant '1' of `data_in` set. For example, `8'b01101000` becomes `8'b00001000`.
- Convert to Index: The SystemVerilog function `$clog2(lsb_one)` takes this one-hot number and returns its binary index. For `8'b00001000`, `$clog2` returns `3`.
- Handle the All-Zeros Case: An explicit check is needed for an input of `0`.
module trailing_zero_detector #(
parameter WIDTH = 8
)(
input logic [WIDTH-1:0] data_in,
output logic [$clog2(WIDTH):0] zero_count
);
logic [WIDTH-1:0] lsb_one;
assign lsb_one = data_in & -data_in;
always_comb begin
if (data_in == '0) begin
zero_count = WIDTH;
end else begin
zero_count = $clog2(lsb_one);
end
end
endmodule
Question 30: Pulse Stretching Circuit
Draw a diagram for a pulse stretching circuit.
A pulse stretcher extends the duration of a short input pulse. A simple way to do this for a fixed number of cycles is with a shift register and an OR gate.
Architecture: The input pulse is fed into a shift register. The output is the OR-reduction of all the bits in the shift register. If the shift register is N-bits wide, an input pulse of 1 cycle will be stretched to an output pulse of N cycles.
For stretching a pulse by a large or variable number of cycles, a counter-based approach is more efficient than a long shift register.