VLSI Interview Questions and Answers – Part 3

Question 21: Flops Needed for a Sequence Detector FSM

How many flip-flops are required to implement a Finite State Machine (FSM) that detects the input sequence “10101”?

The number of flip-flops depends on two factors: the number of states in the FSM and the state encoding scheme used.

Number of States: To detect “10101”, we need states to track the progress. For a non-overlapping sequence detector:
- S0: Idle state (or no part of sequence seen)
- S1: `1` seen
- S2: `10` seen
- S3: `101` seen
- S4: `1010` seen
Upon receiving the final `1` in state `S4`, the machine outputs a ‘detect’ signal. This requires a total of 5 states for a Mealy FSM (where output depends on state and input). A Moore FSM would require an additional state for the output, for a total of 6.
State Encoding:
- Binary Encoding: Requires `ceil(log2(N))` flops, where N is the number of states. For 5 states, this is `ceil(log2(5)) = 3` flip-flops.
- One-Hot Encoding: Requires `N` flops, one for each state. For 5 states, this is 5 flip-flops.

So, the minimum number of flip-flops required is 3 (using a Mealy FSM with binary encoding).

Question 22: Calculate Maximum Frequency of Operation

Given a sequential path, calculate the maximum possible operating frequency. Use the following values: Clock-to-Q delay (`t_c2q`) = 1ns, Combinational logic delay (`t_comb`) = 3ns, Setup time (`t_setup`) = 1ns, and Clock skew (`t_skew`) = 1ns, where the launch clock is delayed relative to the capture clock.

The minimum clock period (`T_min`) must be long enough for the data to travel from the launch flop to the capture flop and meet the setup time of the capture flop.

The standard setup timing equation is:

T_min ≥ t_c2q + t_comb + t_setup + t_skew

Where `t_skew` is defined as `T_capture_clk – T_launch_clk`. In this problem, the launch clock is delayed by 1ns, so `T_launch_clk` is later than `T_capture_clk`. This results in a negative skew, which helps meet setup time.

`t_c2q` = 1 ns
`t_comb` = 3 ns
`t_setup` = 1 ns
`t_skew` = -1 ns

We can calculate the minimum period:

T_min ≥ 1ns + 3ns + 1ns + (-1ns)

T_min ≥ 4 ns

The maximum frequency (`F_max`) is the reciprocal of the minimum period:

F_max = 1 / T_min = 1 / 4 ns = 250 MHz

Therefore, the maximum frequency of operation is 250 MHz.

Question 23: Clock Multiply-by-2 Circuit

Draw a logic diagram for a circuit that multiplies an input clock’s frequency by two.

A clock’s frequency can be doubled by XORing the original clock with a version of itself that has been phase-shifted by 90 degrees (or one-quarter of its period). This creates a pulse on both the rising and falling edges of the original clock.

Clk

Clk_90

Clk_x2

In practice, this is usually done inside a PLL (Phase-Locked Loop) or DLL (Delay-Locked Loop) which can generate precise phase shifts and frequencies.

Question 24: Fixed Priority Arbiter RTL

Provide an optimized RTL implementation for a fixed priority arbiter.

A fixed priority arbiter takes a multi-hot request vector (`req_in`) and produces a one-hot grant vector (`gnt_out`), granting the request with the highest fixed priority (e.g., LSB is highest). The following is a scalable and optimized implementation using a `for` loop with a `genvar` which synthesizes into a chain of priority logic.


module fixed_priority_arbiter #(
    parameter WIDTH = 8
) (
    input  logic [WIDTH-1:0] req_in,
    output logic [WIDTH-1:0] gnt_out
);
    // Grant the LSB if it is requested, as it has the highest priority.
    assign gnt_out[0] = req_in[0];

    // For other bits, grant only if requested AND no lower bit (higher priority) was granted.
    genvar i;
    for (i = 1; i < WIDTH; i = i + 1) begin : grant_logic
        assign gnt_out[i] = req_in[i] & ~|req_in[i-1:0];
    end

endmodule

Note: The logic `& ~|req_in[i-1:0]` is more efficient than `& ~|gnt_out[i-1:0]` because it avoids long dependency chains on the grant signals themselves, resulting in a faster circuit.

Question 25: Round Robin Arbiter RTL

Provide an RTL implementation for a Round Robin arbiter.

A Round Robin arbiter gives each requestor an equal chance by rotating priority. After a request is granted, it becomes the lowest priority in the next cycle. A common way to implement this is by using a fixed priority arbiter on a shifted version of the request vector.


module round_robin_arbiter #(
    parameter REQ_WIDTH = 4
) (
    input  logic clk, reset,
    input  logic [REQ_WIDTH-1:0] req,
    output logic [REQ_WIDTH-1:0] gnt
);

    logic [REQ_WIDTH-1:0] priority_ff;
    logic [REQ_WIDTH-1:0] req_masked;
    logic [REQ_WIDTH-1:0] gnt_unmasked;

    always_ff @(posedge clk or posedge reset) begin
        if (reset) begin
            priority_ff <= '1; // Initial priority to requestor 0
        end else if (|gnt) begin
            // Rotate priority to the next requestor
            priority_ff <= {gnt[REQ_WIDTH-2:0], gnt[REQ_WIDTH-1]};
        end
    end

    // Mask requests based on priority. Only higher priority requests are considered.
    // This is a simplified approach. A more robust one would involve shifting.
    // A common implementation uses a fixed priority arbiter on shifted requests.
    // For simplicity, here is a pointer-based approach:
    
    integer i;
    always_comb begin
        gnt = '0;
        // Start checking from the highest priority requestor
        for (i = 0; i < REQ_WIDTH; i = i + 1) begin
            if (req[i] && priority_ff[i]) begin
                gnt[i] = 1'b1;
                break; // Grant first one found
            end
        end
        // If no priority requestor is active, check all requests
        if (|gnt == 1'b0) begin
             for (i = 0; i < REQ_WIDTH; i = i + 1) begin
                if (req[i]) begin
                    gnt[i] = 1'b1;
                    break;
                end
            end
        end
    end

endmodule

Question 26: Weighted Round Robin Arbiter

Provide the pseudo-code or RTL structure for a Weighted Round Robin (WRR) arbiter.

A WRR arbiter extends the Round Robin scheme by allowing some requestors to be granted more often than others, based on assigned "weights". A requestor is only skipped in the Round Robin scheme if its weight counter is zero.

Architecture:

Each requestor has a `weight_counter`. These are initialized to their assigned weights.
The arbiter uses a standard Round Robin scheme, but only considers requests where the corresponding `weight_counter` is greater than zero.
When a request is granted, its `weight_counter` is decremented.
When all active requestors have their `weight_counter` at zero, all counters are reloaded with their initial weight values.


// --- Conceptual RTL for a 4-requestor WRR Arbiter ---

// State Elements
reg [3:0] weight_counter_0, weight_counter_1, ...;
reg [3:0] priority_pointer; // As in standard RR

// Inputs: req[3:0], clk, reset
// Outputs: gnt[3:0]

// Combinational Logic to determine which requests are active
wire [3:0] active_req;
assign active_req[0] = req[0] & (weight_counter_0 > 0);
assign active_req[1] = req[1] & (weight_counter_1 > 0);
// ... and so on for all requestors

// Use a standard Round Robin arbiter on `active_req` to generate `gnt`

// Sequential Logic for counter updates
always @(posedge clk or posedge reset) begin
    if (reset) begin
        // Reload all counters to initial weights
        weight_counter_0 <= INITIAL_WEIGHT_0;
        // ...
    end else begin
        // Decrement counter for the granted requestor
        if (gnt[0]) weight_counter_0 <= weight_counter_0 - 1;
        // ...

        // Check if all active counters are zero to reload
        if (reload_condition) begin
             weight_counter_0 <= INITIAL_WEIGHT_0;
             // ...
        end
    end
end

Question 27: Frequency Divide-by-3 Circuit

Draw a logic diagram for a circuit that divides an input clock frequency by 3.

A divide-by-3 counter requires 2 flip-flops to achieve the 3 unique states. The output will be high for one cycle and low for two, resulting in a 33% duty cycle. We can derive the logic from a state table.

State Table: (Current State: Qa, Qb | Next State: Da, Db | Output)

00 -> 01 | 0
01 -> 10 | 0
10 -> 00 | 1

From this, we can derive the next-state logic equations:

Da = Qb'
Db = Qa
Output = Qa (if we want the output high in state `10`)

A common implementation uses a slightly different state assignment to simplify logic:

Da = (Qa | Qb)'
Db = Qa
Output = Qb

ADaQa BDbQb out

Question 28: Find Position of First Set Bit

How do you find the index of the first '1' in a binary vector, searching from a) the right (LSB) and b) the left (MSB)?

This is functionally equivalent to designing a priority encoder.

a) From the Right (LSB)

This is a standard priority encoder. It takes a multi-bit vector and outputs the binary index of the highest priority active bit, where the LSB (`bit 0`) has the highest priority.

Example: Input `8'b01101000` -> Output `3'b011` (index 3).

In Verilog, this can be implemented with a `casex` statement or a `for` loop.

b) From the Left (MSB)

This is also a priority encoder, but with the priority reversed (MSB is highest). The easiest way to implement this in RTL is to simply reverse the input vector before feeding it into a standard LSB-first priority encoder.

Example: Input `8'b01101000` -> Output `3'b110` (index 6).

Question 29: Count Trailing Zeros

Design a circuit that takes a binary number and returns the count of trailing zeros (the number of zeros from the LSB up to the first '1'). If the input is all zeros, it should return the width of the word.

This is another application of a priority encoder. The number of trailing zeros is simply the index of the first '1' from the right (LSB).

The most scalable and elegant solution combines a priority encoder with the system function `$clog2`.

Isolate the LSB '1': A common trick is `lsb_one = data_in & -data_in`. This produces a one-hot vector with only the least significant '1' of `data_in` set. For example, `8'b01101000` becomes `8'b00001000`.
Convert to Index: The SystemVerilog function `$clog2(lsb_one)` takes this one-hot number and returns its binary index. For `8'b00001000`, `$clog2` returns `3`.
Handle the All-Zeros Case: An explicit check is needed for an input of `0`.


module trailing_zero_detector #(
    parameter WIDTH = 8
)(
    input  logic [WIDTH-1:0] data_in,
    output logic [$clog2(WIDTH):0] zero_count
);
    logic [WIDTH-1:0] lsb_one;

    assign lsb_one = data_in & -data_in;

    always_comb begin
        if (data_in == '0) begin
            zero_count = WIDTH;
        end else begin
            zero_count = $clog2(lsb_one);
        end
    end
endmodule

Question 30: Pulse Stretching Circuit

Draw a diagram for a pulse stretching circuit.

A pulse stretcher extends the duration of a short input pulse. A simple way to do this for a fixed number of cycles is with a shift register and an OR gate.

Architecture: The input pulse is fed into a shift register. The output is the OR-reduction of all the bits in the shift register. If the shift register is N-bits wide, an input pulse of 1 cycle will be stretched to an output pulse of N cycles.

3-Cycle Pulse Stretcher FF1 FF2 FF3 in_pulse out_stretched

For stretching a pulse by a large or variable number of cycles, a counter-based approach is more efficient than a long shift register.