VLSI Interview Questions and Answers – Part 1

Question 1: Average of a Data Stream

An incoming data stream has an unknown length and is not in any specific order. Design a hardware architecture to find the average of all data values, excluding the two largest values seen so far. The primary constraint is to minimize data storage.

To solve this without storing the entire stream, we can process the data on the fly. We need a few key storage elements (registers):

max_1: To store the largest number seen so far.
max_2: To store the second-largest number seen so far.
sum: A running total of all numbers *except* for max_1 and max_2.
count: A counter for the numbers included in the sum.

The logic operates as follows for each new data value (new_data) that arrives:

If new_data > max_1:
- The old max_1 is now demoted. Add it to the sum.
- Update max_2 with the old max_1 value.
- Update max_1 with new_data.
Else if new_data > max_2:
- The old max_2 value is demoted. Add it to the sum and increment count.
- Update max_2 with new_data.
Else (new_data is smaller than both):
- Add new_data directly to the sum.
- Increment count.

The final average is sum / count. Hardware division is resource-intensive. In an interview, it’s good to discuss trade-offs. A simple approach is to wait until the count is a power of 2 and then perform a fast right-shift to calculate the average at that point.

Question 2: Sequential Pulse Generation

Write a synthesizable Verilog module for an N-bit signal, `A`. The behavior should be that `A[0]` goes high for 10 clock cycles, then goes low. Immediately after, `A[1]` goes high for 10 clock cycles, then low, and so on for all `N` bits in a cyclical manner.

This describes a “walking 1” pattern where each bit position stays high for a fixed duration. We can achieve this with two counters: one to count the 10 cycles and another to track which bit should be active.


module walking_pulse #(
    parameter N = 8,
    parameter PULSE_WIDTH = 10
) (
    input  logic clk,
    input  logic reset,
    output logic [N-1:0] A
);

    localparam COUNT_WIDTH = $clog2(PULSE_WIDTH);
    localparam BIT_SELECT_WIDTH = (N > 1) ? $clog2(N) : 1;

    logic [COUNT_WIDTH-1:0]      cycle_count;
    logic [BIT_SELECT_WIDTH-1:0] active_bit;

    always_ff @(posedge clk or posedge reset) begin
        if (reset) begin
            cycle_count <= '0;
            active_bit  <= '0;
        end else begin
            if (cycle_count == PULSE_WIDTH - 1) begin
                cycle_count <= '0;
                if (active_bit == N - 1) begin
                    active_bit <= '0; // Wrap around
                end else begin
                    active_bit <= active_bit + 1;
                end
            end else begin
                cycle_count <= cycle_count + 1;
            end
        end
    end

    // Use a one-hot encoding based on the active_bit counter
    assign A = (1'b1 << active_bit);

endmodule

Question 3: MUX-Based CDC Synchronizer

Explain the architecture of a MUX-based synchronizer for safely transferring multi-bit data across clock domains, and provide the corresponding Verilog code for the receiver-side logic.

A simple two-flop synchronizer is unsafe for multi-bit data because different bits can transition at slightly different times, leading to an incorrect value being captured in the destination domain. A MUX-based synchronizer (or feedback synchronizer) solves this by only allowing the destination domain to sample the data when it is stable.

Architecture:

A control signal from the source domain (in_ctrl) is first synchronized to the destination domain using a standard two-flop synchronizer.
In the destination domain, a MUX selects between the incoming new data (in_data) and the previously captured (and held) data.
The synchronized control signal (q2) acts as the select line for this MUX. When the control signal indicates new data is available, the MUX passes it to a register. Otherwise, the register feeds its own output back through the MUX, holding its value stable.

Clock Domain A
Clock Domain B

data_in
in_data

ctrl_in
in_ctrl

MUX
1
0

sel

FF
sync_data

clkAclkA
clkBclkBclkB


module mux_synchronizer #(
    parameter WIDTH = 8
) (
    input  logic clkB, reset,
    input  logic [WIDTH-1:0] in_data,
    input  logic             in_ctrl,
    output logic [WIDTH-1:0] sync_data
);

    logic q1, q2;
    logic [WIDTH-1:0] mux_out;

    // 1. Synchronize the control signal into the clkB domain
    always_ff @(posedge clkB or posedge reset) begin
        if (reset) begin
            q1 <= 1'b0;
            q2 <= 1'b0;
        end else begin
            q1 <= in_ctrl;
            q2 <= q1;
        end
    end

    // 2. MUX selects new data or holds the old stable data
    assign mux_out = q2 ? in_data : sync_data;

    // 3. Register the MUX output to get the final synchronized data
    always_ff @(posedge clkB or posedge reset) begin
        if (reset) begin
            sync_data <= '0;
        end else begin
            sync_data <= mux_out;
        end
    end

endmodule

Question 4: Handling Glitches in a Circuit

What is a glitch, and what is the standard method to prevent one from propagating through a digital circuit?

A glitch is a short, unwanted pulse or transition on a signal line, typically at the output of combinational logic. It occurs when different signal paths through the logic have different propagation delays, causing the output to momentarily change to an incorrect value before settling.

Glitches can cause serious issues, such as incorrectly clocking a downstream register or triggering spurious events.

The most effective way to eliminate glitches is to register the output of the combinational logic. A flip-flop is inherently immune to glitches on its data input because it only samples the input at the precise moment of the active clock edge. By the time the clock edge arrives, the glitch will have subsided, and the flip-flop will capture the final, stable value. This prevents the glitch from ever propagating to the next stage of the logic.

Problem: Glitch from Logic
FF1
Logic
FF2
Glitch

Solution: Register the Output
FF1
Logic
FF_Fix
FF2

Question 5: Blocking vs. Non-Blocking Assignments

Explain the synthesis result for the following Verilog code, first using blocking assignments (`=`) and then using non-blocking assignments (`<=`).


// Input is 'a', output is 'o'
always @(posedge clk) begin
  b = a;  // or b <= a;
  c = b;  // or c <= b;
  o = c;  // or o <= c;
end

This question highlights the fundamental difference between blocking and non-blocking assignments in sequential logic.

1. Using Blocking Assignments (`=`)

Blocking assignments execute sequentially within a block. The next statement only executes after the current one is complete.

b = a; // b gets the value of a.

c = b; // c gets the NEW value of b (which is a).

o = c; // o gets the NEW value of c (which is also a).

Synthesis Result: The synthesizer recognizes that `o` simply gets the value of `a`. The intermediate signals `b` and `c` are optimized away. The final hardware is a single flip-flop where input `a` is connected to the D-input and the output is `o`.

2. Using Non-Blocking Assignments (`<=`)

Non-blocking assignments schedule updates to occur at the end of the time step. All right-hand side (RHS) expressions are evaluated first, using the “old” values of the signals from before the clock edge.

b <= a; // Schedule b to get a’s value.

c <= b; // Schedule c to get b’s OLD value.

o <= c; // Schedule o to get c’s OLD value.

Synthesis Result: This correctly models a chain of registers. The value of `a` propagates through one flip-flop per clock cycle. The hardware is a three-stage shift register.

Question 6: CDC with Plesiochronous Clocks

Data is being sent from a 99 MHz clock domain to a 100 MHz clock domain. Is a standard two-flop synchronizer always sufficient for safe synchronization?

Not necessarily. It depends on the source of the clocks.

Clocks with very close frequencies, like 99 MHz and 100 MHz, are called plesiochronous. The critical factor is whether they are derived from the same original clock source or are completely independent.

Derived from Same Source: If both clocks come from the same PLL or oscillator, their phase relationship, while drifting, is not random. The clock edges will align frequently and predictably. This creates “worst-case” scenarios for metastability more often, and a standard two-flop synchronizer’s Mean Time Between Failures (MTBF) might be unacceptably low. More robust solutions like an Asynchronous FIFO are often required.
From Different Sources: If the clocks are from two independent crystal oscillators, their phase relationship is truly random. This scenario is no different from any other asynchronous clock crossing, and a standard two-flop synchronizer, designed with appropriate timing constraints, is generally sufficient.

Question 7: Fixing a Combinational CDC Path

The circuit below shows combinational logic in the source clock domain (`clkA`) feeding directly into a flip-flop in the destination domain (`clkB`). How would you modify this to eliminate the Clock Domain Crossing (CDC) issue?

Domain clkA
Domain clkB
FF
FF
OR
FF

Placing combinational logic directly before a synchronizer is a bad design practice. Any glitches or high-frequency toggling from the logic will be presented to the synchronizer, which drastically reduces the MTBF and increases the chance of failure.

The correct way to fix this is a two-step process:

Register the logic output in the source domain: Add a flip-flop, clocked by `clkA`, immediately after the OR gate. This contains any glitches or rapid toggling within the source domain and presents a clean, stable signal to the clock domain boundary.
Synchronize the stable signal: Use a standard two-flop synchronizer, clocked by `clkB`, to safely transfer the now-stable signal from the new register into the destination domain.

This ensures that only a stable, registered signal attempts to cross the clock domain, which is the foundation of robust CDC design.

Question 8: Observing Multiple Clocks on a Single Pin

Imagine a design has hundreds of internal clock signals. How would you design a circuit to observe any one of these clocks on a single physical output pin for debugging?

This is a common requirement for silicon debug. The most straightforward solution is to use a large multiplexer (MUX).

Architecture:

MUX Inputs: All the internal clock signals that need to be observed are connected to the data inputs of the MUX.
MUX Select Lines: The select lines of the MUX are driven by a control register. This register can be written to via a standard interface like JTAG, APB, or a simple test interface.
MUX Output: The single output of the MUX is routed directly to the physical debug pin.

To observe a specific clock, a test engineer would write the corresponding index value into the control register. This selects the desired clock and routes it to the output pin. It is crucial that this MUX is designed to be glitch-free to avoid creating invalid clock signals at the output.

Question 9: Verilog Lint and Synthesis Issues

Find the linting and synthesis issues in the following Verilog code.


module basic(b, a, en);
  output reg b[4:0];
  input [1:0] a;
  input en;
  always @(a, en) begin
    assign b = 4'b0;
    case ({en, a}) begin
      3'b1_00: b = 3'b001;
      3'b1_01: b = 3'b010;
      3'b1_10: b = 3'b100;
      3'b1_11: b = 3'b101;
    endcase
  end
endmodule

This code has several significant issues:

Illegal Syntax: The `assign b = …` statement is a continuous assignment and is illegal inside a procedural block like `always`. To assign a value inside a combinational `always` block, a blocking assignment (`b = …;`) must be used.
Incomplete `case` statement: The `case` statement only defines behavior when `en` is `1`. It does not specify what `b` should be when `en` is `0`. Without a `default` case or a default assignment for all conditions, the synthesizer will infer a latch to hold the value of `b` when `en` is `0`. Latches are generally undesirable in synchronous designs.
Bit-width Mismatch: The output `b` is 5 bits wide, but the code assigns 3-bit and 4-bit values to it. This is a common linting warning.

Corrected Code:

A better way to write this to avoid latches is to provide a default value at the beginning of the `always` block and use `always_comb` for clarity.


module basic (
    output reg [4:0] b,
    input      [1:0] a,
    input            en
);
    // Use always_comb for combinational logic.
    // It automatically includes all inputs in the sensitivity list.
    always_comb begin
        b = 5'b00000; // Default assignment to prevent latches
        if (en) begin
            case (a)
                2'b00: b = 5'b00001;
                2'b01: b = 5'b00010;
                2'b10: b = 5'b00100;
                2'b11: b = 5'b01000;
                default: b = 5'b00000; // Good practice
            endcase
        end
    end
endmodule

Question 10: Clock Domain of an Async FIFO

In which clock domain does an asynchronous FIFO’s synchronizer logic reside?

This is a trick question. An asynchronous FIFO, by its very nature, bridges two different clock domains. It does not exist in a single domain.

The write logic (write pointer, full flag generation) operates entirely in the write clock domain.
The read logic (read pointer, empty flag generation) operates entirely in the read clock domain.

The “synchronizer logic” is the crucial part that connects them. Specifically:

The binary write pointer is converted to Gray code and then synchronized from the write domain to the read domain to be used in the empty flag logic.
The binary read pointer is converted to Gray code and then synchronized from the read domain to the write domain to be used in the full flag logic.

Therefore, the synchronizer logic doesn’t reside in one domain; it is the very mechanism that crosses from one to the other.