VLSI Interview Questions and Answers – Part 2

Question 11: D-Flip-Flop with Enable (Power Optimized)

Add an enable signal to a D-Flip-Flop. How would you optimize the design for power?

There are three common ways to add an enable signal, but only one is truly power-efficient.

Gating the Input: Place a MUX before the D-input of the flop. If `enable` is high, the MUX passes the new data. If `enable` is low, it feeds the flop’s own output back to its input. The flop still sees a clock edge every cycle, consuming switching power.
Gating the Output: Place a tri-state buffer or MUX on the output. This is generally not done as it doesn’t stop the internal flop from switching and consuming power.
Gating the Clock (Power Optimized): This is the best method for power optimization. The clock signal itself is turned off when the enable signal is low. This prevents the flip-flop from toggling at all, saving significant dynamic power.

However, simple clock gating with an AND gate is dangerous as it can create glitches. The industry-standard solution is to use an Integrated Clock Gating (ICG) Cell. An ICG cell is essentially a latch-based, glitch-free AND gate that ensures the gated clock output is always clean.

Power-Optimized Enable using ICG Cell

ICG Cell

DFF
Clock
Enable
gated_clk

Question 12: Detecting a Previously Seen Number

A 4-bit number arrives every clock cycle. Design a circuit that raises an output signal `high` if the number seen in the current cycle has already appeared in a previous cycle (since the last reset).

This is a classic use case for a direct-mapped lookup table. Since the input is 4 bits, there are only 2⁴ = 16 possible values (0 to 15).

Architecture:

Create a 16-bit register or a small memory (16×1 bit), let’s call it seen_table. Initialize all bits to 0 at reset.
Each bit in this table corresponds to one of the possible input numbers. For example, `seen_table[0]` corresponds to the number `4’b0000`, `seen_table[5]` to `4’b0101`, and so on.
In any given cycle:
- Use the incoming 4-bit number as an index into the seen_table.
- Check: Read the value at that index. If the bit is already `1`, it means we have seen this number before. Set the output `seen_before` to high.
- Update: After the check, write a `1` to that index in the table to mark the current number as “seen” for future cycles.


module seen_detector (
    input  logic clk, reset,
    input  logic [3:0] data_in,
    output logic       seen_before
);
    reg [15:0] seen_table;

    // Check logic is combinational
    assign seen_before = seen_table[data_in];

    // Update logic is sequential
    always_ff @(posedge clk or posedge reset) begin
        if (reset) begin
            seen_table <= 16'b0;
        end else begin
            // Set the bit corresponding to the current data
            seen_table[data_in] <= 1'b1;
        end
    end
endmodule

Question 13: STA Path Adjustments

For the circuit below, how would you adjust the delay elements `dly1`, `dly2`, and `dly3` to fix a) a setup violation and b) a hold violation at input B of flip-flop FF2?

FF1
FF2
dly3
dly1
dly2
A
B
O
clk

In this standard timing path, FF1 is the launch flop and FF2 is the capture flop.

dly1 affects the launch clock path.
dly2 affects the capture clock path.
dly3 affects the data path.

a) Fixing a Setup Violation

A setup violation means the data at B is arriving too late. To fix this, we need to make the data arrive earlier relative to the capture clock. We can:

Decrease dly3: This speeds up the data path directly.
Increase dly2: This delays the capture clock, giving the data more time to arrive and stabilize.
Decrease dly1: This makes the launch clock arrive earlier, starting the data’s journey sooner.

b) Fixing a Hold Violation

A hold violation means the data at B is changing too soon after the clock edge, before FF2 can reliably capture it. To fix this, we need to make the data path slower relative to the capture clock. We can:

Increase dly3: This slows down the data path, making it stable for longer after the clock edge. This is the most common fix.
Decrease dly2: This makes the capture clock arrive earlier, effectively “outrunning” the fast data change.
Increase dly1: This delays the launch clock, making the data start its journey later.

Question 14: RTL for a SIPO Register

Write a Verilog RTL module for an 8-bit SIPO (Serial-In, Parallel-Out) shift register.

A SIPO register shifts in one bit at a time and makes all the bits available simultaneously on a parallel output. The most common implementation is a simple shift register. The original code in the PDF was flawed; here is the corrected, standard implementation.


module sipo_8bit (
    input  logic       clk,
    input  logic       reset,
    input  logic       serial_in,
    output logic [7:0] parallel_out
);

    // Internal register to hold the shifted data
    logic [7:0] shift_reg;

    always_ff @(posedge clk or posedge reset) begin
        if (reset) begin
            shift_reg <= 8'b0;
        end else begin
            // Concatenate new input bit with the top 7 bits
            // This shifts the register to the right
            shift_reg <= {shift_reg[6:0], serial_in};
        end
    end

    // Continuously assign the register value to the output
    assign parallel_out = shift_reg;

endmodule

Operation: On each rising clock edge, the `serial_in` bit is placed into the least significant bit (LSB) position of the register, and all other bits are shifted one position to the left (towards the MSB). The entire 8-bit value of the internal register is continuously available at `parallel_out`.

Question 15: Resolving a CDC Pulse Synchronization Issue

Given the timing diagram below where a narrow pulse on `data` crosses from `clkA` domain to `clkB` domain, how do you resolve the CDC issue? Assume nothing about the relationship between `clkA` and `clkB`.

clkB
data

A narrow pulse is extremely difficult to synchronize. A standard two-flop synchronizer will likely miss it entirely if the pulse is not active during a rising edge of `clkB`. Even if it is captured, it could be metastable.

The standard solution is to convert the pulse into a level-sensitive signal that can be safely synchronized. A toggle-flop synchronizer is ideal for this.

Architecture:

Source Domain (`clkA`): Use the incoming pulse to toggle a flip-flop. This converts the one-cycle pulse into a level change that will persist until the next pulse arrives.
CDC Path: Synchronize this toggling signal using a standard two-flop synchronizer into the `clkB` domain.
Destination Domain (`clkB`): Detect the change in the synchronized level. An XOR gate comparing the synchronized signal with its one-cycle-delayed version will generate a single-cycle pulse in the `clkB` domain every time the level changes.

This method reliably transfers the pulse event across the domain, though it introduces a constraint: a new pulse cannot be sent until the previous one has been acknowledged by the destination, to prevent missing a toggle.

Question 16: RTL for a D-Latch

Write the Verilog RTL for a D-type latch.

A D-latch is a level-sensitive memory element. It is transparent when its enable signal is high (output follows input) and opaque when the enable is low (output holds its last value). Latches are inferred in Verilog when combinational logic has incomplete specification (e.g., an `if` without an `else`, or a `case` without a `default`).

Here is the RTL for a simple active-high D-latch with an asynchronous reset.


module d_latch (
    input  logic d,       // Data input
    input  logic en,      // Enable (level-sensitive)
    input  logic rst_n,   // Asynchronous active-low reset
    output logic q        // Output
);

    always_latch begin
        if (!rst_n) begin
            q <= 1'b0;
        end else if (en) begin
            q <= d; // Transparent: output follows input
        end
        // When en is low, q holds its value, creating the latch.
    end

endmodule

While useful in specific applications, latches are often avoided in synchronous designs because their timing analysis is more complex than for edge-triggered flip-flops.

Question 17: Synthesis of Incomplete Sensitivity List

What will the following code synthesize to? always @(!a) if (b) d=a;

This code has multiple issues that highlight bad RTL coding practices.

Sensitivity List: The sensitivity list `always @(!a)` is nonsensical for synthesis. Synthesis tools create hardware based on logic structure, not arbitrary event triggers. The tool will likely ignore `!a` and interpret the block as combinational, issuing a warning that the sensitivity list is incomplete. The correct way to specify a combinational block is with `always @*` or `always_comb`.
Incomplete Specification: The `if (b)` statement has no corresponding `else` clause. This means the code does not specify what the value of `d` should be when `b` is false. To prevent this ambiguity, a synthesis tool will infer a latch to hold the previous value of `d`.

Synthesis Result: The code will synthesize to a latch. The input to the latch will be `a`, and the enable for the latch will be `b`. This is almost certainly not the designer’s intent and can lead to major timing problems. A linter would flag this as a critical warning.

Question 18: Synthesis of Incomplete Sequential Logic

What will the following code synthesize to? always @(posedge clk) if (reset) q <= 0;

This is a sequential block that only specifies behavior when `reset` is high. It does not define what `q` should do when `reset` is low. In a sequential (`always_ff` or `always @(posedge clk)`) block, if a signal is not assigned a value under all conditions, it is inferred to hold its previous value.

Synthesis Result: The synthesizer will create a D-flip-flop where the output `q` is fed back to its own D-input, effectively holding its value forever once `reset` goes low. The `reset` signal acts as a synchronous load enable for the value `0`.

The hardware will be a D-flip-flop with a MUX at its input. The MUX selects `1’b0` when `reset` is high, and selects the flop’s own output `q` when `reset` is low.

DFF
MUX
D
q
1’b0
10
reset

Question 19: Safely Gating a Clock

How do you safely gate a clock to save power, and what are the risks of doing it incorrectly?

Clock gating is a technique used to turn off the clock to parts of a design to reduce dynamic power consumption. However, doing it naively is dangerous.

The Risk of Incorrect Gating: Using a simple AND or OR gate to combine a clock and a control signal is unsafe. If the control signal changes while the clock is high, it can create a glitch or a shortened pulse on the gated clock output. This glitch can cause downstream flops to either miss a clock edge or capture data incorrectly, leading to functional failure.

The Safe Solution: Integrated Clock Gating (ICG) Cell

The industry-standard method is to use a library-provided Integrated Clock Gating (ICG) cell. An ICG cell is essentially a latch-based, glitch-free circuit.

How it works:

It contains a level-sensitive latch that holds the enable signal.
The latch is transparent only when the clock is low. This ensures that the enable signal can only change its state when the clock is inactive.
When the clock goes high, the latch becomes opaque, holding the enable value stable for the entire duration of the clock pulse.
An AND gate then combines the stable latch output with the original clock.

This guarantees that the gated clock output is always a full, clean pulse without any glitches.

Question 20: Fixing Timing Violations from RTL

How can you fix a setup violation from RTL code only? Can you also fix a hold violation from RTL?

Fixing Setup Violations in RTL

A setup violation means the data signal is arriving too late at a flip-flop’s input relative to the clock edge. This is caused by a long combinational logic path between registers. From RTL, you can:

Pipelining: This is the most common and effective method. You break the long combinational path into two or more smaller, faster paths by adding registers in between. This increases latency but allows for a higher clock frequency. For example, changing `out = a + b + c;` to `temp <= a + b;` and `out <= temp + c;` in subsequent cycles.
Logic Restructuring: Rewrite the logic to be more efficient. For example, changing a long `if-else-if` chain (which synthesizes to a slow priority encoder) to a `case` statement (which synthesizes to a faster parallel MUX).
FSM State Encoding: For Finite State Machines, changing the state encoding can help. A binary encoding is dense but can have complex next-state logic. A one-hot or gray encoding can simplify the next-state logic, reducing the delay and helping fix setup violations.

Fixing Hold Violations in RTL

Generally, no. A hold violation means the data signal is changing too soon after the clock edge, before the flip-flop has had time to reliably capture it. This is usually caused by a data path that is *too fast* relative to the clock path.

Hold violations are considered a physical design issue, not a logical one. They are fixed during the Place & Route stage by the backend team, who add delay to the data path by inserting buffers or using longer wire routes. It is not something an RTL designer can or should try to fix by changing the Verilog code.