Question 1: Average of a Data Stream
An incoming data stream has an unknown length and is not in any specific order. Design a hardware architecture to find the average of all data values, excluding the two largest values seen so far. The primary constraint is to minimize data storage.
To solve this without storing the entire stream, we can process the data on the fly. We need a few key storage elements (registers):
max_1
: To store the largest number seen so far.max_2
: To store the second-largest number seen so far.sum
: A running total of all numbers *except* formax_1
andmax_2
.count
: A counter for the numbers included in thesum
.
The logic operates as follows for each new data value (new_data
) that arrives:
- If
new_data
>max_1
:- The old
max_1
is now demoted. Add it to thesum
. - Update
max_2
with the oldmax_1
value. - Update
max_1
withnew_data
.
- The old
- Else if
new_data
>max_2
:- The old
max_2
value is demoted. Add it to thesum
and incrementcount
. - Update
max_2
withnew_data
.
- The old
- Else (
new_data
is smaller than both):- Add
new_data
directly to thesum
. - Increment
count
.
- Add
The final average is sum / count
. Hardware division is resource-intensive. In an interview, it’s good to discuss trade-offs. A simple approach is to wait until the count
is a power of 2 and then perform a fast right-shift to calculate the average at that point.
Question 2: Sequential Pulse Generation
Write a synthesizable Verilog module for an N-bit signal, `A`. The behavior should be that `A[0]` goes high for 10 clock cycles, then goes low. Immediately after, `A[1]` goes high for 10 clock cycles, then low, and so on for all `N` bits in a cyclical manner.
This describes a “walking 1” pattern where each bit position stays high for a fixed duration. We can achieve this with two counters: one to count the 10 cycles and another to track which bit should be active.
module walking_pulse #(
parameter N = 8,
parameter PULSE_WIDTH = 10
) (
input logic clk,
input logic reset,
output logic [N-1:0] A
);
localparam COUNT_WIDTH = $clog2(PULSE_WIDTH);
localparam BIT_SELECT_WIDTH = (N > 1) ? $clog2(N) : 1;
logic [COUNT_WIDTH-1:0] cycle_count;
logic [BIT_SELECT_WIDTH-1:0] active_bit;
always_ff @(posedge clk or posedge reset) begin
if (reset) begin
cycle_count <= '0;
active_bit <= '0;
end else begin
if (cycle_count == PULSE_WIDTH - 1) begin
cycle_count <= '0;
if (active_bit == N - 1) begin
active_bit <= '0; // Wrap around
end else begin
active_bit <= active_bit + 1;
end
end else begin
cycle_count <= cycle_count + 1;
end
end
end
// Use a one-hot encoding based on the active_bit counter
assign A = (1'b1 << active_bit);
endmodule
Question 3: MUX-Based CDC Synchronizer
Explain the architecture of a MUX-based synchronizer for safely transferring multi-bit data across clock domains, and provide the corresponding Verilog code for the receiver-side logic.
A simple two-flop synchronizer is unsafe for multi-bit data because different bits can transition at slightly different times, leading to an incorrect value being captured in the destination domain. A MUX-based synchronizer (or feedback synchronizer) solves this by only allowing the destination domain to sample the data when it is stable.
Architecture:
- A control signal from the source domain (
in_ctrl
) is first synchronized to the destination domain using a standard two-flop synchronizer. - In the destination domain, a MUX selects between the incoming new data (
in_data
) and the previously captured (and held) data. - The synchronized control signal (
q2
) acts as the select line for this MUX. When the control signal indicates new data is available, the MUX passes it to a register. Otherwise, the register feeds its own output back through the MUX, holding its value stable.
Clock Domain A
Clock Domain B
FF
FF
data_in
in_data
ctrl_in
in_ctrl
q1
q2
MUX
1
0
sel
FF
sync_data
clkAclkA
clkBclkBclkB
module mux_synchronizer #(
parameter WIDTH = 8
) (
input logic clkB, reset,
input logic [WIDTH-1:0] in_data,
input logic in_ctrl,
output logic [WIDTH-1:0] sync_data
);
logic q1, q2;
logic [WIDTH-1:0] mux_out;
// 1. Synchronize the control signal into the clkB domain
always_ff @(posedge clkB or posedge reset) begin
if (reset) begin
q1 <= 1'b0;
q2 <= 1'b0;
end else begin
q1 <= in_ctrl;
q2 <= q1;
end
end
// 2. MUX selects new data or holds the old stable data
assign mux_out = q2 ? in_data : sync_data;
// 3. Register the MUX output to get the final synchronized data
always_ff @(posedge clkB or posedge reset) begin
if (reset) begin
sync_data <= '0;
end else begin
sync_data <= mux_out;
end
end
endmodule
Question 4: Handling Glitches in a Circuit
What is a glitch, and what is the standard method to prevent one from propagating through a digital circuit?
A glitch is a short, unwanted pulse or transition on a signal line, typically at the output of combinational logic. It occurs when different signal paths through the logic have different propagation delays, causing the output to momentarily change to an incorrect value before settling.
Glitches can cause serious issues, such as incorrectly clocking a downstream register or triggering spurious events.
The most effective way to eliminate glitches is to register the output of the combinational logic. A flip-flop is inherently immune to glitches on its data input because it only samples the input at the precise moment of the active clock edge. By the time the clock edge arrives, the glitch will have subsided, and the flip-flop will capture the final, stable value. This prevents the glitch from ever propagating to the next stage of the logic.
Problem: Glitch from Logic
FF1
Logic
FF2
Glitch
Solution: Register the Output
FF1
Logic
FF_Fix
FF2
Question 5: Blocking vs. Non-Blocking Assignments
Explain the synthesis result for the following Verilog code, first using blocking assignments (`=`) and then using non-blocking assignments (`<=`).
// Input is 'a', output is 'o'
always @(posedge clk) begin
b = a; // or b <= a;
c = b; // or c <= b;
o = c; // or o <= c;
end
This question highlights the fundamental difference between blocking and non-blocking assignments in sequential logic.
1. Using Blocking Assignments (`=`)
Blocking assignments execute sequentially within a block. The next statement only executes after the current one is complete.
b = a;
// b gets the value of a.
c = b;
// c gets the NEW value of b (which is a).
o = c;
// o gets the NEW value of c (which is also a).
Synthesis Result: The synthesizer recognizes that `o` simply gets the value of `a`. The intermediate signals `b` and `c` are optimized away. The final hardware is a single flip-flop where input `a` is connected to the D-input and the output is `o`.
2. Using Non-Blocking Assignments (`<=`)
Non-blocking assignments schedule updates to occur at the end of the time step. All right-hand side (RHS) expressions are evaluated first, using the “old” values of the signals from before the clock edge.
b <= a;
// Schedule b to get a’s value.
c <= b;
// Schedule c to get b’s OLD value.
o <= c;
// Schedule o to get c’s OLD value.
Synthesis Result: This correctly models a chain of registers. The value of `a` propagates through one flip-flop per clock cycle. The hardware is a three-stage shift register.
Question 6: CDC with Plesiochronous Clocks
Data is being sent from a 99 MHz clock domain to a 100 MHz clock domain. Is a standard two-flop synchronizer always sufficient for safe synchronization?
Not necessarily. It depends on the source of the clocks.
Clocks with very close frequencies, like 99 MHz and 100 MHz, are called plesiochronous. The critical factor is whether they are derived from the same original clock source or are completely independent.
- Derived from Same Source: If both clocks come from the same PLL or oscillator, their phase relationship, while drifting, is not random. The clock edges will align frequently and predictably. This creates “worst-case” scenarios for metastability more often, and a standard two-flop synchronizer’s Mean Time Between Failures (MTBF) might be unacceptably low. More robust solutions like an Asynchronous FIFO are often required.
- From Different Sources: If the clocks are from two independent crystal oscillators, their phase relationship is truly random. This scenario is no different from any other asynchronous clock crossing, and a standard two-flop synchronizer, designed with appropriate timing constraints, is generally sufficient.
Question 7: Fixing a Combinational CDC Path
The circuit below shows combinational logic in the source clock domain (`clkA`) feeding directly into a flip-flop in the destination domain (`clkB`). How would you modify this to eliminate the Clock Domain Crossing (CDC) issue?
Domain clkA
Domain clkB
FF
FF
OR
FF
Placing combinational logic directly before a synchronizer is a bad design practice. Any glitches or high-frequency toggling from the logic will be presented to the synchronizer, which drastically reduces the MTBF and increases the chance of failure.
The correct way to fix this is a two-step process:
- Register the logic output in the source domain: Add a flip-flop, clocked by `clkA`, immediately after the OR gate. This contains any glitches or rapid toggling within the source domain and presents a clean, stable signal to the clock domain boundary.
- Synchronize the stable signal: Use a standard two-flop synchronizer, clocked by `clkB`, to safely transfer the now-stable signal from the new register into the destination domain.
This ensures that only a stable, registered signal attempts to cross the clock domain, which is the foundation of robust CDC design.
Question 8: Observing Multiple Clocks on a Single Pin
Imagine a design has hundreds of internal clock signals. How would you design a circuit to observe any one of these clocks on a single physical output pin for debugging?
This is a common requirement for silicon debug. The most straightforward solution is to use a large multiplexer (MUX).
Architecture:
- MUX Inputs: All the internal clock signals that need to be observed are connected to the data inputs of the MUX.
- MUX Select Lines: The select lines of the MUX are driven by a control register. This register can be written to via a standard interface like JTAG, APB, or a simple test interface.
- MUX Output: The single output of the MUX is routed directly to the physical debug pin.
To observe a specific clock, a test engineer would write the corresponding index value into the control register. This selects the desired clock and routes it to the output pin. It is crucial that this MUX is designed to be glitch-free to avoid creating invalid clock signals at the output.
Question 9: Verilog Lint and Synthesis Issues
Find the linting and synthesis issues in the following Verilog code.
module basic(b, a, en);
output reg b[4:0];
input [1:0] a;
input en;
always @(a, en) begin
assign b = 4'b0;
case ({en, a}) begin
3'b1_00: b = 3'b001;
3'b1_01: b = 3'b010;
3'b1_10: b = 3'b100;
3'b1_11: b = 3'b101;
endcase
end
endmodule
This code has several significant issues:
- Illegal Syntax: The `assign b = …` statement is a continuous assignment and is illegal inside a procedural block like `always`. To assign a value inside a combinational `always` block, a blocking assignment (`b = …;`) must be used.
- Incomplete `case` statement: The `case` statement only defines behavior when `en` is `1`. It does not specify what `b` should be when `en` is `0`. Without a `default` case or a default assignment for all conditions, the synthesizer will infer a latch to hold the value of `b` when `en` is `0`. Latches are generally undesirable in synchronous designs.
- Bit-width Mismatch: The output `b` is 5 bits wide, but the code assigns 3-bit and 4-bit values to it. This is a common linting warning.
Corrected Code:
A better way to write this to avoid latches is to provide a default value at the beginning of the `always` block and use `always_comb` for clarity.
module basic (
output reg [4:0] b,
input [1:0] a,
input en
);
// Use always_comb for combinational logic.
// It automatically includes all inputs in the sensitivity list.
always_comb begin
b = 5'b00000; // Default assignment to prevent latches
if (en) begin
case (a)
2'b00: b = 5'b00001;
2'b01: b = 5'b00010;
2'b10: b = 5'b00100;
2'b11: b = 5'b01000;
default: b = 5'b00000; // Good practice
endcase
end
end
endmodule
Question 10: Clock Domain of an Async FIFO
In which clock domain does an asynchronous FIFO’s synchronizer logic reside?
This is a trick question. An asynchronous FIFO, by its very nature, bridges two different clock domains. It does not exist in a single domain.
- The write logic (write pointer, full flag generation) operates entirely in the write clock domain.
- The read logic (read pointer, empty flag generation) operates entirely in the read clock domain.
The “synchronizer logic” is the crucial part that connects them. Specifically:
- The binary write pointer is converted to Gray code and then synchronized from the write domain to the read domain to be used in the empty flag logic.
- The binary read pointer is converted to Gray code and then synchronized from the read domain to the write domain to be used in the full flag logic.
Therefore, the synchronizer logic doesn’t reside in one domain; it is the very mechanism that crosses from one to the other.