## CS224 : Assignment 3

Date of Demonstration 28th March 2023, Lab Timing (2PM-5PM), 16% (8%+8%) weight

Part I: Design of Vector Dot Product Unit (VDPU) using HDL and demonstration using Simulation Our aim is to compute the vector dot product of two vectors. Size of the each vector is 16 and each element of vector can hold 8 bit data.

//C Code equivallent work
short S=Sinput; char X[16], Y[16];
for(i=0;i<16;i++)S=S+X[i]\*Y[i];</pre>

Assume the VDPU unit works in three modes

- a) **DataReadyMode**: Data are for both X and Y vectors are availble and it simply compute S in 16 cycles

   Cycle 0: read S low8bits; Cycle 1: read S high8bits; Cycle2-18: compute S, Cycle 19-20: display S
- b) **PartialReadyMode**: Data for one vector Y is avaible but not for vector X, so every cycle it accept 8 bit data from input port and compute the partial dot product and in 16 cycle it compute the final value of S.
  - Cycle 0: read S\_low8bits; Cycle 1: read S\_high8bits; Cycle2-18: read X[i], compute S, Cycle 19-20: display S
- c) **DataTranferMode**: Data for one vector either X or Y, so every cycle it accept 8 bit data from input port store in vector X or Y.
  - Cycle 0: read S\_low8bits; Cycle 1: read S\_high8bits; Cycle2-18: read X[i], compute S, Cycle 19-20: display null.

Based on the above decsription the VDPU unit have one 8 bit **inout port DIO**. Assume if input is all "1111XXXX" it assume it receive GO signal and on last 4 bit of it set the mode, and work based on that. Another two status out output for (a) working, (b) ready, (c) output LSB of output, (d) output MSB of output to DIO.



Simulate your design using input values and find out the resource usage of your design. In this case, you donot need to download and show on the FPGA. Write test bench for the same to validate your design using simulation.

Hint: (a) FPGA have (Multiply and Accumulate) MAC unit, you should utilize the MAC unit if possible. (b) For simplicity, we assume total VDPU operations in all the individual modes takes exactly 20 cycles, so we can manage with one FSM/ASM for all the modes, (c) use as much as possible register/block ram for your design, (d) FPGA have around 100s of MAC unit/DSP slices but you should minimize use of DSP slices, for all the modes one DSP/MAC unit/Multiply-Adder is sufficient. In later part of the course, we may want to instantiate 40 to 50 number of VDPUs in a FPGA to test the performance.

## Part II (VHDL and FPGA): Interfacing VDPU with PC serial port and demsotration on FPGA

- (a) Send 8 bit data from PC using USB port and dispaly at LED of FPGA and similarly recieve 8 bit data from SW0-SW7 of FPGA to PC and Display the same
- (b) Interface VDPU with UART and demonstrate the working of VDPU using PC communication.

For Part II (a) and Part II (b), you need not to simulate your design but download the bit file to FPGA and show the working demo.

Hint: <a href="https://digilent.com/reference/programmable-logic/basys-3/demos/gpio">https://digilent.com/reference/programmable-logic/basys-3/demos/gpio</a>

Google search "UART\_TX\_CTRL.vhd" and " UART\_RX\_CTRL.vhd" to get HDL source code for UART transmitter and reciver code. You need to put/modify the user constraints (or xdc) file accordingly.

You can use 8 times higher clock speed to recieve/send the data using USB but lower speed clock to do VDPU computation. VDPU cycle may be 8 times longer than the USB/UART cycle.

## **Evaluation Procedure**

- All the member of the group need to be present at the time of Demonstration of the assignment. All the absent members will be awarded 0 marks for the assignment. Please show your ID card at the time of demonstration (as it is difficult to remember faces of all the 128 students of your batch).
- Grading will be based on (a) Correctness, (b) Quality of design, (c) Wire optimization, (d) Optimum number of chip used,(e) Cleanliness in design (Wire and Chips should be organized to look good), (f) Use of proper Comment/Naming/Labeling of the wires and (g) Questionnaire and explanation. For HDL codes the quality will be based on FPGA minimum resource utilization (Synthesis Report: optimized number of LUTs, register, Minimum Clock), coding style (Use of proper Comment/Naming/Labeling of the wires), performance, comments, and questionnaire and explanation.

