CS523 : Advanced Computer Architecture

Instructor: Dr. A. Sahu

Course Structure | Lecture Slides | Books | ClassTiming, Venue and Rules
Focus of this course will be on concept in designing industry (Intel/AMD/NVIDIA/Google/IBM/CISCO) standard high performance computer system
Pre-Requisites: CS222 (Computer Architecture and Organisation) http://jatinga.iitg.ernet.in/~asahu/cs222/
Allocated list of topics for lecture note scribing is available AllocatedListHere. Motivation behind lecture note scribing is to create course material/book for next ACA batch. As there is no good book available which cover all the topics of current ACA course. This book/course material will be openly accessible to all and name of the scriber will be mentioned in each chapters and in the book.

Lecture Slides:

  1. 24-29 Jul 2012 MON: Course Struture, Introduction, Motivation, Reference, Timing and Venue PDF Slides
  2. 31 Jul 2012 TUE: Advanced Architecture: Top down Approach (Classifications) PDF Slides
    [[Sima Book, Preface, Page 4]]
  3. 01 Aug 2012 WED: Advanced Architecture: Top down Approach (Pipeline and ILP)PDF Slides
    [[Sima Book, Chapter 4]]
  4. 06 Aug 2012 MON: ACA:Data Parallel and Function Parallel) and Understanding a given Processor Arcitecture (8085)PDF Slides
    [[Sima Book, Introduction and Preface, 8085 Ramesh S Gaononkar Book]]
  5. Pthread Thread Affinity (Mapping User Thread to Hardware thread)
    [[Understanding a given Processor Architecture (8085), 8085 Ramesh S Gaononkar Book]]
  6. 07 Aug 2012 TUE: Designing a processor using components (Single cycle and with only 9 MIPS instructions) PDF Slides
    [[Hennessy Peterson, Basic Architecture Book, Chapter 5 ]]
  7. 08 Aug 2012 WED: Extending to Multi Cycle Design, Pipeline designPDF Slides
    [[Hennessy Peterson, Basic Architecture Book, Chapter 5]]

  8. 13 Aug 2012 MON: Pipeline Design, Clock Skew and Stage DivisionPDF Slides
    [[Hennessy Peterson, Basic Architecture Book, Chapter 5, Flynn Book,Chapter 2 ]]
  9. 14 Aug 2012 TUE: Wave/Self Timed Pipeline [[Non Uniform Line]])PDF Slides
    [[Flynn Book,Chapter 2 , Wave Pipeline Tutorial and Survey, IEEE Trans VLSI, 1998>]]

  10. 21 Aug 2012 TUE: Pipeline Hazards: Data Forwarding and Pipeline Scheduling PDF Slides
    [[6.4/6.5 of 3rd Edd Hennessy book (Ebook given), Chapter 6 Hwang Book
  11. 22 Aug 2012 WED: Branch Performance (Predication and Speedup)PDF Slides
    [[Flynn Book,Chapter 4.5,, Sima Book Chapter 8 ]]
  12. 23 Aug 2012 THU/MON: Branch Performance (Prediction and Target Capture)PDF Slides
    [[Flynn Book,Chapter 4.5,, Sima Book Chapter 8 ]]

  13. 24 Aug 2012 FRI: Extra class in 4th slot Design Space of Super-scalar ProcessorPDF Slides
    [[Sima Book,Chapter 7]]

  14. Sima Paper I on Superscalar, Sima Paper II on Superscalar and Sima Paper II on Superscalar
  15. 27 Aug 2012 MON: Super-scalar Design Space: Shelving, Renaming, Operand Fetch PDF Slides
    [[Sima Book,Chapter 7, Hennessy CA-QA Book, Chapter 3.4 and 3.5, Flynn Book Section 7.6.5]]
  16. 28 Aug 2012 TUE: Super-scalar : Instruction Scheduling (Scoreboard, Tomasulo's Approach)PDF Slides
    [[Sima Book,Chapter 7, Hennessy CA-QA Book, Chapter 3.4 and 3.5, Appendix A7 (scoreboard scheduling)]]
  17. Tamasulo's Approach Demo1, Demo2 and Demo3 (Java Appet, Required JRE to be installed)]]
  18. 29 Aug 2012 WED: Super-scalar: Speculation, Reordering, ILP Limitation PDF Slides
    [[Hennessy CA-QA Book, Chapter 3.10]]

  19. 03 SEP 2012 (MON) : ILP Limitation and Simultaneous Multithreading PDF Slides
    [[Hennessy CA-QA Book, Chapter 3.10-3.12]]
  20. 04 SEP 2012 (TUE) : SMT, Processor Case Study Intel P4-HT, Intel Atom, Comparison of Processors(AMD Athlon, PowerPC 5, Intel Itanium and Intel P4HTExtEd), Performance/Energy Efficiency of Intel Core-i7 and Intel Atom PDF Slides
    [[Hennessy CA-QA Book, Chapter 3.10-3.12]]

  21. 05 Sep 2012 WED: Memory Hierarchy, Memory Wall, and Cache: Set/Index, Associativity, Line size/offset PDF Slides
    [[Flynn Book,Chapter 5 and 6, Hennessy Paterson Chapter 2, 5th Ed ]]
  22. 10 Sep 2012 Monday: Program Cache Behavior, Miss Classification, AMAT and Local/Global Miss PDF Slides
    [[Flynn Book,Chapter 5 and 6, Hennessy Paterson Chapter 2, 5th Ed ]]

  23. 11 Sep 2012 Tuesday: Cache Policies (access:seq/cun/fwd, load:blk/warp/forward,replacement:lru/lfu/mfu/fifo,fetch:demand/pre/swpre, write:wt/wb), Performance Optimization PDF Slides
    [[ Hennessy CA-QA Book 4th Ed Chapter 5, Section 2.2 of Cragon Book]]
  24. 12 Sep 2012, Wednesday: Cache Performance Optimization PDF Slides
    [[Hennessy CA-QA Book 4th Ed Chapter 5 ]]

  25. MID SEMESTER EXAM [[MidSemQuestion], [Solution will be Uploaded Soon]

    II Part of ACA: Multicore Computing
  26. 24-Sep-2012 (MON): Why Multicore: Power/Cost Efficiency, Speedup, Issues in Multiprocessing(Sharing, Mapping/Scheduling, Parallelising) PDFSlides
    [[Flynn Book, Hennessy Book Introductions]]
  27. 25-Sep-2012 : Multiprocessing, Amdhal's law,Gustafon Law's, Equal Work Hypothesis, Efficiency of Parallel Processing, Paralleling Program, Shared Memory Vs Distributed Memory, Shared Memory Architecture PDF Slide
    [[Flynn Book 8.1, 8.5 and 8.6, Intel TBB Book (Reminders) Chapter 2 and Parhami Book ]

  28. 26-Sep-2012: BUS Protocols: Comparison with ALOHA and CSMA, Queueing/Prob Analysis of Multiprocessor BUS PDF Slides
    [[flynn Chap 6.8]]
  29. 01-Oct-2012: Static: Interconnection Network (array, ring, tree, mesh, hyper cube) embedding, 2.5D/3D MESH, Denser/Sparser MESH/Torous PDF Slides
    [[Kai Hwang Book Chapter:1 and 2, Parhami Book Chapter 12]]
  30. 03-Oct-2012 : Dynamics Network/Switching (Bayes, Banes), Routing: Static/Dynamic, Store forword/Warm hole/Cut Through, Mesh NOC: Routing, Router Architecture, Routing Algorithms PDF Slides
    [[Flynn Book, Interconnection Network Book, Parhami Book Chapter 12, ]]
  31. 04-Oct-2012 THURSDAY: Mesh NOC: Routing Algorithms (XY, West First, North Last, negative First and Even Odd)PDF Slides
    [[Flynn Book, Interconnection Network Book ]]

  32. 08-OCT-2012 MONDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part I PDF Slides
    [[ Hennessy CA-QA Book, 4th Ed, Chapter 4.2 (Sec 5,5 of 5th Edd), Culler Book, Sec 5.5, Memory Consistency Coherence Book ]]
  33. 09-OCT-2012 TUESDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part II PDF Slides
    [[Ref Prev Lects]]
  34. 10-OCT-2012 WEDNESDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part III PDF Slides
    [[Ref Prev Lects]]

  35. 16/17 Oct 2012 : Data Parallel Architecture (Vector Architecture and SIMD ) part I PDF Slides
    [[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
  36. 26 Oct 2012 (FRI with Tuesday Time Table): Data Parallel Architecture (Vector Architecture and SIMD ) part II PDF Slides
    [[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
  37. 29 Oct 2012: GPU Architecture, Cuda Programming PDF Slides
    [[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
  38. 30 Oct 2012: Cuda Programming, Compiler Transformation for Parallelism PDF Slides
    [[ Hennessy CA-QA Book, 5th Ed, Chapter 4, David Kirk Book, Compiler Book (Sethi, Aho, Ulman and Lam) Page 801, 810,819 and 848, ]]

  39. 31 Oct 2012: Data Placement Model in Multicore: bipartite matching formulation, ILP formulation, MST formulation for Dual Port Memory, Cache Data Placement in Multicore : Multicore Caching PDF Slide
    [[Extra work:Read algorithms for stable matching, weighted bipartite matching by augmented path iteration, max Span Tree and Integer Linear Program]],[[ 2008 R-NUCA paper ]]

  40. 02 Nov 2012 (Makeup Class): Cache Partitioning and BW Partitioning PDF Slides
    [[ Paper: Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance, Solihin HPCA 2010]], [[Paper:Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches ,Saholin MICRO 2006]]
  41. 05 Nov 2012 : Cache Prefetching: Helper based, Hardware, Pining and Locking, Off-chip Bandwidth Scheduling of Multicore in Presence of PrefetchingPDF Slides
    [[ (a) Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems IPDPS 06]](b) [[Paper: Adaptive Prefetching for Shared Cache Based Chip Multiprocessors, KandemirDate09]], and (c) [[Paper: Prefetch-Aware Shared-Resource Management for Multi-Core Systems, ebrahimi_isca11]]

  42. 06 Nov 2012: Multiprocessor Scheduling (Theory), Multiprocessor (Approximation), List Scheduling PDFSlides
    [[ PeterBrukerBook]], [[Paper:A Survey of Hard Real-Time Scheduling for Multiprocessor Systems ]]
  43. 07-Nov-2012: Real Time Scheduling (Schedulability test, RMS and EDF), Distributed Scheduling, Cilk, Work Stealing, 2D/3D MESH multicore scheduling PDFSlides
    [[ Web Material: A Literature Study on Scheduling in Distributed Systems]], [[Paper: A Taxonomy of Scheduling in General-Purpose Distributed Computing System]] and [[Book:Advanced OS by Singal]] [[ Chapter 17, Algorithm CLR Book 3rd Ed ]], [[ Loh Paper: 3D-Stacked Memory Architectures for Multi-Core Processors]], [[ Kandemir Paper: Design and Management of 3D Chip Multiprocessors Using Network-in-Memory]] and [[ Chou Paper: Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip]]

  44. 12 Nov 2012 : Tiled manycore architecture, Re-configurable Mesh, Architecture Warehouse Scale Computer PDFSlides
    [[ Anant Agrawal Paper: ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR]],[[ Ravichandran Book and Link IEEE.TC.MillerPaper93 ]], and Hennesy CA-QA Book, 5th Ed, Chapter 6 ]]

  45. Self Reading..: Pthread, Cilk, OpenMP and Cuda Research tools: Simulators (Multi2Sim, SESC, SIMIC/GEMS) and Benchmarks (SpecOMP, ParSec, Splash, etc) [[ ]]

  46. End Semester Timing: Nov 20, 1PM-4PM, Room:3101
    Question uploaded EndSemQuestion

Class timing, Venue and Rules:


  1. Patterson, D.A., and Hennessy, J.L. , “Computer Architecture : A Quantitative Approach ”, Morgan Kaufmann Publishers, 5th Edition, Inc.2011
  2. Dezso Sima, Peter Kacsuk, Terence Fountain, " Advanced Computer Architectures : A Design Space Approach", Pearson Education India, 1997
  3. Michael J Flynn, " Computer Architecture: Pipelined and Parallel Processor Design ", Narosa Publishing India, 2003
  1. David Culler, J.P. Singh and Anoop Gupta, "Parallel Computer Architecture: A Hardware/Software Approach", Morgan Kaufmann, first edition, 1998.
  2. Harvey G Cragon, " Memory Systems and Pipelined Processors", Narosa Book Distributors, India, 1998
  3. Patterson, D.A., and Hennessy, J.L. , “Computer Organization and Design: The Hardware/Software Interface”, Morgan Kaufmann Publishers, 4th Edition, Inc.2005,
  4. Kai Hwang, " Advanced Computer Architecture: Parallelism, Scalability, Programmability", McGraw-Hill, first edition, 1992.
  5. Ramachandran Vaidtyanathan and J L Trahan, " Dynamic Reconfiguration: Architectures and Algorithms ", Kluwer Academic Publisher, New York, 2003
  6. David Kirk and Wen-mei Hwu " Programming Massively Parallel Processors: A Hands-on Approach", Morgan Kaufmann Publishers, 2010 EBookCopy From NVidia Website
  7. P Pacheco " An Introduction to Parallel Programming", Morgan Kaufmann Publishers, 2011
  8. J Reinders " Intel Threading Building Blocks ", O'Reily, SPD Books India, 2007
  9. Behrooz Parhami " Introduction to Parallel Processing: Algorithms and Architectures ECopyPDF From External Weblink", Plenum Press, New York, 1999
  10. Duoto J, Yalamanchili S, Ni L " Interconnection Network ", Morgan Kauffman, 2002
  11. Peter Bruker"Scheduling Algorithm", Springer, 2007 OnlineCopy