Welcome to CS523 Advanced Computer Architecture Course Homepage

CS523 : Advanced Computer Architecture

Instructor: Dr. A. Sahu

Course Structure | Lecture Slides | Books | ClassTiming, Venue and Rules

Focus of this course will be on concept in designing industry (Intel/AMD/NVIDIA/Google/IBM/CISCO) standard high performance computer system
Pre-Requisites: CS222 (Computer Architecture and Organisation) http://jatinga.iitg.ernet.in/~asahu/cs222/

Allocated list of topics for lecture note scribing is available AllocatedListHere. Motivation behind lecture note scribing is to create course material/book for next ACA batch. As there is no good book available which cover all the topics of current ACA course. This book/course material will be openly accessible to all and name of the scriber will be mentioned in each chapters and in the book.

Lecture Slides:

24-29 Jul 2012 MON: Course Struture, Introduction, Motivation, Reference, Timing and Venue PDF Slides
31 Jul 2012 TUE: Advanced Architecture: Top down Approach (Classifications) PDF Slides
[[Sima Book, Preface, Page 4]]
01 Aug 2012 WED: Advanced Architecture: Top down Approach (Pipeline and ILP)PDF Slides
[[Sima Book, Chapter 4]]
06 Aug 2012 MON: ACA:Data Parallel and Function Parallel) and Understanding a given Processor Arcitecture (8085)PDF Slides
[[Sima Book, Introduction and Preface, 8085 Ramesh S Gaononkar Book]]

Pthread Thread Affinity (Mapping User Thread to Hardware thread)

07 Aug 2012 TUE: Designing a processor using components (Single cycle and with only 9 MIPS instructions) PDF Slides
[[Hennessy Peterson, Basic Architecture Book, Chapter 5 ]]
08 Aug 2012 WED: Extending to Multi Cycle Design, Pipeline designPDF Slides
[[Hennessy Peterson, Basic Architecture Book, Chapter 5]]

13 Aug 2012 MON: Pipeline Design, Clock Skew and Stage DivisionPDF Slides
[[Hennessy Peterson, Basic Architecture Book, Chapter 5, Flynn Book,Chapter 2 ]]
14 Aug 2012 TUE: Wave/Self Timed Pipeline [[Non Uniform Line]])PDF Slides
[[Flynn Book,Chapter 2 , Wave Pipeline Tutorial and Survey, IEEE Trans VLSI, 1998>]]

21 Aug 2012 TUE: Pipeline Hazards: Data Forwarding and Pipeline Scheduling PDF Slides
[[6.4/6.5 of 3rd Edd Hennessy book (Ebook given), Chapter 6 Hwang Book
22 Aug 2012 WED: Branch Performance (Predication and Speedup)PDF Slides
[[Flynn Book,Chapter 4.5,, Sima Book Chapter 8 ]]
23 Aug 2012 THU/MON: Branch Performance (Prediction and Target Capture)PDF Slides
[[Flynn Book,Chapter 4.5,, Sima Book Chapter 8 ]]

24 Aug 2012 FRI: Extra class in 4th slot Design Space of Super-scalar ProcessorPDF Slides
[[Sima Book,Chapter 7]]

Sima Paper I on Superscalar,

Sima Paper II on Superscalar

27 Aug 2012 MON: Super-scalar Design Space: Shelving, Renaming, Operand Fetch PDF Slides
[[Sima Book,Chapter 7, Hennessy CA-QA Book, Chapter 3.4 and 3.5, Flynn Book Section 7.6.5]]
28 Aug 2012 TUE: Super-scalar : Instruction Scheduling (Scoreboard, Tomasulo's Approach)PDF Slides
[[Sima Book,Chapter 7, Hennessy CA-QA Book, Chapter 3.4 and 3.5, Appendix A7 (scoreboard scheduling)]]

Tamasulo's Approach Demo1, Demo2 and Demo3 (Java Appet, Required JRE to be installed)]]

29 Aug 2012 WED: Super-scalar: Speculation, Reordering, ILP Limitation PDF Slides
[[Hennessy CA-QA Book, Chapter 3.10]]

03 SEP 2012 (MON) : ILP Limitation and Simultaneous Multithreading PDF Slides
[[Hennessy CA-QA Book, Chapter 3.10-3.12]]
04 SEP 2012 (TUE) : SMT, Processor Case Study Intel P4-HT, Intel Atom, Comparison of Processors(AMD Athlon, PowerPC 5, Intel Itanium and Intel P4HTExtEd), Performance/Energy Efficiency of Intel Core-i7 and Intel Atom PDF Slides
[[Hennessy CA-QA Book, Chapter 3.10-3.12]]

05 Sep 2012 WED: Memory Hierarchy, Memory Wall, and Cache: Set/Index, Associativity, Line size/offset PDF Slides
[[Flynn Book,Chapter 5 and 6, Hennessy Paterson Chapter 2, 5th Ed ]]
10 Sep 2012 Monday: Program Cache Behavior, Miss Classification, AMAT and Local/Global Miss PDF Slides
[[Flynn Book,Chapter 5 and 6, Hennessy Paterson Chapter 2, 5th Ed ]]

11 Sep 2012 Tuesday: Cache Policies (access:seq/cun/fwd, load:blk/warp/forward,replacement:lru/lfu/mfu/fifo,fetch:demand/pre/swpre, write:wt/wb), Performance Optimization PDF Slides
[[ Hennessy CA-QA Book 4th Ed Chapter 5, Section 2.2 of Cragon Book]]
12 Sep 2012, Wednesday: Cache Performance Optimization PDF Slides
[[Hennessy CA-QA Book 4th Ed Chapter 5 ]]

[MidSemQuestion]

[Solution will be Uploaded Soon]

II Part of ACA: Multicore Computing

24-Sep-2012 (MON): Why Multicore: Power/Cost Efficiency, Speedup, Issues in Multiprocessing(Sharing, Mapping/Scheduling, Parallelising) PDFSlides
[[Flynn Book, Hennessy Book Introductions]]
25-Sep-2012 : Multiprocessing, Amdhal's law,Gustafon Law's, Equal Work Hypothesis, Efficiency of Parallel Processing, Paralleling Program, Shared Memory Vs Distributed Memory, Shared Memory Architecture PDF Slide
[[Flynn Book 8.1, 8.5 and 8.6, Intel TBB Book (Reminders) Chapter 2 and Parhami Book ]

26-Sep-2012: BUS Protocols: Comparison with ALOHA and CSMA, Queueing/Prob Analysis of Multiprocessor BUS PDF Slides
[[flynn Chap 6.8]]
01-Oct-2012: Static: Interconnection Network (array, ring, tree, mesh, hyper cube) embedding, 2.5D/3D MESH, Denser/Sparser MESH/Torous PDF Slides
[[Kai Hwang Book Chapter:1 and 2, Parhami Book Chapter 12]]
03-Oct-2012 : Dynamics Network/Switching (Bayes, Banes), Routing: Static/Dynamic, Store forword/Warm hole/Cut Through, Mesh NOC: Routing, Router Architecture, Routing Algorithms PDF Slides
[[Flynn Book, Interconnection Network Book, Parhami Book Chapter 12, ]]
04-Oct-2012 THURSDAY: Mesh NOC: Routing Algorithms (XY, West First, North Last, negative First and Even Odd)PDF Slides
[[Flynn Book, Interconnection Network Book ]]

08-OCT-2012 MONDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part I PDF Slides
[[ Hennessy CA-QA Book, 4th Ed, Chapter 4.2 (Sec 5,5 of 5th Edd), Culler Book, Sec 5.5, Memory Consistency Coherence Book ]]
09-OCT-2012 TUESDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part II PDF Slides
[[Ref Prev Lects]]
10-OCT-2012 WEDNESDAY : Cache Coherence, Lock, Barrier and Memory Consistency Part III PDF Slides
[[Ref Prev Lects]]

16/17 Oct 2012 : Data Parallel Architecture (Vector Architecture and SIMD ) part I PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
26 Oct 2012 (FRI with Tuesday Time Table): Data Parallel Architecture (Vector Architecture and SIMD ) part II PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
29 Oct 2012: GPU Architecture, Cuda Programming PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4 ]]
30 Oct 2012: Cuda Programming, Compiler Transformation for Parallelism PDF Slides
[[ Hennessy CA-QA Book, 5th Ed, Chapter 4, David Kirk Book, Compiler Book (Sethi, Aho, Ulman and Lam) Page 801, 810,819 and 848, ]]

31 Oct 2012: Data Placement Model in Multicore: bipartite matching formulation, ILP formulation, MST formulation for Dual Port Memory, Cache Data Placement in Multicore : Multicore Caching PDF Slide
[[Extra work:Read algorithms for stable matching, weighted bipartite matching by augmented path iteration, max Span Tree and Integer Linear Program]],[[ 2008 R-NUCA paper ]]

02 Nov 2012 (Makeup Class): Cache Partitioning and BW Partitioning PDF Slides
[[ Paper: Understanding How Off-Chip Memory Bandwidth Partitioning in Chip Multiprocessors Affects System Performance, Solihin HPCA 2010]], [[Paper:Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches ,Saholin MICRO 2006]]
05 Nov 2012 : Cache Prefetching: Helper based, Hardware, Pining and Locking, Off-chip Bandwidth Scheduling of Multicore in Presence of PrefetchingPDF Slides
[[ (a) Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems IPDPS 06]](b) [[Paper: Adaptive Prefetching for Shared Cache Based Chip Multiprocessors, KandemirDate09]], and (c) [[Paper: Prefetch-Aware Shared-Resource Management for Multi-Core Systems, ebrahimi_isca11]]

06 Nov 2012: Multiprocessor Scheduling (Theory), Multiprocessor (Approximation), List Scheduling PDFSlides
[[ PeterBrukerBook]], [[Paper:A Survey of Hard Real-Time Scheduling for Multiprocessor Systems ]]
07-Nov-2012: Real Time Scheduling (Schedulability test, RMS and EDF), Distributed Scheduling, Cilk, Work Stealing, 2D/3D MESH multicore scheduling PDFSlides
[[ Web Material: A Literature Study on Scheduling in Distributed Systems]], [[Paper: A Taxonomy of Scheduling in General-Purpose Distributed Computing System]] and [[Book:Advanced OS by Singal]] [[ Chapter 17, Algorithm CLR Book 3rd Ed ]], [[ Loh Paper: 3D-Stacked Memory Architectures for Multi-Core Processors]], [[ Kandemir Paper: Design and Management of 3D Chip Multiprocessors Using Network-in-Memory]] and [[ Chou Paper: Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip]]

12 Nov 2012 : Tiled manycore architecture, Re-configurable Mesh, Architecture Warehouse Scale Computer PDFSlides
[[ Anant Agrawal Paper: ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR]],[[ Ravichandran Book and Link IEEE.TC.MillerPaper93 ]], and Hennesy CA-QA Book, 5th Ed, Chapter 6 ]]

Self Reading..: Pthread, Cilk, OpenMP and Cuda
- To be included in Earlier Lecture SlidesCilk PLDI 98
- To Download PPTs http://supertech.csail.mit.edu/cilk/
- POSIX Multithread Programming
- OpenMPBook.pdf
- Cuda Included in 30th Oct Lecture SlidesCudaBook
Research tools: Simulators (Multi2Sim, SESC, SIMIC/GEMS) and Benchmarks (SpecOMP, ParSec, Splash, etc) [[ ]]

EndSemQuestion

Class timing, Venue and Rules:

Venue: 2001
Timing : Monday (4PM-5PM), Tuesday (3PM-4PM) and Wednesday (2PM-3PM)
Rules :
- 75% attendance mandatory
- 30% Lecture note scribing (in Latex+Xfig) + 30% mid sem exam + 40% end sem exam

Books:

Text:

Patterson, D.A., and Hennessy, J.L. , “Computer Architecture : A Quantitative Approach ”, Morgan Kaufmann Publishers, 5th Edition, Inc.2011
Dezso Sima, Peter Kacsuk, Terence Fountain, " Advanced Computer Architectures : A Design Space Approach", Pearson Education India, 1997
Michael J Flynn, " Computer Architecture: Pipelined and Parallel Processor Design ", Narosa Publishing India, 2003

References:

David Culler, J.P. Singh and Anoop Gupta, "Parallel Computer Architecture: A Hardware/Software Approach", Morgan Kaufmann, first edition, 1998.
Harvey G Cragon, " Memory Systems and Pipelined Processors", Narosa Book Distributors, India, 1998
Patterson, D.A., and Hennessy, J.L. , “Computer Organization and Design: The Hardware/Software Interface”, Morgan Kaufmann Publishers, 4th Edition, Inc.2005,
Kai Hwang, " Advanced Computer Architecture: Parallelism, Scalability, Programmability", McGraw-Hill, first edition, 1992.
Ramachandran Vaidtyanathan and J L Trahan, " Dynamic Reconfiguration: Architectures and Algorithms ", Kluwer Academic Publisher, New York, 2003
David Kirk and Wen-mei Hwu " Programming Massively Parallel Processors: A Hands-on Approach", Morgan Kaufmann Publishers, 2010 EBookCopy From NVidia Website
P Pacheco " An Introduction to Parallel Programming", Morgan Kaufmann Publishers, 2011
J Reinders " Intel Threading Building Blocks ", O'Reily, SPD Books India, 2007
Behrooz Parhami " Introduction to Parallel Processing: Algorithms and Architectures ECopyPDF From External Weblink", Plenum Press, New York, 1999
Duoto J, Yalamanchili S, Ni L " Interconnection Network ", Morgan Kauffman, 2002
Peter Bruker"Scheduling Algorithm", Springer, 2007 OnlineCopy