Reinforcement Learning and Dynamic Programming

Markov chains. Markov decision processes. Dynamic programming: policy iteration, value iteration. Model-free reinforcement learning: Monte Carlo, Temporal Difference, Q- learning. Policy gradient methods. Model-based reinforcement learning: classical multi- armed bandits, stochastic multi-armed bandits, adversarial multi-armed bandits.

Credit3

ECTS5

BölümElectrical and Electronics Engineering

FacultyFaculty of Engineering

Hocalar 1 bu dönem · 1 geçmiş

Bu dönem (2025-2026 Spring) · 1 section

Muhammed Ömer Sayın

Geçmişte ders veren (1 kişi)

Milad Malekipirbazari

→ STARS müfredatı / syllabus

Materyal — 0 dosya

Bu derste henüz materyal yok.

İlk dosyayı sen ekleyebilirsin — notlar, geçmiş finaller, çözümler, cheat-sheet, ne varsa. Drive linki / PDF / ZIP / fotoğraf, hepsi olur.

Şu an: mail at, ben düzenleyip yayına alayım. Form/upload UX yakında geliyor (Kimya tasarlıyor).

+ mail at ← katalog

↑ konuya EEE 548 yaz

Müfredat detayı STARS syllabus

📚 Önerilen kaynaklar

Zorunlu Mathematical Foundation of Reinforcement Learning Shiyu Zhao · 2025 · Springer
Önerilen Reinforcement Learning: An Introduction Sutton and Barto

⚖️ Değerlendirme

20% — Quiz: Quizzes (×2)
30% — Midterm:Essay/written: Midterm (×1)
35% — Final:Essay/written: Final (×1)
15% — Project: Individual Project (×1)

⚠️ FZ engelleyen şartlar

None

🤖 GenAI politikası

You should try to solve the assignments given here by yourself. Discussion of the assignments with other students and online tools (e.g., ChatGPT) are allowed and encouraged. However, the final submitted work must be your own. You should not submit anything that you do not understand. We may invite you to explain your solutions at a face-to-face (or Zoom) meeting with the instructor and the grader; at the end of this interview, you may get no credit for the assignment if it is deemed that you ha

📅 Haftalık müfredat

Introduction and Probability Review Markov Chains Markov Decision Processes Bellman Equation Bellman Optimality Equation Value Iteration & Policy Iteration Monte Carlo Methods Multi-armed Bandits Stochastic Approximation Temporal Difference Methods Value Function Methods & Function Approximation Policy Gradient Methods Actor-critic Methods Project Demos ECTS - Workload Table: Activities Number Hours Workload Individual or group work 14 4 56 Preparation for Quiz 2 8 16 Preparation for Final exam 1 15 15 Final exam 1 3 3 Quiz 2 3 6 Preparation for Midterm exam 1 12 12 Midterm exam 1 3 3 Course hours 14 3 42 Total Workload: 153 Total Workload / 30: 153 / 30 5.1 ECTS Credits of the Course: 5 Type of Course: Lecture - Project Course Material: LMS (Moodle, etc) - Lecture Notes Teaching Methods: Lecturing - Assignment - Presentations