Markov chains. Markov decision processes. Dynamic programming: policy iteration, value iteration. Model-free reinforcement learning: Monte Carlo, Temporal Difference, Q- learning. Policy gradient methods. Model-based reinforcement learning: classical multi- armed bandits, stochastic multi-armed bandits, adversarial multi-armed bandits.
İlk dosyayı sen ekleyebilirsin — notlar, geçmiş finaller, çözümler, cheat-sheet, ne varsa. Drive linki / PDF / ZIP / fotoğraf, hepsi olur.
Şu an: mail at, ben düzenleyip yayına alayım. Form/upload UX yakında geliyor (Kimya tasarlıyor).
None
You should try to solve the assignments given here by yourself. Discussion of the assignments with other students and online tools (e.g., ChatGPT) are allowed and encouraged. However, the final submitted work must be your own. You should not submit anything that you do not understand. We may invite you to explain your solutions at a face-to-face (or Zoom) meeting with the instructor and the grader; at the end of this interview, you may get no credit for the assignment if it is deemed that you ha
Introduction and Probability Review Markov Chains Markov Decision Processes Bellman Equation Bellman Optimality Equation Value Iteration & Policy Iteration Monte Carlo Methods Multi-armed Bandits Stochastic Approximation Temporal Difference Methods Value Function Methods & Function Approximation Policy Gradient Methods Actor-critic Methods Project Demos ECTS - Workload Table: Activities Number Hours Workload Individual or group work 14 4 56 Preparation for Quiz 2 8 16 Preparation for Final exam 1 15 15 Final exam 1 3 3 Quiz 2 3 6 Preparation for Midterm exam 1 12 12 Midterm exam 1 3 3 Course hours 14 3 42 Total Workload: 153 Total Workload / 30: 153 / 30 5.1 ECTS Credits of the Course: 5 Type of Course: Lecture - Project Course Material: LMS (Moodle, etc) - Lecture Notes Teaching Methods: Lecturing - Assignment - Presentations