I’m very interested in black-box control of LLMs: changing model behavior without access to parameters, gradients, or intermediate computations (e.g., hidden states). In parallel, I study mechanistic interpretability—how LLMs internally respond to text across domains—and how these signals can support safety and alignment.

I’ve also done some RL back in the day, spanning mild theory and applied work, and I often draw on ideas when thinking about control and reliability.

Google Scholar

Preprints / Working Papers

  • Baturay Saglam and Dionysis Kalogerias. “Self-Improving In-Context Learning: A Test-Time Calibration with Only a Few More Forward Passes.” 2026. (work in progress)

  • Baturay Saglam and Dionysis Kalogerias. “Compatible Gradient Approximations for Actor-Critic Algorithms.” 2025. (major revision in progress) [arXiv] [code]

  • Baturay Saglam and Dionysis Kalogerias. “Test-Time Detoxification Without Training or Learning Anything.” 2026. (under review) [code]

  • Supriti Vijay, Aman Priyanshu, Anu Vellore, Baturay Saglam, Amin Karbasi. “Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models.” 2026. (under review) [arXiv]

Journal Articles

  • Baturay Saglam, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat. “Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach.” Transactions on Machine Learning Research, 2024. [paper] [code]

  • Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek, Suleyman S. Kozat. “Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients.” Neural Processing Letters, 2024. [paper]

  • Baturay Saglam, Furkan B. Mutlu, Dogan C. Cicek, Suleyman S. Kozat. “Actor-Prioritized Experience Replay.” Journal of Artificial Intelligence Research, 2023. [paper] [code]

  • Baturay Saglam and Suleyman S. Kozat. “Deep Intrinsically Motivated Exploration in Continuous Control.” Machine Learning, 2023. [paper] [code]

Peer-Reviewed Conference Proceedings

  • Baturay Saglam, Paul Kassianik, Blaine Nelson, Sajana Weerawardhena, Yaron Singer, Amin Karbasi. “Large Language Models Encode Semantics and Alignment in Linearly Separable Representations.” International Joint Conference on Natural Language Processing and the Asian Chapter of ACL, 2025. [arXiv] [code]

  • Jane H. Lee, Baturay Saglam, Spyridon Pougkakiotis, Amin Karbasi, Dionysis Kalogerias. “Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents.” NeurIPS, 2025. [paper] [code]

  • Baturay Saglam, Xinyang Hu, Zhuoran Yang, Dionysis Kalogerias, Amin Karbasi. “Learning Task Representations from In-Context Learning.” Findings of ACL, 2024. [paper] [code]

  • Baturay Saglam, Doga Gurgunoglu, Suleyman S. Kozat. “Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-Aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI.” IEEE International Conference on Communications Workshops, 2023. [paper] [code]

  • Dogan C. Cicek, Enes Duran, Baturay Saglam, Kagan Kaya, Furkan B. Mutlu, Suleyman S. Kozat. “AWD3: Dynamic Reduction of the Estimation Bias.” IEEE International Conference on Tools with Artificial Intelligence, 2021. [paper]

  • Dogan C. Cicek, Enes Duran, Baturay Saglam, Furkan B. Mutlu, Suleyman S. Kozat. “Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay.” IEEE International Conference on Tools with Artificial Intelligence, 2021. [paper]

  • Baturay Saglam, Enes Duran, Dogan C. Cicek, Furkan B. Mutlu, Suleyman S. Kozat. “Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods.” IEEE International Conference on Tools with Artificial Intelligence, 2021. [paper]