News
2026
March 2026: The WillowNLtoFOL and WillowNLtoFOL_extended datasets are now publicly available on Hugging Face.
- Description: Originally developed in the 2024 master’s thesis and introduced in the ESSLLI 2025 article “Transformer Models for Translating Natural Language Sentences into Formal Logical Expressions,” these datasets provide a structurally diverse and rigorously filtered benchmark for evaluating the compositional generalization capabilities of neural semantic parsing models on the NL-to-FOL translation task.
- Dataset: Hugging Face
March 2026: “Parametric Sound Field Interpolation for Scene-based Navigable Immersive Audio” accepted for publication in the IEEE Transactions on Audio, Speech and Language Processing.
- Description: This article presents a parametric sound field interpolation method using multi-point sparse plane wave decomposition to enable perceptually veridical audio rendering for users navigating within immersive volumetric environments.
- Link: IEEE Xplore
2025
December 2025: Book chapter titled “Yapay zekada sembolik ve sembolik olmayan ayrımı üzerine” (On the distinction between symbolic and non-symbolic in AI) published in Yapay Zeka Felsefesi, edited by Zekiye Göz (Doruk Yayınları).
- Description: The chapter analyzes the operational differences between symbolic expert systems and neural networks within the context of cognitive science and artificial general intelligence.
- Link: Yapay Zeka Felsefesi
November 2025: “The Ouroboros of Benchmarking: Reasoning Evaluation in an Era of Saturation” accepted for poster presentation at the NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling.
- Description: This study investigates performance trends across 52 benchmarks for OpenAI, Anthropic, and Google model families to examine how rapid evaluation saturation impacts the measurement of reasoning.
- Preprint: arXiv:2511.01365
October 2025: “Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures” available on arXiv.
- Description: This work introduces a non-parallel benchmark designed to evaluate physical commonsense reasoning across more than 100 different languages and cultural contexts.
- Preprint: arXiv:2510.24081
- Benchmark: Hugging Face Dataset