about

I study how AI systems change the production of scientific knowledge — and how to make AI-assisted research reliable.

My work develops benchmarks, metrics, and human–AI evaluation methods for research workflows where fluent outputs can conceal hidden failures. I focus on tasks such as scientific ideation, manuscript revision, evidence interpretation, and research design. A recurring theme is that research artifacts are dependency-rich: goals, constraints, methods, evidence, claims, and revisions must remain coherent across time and across documents.

My current agenda is Reliable AI-Assisted Research. I ask when large language models preserve scientific constraints, propagate downstream implications, distinguish evidence from plausibility, and support rather than distort human research judgment.

Two questions guide my research:

How do AI systems change the way scientific knowledge is produced, evaluated, and trusted?

How can we design benchmarks and human–AI workflows that preserve constraints, propagate implications, and improve research reliability?

Current Work

My current work studies the reliability of large language models in research workflows.

In DRIFTBENCH, I evaluate whether models preserve research objectives and hard constraints during multi-turn scientific ideation. The benchmark shows that models can accurately recall constraints they nevertheless violate.

In EditPropBench, I evaluate whether LLM editors propagate local factual edits through dependent claims in scientific manuscripts. The benchmark measures whether models update not only direct mentions, but also implicit qualitative claims licensed by the edited fact.

I am currently extending this agenda into medical research, where AI-assisted protocol design and evidence synthesis require stronger safeguards. In collaboration with clinical researchers, I am developing methods to evaluate whether LLM-generated research designs satisfy the downstream methodological, ethical, statistical, and feasibility obligations implied by their own choices.

Background

My background spans data science, finance, consulting, and academic teaching. Before and alongside my doctoral work, I worked in data science and management consulting and taught as a lecturer at a university of applied sciences in Switzerland.

My academic background includes:

MSc in Data Analytics, University College Dublin
MA in Banking and Finance, University of St. Gallen