As generative AI models become increasingly integrated into high-stakes
domains, the need for robust methods to evaluate their ethical reasoning
becomes increasingly important. This paper introduces a five-dimensional audit
model — assessing Analytic Quality, Breadth of Ethical Considerations, Depth
of Explanation, Consistency, and Decisiveness — to evaluate the ethical logic
of leading large language models (LLMs). Drawing on traditions from applied
ethics and higher-order thinking, we present a multi-battery prompt approach,
including novel ethical dilemmas, to probe the models’ reasoning across diverse
contexts. We benchmark seven major LLMs finding that while models generally
converge on ethical decisions, they vary in explanatory rigor and moral
prioritization. Chain-of-Thought prompting and reasoning-optimized models
significantly enhance performance on our audit metrics. This study introduces a
scalable methodology for ethical benchmarking of AI systems and highlights the
potential for AI to complement human moral reasoning in complex decision-making
contexts.
Dieser Artikel untersucht Zeitreisen und deren Auswirkungen.
PDF herunterladen:



