A pioneer’s perspective on ethics, implementation, and the winding road to good governance
W e’re at a crossroads in how we develop and use artificial intelligence. Throughout 2025, enthusiasm for autonomous, agentic AI seemed to grow boundlessly. Some major firms deployed wide-scale agentic AI at speed while laying off human counterparts. They believed it proposes ways to streamline, cut costs and boost profitable performance at scale.
And yet, risks caused by enabling autonomous AI actions remain significant: ‘actionable hallucinations’ could overwhelm safeguards; cross-silo access to data without oversight could amplify privacy and security vulnerabilities; risk models are not keeping pace.
One voice of experience highlighting risks while offering solutions is Dr. Eva-Marie Muller-Stuler. Long before “responsible AI” was embraced as standard corporate terminology, a few practitioners like Dr. Eva were wrestling with the practical problems and complex questions. How do you build AI systems that can be audited? What does explainability mean when you’re working with 10,000 real-world signals? Who’s held accountable when systems fail?
Dr. Eva’s career offers a useful lens on these questions – not because she has all the answers, yet, but because she worked through the infrastructure problems while others were still debating whether they mattered.
It’s now becoming clear we urgently need a global ethical artificial intelligence framework. The quiet, thorough, deliberate and insightful work Dr. Eva has led for more than a decade offers tried, tested directions and lays the foundations for this without limiting progress.
Early systems work and the complexity problem
Dr. Eva’s entry into AI governance came through direct experience with scale and detail. At KPMG in 2015, she and her team built AI ecosystems integrating real estate, travel, pricing and location data – over 10,000 signals in total. This was before GDPR in Europe, before most frameworks for managing this complexity existed.
One landmark project – the Data Universe Ecosystem – used more than a terabyte of blended data to improve retail forecasting by 200%. Eva also developed AI algorithms for fraud detection, retail demand prediction, banking inclusion, and oil and gas optimization.
Institutionalizing practices at scale
At IBM in 2017, Dr. Eva established the company’s first Center of Excellence for Data Science and AI – a model she would later help implement globally. She published some of IBM’s early frameworks on MLOps (Machine Learning Operations) in 2019 and contributed to work on semantic vector bases that would later inform generative AI systems.
By 2022, she had built EY’s MENA AI practice into EMEA’s top-ranked team. But beyond delivering results, she stayed focused on operational standards – less visible infrastructure that determines whether AI systems can actually be maintained, audited, and improved over time. She was designing systems that could be trusted. Trust, she would argue, is engineered not promised.
UN intervention and early warnings
Already, in 2019, Dr. Eva addressed the United Nations in New York, arguing that AI governance was not keeping pace with AI deployment. While others were celebrating early wins for automation, she was exploring accountability gaps, urging world leaders to build ethical guardrails before mistakes outpaced our ability to correct them.
Her concerns have since materialized in measurable ways. Deepfake-enabled fraud now drains an estimated $250 billion annually from global economies. Biased hiring algorithms have been documented downgrading candidates from women’s colleges. In places, though, Dr. Eva’s frameworks – quietly built – are blueprints for how responsible AI is finally being built
Why diversity reduces risk
Multiple large-scale surveys suggest that while AI adoption is widespread, only a minority of organisations are seeing value from their investments, with many initiatives marooned in ‘pilot purgatory’ rather than reaching full production.
Research on AI adoption indicates that these shortfalls are driven less by model performance and more by organisational learning failures: fragmented data, weak feedback loops, and rigid governance structures that prevent from adapting to real-world contexts over time. As Harvard Business Review authors have put it, many firms are caught in a pattern of launching impressive generative GenAI pilots that never translate into durable, learning-driven capabilities embedded in day-to-day work.
“It’s cheaper to build for WEIRD men,” she says, referring to the Western, Educated, Industrialized, Rich, and Democratic demographic that dominates datasets. “But that’s where risk begins.” Dr. Eva’s argument for inclusion in AI development is pragmatic rather than ideological: homogeneous teams miss failure modes.
She cites cancer diagnosis models that missed dense tissue patterns in Afro-Caribbean women, and voice assistants that struggled with female speech patterns. Beyond data problems were organisational ones – warnings ignored because no one in the room had lived experience of the failure mode. “Diverse teams don’t just raise flags,” she says. “They reduce harm.”
Through initiatives like Women in Data Science, Dr. Eva has mentored practitioners entering the field. “AI was never only one smart boy in a hoodie,” she says. “It’s a team sport requiring different skills and backgrounds.”
Beyond platitudes: what responsible AI actually requires
For Dr. Eva, responsible AI isn’t primarily about intentions – it’s about competency, technical capabilities and organisational structure.
She has little patience for ethics as performance. “Most companies treat AI bias testing like a one-time audit,” she says. “But AI models decay. They rot. And no one’s watching.”
In 2020, she led a global study across IBM’s Academy of Technology, convening 60 experts to define technical pathways for auditable, transparent, legally defensible AI systems.
Eva points to a more systemic failure: ethics boards with no teeth, no veto power, and no documented Red-Team escalation. As AI becomes ubiquitous, she warns, the real risk is not evil systems, but competence decline. “If your AI is faster than your compliance team, slow it down. And make sure your compliance team has the technical expertise to understand where things can go wrong.”
Standard of proof
Dr. Eva now advocates for “documented” and “explainable” AI systems where decisions can be reconstructed, bias-tested, and full data lineage traced. She acknowledges this level of accountability isn’t yet possible with today’s generative AI and large language models but argues it remains essential for high-stakes applications.
For Eva, good AI isn’t just accurate. It’s transparent, documented, secure, explainable, auditable, and bias-tested in real time. It includes humans in the loop based on the risk profile of decisions. And yes, it has kill switches.
“Ship only systems any engineer can debug, repair, or retire,” she says. “If one person is irreplaceable, the system is already broken.”
What comes next
Ask Dr. Eva what she wants AI to look like in 2030, and the answer is precise, specific:
- Global AI standards and regulations with real audits and executive power.
- Red-Team veto rights embedded in delivery teams to halt an AI model failing safety or security checks.
- Certified AI engineers held liable for harm, like surgeons.
- AI used humanely to solve water access, tailor education, and achieve other sustainable development goals.
- Environmental footprint of AI actively reduced through more efficient technology, better algorithms and sustainable energy sources.
“We’ve had our decade of experimentation,” she says. “Now we need a decade of responsibility.”
She argues for two questions before any AI system is built: Is it right to build this? And if yes: How do we build it right?












