A Layered Approach to AI Alignment

Introduction: The Shifting Sands of AI Alignment

Artificial intelligence has advanced rapidly in recent years, giving rise to powerful models capable of performing increasingly complex tasks. Ensuring that these systems act in alignment with human values and intentions—known as AI alignment—has become a critical concern. Current methods for alignment typically involve techniques like Reinforcement Learning from Human Feedback (RLHF), synthetic data generation, and red teaming, often managed by large organisations with centralised control over model development.

However, the AI landscape is changing. Open-source AI models are proliferating, and individuals and smaller groups can now fine-tune models for specialised purposes. This democratisation of AI development means that alignment strategies can no longer be exclusively controlled by a handful of major players. As models become more distributed, centralised strategies become less effective.

The resulting challenge is clear: how can we ensure robust alignment when AI is being deployed by a wide range of actors, each with potentially different values? The risk of fragmented ethical standards—and the unpredictable or even dangerous behaviour that may follow—calls for a fundamental shift in how alignment is conceptualised and implemented.

The Limitations of Centralised Alignment in a Decentralised AI Landscape

Centralised alignment strategies offer benefits such as coordinated oversight and resource consolidation. However, they also suffer from key limitations in a decentralised environment:

Bias Concentration: Models trained within centralised frameworks may reflect the biases of their developers or the organisation’s ethos. For instance, a hiring algorithm trained predominantly on data from one demographic may perpetuate discrimination.
Security Risks: Centralised AI systems can become single points of failure. A breach could compromise ethical safeguards across many systems, leading to widespread harm.
Lack of Adaptability: Centralised approaches may fail to keep up with rapidly evolving technology and societal values. A rigid ethical framework risks becoming outdated.
One-Size-Fits-All Fallacy: Users and communities have diverse cultural, ethical, and contextual needs. Centralised models are ill-suited to deliver the necessary granularity for individual and group preferences.

These limitations highlight the need for a more adaptable and inclusive approach—one that embraces personalisation and decentralisation without sacrificing safety.

Introducing the “Onion Skin” Model for Layered AI Alignment

To address these challenges, we propose the “onion skin” model: a layered, modular approach to AI alignment.

Base Alignment Layer: At the core is a foundational layer created during the initial training or fine-tuning of an AI model. This layer represents a broadly accepted ethical baseline, such as human safety, non-maleficence, and truthfulness. Major developers like OpenAI, Google, or XAI might provide these foundational layers.

Alignment Engines as Layers: Surrounding the base layer are modular “alignment engines.” Each engine encapsulates a distinct value system or ethical framework and is applied as an overlay. These engines could be encoded as structured documents or markup languages that define behavioural constraints in a clear and machine-readable format.

Dynamic Customisation: Users or communities can add or remove these layers depending on their values or needs. For example, a community concerned with environmental ethics could layer on a sustainability-focused alignment engine. Conversely, users could peel away certain layers to default to more general principles.

This approach enables:

Coexistence of multiple ethical standards within a safe, bounded framework
Personalisation at the individual or community level
Rapid adaptability without full model retraining

Alignment Engines as Structured Documents: A Technical Blueprint

To make alignment layers interoperable and machine-readable, structured formats are essential. Possible formats include:

AIML (Artificial Intelligence Markup Language): XML-based, originally built for conversational AI. Could be extended to encode ethical rules via pattern-response logic.
XML/JSON: Generic, flexible formats capable of expressing nested constraints, rules, and priorities.
Ethical Ontologies: Formal representations of ethical concepts and relationships. These allow logical reasoning and contextual understanding of values.

A typical alignment engine document might include:

A unique identifier for the value set
A human-readable summary
Machine-readable rules or constraints
Priority values to resolve rule conflicts

Standardisation of these formats would be crucial for cross-platform compatibility and community sharing. Think of this as an ethical API: a universally understandable format for guiding AI behaviour.

Peeling Back the Layers: Applying and Removing Personalised Alignment

To support this onion skin architecture, the AI system must be modular at its core. Several integration mechanisms are possible:

API-Based Integration: Alignment engines act as external modules queried by the core model. Turning a layer on/off is as simple as modifying API calls.

Parameter Merging: Lightweight parameter tuning or merging adjusts the model’s internal state based on the selected alignment layer. This avoids full retraining.

Runtime Configuration: The model reads and applies alignment documents during execution. This supports immediate switching with negligible overhead.

Each method prioritises efficiency and flexibility. The goal is to enable alignment switching as easily as changing user preferences—without compromising core model integrity.

Challenges to consider:

Efficient interface design between layers and core logic
Conflict resolution between overlapping value sets
Ensuring backward compatibility across model versions

The Rise of Value Cohorts: Online Communities Shaping AI Ethics

Decentralisation creates space for new actors: value cohorts. These are online communities that share ethical perspectives and collaborate on alignment engines.

Picture:

Political groups creating engines reflecting governance models
Religious groups encoding theological ethics
Cultural communities embedding localised norms

These groups could use forums, GitHub repositories, or shared platforms to:

Debate ethical principles
Create and refine alignment engines
Publish verified, reusable ethical layers

This grassroots model promotes diversity and inclusivity. It also allows for rapid iteration and ethical innovation beyond the scope of large institutions.

However, the risk of fragmentation looms. Divergent value sets may result in conflicting AI behaviour across ecosystems. Mitigating this requires:

Meta-standards for compatibility
Conflict resolution protocols
Transparency tools for users to inspect applied value sets

A New Competitive Landscape: Decentralised Alignment and Major AI Developers

This emerging alignment model forces major AI players to adapt. The old paradigm of top-down control is giving way to a more fluid, user-driven approach.

To remain relevant, developers might:

Provide robust base layers as alignment foundations
Offer modular architectures with plug-in interfaces
Facilitate integration of community-built engines

Some may even launch alignment marketplaces, where users browse and apply ethical profiles. By becoming platforms rather than gatekeepers, developers can increase trust, adoption, and community loyalty.

Strategically, this empowers users and builds resilience into alignment systems. Developers that embrace this transition may gain competitive advantage in a world increasingly sceptical of centralised tech control.

Ethical Chess Games: Multi-Agent Simulations for Value Exploration

To test the real-world implications of decentralised alignment, multi-agent simulations offer a powerful methodology.

In these simulations:

Each AI agent operates under a different alignment layer
Agents interact in complex environments (economic, political, social)
Outcomes are observed over time

These “chess games of ethics” allow researchers to:

Identify emergent ethical conflicts
Observe how alignment profiles handle trade-offs
Study negotiation, cooperation, or conflict between divergent values

Advanced simulations could even include rhetorical or debate mechanics, where agents explain and defend their ethical decisions—mimicking real-world diplomacy or policy discussions.

This sandbox environment gives valuable feedback for refining alignment engines before real-world deployment.

Forging Nuanced Ethical Structures: The Path Towards Long-Term Alignment

The long-term goal is not just flexible alignment, but resilient, evolved ethical systems. Multi-agent simulations may lead to:

Inter-ethical frameworks: Hybrid ethical systems discovered through negotiation or convergence between AIs
Evolving value sets: Profiles that change over time based on interaction history or user feedback
Meta-ethics: Systems that reason about their own alignment layers and adjust dynamically

This iterative development could produce alignment profiles far more sophisticated and inclusive than current monolithic models. By learning from interaction, alignment becomes not just encoded—but co-evolved with human society.

A Call to Action for Leaders: Shaping the Future of AI Alignment

As leaders, you have a crucial role in shaping the trajectory of AI development. The “onion skin” model presents a framework for a more democratic and personalised approach to AI alignment—one that reflects the diverse values of our global society.

We urge you to:

Champion the Development of Open Standards: Advocate for the creation of standardised formats (like ethical APIs) that enable interoperability between alignment engines.
Support the Growth of Value Cohorts: Encourage the formation of online communities dedicated to developing and refining alignment engines.
Invest in Modular AI Architectures: Prioritise the development of systems that allow for seamless integration of diverse value sets.
Promote Multi-Agent Simulation Research: Support tools and platforms to test, validate, and refine alignment engines in ethical simulations.
Drive Transparency and Accountability: Provide tools for users to audit which values are guiding their AI systems.
Initiate Cross-Sector Collaboration: Encourage input from technologists, ethicists, policymakers, and citizens alike.
Advocate for Ethical Marketplaces: Empower users to select ethical profiles that reflect their personal or organisational values.

Conclusion: Embracing a Future of Personalised and Collaborative AI Alignment

The current trajectory of AI—marked by decentralisation, openness, and user empowerment—demands a new alignment strategy.

The onion skin model offers a compelling way forward:

Base layers provide foundational safety
Alignment engines enable personalisation and community-driven ethics
Layer management tools ensure modularity, flexibility, and efficiency
Online value cohorts democratise ethical input
Simulations test and refine emergent ethical systems

This model does not reject centralised alignment—it builds upon it. But it extends the conversation to a future where AI must serve a pluralistic, dynamic, and global society.

As alignment becomes layered, decentralised, and personalised, it reflects not just what AI should do—but what humanity is becoming.

Let us build AI systems capable of evolving with us.

The future of alignment isn’t control. It’s collaboration.

This article was originally published on LinkedIn.

A Layered Approach to AI Alignment

By David Saliba

Introduction: The Shifting Sands of AI Alignment

The Limitations of Centralised Alignment in a Decentralised AI Landscape

Introducing the “Onion Skin” Model for Layered AI Alignment

Alignment Engines as Structured Documents: A Technical Blueprint

Peeling Back the Layers: Applying and Removing Personalised Alignment

The Rise of Value Cohorts: Online Communities Shaping AI Ethics

A New Competitive Landscape: Decentralised Alignment and Major AI Developers

Ethical Chess Games: Multi-Agent Simulations for Value Exploration

Forging Nuanced Ethical Structures: The Path Towards Long-Term Alignment

A Call to Action for Leaders: Shaping the Future of AI Alignment

Conclusion: Embracing a Future of Personalised and Collaborative AI Alignment

Tips for developing impactful Quarterly and Annual Reports

Do press releases matter anymore?

What Matters in PR? With Ron Jabal, PAGEONE Group Founder and CEO

Media Monitoring

PR Measurement

Consultancy & Market Research

Resources

Contact Us