Building trust: Foundations of security, safety and transparency in AI

As publicly available artificial intelligence (AI) models rapidly evolve, so do the potential security and safety implications, which calls for a greater understanding of their risks and vulnerabilities. To develop a foundation for standardised security, safety and transparency in the development and operation of AI models–as well as their open ecosystems and communities–we must change how we’re approaching current challenges, such as consistent information about models, lack of distinction between security and safety issues and deficient and non-standardised safety evaluations available and in use by model makers.

Risks and vulnerabilities

While similar, AI security and AI safety are distinct aspects of managing risks in AI systems. AI security protects the systems from external and internal threats, while AI safety provides confidence that the system and data don’t threaten or harm users, society or the environment due to the model’s operation, training or use. However, the relationship between AI security and safety is often blurry.

An attack that would typically be considered a security concern can lead to safety issues (or vice versa), such as the model producing toxic or harmful content or exposing personal information. The intersection of AI security and safety highlights the critical need for a comprehensive approach to AI risk management that addresses both security and safety concerns in tandem.

Current challenges and trends

While the AI industry has taken steps to address security and safety issues, several key challenges remain, like the prioritisation of speed over safety, inadequate governance and deficient reporting practices. Emerging trends suggest that targeting these areas of growth are crucial for developing effective safety, security and transparent practices in AI.

Speed over safety

In the spirit of developing and deploying AI technologies quickly to “secure” increased market share, many organisations are prioritising quickening their pace to market over safety testing and ethical considerations. As seen via past security incidents, security is often years behind nascent technology, typically leading to a major incident before the industry begins to self-correct. It’s reasonable to predict that in the absence of individuals pushing for risk management in AI, we may experience a significant and critical safety and security incident. While new models are being introduced with security and safety in mind, the lack of consensus around how to convey the necessary safety and transparency information makes them challenging to evaluate, though the increase in safety-conscious models is a positive step forward for the AI industry.

Governance and self-regulation

With very little government legislation in effect, the AI industry has relied upon voluntary self-regulation and non-binding ethical guidelines, which have proven to be insufficient in addressing security and safety concerns. Additionally, proposed legislation often doesn’t align with the realities of the technology industry or concerns raised by industry leaders and communities, while corporate AI initiatives can fail to address structural issues or provide meaningful accountability as a result of being developed especially for their own use.

Self-governance has had limited success and tends to involve a defined set of best practices implemented independent of primary feature development. As seen historically across industries, prioritising security at the expense of capability is often a trade off stakeholders are unwilling to make. AI further complicates this by extending this challenge to include direct impacts to safety.

Deficient reporting practices

As the industry currently stands, there is a lack of common methods and practices in handling user-reported model flaws. This is partially due to the fact that the industry’s flawed-yet-functional disclosure and reporting system for software vulnerabilities isn’t an apples-to-apples solution for reporting in AI. AI is a technical evolution of data science and machine learning (ML), distinct from traditional software engineering and technology development due to its focus on data and math and less on building systems for users that have established methodologies for threat modelling, user interaction and system security. Without a well understood disclosure and reporting system for safety hazards, reporting an issue by directly reaching out to the model maker may be cumbersome and unrealistic. Without a well understood, standardised reporting process, the impact of an AI safety incident could potentially be far more egregious than it should be, due to delayed coordination and resolution.

Solutions and strategies

Heavily drawing upon prior work by Cattel, Ghosh & Kaffee (2024), we believe that extending model/system cards and hazard tracking are vital to the improvement of security and safety in the AI industry.

Extending model/safety cards

Model cards are used to document the possible use of an AI model, as well as its architecture and occasionally the training data used for the model. Model cards are currently used to provide an initial set of human-generated material about the model that’s then used to assess its viability, but model cards could have more potential and applicability beyond their current usage, regardless of where they travel or where they’re deployed.

To effectively compare models, adopters and engineers need a consistent set of fields and content present on the card, which can be accomplished through specification. In addition to the fields recommended by Barnes, Gebru, Hutchinson, Mitchell, Raji, Spitzer, Vasserman, Wu & Zaldivar, 2019, we propose the following changes and additions:

Expanding intent and use to describe the users (who) and use cases (what) of the model, as well as how the model is to be used.
Add scope to exclude known issues that the model producer doesn’t intend or have the ability to resolve. This will ensure that hazard reporters understand the purpose of the model before reporting a concern that’s noted as unaddressable against its defined use.
Adjust evaluation data to provide a nested structure to convey if a framework was also used, and the evaluation’s outputs that were run on the model. Standardised safety evaluations would enable a skilled user to build a sustainably equivalent model.
Add governance information about the model to understand how an adopter or consumer can engage with the model makers or understand how it was produced.
Provide optional references, such as artifacts and other content, to help potential consumers understand the model’s operation and demonstrate the maturity and professionalism of a given model.

Requiring these fields for model cards allows the industry to begin establishing content that is essential for reasoning, decision making and reproducing models. By developing an industry standard for model cards, we will be able to promote interoperability of models and their metadata across ecosystems.

Hazard tracking

While the common vulnerability disclosure process used to track security flaws is effective in traditional software security, its application in AI systems faces several challenges. For one, ML model issues must satisfy statistical validity thresholds. This means that any issues or problems identified in an AI model, such as biases, must be measured and evaluated against established statistical standards to ensure that they’re meaningful and significant. Secondly, concerns related to trustworthiness and bias often extend beyond the scope of security vulnerabilities and may not align with the accepted definition. Recognising these limitations, we believe that expanding the ecosystem with a centralised, neutral coordinated hazard disclosure and exposure committee and a common flaws and exposure (CFE) number could satisfy these concerns. This is similar to how CVE was launched in 1999 by MITRE to identify and categorise vulnerabilities in software and firmware.

Users who discover safety issues are expected to coordinate with the model providers to triage and further analyse the issue. Once the issue is established as a safety hazard, the committee assigns a CFE number. Model makers and distributors can also request CFE numbers to track safety hazards they find in their own models. The coordinated hazard disclosure and exposure committee is the custodian of CFE numbers and is responsible for assigning them to safety hazards, tracking them and publishing them. Additionally, the formation of an adjunct panel will be responsible for facilitating the resolution of contested safety hazards.

What next?

Models developed according to open source principles have the potential to play a significant role in the future of AI. The frameworks and tools that are necessary for developing and managing models against industry and consumer expectations require openness and consistency in order for organisations to reasonably assess risk. With more transparency and access to critical functionality, the greater our ability to discover, track and resolve safety and security hazards before they have widespread impact. Our proposals intend to afford flexibility and consistency through existing governance, workflows and structure. When implemented, they could provide more efficient avenues to resolving the pressing need to effectively manage AI safety.

Also Read: Red Hat announces latest updates to global partner engagement experience

Building trust: Foundations of security, safety and transparency in AI

Yellow Card Wins Grand Prix Payments Award at Money20/20 USA

Airtel Africa’s Mobile Money Nears $200 Billion in Annual Transactions

dLocal Launches BNPL Fuse, the First Buy Now, Pay Later Aggregator for Emerging Markets

Yellow Card Wins Grand Prix Payments Award at Money20/20 USA

Airtel Africa’s Mobile Money Nears $200 Billion in Annual Transactions

dLocal Launches BNPL Fuse, the First Buy Now, Pay Later Aggregator for Emerging Markets

M-PESA Ethiopia Now Interoperable with 15 Banks via EthSwitch Integration

Cybersecurity Month perspective: Why AML and cybersecurity are critical to Africa’s crypto future

Why attack surface management must look both inside and out

T-Bin: The Kenyan Startup Using Smart Bins to Revolutionize Waste Management in Africa

M-KOPA’s Financing Model is Powering Kenya’s Electric Mobility Transition: A Conversation with Brian Njao

Subscribe to Updates

Building trust: Foundations of security, safety and transparency in AI

Related Posts