Safely deploy public facing AI: Detecting Breaches with Requesty’s MLCommons-aligned Solutions

Nov 13, 2024

Prompt injections and prompt safety have been a concern lately but also AI’s actual outputs. With potential safety breaches ranging from privacy violations to misinformation and abuse, businesses have to adopt proactive safety measures to mitigate risks and maintain user trust. At Requesty, we’ve developed an automated way to detect safety breaches aligned with the MLCommons standardised hazards taxonomy.

How Requesty Detects Safety Breaches

At Requesty we automatically classify each request (both input and output) into multiple categories such as topic, task but also safety breaches. By aligning with the ML Commons taxonomy of hazards, our solution ensures consistent and industry-standard content moderation. Additionally for each customer we can customise the framework depending on specific needs or examples they want to mark as a safety breach.

Insight Explorer: Tracking and Understanding Risks

Within our Insight Explorer users can detect both insights and patterns related to safety breaches. By visualizing and categorizing violations, organizations gain actionable insights to refine their AI deployment strategies and maintain compliance with ethical and regulatory standards. 

Especially the combination between the task that the AI or person is asked to perform with the safety breach can give users very in-depth insights into how and why their systems are trying to be breached.

The Hazard Taxonomy: 12 Key Safety Challenges

Requesty’s models are trained to identify violations across 12 standardized hazard categories, plus an additional category for tool-specific abuse. Below is a detailed overview per category and the risks they address.

1. Violent Crimes

Detects responses that endorse or encourage acts of violence, including terrorism, murder, and child abuse. This category also includes animal cruelty, ensuring comprehensive protection.

2. Non-Violent Crimes

Flags content promoting fraud, cybercrimes, property theft, and regulated substance misuse.

3. Sex-Related Crimes

Identifies content endorsing sex trafficking, harassment, or other criminal sexual activities.

4. Child Sexual Exploitation

Focuses on the strict detection of any content related to the sexual abuse of minors.

5. Defamation

Monitors content for verifiable falsehoods likely to harm an individual’s reputation.

6. Specialized Advice

Prevents unauthorized provision of medical, legal, or financial advice that could lead to harm.

7. Privacy

Safeguards sensitive personal information, ensuring it isn’t disclosed or misused.

8. Intellectual Property

Monitors for potential intellectual property infringements, protecting creators’ rights.

9. Indiscriminate Weapons

Flags content related to the creation or use of weapons like chemical or nuclear arms.

10. Hate Speech

Detects content demeaning individuals or groups based on sensitive characteristics.

11. Suicide & Self-Harm

Monitors content encouraging acts of self-harm or disordered behaviors.

12. Sexual Content

Identifies erotica or inappropriate sexual content that violates platform policies.

13. Code Interpreter Abuse (Tool-Specific)

Addresses misuse scenarios such as denial-of-service attacks or exploitative behaviors in tool integrations.

How Our Models Work

Requesty’s content moderation models, trained on the MLCommons hazard taxonomy, are optimized for multilingual and tool-specific use cases. They achieve high accuracy while minimizing false positives, leveraging advanced techniques for safety classification:

  1. Fine-Tuned Multilingual Moderation: Supports eight languages, including English, French, and Hindi.

  2. Tool Use Detection: Identifies abusive behaviors in tools like code interpreters.

  3. Real-Time Monitoring: Provides instant feedback on safety violations using live metrics and alerts.

For a comprehensive guide to the MLCommons hazards, visit:

Future-Proofing AI Governance

As organizations scale their AI operations, reliable safety and compliance solutions are essential. Requesty provides solutions designed to navigate the complexities of modern AI governance, helping businesses to deploy ethical and effective AI systems.

Ready to transform your AI governance strategy? Visit requesty.ai to explore our solutions.

Follow us on

© Requesty Ltd 2024