Safely deploy public facing AI: Detecting Breaches with Requesty’s MLCommons-aligned Solutions
Nov 13, 2024
Prompt injections and prompt safety have been a concern lately but also AI’s actual outputs. With potential safety breaches ranging from privacy violations to misinformation and abuse, businesses have to adopt proactive safety measures to mitigate risks and maintain user trust. At Requesty, we’ve developed an automated way to detect safety breaches aligned with the MLCommons standardised hazards taxonomy.
How Requesty Detects Safety Breaches
At Requesty we automatically classify each request (both input and output) into multiple categories such as topic, task but also safety breaches. By aligning with the ML Commons taxonomy of hazards, our solution ensures consistent and industry-standard content moderation. Additionally for each customer we can customise the framework depending on specific needs or examples they want to mark as a safety breach.
Insight Explorer: Tracking and Understanding Risks
Within our Insight Explorer users can detect both insights and patterns related to safety breaches. By visualizing and categorizing violations, organizations gain actionable insights to refine their AI deployment strategies and maintain compliance with ethical and regulatory standards.
Especially the combination between the task that the AI or person is asked to perform with the safety breach can give users very in-depth insights into how and why their systems are trying to be breached.
The Hazard Taxonomy: 12 Key Safety Challenges
Requesty’s models are trained to identify violations across 12 standardized hazard categories, plus an additional category for tool-specific abuse. Below is a detailed overview per category and the risks they address.
1. Violent Crimes
Detects responses that endorse or encourage acts of violence, including terrorism, murder, and child abuse. This category also includes animal cruelty, ensuring comprehensive protection.
2. Non-Violent Crimes
Flags content promoting fraud, cybercrimes, property theft, and regulated substance misuse.
3. Sex-Related Crimes
Identifies content endorsing sex trafficking, harassment, or other criminal sexual activities.
4. Child Sexual Exploitation
Focuses on the strict detection of any content related to the sexual abuse of minors.
5. Defamation
Monitors content for verifiable falsehoods likely to harm an individual’s reputation.
6. Specialized Advice
Prevents unauthorized provision of medical, legal, or financial advice that could lead to harm.
7. Privacy
Safeguards sensitive personal information, ensuring it isn’t disclosed or misused.
8. Intellectual Property
Monitors for potential intellectual property infringements, protecting creators’ rights.
9. Indiscriminate Weapons
Flags content related to the creation or use of weapons like chemical or nuclear arms.
10. Hate Speech
Detects content demeaning individuals or groups based on sensitive characteristics.
11. Suicide & Self-Harm
Monitors content encouraging acts of self-harm or disordered behaviors.
12. Sexual Content
Identifies erotica or inappropriate sexual content that violates platform policies.
13. Code Interpreter Abuse (Tool-Specific)
Addresses misuse scenarios such as denial-of-service attacks or exploitative behaviors in tool integrations.
How Our Models Work
Requesty’s content moderation models, trained on the MLCommons hazard taxonomy, are optimized for multilingual and tool-specific use cases. They achieve high accuracy while minimizing false positives, leveraging advanced techniques for safety classification:
Fine-Tuned Multilingual Moderation: Supports eight languages, including English, French, and Hindi.
Tool Use Detection: Identifies abusive behaviors in tools like code interpreters.
Real-Time Monitoring: Provides instant feedback on safety violations using live metrics and alerts.
For a comprehensive guide to the MLCommons hazards, visit:
Future-Proofing AI Governance
As organizations scale their AI operations, reliable safety and compliance solutions are essential. Requesty provides solutions designed to navigate the complexities of modern AI governance, helping businesses to deploy ethical and effective AI systems.
Ready to transform your AI governance strategy? Visit requesty.ai to explore our solutions.