AI Content Filter
Machine learning powered moderation that automatically detects and removes toxic and vulgar content in real-time.
Overview
The AI Content Filter uses an external ML API (powered by toxic-bert or similar models) to analyze every message for harmful content. It returns toxicity and vulgarity scores between 0.0 and 1.0.
When the score exceeds your configured sensitivity threshold, the bot automatically takes action based on your settings.
What It Detects
Toxic Content
- • Aggressive/insulting language
- • Harassment and bullying
- • Hate speech
- • Discriminatory messages
- • Threats and intimidation
Vulgar Content
- • Profanity and swear words
- • Obscene language
- • Explicit content
- • Sexual references
Mini App Configuration
Open /app → Settings → Content Filter (AI)
Enable Content Filter
Toggle to enable/disable the AI moderation system.
Action on Violation
Choose what happens when harmful content is detected:
| Action | Behavior |
|---|---|
| delete | Delete message only |
| warn | Delete + warn user |
| mute | Delete + mute user (configurable duration) |
| ban | Delete + permanently ban user |
Sensitivity
Controls how strict the filter is. Higher values = fewer false positives but may miss some content:
| Level | Threshold | Description |
|---|---|---|
| Low | 0.5 | Catches more content, more false positives |
| Medium | 0.7 | Balanced (recommended) |
| High | 0.85 | Only obvious violations |
| Very High | 0.95 | Extreme content only |
Mute Duration
When action is set to "mute", how long the user is muted:
Log Violations
When enabled, all detected violations are logged to your mod logs channel for review.
Bot Commands
/contentfilter onEnable the AI content filter/contentfilter offDisable the AI content filter/contentfilter statusCheck current settingsTechnical Details
- • API Endpoint: External ML server (FastAPI + toxic-bert)
- • Timeout: 2 seconds per request (fails silently on timeout)
- • Minimum text length: 2 characters
- • Language support: Optimized for English, basic support for other languages
- • Response format: Returns toxic (0.0-1.0) and vulgar (0.0-1.0) scores
FAQ
Will it block false positives?
The ML model is trained to minimize false positives. If you notice issues, increase the sensitivity threshold to reduce false detections. We recommend starting with "Medium" (0.7).
What happens if the ML server is down?
If the API times out or returns an error, messages are allowed through (fail-open). This prevents blocking legitimate messages during outages.
Can I whitelist certain words?
Currently, the AI filter uses ML-based detection and doesn't support per-word whitelisting. Use the sensitivity setting to adjust strictness.