arrow_backBack to Features

AI Content Filter

Machine learning powered moderation that automatically detects and removes toxic and vulgar content in real-time.

Overview

The AI Content Filter uses an external ML API (powered by toxic-bert or similar models) to analyze every message for harmful content. It returns toxicity and vulgarity scores between 0.0 and 1.0.

When the score exceeds your configured sensitivity threshold, the bot automatically takes action based on your settings.

What It Detects

Toxic Content

  • • Aggressive/insulting language
  • • Harassment and bullying
  • • Hate speech
  • • Discriminatory messages
  • • Threats and intimidation

Vulgar Content

  • • Profanity and swear words
  • • Obscene language
  • • Explicit content
  • • Sexual references

Mini App Configuration

Open /app → Settings → Content Filter (AI)

Enable Content Filter

Toggle to enable/disable the AI moderation system.

Action on Violation

Choose what happens when harmful content is detected:

ActionBehavior
deleteDelete message only
warnDelete + warn user
muteDelete + mute user (configurable duration)
banDelete + permanently ban user

Sensitivity

Controls how strict the filter is. Higher values = fewer false positives but may miss some content:

LevelThresholdDescription
Low0.5Catches more content, more false positives
Medium0.7Balanced (recommended)
High0.85Only obvious violations
Very High0.95Extreme content only

Mute Duration

When action is set to "mute", how long the user is muted:

5 minutes30 minutes1 hour24 hours

Log Violations

When enabled, all detected violations are logged to your mod logs channel for review.

Bot Commands

/contentfilter onEnable the AI content filter
/contentfilter offDisable the AI content filter
/contentfilter statusCheck current settings

Technical Details

  • API Endpoint: External ML server (FastAPI + toxic-bert)
  • Timeout: 2 seconds per request (fails silently on timeout)
  • Minimum text length: 2 characters
  • Language support: Optimized for English, basic support for other languages
  • Response format: Returns toxic (0.0-1.0) and vulgar (0.0-1.0) scores

FAQ

Will it block false positives?

The ML model is trained to minimize false positives. If you notice issues, increase the sensitivity threshold to reduce false detections. We recommend starting with "Medium" (0.7).

What happens if the ML server is down?

If the API times out or returns an error, messages are allowed through (fail-open). This prevents blocking legitimate messages during outages.

Can I whitelist certain words?

Currently, the AI filter uses ML-based detection and doesn't support per-word whitelisting. Use the sensitivity setting to adjust strictness.