This solution is by far not perfect and can be improved/automated/updated in many different ways. It took me only 10 minutes to implement though and basically reduce my web traffic by 80%.
I simply use the ngx_http_map_module
, which allows me to have a variable
depend on the value of another variable.
map $http_user_agent $blocked_user_agent { default 0; ~*amazonbot 1; ~*openai 1; ~*chatgpt 1; ~*gptbot 1; ~*claudebot 1; }
The nginx expression ~*term matches case-insensitively for the occurence of
the string at any position.
This map finds its place somewhere in the http-section in my /etc/nginx/nginx.conf
.
I simply checked my nginx access-files to find the most commonly used crawlers in the user agent.
To then apply and test for the filter I run a simple if-statement before returning any files or proxying in the server. The error code i have selected is 450, but any code can be really used.
For example
server { .... location / { if ($blocked_user_agent) { return 450; # Blocked by Windows Parental Controls } try_files ...; } }