Block malicious user agents via nginx config

05 October 2024
coding ·
nginx ·
servers
4 min · 747 words

Bots are everywhere, and so are malicious agents. They crawl, they scrape, they read... and sometimes (or most? of the times),they abuse.

A few times now, I've observed increased CPU / RAM consumption on servers where I wouldn't expect any usage on non-peak hours, and after browsing through access logs, I find out the cause is that certain bots are crawling my sites over and over again.

This has been made worse, for example, by certain WordPress plug-ins that generate links with unique GET parameters on each page load, making these not-so-smart crawler bots get stuck on an endless loop. What a waste of bandwidth and compute resources.

To put an end to this, I've now blocked by default certain user agents from accessing any of the websites I host, directly on nginx config. This ensures that malicious actors are stopped even before they can trigger any server execution or load static content. You can follow similar steps or even swap user agent for any other identifier you define to block agents from accessing your servers.

Create a list of blocked user agents #

Create a new file in /etc/nginx with the following content and any name you want.

I'll use blocked_user_agent.rules here:

map $http_user_agent $blocked_user_agent {
    # Requests are allowed by default
    default 0;

    # `~example` will match any user agent strings
    # that have `example` anywhere inside them.
    # Some examples:
    ~Amazonbot 1;
    ~openai 1;
    ~chatgpt 1;
    ~gptbot 1;
}

This list can get as short or as long as you need, and you can change it (whether to add or remove blocked user agents) anytime you need.

Block requests based on `$blocked_user_agent` #

Now that you have an easily-accessible map of user agents, it's time to make this variable available to nginx and block unwanted requests.

On /etc/nginx/nginx.conf, add the following at the end the http block:

http {
    # Skipping over content...
    # (...)

    # Include file with map of blocked user agents
    include /etc/nginx/blocked_user_agent.rules;
}

Finally, on the config files for each of your sites (inside /etc/nginx/sites-enabled/) add the following inside the server block, before you start matching for any locations:

server {
    server_name aitorres.com;

    # Blocking undesired user agents
    if ($blocked_user_agent) {
        # `444 No Response`, nginx specific HTTP status code.
        # You can choose to return other standard HTTP
        # status codes, like `404 Not Found` or `403 Forbidden`
        # base on your needs
        return 444;
    }

    # Rest of your usual file, unchanged
}

One more thing: reload or restart your nginx server from your shell.

# Ensuring config is valid
nginx -t

# Reloading the server without downtime, you can choose to restart as well
nginx -s reload

All done! Your server will start blocking these requests, and you should start seeing reduced resource consumption. If you have access logs enabled for your server, then you'll see the requests from blocked user agents logged with the HTTP status code you chose to return.

One final note: if you ever modify the list of blocked user agents, remember to reload or restart nginx for the changes to take effect.

This method is not infalible as it depends on the bot (or the malicious agent behind it) to consistently use the same user agent, but it's a start and just takes a couple minutes to add. Hopefully one day bots will stop misbehaving completely, but until then... ;-)

Previous: Handle errors on custom Elementor Form Actions
Next: Getting a Kobo rekindled my passion for reading

Block malicious user agents via nginx config

Create a list of blocked user agents #

Block requests based on $blocked_user_agent #

Block requests based on `$blocked_user_agent` #