SourceHut, an open-source git-hosting service, is encountering severe disruptions due to excessive traffic from AI web crawlers. These large language model (LLM) crawlers significantly strain their services, leading to the implementation of mitigations such as Nepenthes to limit their impact. Despite these efforts, which might unintentionally hinder end-users, SourceHut has blocked several cloud providers, including Google Cloud and Azure, due to high bot traffic. The open-source community, often burdened by such crawling activities, has voiced similar concerns in the past regarding their impact on service availability.
"SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now."
"We have unilaterally blocked several cloud providers, including GCP and Azure, for the high volumes of bot traffic originating from their networks, advising service administrators to arrange exceptions."
Collection
[
|
...
]