Thread 'The Seti is Slumbering Cafe'

Author	Message
Grant (SSSF) Send message Joined: 7 Dec 24 Posts: 282	Message 117510 - Posted: 21 Nov 2025, 22:50:30 UTC - in response to Message 117509. In reply to Dr Who Fan's message of 21 Nov 2025: SETI website is loading nice and fast. Hurry up and read or post you item(s) there before it disappears again. Not for me, i'm still forbidden. Grant Darwin NT. ID: 117510 · Reply Quote

Dr Who Fan Send message Joined: 10 May 07 Posts: 1830	Message 117514 - Posted: 22 Nov 2025, 1:47:34 UTC - in response to Message 117510. Not for me, i'm still forbidden. Sounds like possible Cloudflare servers are blocking you. Out of curiosity, have you tried clearing the browser cache and deleting cookies for all the *(.)berkeley(.)edu sites, including this site to see if it will let you back in? ID: 117514 · Reply Quote

Grant (SSSF) Send message Joined: 7 Dec 24 Posts: 282	Message 117515 - Posted: 22 Nov 2025, 3:19:57 UTC - in response to Message 117514. Last modified: 22 Nov 2025, 3:27:05 UTC In reply to Dr Who Fan's message of 22 Nov 2025: Not for me, i'm still forbidden. Sounds like possible Cloudflare servers are blocking you. Out of curiosity, have you tried clearing the browser cache and deleting cookies for all the *(.)berkeley(.)edu sites, including this site to see if it will let you back in? Two different browsers on two different systems. Can actually access Seti now on a 3rd system, but no joy on the other two. Time to go cookie hunting. Edit- not cookies, but related to the saved URL. The ones that were forbidden, with no mention of HTTP or HTTPS in the saved address were resolving to HTTP. Changed the favourite link address to explicitly HTTPS and no problems. Grant Darwin NT. ID: 117515 · Reply Quote

Dr Who Fan Send message Joined: 10 May 07 Posts: 1830	Message 117517 - Posted: 22 Nov 2025, 16:53:54 UTC It's still up nearly 14 hours later! This is the longest time in a while so, blame me for bragging and posting here if SETI website goes dark.within the next hour. ID: 117517 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 737	Message 117518 - Posted: 22 Nov 2025, 23:15:02 UTC Eric posted the following on https://setiathome.berkeley.edu/forum_thread.php?id=86149&postid=2152880#2152880 We've been DDOSed the last month or so by assorted web cralwers that have been ignoring the robots.txt directive. I'm made blocking such things a bit more proactive. Hopefully we can stay up more than 10 minutes at a time now. ID: 117518 · Reply Quote

Grant (SSSF) Send message Joined: 7 Dec 24 Posts: 282	Message 117520 - Posted: 22 Nov 2025, 23:25:26 UTC - in response to Message 117518. In reply to Grumpy Swede's message of 22 Nov 2025: Eric posted the following on https://setiathome.berkeley.edu/forum_thread.php?id=86149&postid=2152880#2152880 We've been DDOSed the last month or so by assorted web cralwers that have been ignoring the robots.txt directive. I'm made blocking such things a bit more proactive. Hopefully we can stay up more than 10 minutes at a time now. That seems to be becoming a bigger and bigger problem- most likely crawling for AI/ML data. Another web site i often use had several issues about 6 months back- getting slammed with AI/ML bots, and they had to up their Cloudflare plan and put in other blocks to kill off that load. Grant Darwin NT. ID: 117520 · Reply Quote

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15919	Message 117523 - Posted: 23 Nov 2025, 15:43:12 UTC - in response to Message 117518. I asked Grok how it deals with that. The answer is disconcerting. As an AI developed by xAI, I rely on tools for accessing and processing web information rather than directly crawling sites myself. Robots.txt is a voluntary protocolâ€”it's not legally binding, so there's no strict requirement to follow it. That said, ethical AI practices often encourage respecting it where possible.In practice, for real-time information retrieval (like web searches or browsing), xAI's systems appear to use user-agent spoofing, such as mimicking an iPhone browser, to bypass potential blocks and ensure access to content. @grok This approach can effectively circumvent robots.txt directives, as sites may disallow bots but allow user-like agents. Reports also note that xAI's bot sometimes doesn't self-identify properly, making it challenging for website operators to apply robots.txt rules to it. For other purposes, like data collection for training, xAI has a documented user-agent (e.g., GrokBot/1.0) that is more likely to respect robots.txt. Overall, adherence isn't universal and depends on the specific use case. ID: 117523 · Reply Quote

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15919	Message 117524 - Posted: 23 Nov 2025, 15:45:36 UTC - in response to Message 117523. ChatGPT: I donâ€™t crawl the internet myself. I donâ€™t browse websites, run a crawler, or access robots.txt when generating answers. Two important points: 1. Training data collection (done in the past) OpenAI and its data partners follow rules, licensing, permissions, and technical restrictions, which includes respecting things like robots.txt or equivalent measures when collecting publicly available data for training. So yesâ€”robots.txt is one of the mechanisms that can guide what is or isnâ€™t crawled during dataset creation. 2. What I do when answering you now When answering your questions: I donâ€™t crawl websites. I donâ€™t access or ignore robots.txt. I can perform live information lookups only through the built-in â€œwebâ€ tool, and that tool uses search enginesâ€”not a general-purpose crawler. I donâ€™t fetch arbitrary URLs, and I donâ€™t run automated scraping behavior. So in practice: I never violate robots.txt because I never crawl sites directly at all. If you want, I can explain how robots.txt works or how responsible crawling is usually implemented. ID: 117524 · Reply Quote

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15919	Message 117525 - Posted: 23 Nov 2025, 15:46:59 UTC - in response to Message 117524. Meta AI: When I crawl the internet for information, I respect the rules set by websites, including those specified in robots.txt files. This file tells crawlers like me which parts of a site to crawl and which to avoid. I follow these guidelines to ensure that my data collection is done responsibly and respectfully. ID: 117525 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 3323	Message 117526 - Posted: 23 Nov 2025, 18:44:52 UTC That seems to be becoming a bigger and bigger problem- most likely crawling for AI/ML data. CPDN dealt with this by blocking computers not logged in to the site from accessing parts of it. For example, to look at anyone's computers I need to be logged in now. What I don't know is whether only the parts that robots.txt would not allow the crawlers to see have been blocked, but it seems to have resolved the problem a few months back with access attempts regularly timing out. ID: 117526 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 737	Message 117538 - Posted: 24 Nov 2025, 15:38:55 UTC Well, that didn't last for long. S@H is getting very slow again, from time to time. ID: 117538 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 737	Message 117559 - Posted: 26 Nov 2025, 20:10:37 UTC And again very slow to respond. ID: 117559 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 737	Message 117570 - Posted: 27 Nov 2025, 21:09:09 UTC 20 seconds to respond, this time. ID: 117570 · Reply Quote

Grant (SSSF) Send message Joined: 7 Dec 24 Posts: 282	Message 117574 - Posted: 28 Nov 2025, 7:32:57 UTC Nice and quick to respond at the moment. Grant Darwin NT. ID: 117574 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 737	Message 117575 - Posted: 28 Nov 2025, 15:21:18 UTC - in response to Message 117574. In reply to Grant (SSSF)'s message of 28 Nov 2025: Nice and quick to respond at the moment. Well, that's over now. Back to ERR_CONNECTION_TIMED_OUT ID: 117575 · Reply Quote

Dr Who Fan Send message Joined: 10 May 07 Posts: 1830	Message 117576 - Posted: 28 Nov 2025, 16:02:14 UTC - in response to Message 117575. And at least for now, it's loading and a fast response time. Seems Cloudflare proxy servers are causing problems for people and websites as it's 'Black Friday' sales on the web in the USA. ID: 117576 · Reply Quote

Grant (SSSF) Send message Joined: 7 Dec 24 Posts: 282	Message 117578 - Posted: 28 Nov 2025, 22:51:52 UTC - in response to Message 117575. In reply to Grumpy Swede's message of 28 Nov 2025: In reply to Grant (SSSF)'s message of 28 Nov 2025: Nice and quick to respond at the moment. Well, that's over now. Back to ERR_CONNECTION_TIMED_OUT It's loading for me, if i give it a minute or so. Another minute or so to get the forums, then another to get a thread to load ... I'm not going to risk an attempt to post something. Grant Darwin NT. ID: 117578 · Reply Quote

Suzie-Q Send message Joined: 27 May 14 Posts: 28	Message 117604 - Posted: 3 Dec 2025, 3:35:03 UTC I give up. Every day I try to connect. Every day I immediately get a 403 Forbidden error. 8^)~~ ~~~~~ ID: 117604 · Reply Quote

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1456	Message 117606 - Posted: 3 Dec 2025, 7:48:25 UTC - in response to Message 117604. Make sure you are using "https" and not "http" - the net appears to have been tightened after the last round of problems ID: 117606 · Reply Quote

Grant (SSSF) Send message Joined: 7 Dec 24 Posts: 282	Message 117621 - Posted: 4 Dec 2025, 5:11:07 UTC - in response to Message 117604. In reply to Suzie-Q's message of 3 Dec 2025: I give up. Every day I try to connect. Every day I immediately get a 403 Forbidden error. It just took me about 6 goes to be able to "Mark all threads as read." Kept getting Forbidden You don't have permission to access this resource. Apache/2.4.52 (Ubuntu) Server at setiathome.berkeley.edu Port 443 messages. That was after waiting 30-40 seconds for the site to load. Grant Darwin NT. ID: 117621 · Reply Quote

Copyright © 2026 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.