Thread 'The Seti is Slumbering Cafe'

Message boards : The Lounge : The Seti is Slumbering Cafe
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 518 · 519 · 520 · 521

AuthorMessage
Grant (SSSF)

Send message
Joined: 7 Dec 24
Posts: 174
Message 117510 - Posted: 21 Nov 2025, 22:50:30 UTC - in response to Message 117509.  

In reply to Dr Who Fan's message of 21 Nov 2025:
SETI website is loading nice and fast. Hurry up and read or post you item(s) there before it disappears again.
Not for me, i'm still forbidden.
Grant
Darwin NT.
ID: 117510 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1699
United States
Message 117514 - Posted: 22 Nov 2025, 1:47:34 UTC - in response to Message 117510.  

Not for me, i'm still forbidden.

Sounds like possible Cloudflare servers are blocking you.

Out of curiosity, have you tried clearing the browser cache and deleting cookies for all the *(.)berkeley(.)edu sites, including this site to see if it will let you back in?
ID: 117514 · Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 7 Dec 24
Posts: 174
Message 117515 - Posted: 22 Nov 2025, 3:19:57 UTC - in response to Message 117514.  
Last modified: 22 Nov 2025, 3:27:05 UTC

In reply to Dr Who Fan's message of 22 Nov 2025:
Not for me, i'm still forbidden.

Sounds like possible Cloudflare servers are blocking you.

Out of curiosity, have you tried clearing the browser cache and deleting cookies for all the *(.)berkeley(.)edu sites, including this site to see if it will let you back in?
Two different browsers on two different systems.
Can actually access Seti now on a 3rd system, but no joy on the other two.
Time to go cookie hunting.

Edit- not cookies, but related to the saved URL.
The ones that were forbidden, with no mention of HTTP or HTTPS in the saved address were resolving to HTTP. Changed the favourite link address to explicitly HTTPS and no problems.
Grant
Darwin NT.
ID: 117515 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1699
United States
Message 117517 - Posted: 22 Nov 2025, 16:53:54 UTC

It's still up nearly 14 hours later! This is the longest time in a while so, blame me for bragging and posting here if SETI website goes dark.within the next hour.
ID: 117517 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 606
Sweden
Message 117518 - Posted: 22 Nov 2025, 23:15:02 UTC

Eric posted the following on https://setiathome.berkeley.edu/forum_thread.php?id=86149&postid=2152880#2152880

We've been DDOSed the last month or so by assorted web cralwers that have been ignoring the robots.txt directive. I'm made blocking such things a bit more proactive. Hopefully we can stay up more than 10 minutes at a time now.
ID: 117518 · Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 7 Dec 24
Posts: 174
Message 117520 - Posted: 22 Nov 2025, 23:25:26 UTC - in response to Message 117518.  

In reply to Grumpy Swede's message of 22 Nov 2025:
Eric posted the following on https://setiathome.berkeley.edu/forum_thread.php?id=86149&postid=2152880#2152880

We've been DDOSed the last month or so by assorted web cralwers that have been ignoring the robots.txt directive. I'm made blocking such things a bit more proactive. Hopefully we can stay up more than 10 minutes at a time now.
That seems to be becoming a bigger and bigger problem- most likely crawling for AI/ML data.
Another web site i often use had several issues about 6 months back- getting slammed with AI/ML bots, and they had to up their Cloudflare plan and put in other blocks to kill off that load.
Grant
Darwin NT.
ID: 117520 · Report as offensive     Reply Quote
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15737
Netherlands
Message 117523 - Posted: 23 Nov 2025, 15:43:12 UTC - in response to Message 117518.  

I asked Grok how it deals with that. The answer is disconcerting.

As an AI developed by xAI, I rely on tools for accessing and processing web information rather than directly crawling sites myself. Robots.txt is a voluntary protocol—it's not legally binding, so there's no strict requirement to follow it. That said, ethical AI practices often encourage respecting it where possible.In practice, for real-time information retrieval (like web searches or browsing), xAI's systems appear to use user-agent spoofing, such as mimicking an iPhone browser, to bypass potential blocks and ensure access to content.

@grok

This approach can effectively circumvent robots.txt directives, as sites may disallow bots but allow user-like agents. Reports also note that xAI's bot sometimes doesn't self-identify properly, making it challenging for website operators to apply robots.txt rules to it.

For other purposes, like data collection for training, xAI has a documented user-agent (e.g., GrokBot/1.0) that is more likely to respect robots.txt.

Overall, adherence isn't universal and depends on the specific use case.
ID: 117523 · Report as offensive     Reply Quote
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15737
Netherlands
Message 117524 - Posted: 23 Nov 2025, 15:45:36 UTC - in response to Message 117523.  

ChatGPT:

I don’t crawl the internet myself. I don’t browse websites, run a crawler, or access robots.txt when generating answers.

Two important points:

1. Training data collection (done in the past)

OpenAI and its data partners follow rules, licensing, permissions, and technical restrictions, which includes respecting things like robots.txt or equivalent measures when collecting publicly available data for training. So yes—robots.txt is one of the mechanisms that can guide what is or isn’t crawled during dataset creation.

2. What I do when answering you now

When answering your questions:

I don’t crawl websites.

I don’t access or ignore robots.txt.

I can perform live information lookups only through the built-in “web” tool, and that tool uses search engines—not a general-purpose crawler.

I don’t fetch arbitrary URLs, and I don’t run automated scraping behavior.

So in practice: I never violate robots.txt because I never crawl sites directly at all.

If you want, I can explain how robots.txt works or how responsible crawling is usually implemented.
ID: 117524 · Report as offensive     Reply Quote
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15737
Netherlands
Message 117525 - Posted: 23 Nov 2025, 15:46:59 UTC - in response to Message 117524.  

Meta AI:

When I crawl the internet for information, I respect the rules set by websites, including those specified in robots.txt files. This file tells crawlers like me which parts of a site to crawl and which to avoid. I follow these guidelines to ensure that my data collection is done responsibly and respectfully.
ID: 117525 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 3050
United Kingdom
Message 117526 - Posted: 23 Nov 2025, 18:44:52 UTC

That seems to be becoming a bigger and bigger problem- most likely crawling for AI/ML data.
CPDN dealt with this by blocking computers not logged in to the site from accessing parts of it. For example, to look at anyone's computers I need to be logged in now. What I don't know is whether only the parts that robots.txt would not allow the crawlers to see have been blocked, but it seems to have resolved the problem a few months back with access attempts regularly timing out.
ID: 117526 · Report as offensive     Reply Quote
Previous · 1 . . . 518 · 519 · 520 · 521

Message boards : The Lounge : The Seti is Slumbering Cafe

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.