NoahBot: Website Access and Crawling Policy

As per industry standards, our organization uses NoahBot to interact with public web content in support of AI development, user-directed search, and web content retrieval. We prioritize transparency and respect for website owner preferences.

Below is information on how NoahBot works and how you can configure your site to manage access.


Bot Name

NoahBot

Purpose

NoahBot is used to:

  • Collect publicly available web content that may help improve the safety, relevance, and performance of our AI models.
  • Retrieve web content in response to user-directed prompts or searches.
  • Enhance search result quality by analyzing online content for indexing and ranking.
What Happens if You Disable NoahBot

When your website blocks NoahBot:

  • Your content will be excluded from consideration for future AI model training.
  • NoahBot will not be able to retrieve your pages in response to user queries.
  • Your content may not be indexed or factored into search result improvements, which could impact visibility.

Our Crawling Principles

  • Transparency: NoahBot identifies itself clearly and adheres to standard crawling protocols.
  • Non-Intrusiveness: We follow respectful crawling practices, including obeying Crawl-delay directives to reduce server load.
  • Respect for Robots.txt: NoahBot fully respects robots.txt directives and will not access disallowed content.
  • No Circumvention: We do not bypass CAPTCHAs or other access-control mechanisms.

Example: Blocking or Limiting NoahBot

To limit or block NoahBot’s access, modify your robots.txt file at the root of your domain and each relevant subdomain.

To Completely Block NoahBot:

User-agent: NoahBot
Disallow: /

Please note:
Blocking NoahBot by IP is not recommended, as we do not publish IP ranges. We rely on standard robots.txt directives for compliance.