How to Get Your Site Indexed (or Not)
How To Get a Site Indexed: Have your site pages indexed by the Google Enterprise Search Appliance
The Cal Poly Google search service automatically indexes all pages linked from the list of Cal Poly domains. There is no need to submit your pages to be indexed as long as your site is within one of these domains and is linked from a page within one of these domains. That's it! It's that easy!
Making sure your pages are available to Web visitors in a Cal Poly search is made easier if you follow the tips and guidelines for making pages more search friendly.
If you have made your Web site search engine friendly and you are still finding that your site isn't showing up in the search results review these quick checks to see if you can determine a fix for the problem:
- Make sure the site is hosted in a Web domain that is included in the list of crawled domains.
- Make sure you haven't added a robots.txt file or meta tag that prevents the Google robot from indexing your page.
- If these quick checks prove unfruitful toward getting your site indexed, please review additional checks listed on Can't find a document that I know is on a Web server.
For additional information about making your site Google-friendly, please consult the tips and guidelines for making pages more search friendly.
How To Prevent Indexing: Keeping an entire Web site, a Page, or specific content on a Page out of the Google Index.
In addition to understanding the basics on how the Google Search Appliance crawls and indexes Web sites, it's also useful to know how to structure your site in such a way that you can easily control what gets indexed in your site and, if you so choose, what doesn't get indexed.
Site/folder Control - Crawler name for Robots.txt
Use the name csu-gsa-crawler in a robots.txt file if you want to prevent the CSU Google Search from crawling a site. For detailed information on how to keep a search engine from indexing an entire Web site or a folder see the robots.txt section in the Web Authoring Resource Center for a complete discussion of the robots.txt exclusion method.
Page Control - Keep a page out of the index
For detailed information on how to keep the Google Search Appliance from indexing an individual page see the META Tag Robot Control section in the Web Authoring Resource Center for a complete discussion of the Meta Tag exclusion method.
Content Control - Googleon/Googleoff Tags
The Googleon/Googleoff tags can be used within the HTML code of a document to tell the Search Appliance what portions of a Web page to index and what not to index. More information about these tags can be found in the Googleon/Googleoff section of the page Make a Search Engine Friendly Web Site.