Enough of myth-busting! Today I will answer a few search engine optimization questions, some received from my NuttieZine subscribers and others gotten from public forums. Now, if you think my answers are not up to the par, feel free to kick my butt by posting a comment below!
On to the Q’s and A’s:
Q1: Should I make sitemaps “noindex,nofollow”?
A1: Once upon a time, that is exactly what I used to do, based on the suggestion of a forum member who claimed to do just that to get “Google Juice”. The guy believed that if one doesn’t forbid search engines from indexing and following the sitemap links, he would suffer from duplicate content penalty.
Over time I thought over it and realized that while forbidding search engines from indexing the sitemap seems alright (because it is not a page worth indexing; I mean, there are more important pages that need to be given higher indexing priority), “no following” the whole sitemap seems to be self-defeating. I mean, if you don’t want search engines to find out and follow the links of your sitemap then why create it at all?
From that time onwards I changed the meta tag of the “Head” section of my sitemap page to the following:
<meta name=”ROBOTS” content=”noindex, follow “>
I use this free sitemap generator tool to create sitemaps, and my choice seems to be a good one because Google™ recommends this tool in their list of recommended sitemap generators. From the generated sitemaps, I download only the .xml and .html files; the .xml file comes handy for submitting my sitemap to Google, and the .html file is what I link to from my website’s homepage!
Far from suffering from “duplicate content penalty” (which is by the way a myth), I have gotten more of my useful pages indexed in Google! It seems that due to my making the sitemap page “nofollow”, Google did not follow the links on that page and naturally was unable to index some of my pages (because there was no other way to find them except through the sitemap).
It is important to understand that search engine find your webpages in this manner:
Home page=>Internal and External Links in Homepage=>Search Engine Spider Follows these Links, Finds Out and Indexes the Other Pages!
Now, there might be some pages that you don’t link to from your homepage, and search engines robots would have no way to find them out except through a sitemap. Therefore, the sitemap MUST be linked to from your homepage!
Sure, if you have a “duplicate sitemap” anywhere else on your website, or say, a page full of reciprocal links, you can safely forbid search engines from following and indexing them!
Q2: When is the right time to add content to or build backlinks for my website? Should I wait for Google Pagerank Updates?
A2: Here is the real story about Google Pagerank Updates
So, it does not make much sense to bother yourself with it, does it? A Google PR update hardly means anything to real entrepreneurs; in my humble opinion, its scope and power is limited to the green bar you see on your Google toolbar!
Let your site’s pagerank go up or down, but you should keep adding content and backlinks consistently!
Yeah, yeah, there used to be a time when I regarded a pagerank boost as the greatest reward for my efforts, but no more. You live and learn, so to speak!
Q3: How do search engines see my website?
A3: There is a lot of speculation going about it. Some people say that search engines can read JavaScript, Flash, Multimedia, etc., while others recommend building only “text-based” websites! Since I am an ordinary guy and not any seo guru, I prefer to go by what Google says in its webmasters guidelines. Google says that “Most spiders see your site much as Lynx would. If features such as JavaScript, cookies, session IDs, frames, DHTML, or Macromedia Flash keep you from seeing your entire site in a text browser, then spiders may have trouble crawling it.”
However, in this article Google includes “Flash” among the list of file-types it can index! Carefully read the following two paragraphs:
“In general, however, search engines are text based. This means that in order to be crawled and indexed, your content needs to be in text format. (Google can now index text content contained in Flash files, but other search engines may not.)
This doesn’t mean that you can’t include rich media content such as Flash, Silverlight, or videos on your site; it just means that any content you embed in these files should also be available in text format or it won’t be accessible to search engines. The examples below focus on the most common types of non-text content, but the guidelines are similar for any other types: Provide text equivalents for all non-text files.”
This article further stresses Google’s ability to index Flash files.
Now the question remains: should you add multimedia on your website or limit yourself to just text content? The answer to this question is probably best answered in this line: “Provide text equivalents for all non-text files“!
If you are still unsure, I think going the “trial and error” way is your best choice, especially if you cannot afford to have a “text-based” site.
BTW, if you don’t want to install Lynx browser, there are two options:
a) Here is a free online tool that emulates the Lynx browser, and it is very hands-free: http://www.delorie.com/web/lynxview.html
The only catch is that you would need to upload a special file called delorie.htm or delorie.gif on your web server to prove your ownership of the respective website. The file can be empty.
While it is perfectly understandable that the webmaster has done this to prevent abuse of his resources, I wish he would have made it clear on the very first page.
b) Web developer Toolbar for Firefox: With the help of this Firefox addon, you can see your site just as you would through Lynx. Here is how: you can disable all non-textual content such as images, css, cookies, javascript, forms, popups, etc., from showing up on your browser (I don’t see a way to disable flash content, but since I don’t use flash on my websites, I have never felt the lack of it)!
What remains at the end is text content! Once you are happy with your website’s layout, you can re-enable all the disabled options!
I have used it this addon with good results. I had originally downloaded it to use it as an alternative to Lynx (I am terrified of installing any more softwares programs on my machine since there are already a boatload of them), but now I love it for more than one reason. Download it and you would know why!
This (along with Firebug) is probably the best friend of a web designer or programmer! It should be in everyone’s toolbox, irrespective of whether you are an expert web developer or a plain Nuttie Guru like me!
Q4: What are canonical URLS and what is their importance in SEO?
A4: A canonical URL is the BEST URL of your website. Let me give you some examples:
The following URLs may not look different to you:
- www.abc.com
- http://abc.com
- http://abc.com/index.php
- www.abc.com/index.php
But search engines DO differentiate between them! Search engines, unlike humans, see them as four different URLs! When building your site links, it is important to decide on your site’s canonical URL (this in turn would influence your website’s internal linking structure as well as the links in your Google sitemap). For example, if you use links starting with http://yourdomain.com on your site, don’t expect the search engines to index your site as http://www.yourdomain.com; the reverse is also true!
It is also important that you remain consistent with your choice of canonical URL. For example, if you have decided to go without the ‘www” part, stay with that choice. Don’t make 50% of your site’s links starting with “http://domain.com” and the other 50% with “http://www.domain.com” !
But what if you have made the “mistake” of not using the “www” part in your URLs and now want to have it? There is an easy way out. Create a 301 redirect using your website’s .htaccess file, so that all traffic (human and search engine spiders) coming to http://domain.com would be automatically redirected to http://www.domain.com.
Just open your website’s .htaccess file (it is usually located in the root folder of your site, but if you cannot see it there, just set your FTP program to show all the hidden files of your server) and add the following lines to it:
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^domain.com [nc]
rewriterule ^(.*)$ http://www.domain.com/$1 [r=301,nc]
Obviously, you should replace domain.com with YOUR domain name!
In my opinion, PSPAD is one of the best editors for editing files such as .htaccess, .ini, etc.; it is what I mainly use for this purpose!
Once you are done editing, re-upload the .htaccess file back to your server!
Note that this type of redirection only works on Linux servers that have the Apache mod-rewrite module enabled! If you are not sure then check with your web host first! Most standard web hosts do have the mod rewrite enabled by default!