A Guide to Clean URLs for SEO and Usability

A Guide to Clean URLs for SEO and Usability

What is a Clean URL?

A clean URL is one that is easily read and does not contain any query strings or URL parameters.

Take a look at the URL below:

http://example.com/services/index.jsp?category=legal&id=patents

This URL does not easily describe the title or contents of the page at a glance. These bits of text; index.jsp?category= and &id=, are URL parameters that give an unclean look to your URL. Here is another version of the same URL:

http://example.com/services/legal/patents

Obviously the second example has a more straightforward, professional look and is more likely to be clicked on when shared on your Twitter or Facebook profile, or simply in a blog.

The Importance of Clean URLs:

Clean URLs are preferred by people, and it just so happens that search engines prefer them as well. Keywords in URLs are, more often than not, used to identify the relevance of a page when a search for a particular keyword is performed. However, it is not generally recommended to stuff keywords in your URLs for SEO purposes. The idea is to deliver enhanced usability to help users remember and share your URLs more easily. At the same time, clean URLs also facilitate a search engine’s ability to identify the relevance of page content to a particular search query, in order to choose what to display in search results. Indexing a clean URL is far easier than crawling and indexing a messy one. The screenshot below is of page-one search results for the keyword Legal Trademark. Notice that the URLs are generally clean and contain at least one if not both of these words.

First-Page Search Results Demonstrating Clean URLs

First-Page Search Results Demonstrating Clean URLs

A clean URL also encourages higher click through rates in search results.  Take the examples shown above. Users are likely to click these URLs because they are easy to understand and seem highly relevant, especially ones with the keywords highlighted prominently in the URL.

What are URL Parameters?

A URL parameter is the variable in a web address that appears after the question mark (?). These are also called query strings or query paths, and sometimes they are added in order to track a particular URL, in which case they are called tracking parameters. For instance, take a look at the URL below:

http://www.yoursite.com/?utm_source=book&utm_ medium=text&utm_campaign=test

In this URL, ?utm_source is a tracking parameter (also called a UTM parameter) that is usually associated with Google Analytics.

The presence of URL parameters are problematic in three particular places. These are:

  • Home page URLs.
  • Duplicate inner pages.
  • Pagination.

Home Page URLs:

The home page is the most important page of your website. It is the most linked-to page, both internally and externally. With so many inbound and outbound links pointing to and from the home page, it is necessary to keep the URL clean. Sometimes URL parameters, (as shown in the sample below), are created, resulting in multiple URLs pointing to the home page. This not only creates duplicate content, it also distributes the page’s link juice, thus reducing its page rank.

These are different forms of the same home page URL:

yoursite.com/index.html
yoursite.com/home
yoursite.com/home.html
yoursite.com/page/Home/0,,1234,00.html

The situation is worse when the www and non-www versions of the home page URL create duplicate content. For instance, if someone links to your site as yoursite.com, they might be taken to the URL www.yoursite.com/index.html. These variations caused by URL parameters do not help search engines decide which URL needs to be presented in search results. In such cases, search engines like Google group the duplicate URLs into a cluster and select the best URL to display in search results. The URL chosen by the search engines may not be the URL that your fans are linking to. Thus, your link equity is shared by different versions of your home page URL, which dilutes the SEO of your website’s home page.

In order to avoid this:

  • Choose the URL you want to be the original. This will be your canonical URL. To ensure your website traffic lands on the canonical URL, add a 301 redirect to the duplicate home page URLs that have session ids and query strings.
  • If 301 redirects are not possible, add rel=canonical tags to the duplicate pages.
  • Never link to your home page using any URL other than your canonical one. This includes external sites, your blog page, your social media profiles, email signatures and anything else.

Duplicate Inner Pages:

When URL parameters produce the same content more than once, it means a duplicate URL has been created. This can occur for home pages and inner pages. For instance, say you have a product page on your e-commerce site that is about green dresses. Due to the query strings and session ids created, different versions of the same URL are available to search engines. For instance, the sample URLs below all point to the same page content:

http://www.yoursite.com/products/women/dresses/green.htm
http://www. yoursite.com/products/women?category=dresses&color=green
http://yoursite.com/shop/index.php?product_id=32&highlight=green+dress&cat_id=1&sessionid=123&affid=431

Search engines decide which of these URLs to display in search results. They may choose the third URL, which is not clean, and if such a URL is displayed in search results it may not obtain as many clicks as a clean URL would. To avoid this you can do one of the following steps (the same as discussed above):

  • Choose the canonical URL. Add 301 redirects and send your website traffic to the canonical URL.
  • If 301 redirects are not possible, add rel=canonical tags to the duplicate pages.

Pagination:

When a page is unable to list all items in a single page, a new page is created with the same URL but different query strings, as shown in the sample below. This is called pagination.

http://www.yoursite.com/green-dresses?page=1
http://www.yoursite.com/green-dresses?page=2
http://www.yoursite.com/green-dresses?page=3

Search engines need to understand the relationship between each page in order to index them correctly and avoid duplicate content issues.

Add a rel=”canonical” tag to the paginated pages in the following manner:

<link rel=”canonical” href=”http://www. yoursite.com/green-dresses”>

It is best to also use rel=”next” and rel=”prev” tags so that search engines index these pages in the proper sequence. Read more about these tags on the Google Webmaster Central Blog. If your pagination URLs have a lot of session ids and query strings, it is best to get them cleaned.

Depending on the server and the platform with which your website was built, there are different ways to clean your web page URLs. Your server needs to have a technology known as “mod_rewrite” that is set up and enabled for your account. The mod_rewrite is a module available in the Apache server. Here is how to clean URLs in the Apache server:

First you need an .htaccess file to make your URLs clean. Open a blank Notepad document and save it as .htaccess. Paste the following into your file:

RewriteEngine On

The query string in your URL must be put into the .htaccess file. For instance, if your URL is

http://www.yoursite.com/index.php?page=articles,

then the query string is

index.php?page=articles.

Add this URL parameter into the .htaccess file in the following manner:

RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1

The ‘^’ stands for the URL where this .htaccess file is located, for example if you put it in the www.yoursite.com/green-dressesfolder then it stands for that URL. The ‘([a-zA-Z0-9]+)’ stands for any characters that are lower-case alphabetical, upper-case alphabetical and numerical. The ‘+’ sign indicates that any number of characters is fine. The ‘$’ sign denotes the end of the clean URL and is then followed by the URL parameter. The ‘$1’ part denotes that only one variable set has been specified. Upload this .htaccess file into your root public directory that contains your main index page.

Learn more about the various RewriteRules here.

You can change your settings within search engine tools so that they ignore certain parameters in your URLs. Bing allows for this in Bing Webmaster Tools and Google allows for it in Google Webmaster Tools. However, it is not advisable to configure site-wide parameters if you are not entirely sure of what you are doing.