Relative vs. Absolute URLs and SEO07/11/2014
A relative URL is any URL that doesn't explicitly specify the protocol (e.g., "
http://" or "
https://") and/or domain (
www.example.com ), which forces the visitor's web browser (or the search engine bots) to assume it refers to the same site on which the URL appears.
Used wisely, this can be incredibly useful for developers, but in the wrong hands it can lead to huge problems for your site, both for internal navigation and for the search bots.
Relative URLs come in three flavors:
The URL begins with the name of a page, or the name of a path (folder, directory, whatever) containing a page. Browsers and bots assume this link refers to a page that is either in the same directory as the page on which the link appears, or in a subdirectory below it.
For example, let's say your site has a Services section, and under that you have a subsection called Video Production , and that the overview page for this service has the URL:
Let's also say the site has even more-specific pages on different types of video production services, such as Corporate Video which has this URL:
Now let's say, back on the "Video Production" overview page, you have a link to this Corporate Video sub-page with the URL specified as:
Having no domain or foreslashes, this link is assumed by browsers and bots to be relative to the path of the page on which it appears, correctly calculating the absolute URL for this link as:
This is a perfectly-valid use for path-relative URLs, but any new opportunity for error-making is going to lead to more maintenance issues, like when you decide to rearrange your content structure. Now you've got a lot of extra QA work tracking down any broken relative links you might not have noticed while moving the content around.
Relative Directory Traversal
Another kind of path-relative URL is the UNIX-style directory traversal method of using a dot-dot-slash ("
../ ") to refer to the parent directory of the current page. Expanding on the previous examples, let's say your Services section has an Advertising subsection, and under that a page for YouTube Pre-Roll Ads , i.e. "
If you want to cross-link to that page from your Corporate Videos page, you can use dot-dot-slash notation in the link on the Corporate Videos page, i.e. ("
../advertising/youtube-preroll "), the absolute URL for which is calculated as:
Any kind of path-relative URL is likely to break when moving content vertically through the site hierarchy, but relative directory traversals can be even more risky as it's not as likely to be noticed prior to moving a particular resource. You can't locate the problem by find-replace like you can with other path-relative URLs, which can become a real headache over time if you're relying on dot-dot-slash notation throughout the site.
Another issue with this method occurs when content managers, designers, developers and site administrators -- with their broad range of technical skill and attention to detail – simply omit a dot, starting a URL with dot-slash ("
./ ") instead of a dot-dot-slash. Returning to the previous example, linking from Corporate Videos under Video Production to YouTube Pre-Roll Ads under Advertising with just a dot-slash (i.e., "
./advertising/youtube-preroll ") would result in the absolute URL being calculated as:
Which is a bad URL, but since dot-slash is basically a circular reference to "here" , there are circumstances where this can also turn into a bot trap due to infinite recursion on a URL that grows infinitely longer with each step of the crawl through the false hierarchy implied by a dot-slash, and that can really wreck your site's crawl budget.
The leading foreslash before "services" indicates that this URL is relative to the root of the site's URL structure, rather than the path of the page on which it appears. In this case, the absolute URL is calculated to be:
Root-relative URLs are probably the safest kind of relative URL overall, both for minimizing the potential for human or bot error, and for simplifying site maintenance. When absolute URLs aren't an option, root-relative URLs are probably your best bet.
The double-leading-foreslash ("
// ") tells the browser or bot to use the same connection scheme or protocol (i.e., either HTTP or HTTPS) as used to request the page on which the URL appears, so if this URL is on a page whose URL begins "
https:// ", then protocol-relative URLs on that page should also begin with "
You might use protocol-relative URLs to reference scripts or images in a page template that's used in both the secure and non-secure areas of the site, which is typical of e-commerce sites where the UI elements remain consistent from shopping to checkout, but they're a bad idea for links between pages or for canonical tags since the bots could potentially end up crawling the same page in both HTTP and HTTPS modes, which are considered unique URLs even though the rest of the URL after the "
http:// " or "
https:// " may be identical. Such a scenario could lead to not only wasted crawl budget, but potentially severe duplicate content issues.
The Case for Absolute URLs
Any time you leave something up to the bots to decide, you're asking for trouble. Googlebot is extremely sophisticated, but it can still make mistakes when encountering any unusual circumstances involving relative URLs. Less-sophisticated bots are even more likely to get confused by relative links, even under typical circumstances. The only way to ensure that you've eliminated these uncertainties is by utilizing absolute URLs, and doing so consistently across the site.
Relative links specified in canonical tags are a terrible idea, as they violate the very purpose of the canonical tag: disambiguation. Always, always specify canonical URLs in absolute form, or don't use the canonical tag.
If you employ a staging environment (e.g., "
dev.example.com ") separate from your production environment that serves your live site, you may find the idea of absolute URLs to be completely insane at first glance. However, most CMSes can be made to generate absolute URLs dynamically based on the current server environment (i.e., all URLs specify "
www.example.com " unless you're on the staging server, then it's "
dev.example.com "), which gives you a fully-functional link structure in the development environment that you don't have to think twice about when deploying to production. WordPress, for instance, always generates absolute URLs everywhere link URLs aren't hardcoded into the content.
If your staging site is publicly-accessible on its own, you definitely don't want to be using relative URLs in the site navigation links (menus, footers, etc.) because even a single incorrect link deployed to production can open up the whole staging site to being crawled, indexed and returned in search results. The potential implications of having multiple copies of your site in search results are difficult to understate, and range from wasted crawl budget, through duplicate content issues and UX confusion, to serious security concerns depending on the setup.
If your site publishes RSS feeds, relative links appearing in the feed content probably won't work, so inline links in content should always be specified as absolute URLs. They also help to deter all but the most sophisticated web-spammers from republishing content their spambots have scraped from your site.
Image search can also benefit from specifying absolute image URLs by asserting ownership in the image search results, thereby reducing duplicates from competing sites, particularly in cases where they're hotlinking to original image content on your site.
Depending on the circumstances, it may not be as worthwhile to fix relative URLs on an existing site as it is to remember when building-out new sites.
The most important rule is for your site to stay internally consistent. Any internal linking scheme that's at least self-consistent presents fewer opportunities for error on the part of writers, developers and ultimately search bots, for whom you should always be rolling out the proverbial red carpet [of absolute URLs].