Posted by Dave Sottimano
IIS Server through the eyes of an SEO
Disclaimer: This post is long and technical, but has been lovingly paraphrased for the benefit of non-technical SEOs to get involved and step out of their comfort zone. Recently, I’ve had to deal with sites running on IIS and rather than just prescribing universal SEO fixes, I decided to get my hands really, really dirty. This is what I’ve learned…
In this post: (Use the navigation links, trust me!)
- Brief explanation of what IIS is
- Why SEOs should care about IIS
- How to get a free, super powerful crawler – You need Windows 7
- IIS "out of the box SEO faults" you need to know about
- The Chaining 301 disease and a way to fix it
- Interview / SEO Resource guide for IIS 7+ - with Mark Ridley
- Possible sexual attraction to Microsoft products
Internet Information Services (IIS) is the second most popular web server in the world to Apache. IIS’s first big gig was in 1995 and still continues to power a massive portion of the internet today. Read more about Microsoft's history on the web
- You’ll likely have a client sooner or later that uses IIS
- There are some common SEO faults with IIS that you need to know
- Fixing those default "faults" can significantly improve the site's SEO
- You need to understand how this beast works before you say “301” or “clean URL”
- I will show you how to get a free kickass crawler – You need Windows 7 on your machine
- If you can speak IIS, those super smart developers will instantly warm up to you
- IIS popularity is growing (see image below)
IIS isn’t going anywhere, in fact it’s growing. Whether you appreciate Microsoft products or not, there are many that do and there’s no doubt that you’ll likely have to consult a client running IIS.
Requirements: Windows 7 (automatically comes with IIS 7.5)Sorry Mac! – muhahaha
How: IIS server & SEO toolkit on your Windows 7 machine
Why: It’s frickin incredible – It will make your SEO life much much easier.
Difficulty: Easy
Estimated Completion time: 5-7 minutes
Why it's so cool!
Watch this video, and tell me you DON'T need this beast :)
- Reports on content, SEO and other violations. Tells you how to fix it in plain english
- You can add your own custom violations
- Fully functional database of what's going on with your site
- Tons and tons of custom reporting, including mapping navigation paths
- Find orphaned pages from Sitemap
- And much much more. You expected that right :)
We can get you up and running in 10 minutes (minimum requirements): follow these instructions– this is EASY.
First, install IIS Server
- Start > Settings > Control Panel
- Programs & Features
- Click on "Turn Windows Features on/off"
- Find the Internet Information Services folder, expand it
- Expand the Web Management tools folder
- Make sure IIS Management Console is checked, click OK
Next, the IIS toolkit - Post here with screenshots
- Download the IIS Web Platform Installer, and install it
- Click on Start, search for "Web platform installer", open it up
- Once in the web platform, type in "SEO" in the search bar - top right
- It should be the last item in the list "Search Engine Optimisation Toolkit", click add, then install
- Click on start, type in: inetmgr
- Look for the Icon that says "Search Engine Optimisation Toolkit", and double click
- You're done! You now have an insane, free crawler on your machine :)
Now you need to:
- Crawl a site!
- Understand the analysis!
- Enjoy, you'll love it :)
Read Microsoft's official instructions for the IIS SEO toolkit
- Default Documents: Read more about this here. Example of a default document: www.example.com/index.html www.example.com - will return the same content, different URLs
- Case insensitive IIS by default will serve the same content regardless of casing Example: www.example.com/BLOG and www.example.com/blog are the same content, different URLs
- Works with www and non www (site canonicalization) By default, like most servers you'll get the same content from www.example.com and example.com
Why it’s a problem: You get a hard earned inbound link to the wrong URL and it doesn’t give you full credit – “but it’s the same page!” No, it’s not, and no, Google doesn’t magically fix this even though they try.
But hey! No one is going to link to your site like this! Right? cc Matt Cutts 2006
- www.example.com/
- example.com/
- www.example.com/index.html
- example.com/INDEX.html
Yes, yes they do. Not on purpose, but they do.
This is a very real problem for older, larger sites that hadn’t canonicalized their domain, or bothered with consistent URL structure in the past. Fast forward to 2011 and we SEOs start nitpicking for every bit of link juice and we notice that we’re missing out on all of these external links because we’re not redirecting these inbound links to the right pages. We need to write some permanent redirect rules to catch these.
Simple, right? When someone links to our real page www.example.com/blog/ like this: example.com/Blog
We just do a 301 redirect back to www.example.com/blog/ ! Happy days right? Please continue…
It’s best to explain this by example. Open up this page from Cheapflights.co.uk.
Notice the capital L (london)? What happens when we request http://www.cheapflights.co.uk/flights/london/?
Brilliant! Even if someone incorrectly links to the lowercase version, we get a 301 redirect to the correct page http://www.cheapflights.co.uk/flights/London/
Here’s where I get crazy. Let’s say I decide to link to http://cheapflights.co.uk/flights/london
I want you to notice a couple of things here:
- No www
- London is london
- Monitor the http requests, try HTTPFOX for FireFox
Whoa! Two 301 redirects! We know one 301 loses link juice, but 2?The point here is that Google is far less likely to pass on PageRank with each chained redirect, you should always aim for no more than one 301 redirect.
This is referred to as chaining 301 redirects (My example is specific to IIS)
Why is this happening?
When you are creating redirect rules, the server usually obeys each condition depending on what it finds. As a good SEO, you should always canonicalise the domain – as a smart SEO you’ll probably want to stop any uppercasing goofs too, maybe even trailing slashes while you’re at it!
My obsession – I noticed this all over the web and I was obsessed with finding the “catch all” redirect rule that would only redirect once, to the right URL.
I failed, until I got some very clever friends involved. Meet Daniel Furze, developer at Reed.co.uk.
A way to fix the chain for common SEO issues
I'm nicknaming this the "Furze Method" :) This method has been tried and tested, so we know it works. You should add this rule base in web.config for a .net app. Learn more here.
We're using IIS 7 and the url rewrite 2.0 module here which allows us to run regex rules against different parts of the url and then either Redirect or Rewrite them. The ability to rewrite is what I'm using to make sure we keep 301 chaining to a minimum. It allows us to run several rules against the url, cleaning it up in stages before doing one final Redirect. The trick to this is that when a rewrite rule matches and it does it's job it adds a _ to the path of the url. The final rules look for any path beginning with _ and will then strip this out and redirect. Let's have a look at the rules and then break them down in to sections.
Section 1
<rule name="WhiteList - resources" stopProcessing="true">
<match url="^resources/" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false" />
<action type="None" />
</rule>
The first rule is very important here as there is no need to be cleaning up resource requests (remember all requests pass through here including image, css and js files with an integrated pipe).
Section 2
<rule name="SEO - remove default.aspx" stopProcessing="false">
<match url="(.*?)/?default\.aspx$" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_METHOD}" pattern="GET" />
</conditions>
<action type="Rewrite" url="_{R:1}" />
</rule>
<rule name="SEO - Remove trailing slash" stopProcessing="false">
<match url="(.+)/$" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_METHOD}" pattern="GET" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
</conditions>
<action type="Rewrite" url="_{R:1}" />
</rule>
<rule name="SEO - ToLower" stopProcessing="false">
<match url="(.*)" ignoreCase="false" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_METHOD}" pattern="GET" />
<add input="{R:1}" pattern="[A-Z]" ignoreCase="false" />
</conditions>
<action type="Rewrite" url="_{ToLower:{R:1}}" />
</rule>
The 2nd section of 3 rules will tidy up different parts of the url
- remove default.aspx, since we are using .net we want to clean up any stray default.aspx from the path (less important with a greenfield mvc app), you can change this rule for any other document types that you like including index.html.
- clean up trailing slashes from the path of the url ie www.yoursite.com/news/ becomes www.yoursite.com/news
- lowercase the whole path part of the url, the querystring isn't touched as these are often case sensitive so www.yoursite.com/News becomes www.yoursite.com/news
Section 3
<rule name="SEO - http canonical redirect" stopProcessing="true">
<match url="^(_*)(.*)" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_HOST}" pattern="^www\.yoursite\.org$" negate="true" />
<add input="{HTTP_METHOD}" pattern="GET" />
<add input="{SERVER_PORT}" pattern="80" />
</conditions>
<action type="Redirect" url="http://www.yoursite.org/{R:2}" />
</rule>
<rule name="SEO - https canonical redirect" stopProcessing="true">
<match url="^(_*)(.*)" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_HOST}" pattern="^www\.yoursite\.org$" negate="true" />
<add input="{HTTP_METHOD}" pattern="GET" />
<add input="{SERVER_PORT}" pattern="443" />
</conditions>
<action type="Redirect" url="https://www.yoursite.org/{R:2}" />
</rule>
<rule name="SEO - non-canonical redirect" stopProcessing="true">
<match url="^(_+)(.*)" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_METHOD}" pattern="GET" />
</conditions>
<action type="Redirect" url="{R:2}" />
</rule>
The 3rd part is a little more tricky, this is the part that does the actual Redirect but it is also responsible for one last tidy-up as well. The first 2 rules will fix missing www and then redirect while also respecting the port the requested url was using ie http://yoursite.com becomes http://www.yoursite.com and the same for https. The reason these 2 rules are separate is that I haven't figured out how to add www without also specifying either http or https. The 3rd rule will do a redirect if any of the SEO Rewrite rules caught but the url already had www in it. You can see that these last 3 rules pay attention to _ which might be in the path from the previous rewrite rules.
One final gotcha is to make sure you don't actually have any geniune urls that begin with _ , if you do then you can substitute this for another special character but you will have to make sure it all still works. And also to pay attention to which ones have stopProcessing="true" on them, typically a Redirect will stop and a SEO rewrite will carry on. If you are wanting to put in any other rules for say redirecting email links or migrating old pages to new pages then I would recommend putting them in between any Whitelist rules you have and the SEO rules at the bottom.
The final result is a very mangled url like:
http://yoursite/NeWs/default.aspx?id=123 will redirect to http://www.yoursite/news?id=123 with only one 301.
You can see how it works live here..please disregard the domain name!http://a4uexpobavarianbeerandsausageonstand50.org/ArTIcle/
You'll notice that our non www, uppercase and trailing slash version 301's to http://www.a4uexpobavarianbeerandsausageonstand50.org/article with one 301! Cool!
Mark Ridley is the Head of Development & an IIS expert at Reed and a personal friend of mine that I called for help. Oh feel free to bug him on Twitter, just say I told you to :)
I need to 301 redirect page A to page B, how can we do this?
Dave: This is fairly easy, we won't reinvent the wheel - please see detail instructions here. Updated: Another great resource (full tutorial) has been discovered thanks to Alan in the comments
We are migrating a site and we know which pages to redirect to, can you send us the rewrite map template so we can fill it out? (Basically explain that the Devs could send a rewrite map template and then just apply rules)
Dave: Yes, it is possible. Get a developer to send you a template, use this resource here.
How will ASP, .net, PHP react to our redirect rules? Are there any common pitfalls?
IIS URL Rewrite rules will execute before any rewrites or redirects that your application insists upon. There are some very technical reasons for this. Essentially, the web server thinks it's more important than your code, which is probably correct. The only pitfalls to watch out for, exactly as with Apache, is that you're making sensible rules on the web server. It is a little bit too easy to make infinite redirect loops, or at least very long chains of redirects that won't make you popular with search engines
Is there anything Apache can do that IIS can't?
There's WAY better community support for Apache, and a much wider adoption. That doesn't mean that IIS has poor support, just that the number of articles telling you about Apache is going to outnumber IIS by orders of magnitude, partially why this article is important. Other than that - yes, some Apache modules make Apache do things that IIS can't, but there aren't many things that will trouble the 80% of 'normal' users.
Do you prefer IIS or Apache? Why?
I refuse to get involved in a religious argument! Both are very capable web servers (number 1 and 2 in the market according to Netcraft), and often the decision to use one or the other is not based on the capability of the web server itself. Microsoft houses will use IIS and Free/Open Source or *nix houses will use Apache (or nginx). It doesn't hurt anyone to understand both, especially as your client will have made the decision to use one or the other, and probably won't want to change. It's enough to know that both are sensible choices depending on your reasons.
How can I speed up page load time with IIS?
IIS, like Apache's mod_gzip, allows you to turn on gzip compression of both static and dynamic content. It's a very simple change that will make a massive difference to your page download speed, and your bandwidth costs! This was added in IIS7 and was a long overdue feature. All modern browsers support content compression, and there is almost no downside to enabling it.
A more technical, and complex solution is also available in addition to content compression, which is output caching (http://learn.iis.net/page.aspx/710/configure-iis-7-output-caching/). This is probably something best left to the developers and server administrators to argue over, as whilst it can have benefits they are less obvious and harder to achieve than compression.
Can you easily give me log files? What information can I get? What software do you recommend to parse the logs?
Of course! IIS logs just the same way Apache does - writing log files into a folder. You can use exactly the same format as Apache, or several other popular web log formats, which are simple configuration options in IIS. Log parsing is often now less common with the massive improvements in online services, such as Google Analytics, Webtrends or Omniture. With Google Analytics being free, you're in a position to do away with pesky logs and let your marketeers do the analysis without worrying about the underlying technology. The free 'Live' view which is currently in Beta in GA is also amazing, and will sell itself to execs and marketing folk, as they can see who is on your site right at the moment. If you want a little more detail from your own logs for a reasonable price, I've often used Sawmill to great effect. AWStats is also popular with the more technical FOS community.
Can you explain (basic) how you set up a dev server and then deploy to live? (what are common issues)?
This is enormously variable based on your server architecture, your connections and your technical decisions. On a Microsoft stack, Microsoft provide tools such as Web Farm Framework and Web Deploy which can be a great help and offer a lot of heavy lifting. Some people with simpler sites are happy with FTP or WebDAV deployments. We use our own cocktail of Powershell scripts and Robocopy to manage our servers and keep them up to date, but we won't test you on that afterwards.
How can you define a custom 404 page easily?
IIS has standard support for all the common (and any custom) error pages. The easiest route is to create an HTML page with your 404 content and save it to a folder on your live servers. In the IIS configuration, you then simply locate this page and IIS will do the rest. With more complex ASP.NET MVC sites, you may want to handle your own 404s, bug again that's best left to the developers.
Can you explain how to make friendly (SEO) urls in IIS? What is the process?
Exactly the same way as with any other web server! Name your content well. Avoid special characters, use lower case filenames, be consistent with naming, use slugs where possible to make URLs that search engines will appreciate. If you're writing an application which pulls content from a database, make the URLs as meaningful as possible and avoid varied query strings. All of this is second nature for good SEOs. ASP.NET (or any other flavour) MVC helps even more as the URL routing allows the developers more control of exactly what the URL will look like, rather than having to rely only on filenames. If you use a content management system, make sure that you know how to configure correctly - most popular CMS products have this boiled in these days.
What do you mean by rewriting as opposed to redirecting?
In simple terms, imagine you have two files - a.htm and b.htm.
If you are using rewriting, someone will look at a.htm in a browser (and the address bar will still show http://site.com/a.htm), but you will actually see the content of b.htm.
If you are using the more common redirect method, when someone asks for a.htm with their browser, they will be sent to b.htm in the server. The address bar will now show http://www.site.com/b.htm For SEO purposes, this means that a spider won't be able to tell the difference between a.htm and b.htm if you're rewriting. If you're redirecting, you can give powerful instructions to the spider, such as a 301 response, asserting that the value of the b.htm should now be carried to a.htm. In most cases, you probably only need to worry about using redirects, as these are widely used and understood. Rewriting is often an edge case, or something used by developers for their own nefarious purposes.
Give me a good reason why I should use IIS 7 over previous versions?
IIS7 - now actually IIS7.5 in Windows 2008 R2 - was a huge improvement over IIS6. Native support for gzip compression, the newly integrated ASP.NET pipeline and performance and security improvements should be enough. If not, Windows 2008 is a far nicer Operating System to use than 2003 (now it's been patched a few times), and migrating will make your life a much more pleasant place to be. Do watch out for slow file copying though, a relic of it's Windows Vista brethren.
A few more helpful links
- IIS URL rewriting & ASP routing
- Fixing common SEO problems with the URL rewrite extension
- ASP, NET, MVC and the new IIS 7 rewrite module
- Explore The Web Server For Windows Vista And Beyond
Feel free to follow me on Twitter @dsottimano, don't forget to randomly hug a developer - even if they say they don't like it :)
Source: http://feedproxy.google.com/~r/seomoz/~3/NKCIxD7-T7Y/what-every-seo-should-know-about-iis
seo sem search engine marketing internet marketing search engine
No comments:
Post a Comment