当前位置: 首页 > 网络学院 > 设计教程 > 设计理念 > 面向下一代的URL
For many years we have heard about the impending death of URLs that are difficult to type, remember and preserve.
由于传统URL难于输入,记忆和保存,多年来,人们一直在构想新方式以取代它。
The use of URLs has actually improved little thus far, but changes are afoot in both development practices and Web server technology that should help advance URLs to the next generation.
如今,URL的用处并没有太多的改进;但是,与此同时,对其改进所进行的实践和网络服务器技术方面都在力图改善,从而推进新一代URL的发展进步。
Complex, hard-to-read URLs are often dubbed dirty URLs because they tend to be littered with punctuation and identifiers that are at best irrelevant to the ordinary user. URLs such as http://www.example.com/cgi-bin/gen.pl?id=4&view=basic are commonplace in today's dynamic Web. Unfortunately, dirty URLs have a variety of troubling aspects, including:
复杂的,难以认读的URL常被称作不洁的URL,因为它们混有很多普通用户无法识别的标点和符号。比如:http://www.example.com/cgi-bin/gen.pl?id=4&view=basic,这些URL在如今的动态网络中很普遍。但是,不洁的URL有很多缺陷,其中包括:
Dirty URLs are difficult to type.
不洁的URL不易输入
The length, use of punctuation, and complexity of these URLs makes typos commonplace.
这类URL长度大,又含有字符,非常复杂,这将极易导致用户出现输入错误。
Dirty URLs do not promote usability.
不洁的URLs不能增加实用性
Because dirty URLs are long and complex, they are difficult to repeat or remember and provide few clues for average users as to what a particular resource actually contains or the function it performs.
由于不洁的URL既长又复杂,一般而言,用户很难记忆它们,也没法从中得到有效信息线索,以及推测出这个地址上的具体资源到底是什么内容,有什么功能。
Dirty URLs are a security risk.
不洁的URLs存在安全隐患
The query string which follows the question mark (?) in a dirty URL is often modified by hackers in an attempt to perform a front door attack into a Web application. The very file extensions used in complex URLs such as .asp, .jsp, .pl, and so on also give away valuable information about the implementation of a dynamic Web site that a potential hacker may utilize.
不洁的URL中,问号(?)后面的询问字符串经常被黑客利用修改,黑客企图以此对网络应用程序进行正面攻击。复杂URL中真实的文件扩展名,例如:.asp, .jsp, .pl等等,也会暴露关于动态网站执行的相关信息,这些也可能被黑客用作攻击的手段。
Dirty URLs impede abstraction and maintainability.
清洁URL不利于服务器维护和数据抽取
Because dirty URLs generally expose the technology used (via the file extension) and the parameters used (via the query string), they do not promote abstraction. Instead of hiding such implementation details, dirty URLs expose the underlying "wiring" of a site. As a result, changing from one technology to another is a difficult and painful process filled with the potential for broken links and numerous required redirects.
由于不洁的URLs通常会通过文件的扩展名暴露所使用的网站开发技术,以及通过询问字符串暴露所使用的参数,这会不利于服务器数据的抽取。不洁的URL没有把执行细节隐藏起来,相反的,它暴露了站点的潜在“路径(线索)”。结果导致了技术转型非常困难;并且由于过多无效连接使得过程处理变得非常复杂。
Given the numerous problems with dirty URLs, one might wonder why they are used at all. The most obvious reason is simply convention -- using them has been, and so far still is, an accepted practice in Web development. This fact aside, dirty URLs do have a few real benefits, including:
我们在这里列出了不洁URLs的种种缺陷后,也许有人会问,既然如此为何还要用它们呢?最显然的原因是出于惯例——在网络发展过程中,使用不洁的URL一直是,到目前为止仍然是被人们认同的惯例。除此外,不洁的URLs也确实有它的一些好处,其中包括:
They are portable.
URL携带信息方便
A dirty URL generally contains all the information necessary to reconstruct a particular dynamic query. For example, consider how a query for "web server software" appears in Google -- http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=Web+server+software . Given this URL, you can rerun the query at any time in the future. Though difficult to type, it is easily bookmarked.
不洁的URL通常包含了重建某具体动态查询的全部信息。举个例子,试想在Google网站上,关于“网络服务器软件”的信息查询是怎样显示结果的?是这样:http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=Web+server+software。有了这个URL提供的查询关键字,以后无论何时你都可以重新进行相同查询了。虽然整个URL很难输入,但把它用作查询的“书签”还是挺方便的。
They can discourage unwanted reuse.
不洁的URL可以控制访问权限
The negative aspects of a dirty URL can be regarded as positive when the intent is to discourage the user from typing a URL, remembering it, or saving it as a bookmark. The intimidating look and length of a dirty URL can be a signal to both user and search engine to stay away from a page that is bound to change. This is often simply a welcome side effect, rather than a conscious access control policy -- frequently nothing is done to prevent actual use of the URL by means of session variables or referring URL checks.
当网络主们不希望用户输URL,记住它,或把它保存为“书签”的时候,不洁的URL的消极方面反倒是成了积极方面。不洁的URL颇有威慑力的复杂内容和长度,算是对用户和搜索引擎发出信号:此页面有限制,还是不用它为好。不洁的URL的这种副面效应很简单,但经常备受欢迎。它比人为的访问权限控制措施要好一些:通过使用Session变量,或进行URL检查等方式控制URL的实际使用,总是收效甚微。
Cleaning URLs
简化URL
The disadvantages of dirty URLs far outweigh their advantages in most situations. If the last 30 or 40 years of software development history are any indication of where development for the Web is headed, abstraction and data hiding will inevitably increase as Web sites and applications continue to grow in complexity. Thus, Web developers should work toward cleaner URLs by using the following techniques:
大多数情况下不洁的URL的缺陷要远多于它的优点。如果说软件产业这三四十年来的发展历程,多少都对网络将来的发展趋势有所指向的话,那就是随着网站和应用程序的不断复杂化,服务器的抽取和数据隐藏的需要将不可避免地增加。
Keep them short and sweet.
URL要保持简短和友好。
The first path to better URLs is to design them properly from the start. Try to make the site directories and file names short but meaningful. Obviously, /products is better than /p, but resist the urge to get too descriptive. Having www.xyz.com/productcatalog doesn't add much meaning (if a user looks for a product catalog, they might well expect to find it at or near the top-level products page), but it does needlessly restrict what the page can reasonably contain in the future. It's also harder to remember or guess at. Shoot for the shortest identifiers consistent with a general description of the page's (or directory's) contents or function.
URL升级的第一步,是从头开始的合理设计。站点目录和文件名要保持简短且有意义。显然,/products比/p要好,但注意不要描述得过于详尽。例如www.xyz.com/productcatalog,它就没有多大意义(如果用户想找一份产品目录,他们很可能期望找到关于产品总目录的页面),反而对于将来页面可包含的内容,造成了不必要的内容限制。况且名称太长,更不利于记忆和推测。URL的标符要简短,和关于页面(或网页目录)的内容,功能的总体描述要一致。
Avoid punctuation in file names.
文件名避免包含标点符号
Often designers use names like product_spec_sheet.html or product-spec-sheet.html. The underscore is often difficult to notice and type, and these connectors are usually a sign of a carelessly designed site structure. They are only required because the last rule wasn't followed.
设计者经常会用“product_spec_sheet.html”或“product-spec-sheet.html.”这样的文件名。下划线容易被人忽略,也不易输入,而且连接符也常被看作是网站结构设计粗糙的象征。只有迫不得已时才能使用它们。
Use lower case and try to address case sensitivity issues.
尽量使用小写,并处理好有关大小写区分的事项
Given the last tip, you might instead name a file ProductSpecSheet.html. However, casing in URLs is troubling because depending on the Web server's operating system, file names and directories may or may not be case sensitive. For example, http://ww.xyz.com/Products.html and http://www.xyz.com/products.html are two different files on a UNIX system but the same file on a Windows system. Add to this the fact that www.xyz.com and WWW.XYZ.COM are always the same domain, and the potential for confusion becomes apparent. The best solution is to make all file and directory names lowercase by default and, in a case sensitive server operating environment, to ensure that URLs will be correctly processed no matter what casing is used. This is not easy to do under Apache on Unix/Linux systems (related info), although URL rewriting and spellchecking can help (discussed below).
在明白了前面一点后,你也许会重新命名一个文件名为:ProductSpecSheet.html。然而,URL中的大小写区分是非常麻烦的,因为网络服务器的操作系统不同,它们对文件名和目录的大小写区分要求也不同。例如,基于UNIX系统, http://ww.xyz.com/Products.html和http://www.xyz.com/products.html所指的两份文件是不同的,但是基于Windows系统,它们却是相同的。此外,www.xyz.com 和 WWW.XYZ.COM这两个域名也是相同的,由此产生混乱的可能也变得明显了。最佳解决方法是:文件名和目录都默认为小写;另外,在服务器系统是区分大小写的情况下,要保证无论输入URL的大小写状况,它们都能被正确地处理。即使URL的重写和拼写检查措施能有所帮助,在Unix或Linux系统的Apache[阿帕奇]状态下,遇到上述情况URL还是不能保证被正确处理。
Do not expose technology via directory names.
目录名不要暴露所使用的建站技术。
Directory names commonly or easily associated with a given server-side technology unnecessarily disclose implementation details and discourage permanent URLs. More generic paths should be used. For example, instead of /cgi-bin, use a /scripts directory, instead of /css, use /styles, instead of /javascript, use /scripts, and so on.
通常目录名会和服务器相关技术有联系,这可能导致服务器执行细节的泄露,以及URL无法持久有效。尽量多用普通路径。例如,“/cgi-bin”可替换成“/scripts directory”,“/css, use /styles”和“instead of /javascript”可替换成“/scripts”,等等。
Plan for host name typos.
主机产生错误域名typo的有关准备
The reality of end user navigation is that around half of all site traffic is from direct type or bookmarked access. If users want to go to Amazon's web site, they know to type in www.amazon.com. However, accidentally typing ww.amazon.com or wwww.amazon.com is fairly easy if a user is in a hurry. Adding a few entries to a site's domain name service to map w, ww, and wwww to the main site, as well as the common www.site.com and site.com, is well worth the few minutes required to set them up.
用户导航走向末路的现实原因是:所有网站有近一半的通信量,是由于直接访问和书签式访问获得的。如果用户想浏览Amazon的网站,他们会在地址栏输入www.amazon.com。然而,如果用户匆忙,也可能会凑巧输成ww.amazon.com 或 wwww.amazon.com。针对上述情况,在涉及有关站点域名的工作方面,我们可以多准备一些typo域名(与正确域名相近,但错误的域名),这样用户在输入w,ww或wwww的情况下,依然能链接到主页,另外常见的还有www.site.com和site.com,这些工作几分钟就能完成,意义却不可低估。
Plan for domain name typos.
域名typo的相关准备
If possible, secure common "fat finger" typos of domain names. Given the proximity of the "z" and "x" keys on a standard computer QWERTY keyboard, it is no wonder Amazon also has contingency domains like amaxon.com. Google allows for such variations as gooogle.com and gogle.com. Unfortunately, many Web traffic aggregators will purchase the typo domains for common sites, but most organizations should find some of their typo domains readily available. Organizations with names that are difficult to spell, like "Ximed," might want to have related domains like "Zimed" or "Zymed" for users who know the name of the organization but not the correct spelling. The particular domains needed for a company should reveal themselves during the course of regular offline correspondence with customers.
如果可能的话,要照顾到用户输入域名时普遍存在的“粗手指失误”。键盘上按键排列紧密,用户可能把“z”键错按到它周围的“x”,”s”或 ”a”键,因此,Amazon网站会有像amaxon.com这样的Typo域名也就不难理解了。而Google网站还准备了像gooogle.com 和gogle.com.这样的域名typo。不妙的是,很多网络投机者会抢注关于大网站的域名typo,但无论怎样,组织机构都该为其网站注册好一些容易产生的相关域名typo,这样用户更有机会找到他们的网站。有的组织机构名称不易拼写,比如"Ximed,"那么它应该注册像"Zimed" 或"Zymed" 的相关域名,以方便那些知道组织名称,但不会正确拼写的用户 。公司如果有特殊的域名,应该在常规联系时及时告诉客户。
Support multiple domain forms.
支持多重域名
If an organization has many forms to its name, such as International Business Machines and IBM, it is wise to register both forms. Some companies will register their legal form as well, so XYZ, LLC or ABC, Inc. might register xyzllc.com and abcinc.com as well as primary domains. While it seems like a significant investment, if you use one of the new breed of low-cost registrars (like itsyourdomain.com), the price per year for numerous domains for a site is quite reasonable. Given alternate domain extensions like .net, .org, .biz and so on, the question begs -- where to stop? Anecdotally, the benefits are significantly reduced with new alternate domain forms (like .biz, .cc, and so on), so it is better to stick with the common domain form (.com) and any regional domains that are appropriate (e.g. co.uk).
如果组织名称有很多不同形式,比如International Business Machines [国际商用机器] 和IBM,那么组织最好将两个域名都进行注册。一些公司会注册很多合法的域名,比如假设有XYZ, LLC或 ABC公司,他们除注册主要域名外,还可能会注册像xyzllc.com 和abcinc.com这样的域名。这样看起来似乎投资巨大,但是如果你找花费低廉的新注册点(比如itsyourdomain.com)注册,每年为站点大量域名的花费就显得可以接受了。域名后缀有net, .org, .biz等等可供选择,应该选择哪个呢?使用新域名后缀可能效果会不太好(比如biz, .cc),最好使用普通的域名后缀 (如.com),或者表示所处地域的域名后缀(比如co.uk)
Add guessable entry point URLs.
建立多重URL
Since users guess domain names, it is not a stretch for users -- particularly power users -- to guess directory paths in URLs. For example, a user trying to find information about Microsoft Word might type http://www.microsoft.com/word. Mapping multiple URLs to common guessable site entry points is fairly easy to do. Many sites have already begun to create a variety of synonym URLs for sections. For example, to access the careers section of the site, the canonical URL might be http://www.xyz.com/careers. However, adding in URLs like http://www.xyz.com/career, http://www.xyz.com/jobs, or http://www.xyz.com/hr is easy and vastly improves the chances that the user will hit the target. You could even go so far as to add hostname remapping so that http://investor.xyz.com, http://ir.xyz.com, http://investors.xyz.com, and so on all go to http://www.xyz.com/invvestor. The effort made to think about URLs in this fashion not only improves their usability, but should also promote long term maintainability by encouraging the modularization of site information.
因为用户寻找网站时会推测网站域名,所以用户特别是高级用户,会有可能推测URL中的目录路径。比如说,想了解Microsoft Word相关信息的用户,可能会输入http://www.microsoft.com/word。对于站点各个入口来说,建立多个URL是非常容易的。很多网站已经开始这么做了。举个例子,访问一个网站有关就业的部分,规范的URL可能是http://www.xyz.com/careers。然而,如果建立起别的URL,例如http://www.xyz.com/career, http://www.xyz.com/jobs, 或 http://www.xyz.com/hr,那么输入这些URL的用户也可以找到相同目的地了。你还可以添加像http://investor.xyz.com,http://ir.xyz.com,http://investors.xyz.com这样的域名,因而用户输入这些域名时能进入http://www.xyz.com/invvestor。这些对相关URL的考虑不仅增强了URL的实用性,还通过支持网站信息资源的模块化,促进了站点的长期维护。
Where possible, remove query strings by pre-generating dynamic pages.
尽可能的删除动态页所产生的查询字符串
Often, complex URLs like http://www.xyz.com/press/releasedetail.asp?pressid=5 result from an inappropriate use of dynamic pages. Many developers use server-side scripting technologies like ASP/ASP.NET, ColdFusion, PHP, and so on to generate "dynamic" pages which are actually static. For example in the previous URL, the ASP script drills press release content out of a database using a primary key of 5 and generates a page. However, in nearly all cases, this type of page is static both in content and presentation. The generation of the page dynamically at user view time wastes precious server resources, slows the page down, and adds unnecessary complexity to the URL. Some dynamic caches and content distribution networks will alleviate the performance penalty here, but the unnecessarily complex URLs remain. It is easy to directly pre-generate a page to its static form and clean its URL. Thus, http://www.xyz.com/press/releasedetail.asp?pressid=5 might become www.xyz.com/press/pressrelease5 or something much more descriptive like http://www.xyz.com/press/03-02-2003 -- or even better like http://www.xyz.com/press/newproduct. The issue of when to generate a page, either at request time or beforehand, is not much different than the question of whether a program should be interpreted or compiled.
一些复杂的URL,比如http://www.xyz.com/press/releasedetail.asp?pressid=5的产生,经常是由于动态页面的不合理使用造成。很多开发技术员会使用相关服务器脚本技术,比如ASP/ASP.NET, ColdFusion, PHP等等,看上去生成的是动态页面,其实它是静态的。如上述URL中,在ASP脚本形式下,“press”显示为“5”,数据库中的相关数据内容会被调动,生成页面。但是,几乎所有情况下,这种类型的页面无论在内容还是表达上都是静态的。在用户看来,动态地发生页面浪费服务器宝贵资源,减慢页面生成速度,不必要地增加URL复杂程度。一些动态的高速缓冲存储器和内容配置网络可以减轻这些影响,但URL还是照旧复杂。通过直接预先发生页面的静态形式,从而使URL得到简化,这并不困难。因而,http://www.xyz.com/press/releasedetail.asp?pressid=5 可能会简化为www.xyz.com/press/pressrelease5 。即使像描述性较强的http://www.xyz.com/press/03-02-2003 ,也比http://www.xyz.com/press/newproduct. 要好。至于何时生成页面,是预先生成还是等用户发出请求时生成。这个问题好比是在问,程序是应该口头阐释还是应该直接编译。
Rewrite query strings.
重写查询字符串
usually remaps the ?, &, and + symbols in a URL to more readily typeable characters. Thus, a URL like http://www.xyz.com/presssearch.asp?key=New+Robot&year=2003&view=print might become something like http://www.xyz.com/pressearch.asp/key/New-Robot/year/2003/view/print. While this makes the page "look" static, it is indeed still dynamic. The look of the URL is a little less intimidating to users and may be more search engine friendly as well (search engines have been known to halt at the ? character). In conjunction with the next tip, this might even discourage URL parameter manipulation by potential site hackers who can't tell the difference between a dynamic page and a static one. The challenge with URL rewriting is that it takes some significant planning to do well, and the primary tools used for these purposes -- rule-based URL rewriters like mod_rewrite for Apache and ISAPI Rewrite for IIS -- have daunting rule syntax for developers unseasoned in the use of regular expressions. However, the effort to learn how to use these tools properly is well worth it.
在页面必须是动态的情况下,也可以简化URL的查询字符串。容易的简化常常是把“?”,“ &”, 和 “+”等符号换成更易于输入的字符。因而,http://www.xyz.com/presssearch.asp?key=New+Robot&year=2003&view=print也许可以简化成http://www.xyz.com/pressearch.asp/key/New-Robot/year/2003/view/print。 这样虽然从URL看,页面是静态的,但实际却是动态。用户看到这样的URL也会觉得更顺眼,搜索引擎也会觉得它更“友好”(我们知道搜索引擎在碰到“?”字符时会中断工作)。和下一点提到的相同,由于黑客无法利用“?”区分静态和动态页面,他们就很难通过处理URL相关参数试图实施攻击了。重写URL是带有挑战的,这需要充足准备。重写可利用的主要工具有mod_rewrite for Apache 和 ISAPI Rewrite for IIS,但在开发设计者们使用常规表达时,这些工具的语法规则实在令人生畏。然而,为了学会使用这些工具所付出的努力还是非常有回报的。
Remove extensions from files in URL and source.
清除URL和服务器资源中的文件扩展名
Probably the most interesting URL improvement that can be made involves the concept of content negotiation。 Despite being a long-supported HTTP specification,content negotiation is rarely used on the Web today。The basic idea of content negotiation is that the browser transmits information about the resources it wants or can accept (MIME types preferred, language used, character encodings supported, etc.) to the server, and this information is then used, along with server configuration choices, to dynamically determine the actual content and format that should be transmitted back to the browser. Metaphorically, the browser and the server hold a negotiation over which of the available representations of a given resource is the best one to deliver, given the preferences of each side. What this means is that a user can request a URL like http://www.xyz.com/products, and the language of the content returned can be determined automatically -- resulting in the content being delivered from either a file like products-en.html for English speaking users or one like products-es.html for Spanish speakers. Technology choices such as file format (PNG or GIF, xhtml or HTML) can also be determined via content negotiation, allowing a site to support a range of browser capabilities in a manner transparent to the end user.
或许在URL改善方面,最有趣的算是“内容谈判”了。Despite being a long-supported HTTP specification,即使如今网络中“内容谈判”用得非常少。“内容谈判”的基本思想是:浏览器向服务器传输信息,告诉服务器它想要哪些资源,能接受哪些资源(所希望的MIME的类型,所用语言,所支持的代码,等等),服务器会处理信息,根据它的配置选择,动态地决定向浏览器返回何种内容和形式的信息资源。这就好比浏览器和服务器在互相进行谈判,考虑双方共同利益的情况下,讨论传送怎样的有用资源,怎样传送才是最佳的。这意味着如果用户输入了一个URL:http://www.xyz.com/products,返回内容的语言种类会被自动认定——结果就是:对于英语用户来说,返回内容的文件来源会是products-en.html,而对于西班牙语用户,则是products-es.html。相关技术选择例如文件格式(PNG或 GIF, xhtml或HTML),也可以通过内容谈判决定,这样,站点可以满足终端用户浏览器的大部分要求了。
Content negotiation not only allows developers to present alternate representations of content but has a significant side effect of allowing URLs to be completely abstract. For example, a URL like http://www.xyz.com/products/robot, where robot is not a directory but an actual file, is completely legal when content negotiation is employed. The actual file used, be it robot.html, robot.cfm, robot.asp, etc., is determined using the negotiation rules. Abstracting away from the file extension details has two significant benefits. First, security is significantly improved as potential hackers can't immediately identify the Web site's underlying technology. Second, by abstracting the extension from the URL, the technology can be changed by the developer at will. If you consider URLs to be effectively function calls to a Web application, cleaned URLs introduce the very basics of data hiding.
“内容谈判”不仅使开发者可以准备许多可供选择的资源内容,也对URL的完全绝对性有显著的影响。比如, “内容谈判”存在的情况下,http://www.xyz.com/products/robot,此URL是合法的,其中“robot”不指目录,而指文件。而实际所用的文件可以是robot.html,robot.cfm,robot.asp等等,这就要根据“内容谈判”相关规则而定了。除去文件扩展名有两大好处。首先,安全性得到显著提高,黑客无法立即通过扩展名识别出网站的背后技术。其次,若URL中除去扩展名,开发者可以方便地转换所用技术,不会受到原有的限制。如果你认为URL是对网络应用程序非常有效的访问途径,那么简化的URL是隐藏数据的基础。
URLs can be cleaned server-side using a Web server extension that implements content negotiation, such as mod_negotiation for Apache or PageXchanger for IIS. However, getting a filter that can do the content negotiation is only half of the job. The underlying URLs present in HTML or other files must have their file extensions removed in order to realize the abstraction and security benefits of content negotiation. Removing the file extensions in source code is easy enough using search and replace in a Web editor like Dreamweaver MX or HomeSite. Some tools like w3Compiler also are being developed to improve page preparation for negotiation and transmission. One word of assurance: don't jump to the conclusion that your files won't be named page.html anymore. Remember that, on your server, the precious extensions are safe and sound. Content negotiation only means that the extensions disappear from source code, markup, and typed URLs.
利用执行“内容谈判”的Web服务器扩展,URL可以在服务器端被简化,类似服务器扩展有mod_negotiation for Apache 或 PageXchanger for IIS。但是,找到执行“内容谈判”的筛选还只是完成了一半工作。HTML文件和其他文件中隐藏的URL也要清除文件扩展名,以实现“内容谈判”带来的抽取和安全性方面的好处。要简单地清除源代码中的文件扩展名,可以使用Dreamweaver MX 或 HomeSite网页编辑软件中的“search(查找)”“replace(替代)”功能。像w3Compiler的一些工具也正不断改进,可以为“内容谈判”和对相关页面改善信息传输作准备。有一点要提醒:不要以为消除之后,再也不会有像page.html这样的文件名出现了。要记住,在你的服务器上,扩展名还是安全的保留下来的,“内容谈判”的执行只意味着把源代码,标记,和输入URL中的扩展名清除了。
Automatically spell check directory and file names entered by users.
URL中的目录和文件名的自动拼写检查
The last tip is probably the least useful, but it is the easiest to do: spell check your file and directory names. On the off chance that a user spells a file name wrong, makes a typo in extension or path, or encounters a broken link, recovery is easy enough with a spelling check. Given that the typo will start to generate a 404 in the server, a spelling module can jump in and try to match the file or directory name most likely typed. If file and directory names are relatively unique in a site, this last ditch effort can match correctly for numerous typos. If not, you get the 404 as expected. Creating simple "Did you mean X?"-style URLs requires the simple installation of a server filter like mod_speling for Apache or URLSpellCheck for IIS. The performance hit is not an issue, given that the correction filter is only called upon a 404 error, and it is better to result in a proper page than serve a 404 to save a minor amount of performance on your error page delivery. In short, there is no reason this shouldn't be done, and it is surprising that this feature is not built-in to all modern Web servers.
最后一点作用最小,但它最易于做到:对文件和目录名称进行拼写检查。用户输入的URL在文件名,扩展名,或路径上的拼写错误,断开的连接,这些都可以容易地被拼写检查自动修改恢复。比如若给出“404”,服务器的拼写模块会立即起作用,把与“404”最相近的文件名或目录名调用出来,以便随后修复。如果网站的文件名和目录名相对独立,那么针对大量的Typo 名称,上述匹配修复会很准确。如果不是相对独立,则得到的仍然是“404”。要建立如"Did you mean X?"这类简单的URL,须要安装简易的服务器筛选装置,像mod_speling for Apache 或 URLSpellCheck for IIS。拼写检查的设置并不是个大问题,因为修正筛选装置处理的只是类似“404”这样的小错误,而且发送匹配修正后的页面,总比一直向用户发送错误报告页面要好得多。简而言之,没有理由不设置拼写检查,但令人惊讶的是,现行的所有网络服务器竟然都没有这项设置。
Most of the tips presented here are fairly straightforward, with the partial exception of URL cleaning and rewriting. All of them can be accomplished with a reasonable amount of effort. The result of this effort should be cleaned URLs that are short, understandable, permanent, and devoid of implementation details. This should significantly improve the usability, maintainability and security of a Web site. The potential objections that developers and administrators might have against next generation URLs will probably have to do with any performance problems they might encounter using server filters to implement them or issues involving search engine compatibility. As to the former, many of the required technologies are quite mature in the Apache world, and their newer IIS equivalents are usually explicitly modeled on the Apache exemplars, so that bodes well. As to the search engine concerns, fortunately, Google so far has not shown any issue at all with cleaned URLs. At this point, the main thing standing in the way of the adoption of next generation URLs is the simple fact that so few developers know they are possible, while some who do are too comfortable with the status quo to explore them in earnest. This is a pity, because while these improved URLs may not be the mythical URN-style keyword always promised to be just around the corner, they can substantially improve the Web experience for both users and developers alike in the long run.
上述的大多数要点都浅显易懂,除了部分关于URL简化和重写的内容。这些建议都值得大家努力尝试一下。如果照这些建议做了,那么简化后的URL应该会简短,易懂,持久,且不泄露执行细节。如此简化的URL可以显著改善网站的实用性,可维护性和安全性。将来网络开发者和管理者可能拒绝新一代URL的理由有两个:服务器筛选在对新URL执行操作时,可能有问题出现;新URL与搜索引擎会涉及兼容性问题。关于前者,在Apache服务器上,已经出现了相当成熟的解决技术,他们的新型IIS equivalents很明显是模仿Apache exemplars的,所以前个问题解决是比较可观的。至于搜索引擎,较为幸运的是,Google对新URL并没反应出什么问题。因此可以说,新URL广泛使用的主要瓶颈在于,了解新URL有发展潜力的开发者为数太少,即使那些了解的开发者,也过于安于现状,没有热诚去探索研究新URL。这令人惋惜,因为虽然这些改良的新URL不能在短期内有所作为,但在长期过程中,必定会充分地改变用户和网络开发者的web体验。