[Previous] [Contents] [Next]

Uniform Resource Locators

Uniform resource locators-more commonly known as URLs-are used as the primary naming and addressing method of the Web. URLs belong to the larger class of uniform resource identifiers; both identify resources, but URLs include specific host details that allow connection to a server that holds the resource.

A URL can be broken into three basic parts: the protocol identifier; the host and service identifier; and a resource identifier, a path with optional parameters and an optional query that identifies the resource. The following example shows a URL that identifies an HTTP resource:

http://host_domain_name:8080/absolute_path?query

The HTTP standard doesn't place any limit on the length of a URL, however, some older browsers and proxy servers do. The structure of a URL is formally described by RFC-2396: Uniform Resource Identifiers (URI): Generic Syntax.

Protocol

The first part of the URL identifies the application protocol. HTTP URLs start with the familiar http://. Other applications that use URLs to locate resources identify different protocols; for example, URLs used with the File Transfer Protocol (FTP) begin with ftp://. URLs that identify HTTP resources served over connections that are encrypted using the Secure Sockets Layer start with https://. We discussed the use of the Secure Sockets Layer to protect data transmitted over the Internet in Tutorial 9.

Host and service identification

The next part of the HTTP URL identifies the host on which the web server is running, and the port on which the server listens for HTTP requests. The domain name or the IP address can identify the host component. Using the domain name allows user-friendly web addresses such as:

http://www.w3.org/Protocols/

The equivalent URL using the IP address is more difficult to remember:

http://18.29.1.35/Protocols/

Nonstandard TCP ports

By convention, servers running well-known Internet applications use standard, well-known TCP port numbers. By default, a HTTP server listens for requests on port 80, an FTP server listens on port 21, and so on. The port number can be omitted from a URL if the well-known port is used. Clients-such as web browsers-determine which well-known port to connect to by the protocol indicated in the URL. For example, requests for the URL http://www.ora.com are made to the host machine www.ora.com on port 80. When a nonstandard port is used, the URL must include the port number so the browser can successfully connect to the service. For example, the URL http://www.example.com:8080 connects to the web server running on port 8080 on the host www.example.com.

[Previous] [Contents] [Next]