Chapter 3 How Web Browsers Work

Web Browser Applications
Uniform Resource Locators
How Web Browsers Access HTML Documents
Summary
Review Questions
Review Exercises

HTML codes are written specifically for display in browser applications designed for the World Wide Web. Unlike some other document formats or specifications, this is the only application for HTML coding. So it's important to get to know these browsers.

In this chapter, you'll be learning about some popular Web browser applications, how Web browsers interact with Web servers, and how browsers interact with the other Internet services that are available to them.

Web Browser Applications

All Web browsers are capable of certain basic tasks, like finding and loading new Web pages, and displaying them following HTML standards and conventions. There's enough freedom in HTML and the Web standards in general, though, that each Web browser ends up being slightly unique.

As you look at these browsers, I'd like to make one point clear: although most of them display HTML documents in a particular way, each browser application actually has quirks or features that you should keep in mind while you're creating your documents.

Note

This book cannot provide an exhaustive survey of the Web browsers available. It is fair to say that I'm covering about 90 percent of the current market, but you should recognize that there are other browsers being used to access HTML pages.

NCSA Mosaic

Originally released by the National Center for Supercomputing Applications (NCSA) in 1993, Mosaic was the first widely available graphical browser for Web users (see fig. 3.1). It is currently written for Windows, Windows 95, Macintosh, and various UNIX platforms. It is also the basis of a number of other browsers on the market-most notably those created and licensed by SpyGlass Corp.

Figure 3.1 : NCSA Mosaic for Windows 95.

Although definitely in widespread use, the Mosaic family of browsers is nowhere near the most popular of Web browsers, losing by a significant share of the market to Netscape Navigator. Mosaic has its merits, though, especially as a straight HTML standards-based Web browser known for being relatively well-programmed and effective.

One of the most compelling reasons to use NCSA Mosaic might just be that some versions are free to academic and nonprofit organizations and individuals. It can be downloaded from http://www.ncsa.uiuc.edu/SDG/Software/SDGSoftDir.html or by FTP at ftp://ftp.ncsa.uiuc.edu/.

Netscape Navigator

Easily the most popular Web broswer currently available, Netscape Navigator (often simply referred to as Netscape) made a splash on the Internet in 1995 with its totally free first version of the application. Created in part by programmers who had worked on the original NCSA project, Netscape became quickly known as the finest second-generation Web browser, noted for both its flexibility and speed gains over Mosaic-especially for modem connections.

Another reason for Netscape's popularity is its ability to accept plug-ins, or helper applications, that actually extend the abilities of the Netscape Navigator browser window. Netscape users who have the Macromedia Shockwave plug-in, for instance, can view Macromedia presentation files that are embedded within HTML documents in Navigator's window (instead of loading a separate helper application).

Netscape is also available for Windows, Mac, and UNIX users and is available free to certain qualifying (nonprofit and academic) users (see fig. 3.2). It can be downloaded on the Web from http://home.netscape.com/comprod/mirror/client_download.html or by FTP at ftp.netscape.com.

Figure 3.2 : Netscape Navigator for Macintosh.

When introduced, Netscape's main advantages were speed and the ability to display more graphics formats than Mosaic. Since that time, however, Netscape has introduced security features and other technologies (like a built-in e-mail program and built-in UseNet newsreader) that continue to set it apart from other browsers.

Another advantage is the support of Java applets and JavaScript authoring within Netscape itself. Again, Java applets can be embedded in the Netscape browser window, allowing the user access to truly dynamic pages that can be an interface for anything from simple games to stock quotes to bank-by-computer information. JavaScript gives Web designers programmatic control over their pages, allowing them to check HTML form entries, load different pages based on user input, and much more.

Perhaps most significant to HTML writers, however, is yet another addition that Netscape offers beyond Mosaic-Netscape HTML extensions. These are extra HTML-like elements that Netscape can recognize in Web pages. Although a good deal of debate has raged about whether or not this is ultimately a good thing for the Web (see sidebar), it remains a fact that a Web site can be designed in such a way that although most browsers can display the page's basic text and graphics, it is best viewed in Netscape Navigator.

Why is this? Netscape adds many HTML elements that offer more control over the layout of a page than the HTML standard allows. This includes such features as centering text and graphics, wrapping text around figures, and adding tables to Web pages. These elements are not found in HTML 2.0, although their popularity on the Web has caused many of them to be incorporated into HTML 3.0 level standards.

Are Netscape HTML Commands Good for the Web?

When Netscape first introduced its extensions to HTML, two strong reactions came from opposite sides of the playing field. Experienced HTML designers-especially those interested in more control over the pages-said, "Cool." Defenders of the original HTML, however, were not as pleased.

Why would you be against HTML extensions? Because using them leaves a large percentage of Web users out in the cold. If people begin to write their Web pages using Netscape HTML extensions, suddenly at least 40 percent of the Web's users will see a less-than-ideal version of the site.

Clearly, adding the extensions was shrewd marketing on Netscape's part. After all, if you want to see the best layouts on the Web, all you have to do is get a copy of Netscape.

But for some users, like those using NCSA Mosaic, the America Online Web browser, or some other popular Web application, they're just out of luck. The extension won't display correctly in their browsers and, in some cases, will cause errors.

Purists will point to the Netscape HTML extension as going against the spirit of HTML. HTML is supposed to offer less control over a page, so that it can be platform- and application-independent. Netscape HTML, by definition, flies in the face of this spirit.

Fortunately for everyone, new HTML 3.0 level standards are emerging that support many of the Netscape HTML commands in a more "official" way. That means the best of both worlds-layout features and total compatibility-as more browsers come to support HTML 3.0 level additions.

In the meantime, will Netscape strike again with some other innovation? Don't be too surprised if it does.

Microsoft Internet Explorer

Recently released for free to the general public is the Internet Explorer, a Web browser created by Microsoft Corp (see fig. 3.3). Loosely based on the Mosaic technology, Internet Explorer is a reasonably well-featured browser with decent speed for modem users. Microsoft's browser is available for Windows 95, Windows 3.1, and Macintosh platforms. It can be found on the Web at http://www.microsoft.com/IE/ or by FTP at ftp.microsoft.com.

Figure 3.3 : Microsoft Internet Explorer for Windows 95.

Like Netscape, Internet Explorer also incorporates elements that are not compliant with the generally accepted HTML standard. Again, these codes are geared more toward page layout than is the HTML standard. More and more often, sites on the Web are recommending that you use Internet Explorer to view the site because it uses the nonstandard HTML elements recognized by Internet Explorer.

Lynx

Lynx and similar browsers are a little different from the others discussed so far, because they lack the ability to display graphics. It may be surprising that people still rely on text-based browsers to access the Web, but it remains true that not everyone has a high-speed connection to the Internet. In fact, many users don't even have a graphical operating system (such as Windows, Mac OS, or OS/2) for their computer.

Lynx was originally written for the UNIX platform. In fact, it is the browser used by most service providers for text-based accounts. There is also an MS-DOS version that offers users browsing capabilities in a text-only format (see fig. 3.4).

Figure 3.4 : The Lynx browser through a text-only UNIX account.

Special considerations must go into your HTML documents if they're going to support text-based browsers like Lynx. Fortunately, as you'll see in the HTML formatting chapters, the HTML 2.0 and 3.0 standards are heavily in favor of text-based browsers-in the spirit of not leaving anyone out.

The individual HTML designer must be wary, though, especially when designing highly graphical Web sites and interfaces. Something that you should constantly ask yourself while creating a Web site is: Am I leaving out my text-based viewers? Is there anyone out there who can't get the full effect of what I'm communicating because they can't see the graphics?

Inevitably, that will indeed be the case-but a good HTML designer works to minimize that possibility.

Tip

Many considerate Web designers go so far as to create two or more versions of their Web site-one for graphical browsers, and one that offers only text.

Uniform Resource Locators

Now that you've looked at the various different Web browsers that might be accessing your Web site, let's talk about something they all have in common: the use of Uniform Resource Locators (URLs). What's an URL? If you remember our discussion from the last chapter, you may recall that I mentioned that most Internet services have "addresses" for accessing information within that service.

Tip

Not everyone follows this convention, but this book is written in such a way that it will be easier to read if you pronounce "URL" as you would the name "Earl."

Each of these addresses is a bit different. For instance, you would send an e-mail message to my America Online account using tstauffer@aol.com in an e-mail application.

To acccess the AOL public FTP site, on the other hand, you would enter ftp.aol.com in the FTP application you are using.

The World Wide Web also has its own addressing scheme, but it's slightly more advanced than the schemes of its predecessors. Not only is the Web newer, but its addresses have to be more sophisticated because of the Web's unique ability to access all of the different Internet services.

URLs are these special addresses. They follow a format like this:

protocol://host.domain.first-level domain/path/filename.ext

protocol:host.domain.first-level domain

An example of an URL to access a Web document would be http://www.microsoft.com/windows/index.html.

Let's look at that address carefully. According to the format for an URL, then, http:// would be the protocol, www is the host you're accessing, microsoft is the domain, and com is the first-level domain type for this system. That's followed by / to suggest that a path statement is coming next.

The path statement tells you that you're looking at the document index.html, located in the directory windows.

Note

Those of you familiar with DOS, Windows, or UNIX will probably recognize path statements right away. Mac OS users and others simply need to realize that a path statement offers a "path" to a specific file on the server computer's hard drive. A Web browser needs to know in exactly which directories and subdirectories (folders and subfolders) a file can be found, so a path statement is a standard part of any URL.

There are two basic advantages of the URL. First, it allows you to explicitly indicate the type of Internet service involved. HTTP, for instance, indicates the HyperText Transfer Protocol-the basic protocol for transferring Web documents. You'll look at this part of the URL in a moment.

Secondly, the URL system of addressing makes every single document, program, and file on the Internet a separately addressable entity. Why is this useful?

Example: The URL Advantage

For this example, all you need to do is load your Web browser (whichever you happen to use) and find the text box or similar interface element that allows you to enter an URL manually to access Web pages (see fig. 3.5). The point of this example is to show the benefits of using URLs for the Web. With Gopher and FTP, you really only need to know a host address. But, on the Web, knowing just the host address often isn't enough.

Figure 3.5 : The Go To/Location text box in Netscape for Windows allows you to enter an URL manually.

Once you've located the appropriate entry box, enter www.mcp.com. Depending on the browser you're using, you'll more than likely need to hit the Enter or Return key after typing this address.

What happens then depends on your Web browser. Some browsers will give an error, which isn't exactly perfect for this example, but it does prove the point that you need more than just a server address to get around on the Web. Others will take you directly to the Macmillan Computer Publishing Web site.

Tip

If your browser gives you an error, enter http://www.mcp.com. Some browsers require at least a partial URL. Others guess the protocol from the type of server address entered.

Notice that www.mcp.com follows the addressing conventions established for Internet services like FTP and Gopher. The problem is that, if the Web used this method for addresses, you'd have to begin at the first page of the Web site every time you wanted to access one of the hundreds of pages available from Macmillan.

To get around that, an URL provides your Web browser with more information.

All Web browsers should easily handle this address. With an URL, you're able to be much more specific about the document you want to see, since every document on the Internet has an individual address. In this case, you've instructed your Web browser to go directly to the que directory on Macmillan's Web site and load the HTML document called index.html.

The Different Protocols for URLs

You've already looked at Internet addresses such as www.mcp.com in depth, and you should be familiar with the concept of a path statement. That just leaves one part of an URL that's new to you: the protocol.

I've already mentioned that HTTP is the protocol most often used by Web browsers to access HTML pages. Table 3.1 shows some of the other protocols that can be part of an URL.

Table 3.1 Possible Protocols for an URL

Protocol	Accesses…
http://	HTML documents
https://	Some "secure" HTML documents
file://	HTML documents on your hard drive
ftp://	FTP sites and files
gopher://	Gopher menus and documents
news://	UseNet newsgroups on a particular news server
news:	UseNet newsgroups
mailto:	E-mail messages
telnet:	Remote Telnet (login) session

By entering one of these protocols, followed by an Internet server address and a path statement, you can access nearly any document, directory, file, or program available on the Internet or on your own hard drive.

Note

The mailto:, news:, and telnet: protocols have slightly different requirements to create an URL. mailto: is followed by a simple e-mail address, news: is followed by just the newsgroup name, and telnet: is followed by just a server address. Also notice that file:// is often slightly different for different browsers.

Example: Accessing Other Internet Services with URLs

Over time, applications designed to access non-Web Internet services (like FTP or Gopher programs) will begin to use the URL system more and more. For now though, as a rule, basically only Web browsers use URLs.

Fortunately, by simply changing the protocol of a particular URL, you can access most Internet services directly from your browser. For this example, you'll need to load your Web browser once more and enter ftp://ftp.cdrom.com/pub/win95/demos/.

This should result in a listing of the subdirectory demos located on the FTP server ftp.cdrom.com. Notice that you didn't enter a document name, because, if you're using the FTP protocol, the document or file will be automatically downloaded.

Tip

If your browser tells you that there are too many users presently connected for you to connect to this FTP site, wait a moment or two, then click your Reload button or otherwise reload this URL with your browser.

Not all browsers support the mailto: command-let's see if yours does. In your browser's URL window, type mailto:tstauffer@aol.com and hit Enter or Return if necessary.

If your browser supports the mailto: protocol command, you should be presented with a new window, complete with my e-mail address in the Mail To field (see fig. 3.6).

Figure 3.6 : A mailto: protocol URL in action.

How Web Browsers Access HTML Documents

When you enter an URL in the URL field on your browser, the browser goes through the following three basic steps:

The browser determines what protocol to use.
It looks up and contacts the server at the address specified.
The browser requests the specific document (including its path statement) from the server computer.

Using all of this information, your browser was able to access the variety of Internet services discussed previously in Table 3.1 and in the subsequent example. But what does this have to do with HTML design? Just about everything.

In HTML, a hypertext link is simply a clickable URL. Every time you create a link in a Web document, you assign an URL to that link. When that link is clicked by a user, the URL is fed to the browser, which then goes through the procedure outlined above to try and retrieve it.

Example: Watching the Link

If you've used your Web browser much, then you've watched this happen countless times, even if you didn't realize it. If you're using Netscape, Mosaic, or a similar browser, start by pointing your mouse pointer at just about any link you can find. You may notice that when your mouse pointer is touching the link, an URL appears in the status bar-probably at the bottom of the page (see fig. 3.7).

Figure 3.7 : An URL in the status bar of Netscape Navigator.

That's the URL associated with the link to which you're pointing. Clicking that link will cause the browser to accept that URL as its next command, in much the same way that you manually entered URLs in the earlier example. To see it happen, click the link once. Now check the URL field that you used before to enter URLs (see fig. 3.8). You should see the same URL that was associated with the link to which your mouse was pointing. Then, after a few seconds, you should be at the new page.

Figure 3.8 : The link's URL now appears in the URL field (which is Location in Netscape).

What Can Be Sent on the Web?

Part of the magic of the HTTP protocol is that it is fairly unlimited (by Internet standards) in the sort of files that it can send and receive. For instance, like Internet e-mail, much of what is sent on the Web (via the HTTP protocol) is ASCII text. But, unlike Internet e-mail, HTTP isn't limited to ASCII text.

Note

There are two different types of files that can be sent over various Internet services. These are ASCII text files (plain text) and binary files. Binary files are any documents created by applications (such as word processing or graphics applications) or even the applications themselves. It's easiest to think of binary files as anything that isn't an ASCII file.

In fact, HTTP can send both of the major types of files-ASCII and binary-using the same protocol. This means that both plain text files (such as UseNet messages and HTML documents) and binaries (such as downloadable programs or graphics files) can be sent via the Web without any major effort on the part of the user. In certain cases, the HTML author will have to make a distinction (for instance, as to whether or not a graphics file should be displayed or downloaded to the user's machine), but, for the most part, HTTP figures this stuff out by itself.

How exactly does it figure these things out? Usually by a combination of the protocol selected and the extension to the filename in question. For instance, a file called INDEX.HTML that's accessed using an URL that starts with the http:// protocol will be displayed in a browser as an HTML file, complete with formatting and hypertext links.

The same file, however, if it is renamed to be INDEX.TXT, even if it's loaded with an http:// protocol URL, will be displayed in the browser as a simple ASCII file, just as if it were being displayed in WordPad, SimpleText, or Emacs. Why is this? Because the extension tells the Web browser how to display the file (see figs. 3.9 and 3.10).

Figure 3.9 : INDEX_TEST.HTM is loaded as an HTML document by the browser.

Figure 3.10 : INDEX_TEST.TXT is displayed simply as an ASCII text file.

You may recall from Chapter 1 that much of an HTML document is "text" (the rest being HTML codes). In fact, all of an HTML document is ASCII text, as is demonstrated in figure 3.9. It is only the extension .HTML (or .HTM on DOS-based Web servers) that tells a Web browser that it needs to interpret some of the text as HTML commands within a particular ASCII text document.

Tip

Because HTML documents are ASCII text, it's possible to create them in simple text editor programs. A Microsoft Word document, on the other hand, is not ASCII text-it's saved in a binary format. So, if you use a word processor to create HTML documents, remember to use the Save As command to save the HTML page in an ASCII format.

Binaries on the Web

When a binary document such as a graphics file is sent over the Web, it's important that it have the appropriate extension. That's how Web browsers know whether a document should be viewed in the browser window (like a JPEG- or GIF-format graphic) or whether it should be saved to the hard drive (like a ZIP or StuffIt archive file).

To the HTML designer, this means two things. First of all, you should recognize that your HTML pages can offer just about any other type of file for transport across the Web. If you want to send graphics, games, WordPerfect documents, or just about anything else, just put a hypertext link to that file on your Web page.

Second, you need to remember that the most important part of a filename is its extension. If you fail to put the correct extension on a filename, your user's browser won't know what to do with it. If you're trying to display a graphic on your Web page, for instance, but put a .TXT extension on it, it won't display.

Everything is Downloaded

There's one other thing you should realize about the Web and Web browsers before you begin to develop Web pages. Very simply, everything you view in a Web browser has to be downloaded from the Web site first. What do I mean by this?

Whenever you enter an URL or click a hypertext link, the HTML document (or binary file) that you're accessing is sent, in its entirety, from the Web server computer to your computer's hard drive. That's why, for instance, Web pages with a lot of graphics files take longer to display than Web pages with just text.

For the Web user, this is both good and bad. It's good because once a page is downloaded, it can be placed in the cache, so that the next time you access the page, it will take much less time to display. It's also good because anything that's currently displayed in your browser window, including the HTML document and any graphics files, can be instantly renamed and filed on your hard drive for your personal use.

Tip

If you use Netscape Navigator, click and hold the mouse button (on a Mac) or click the right mouse button (in Win95) while pointing to a Web page graphic. Notice that, after a few seconds, you can rename that graphic and save it to your hard drive.

The bad side of downloading, though, is that every graphic and all of the text you include in an HTML page has to be transmitted over the Internet to your user's computer. If your user is accessing the Web over a modem, then downloading and displaying your page can take a long time-especially if your Web page includes a lot of graphics. This means that HTML designers have to be constantly aware of the size of their HTML documents and their Web page graphics in order to avoid causing their users unnecessary irritation and wasted time.

Note

It takes 15 to 30 seconds (on average) for a 25 kilobyte graphic to be transmitted over a 28.8 kbps modem connection. So a 100 kilobyte Web page could take around two minutes to transfer-the length of four television commercials.

Summary

There are a number of popular Web browser applications that Web designers should take into consideration when designing their Web pages. Each browser displays HTML codes in slightly different ways and some-like Netscape and MS Internet Explorer-even add their own HTML-style commands.

The Web uses a particular style of Internet address, called an URL, which allows it to address individually any document on the Internet. This offers an advantage over other Internet address schemes because it specifies the Internet service protocols desired and points directly at documents.

It's important for the Web designer to remember that everything on a Web page is downloaded, including text and graphics. The larger the graphics on a Web page, the longer it will take to display. This is also an advantage, though, since pages can be cached for future use.

Review Questions

Which browser was the first graphical browser on the market? Which is currently most popular?
Most Netscape HTML extensions are designed to help with what aspect of Web pages?
What makes the Lynx browser different from the others discussed?
Is the following an URL, a server address, or a path statement?
www.mcp.com
What makes the mailto: command different from a standard URL?
What ASCII character comes between each folder or directory in a path statement?
If I entered the following in my browser's URL field (and hit Return, if necessary), would it download a file?
http://ftp.cdrom.com/pub/win95/games/four.zip
True or false. Graphics displayed on a Web page are downloaded to the user's computer, which is why they often take extra time to display.
Are the following files ASCII files or binary files? A CorelDRAW! picture, an HTML page, a Microsoft Word document, and a WordPad document.

Review Exercises

Use your current Web browser to access one of the FTP sites mentioned in the "Web Browser Applications" section of this chapter. Notice how browsers handle FTP connections.
Use an ftp:// URL to download one of those other Web browsers (or another file) directly. Hint: you'll need to figure out the path to the file first.
If your ISP allows it, use a modem communications program to dial up your account, and then use Lynx or a similar text browser through your ISP's connection. Notice how different the Web is without graphics and a mouse!

Chapter 3

How Web Browsers Work

CONTENTS