Web Technologies Nodrm 56584c

  • December 2019
  • PDF

This document was ed by and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this report form. Report 3i3n4


Overview 26281t

& View Web Technologies Nodrm as PDF for free.

More details 6y5l6z

  • Words: 196,659
  • Pages: 626


Fig. 6.6 Example of an HTML page We can create the above file by using any simple text editor, such as Notepad. We can save it in a directory of our choice and then open it in a browser. The browser shows the output as shown in Fig. 6.7.

Fig. 6.7

Output of a simple HTML page

As we can see, we can format the output the way we want. Let us examine what we have done in of coding now.

Web Technologies

154

Every HTML page must begin with this line. This line indicates that the current page should be interpreted by the Web browser (when we ask the browser to open it) as an HTML page. Because we enclose the word html inside the characters < and >, it is called as a tag. A tag in HTML conveys some information to the browser. For example, here, the tag tells the browser that the HTML page starts here. We shall see more such examples in due course of time. Title of page 585y1q

These lines define the head part of an HTML page. An HTML page consists of two sections, the head of the page, and the body of the page. The title of the page is defined in the head section. We can see that we have defined the title of the page as Title of page. If we look at the browser output, we will notice that this value is displayed at the top of the page. This is where the output of the title is shown. Incidentally, like title, there can be many other tags inside the head section, as we shall see subsequently. This is my first homepage. This text is bold

As mentioned earlier, the HTML page has a head section and a body section. The body section contains the tags that display the output on the browser screen other than the title. Here, the body section contains some text, which is displayed as it is. Thereafter, we have some text inside tags and . This indicates that whatever is enclosed inside these tags should be displayed in bold (b stands for bold). Hence, we see that the text enclosed inside the and tags is displayed in bold font in the browser output.

This tag indicates the end of the HTML document. We need to note some points regarding what we have discussed so far. 1. 2. 3. 4. 5. 6.

HTML tags are used to mark-up HTML elements. HTML tags are surrounded by the two characters < and >. HTML tags normally come in pairs like and . The first tag in a pair is the start tag, the second tag is the end tag. The text between the start and end tags is the element content. An ending tag is named in the same way as the corresponding starting tag, except that it has a / character before the tag name. 7. HTML tags are not case sensitive, means the same as . 8. We are specifying all tags in lower case. Although this was not a requirement until HTML version 4.0, in the future versions, it is likely to become mandatory. Hence, we should stick to lower case tags only.

6.3.2 Headings, Paragraphs, Line Breaks, etc. Headings in HTML are defined with the

to 1p5k3o

tags. For example, 405x3x

defines the largest heading, whereas 427037

defines the smallest heading. HTML automatically adds an extra blank line before and after a heading. Figure 6.8 shows an example. T/IP Part IV 155 Headings Example i2k2u

This is heading H1 1m5c6h

This is heading H2 494m1e

This is heading H3 1n6s26

This is heading H4 6l562

This is heading H5 5m1g16
This is heading H6 1v1u3x


Fig. 6.8

Headings, etc.

Figure 6.9 shows the corresponding output.

Fig. 6.9

Heading output

Paragraphs are defined with the

tag. HTML automatically adds an extra blank line before and after a paragraph. Figure 6.10 shows an example.

Paragraphs Example x295x

This is heading H1 1m5c6h

This is a paragraph

This is another paragraph



Fig. 6.10

Paragraphs example

Web Technologies

156 Figure 6.11 shows the corresponding output.

Fig. 6.11 Paragraphs output The
tag is used when we want to end a line. This tag does not start a new paragraph. The
tag forces a line break wherever you place it. Figure 6.12 shows an example. Line Breaks Example 5u5f52

This
is a para
graph with line breaks



Fig. 6.12

Line breaks example

The resulting output is shown in Fig. 6.13.

Fig. 6.13 Line breaks output

T/IP Part IV

157

6.3.3 Creating Links to Other Pages The anchor tag can be used to create a link to another document. This is called as hyperlink or Uniform Resource Locator (URL). The tag causes some text to be displayed as underlined. If we click on that text in the Web browser, our browser opens the site/page that the hyperlink refers to. The tag used is
. The general syntax for doing this is as follows. Text to be displayed

Here, a = Create an anchor href = Target URL Text = Text to be displayed as substitute for the URL For example, if we code the following in our HTML page: Visit Yahoo!

The result is visit Yahoo!

A full example is shown in Fig. 6.14. Hype link Example 1c5j33

This tag will create a hyper link 22o2f



Visit Yahoo! 3d2n2k


Fig. 6.14

Hyper link example

The resulting output is shown in Fig. 6.15.

Fig. 6.15 Hyper link output

Web Technologies

158

6.3.4 Frames The technology of frames allows us to split the HTML page window (i.e., the screen that the sees) into two or more sections. Each section of the page contains its own HTML document. The original HTML page, which splits itself into one or more frames, is also an HTML page. If this sounds complex, refer to Fig. 6.16.

Fig. 6.16

Frames concept

How do we achieve this? The main HTML page contains reference to the two frames. An example would help clarify this point. A sample main HTML page is shown in Fig. 6.17. Frames Example! j3m5g


Fig. 6.17 Frames example Let us understand what this page does. It has the following tag.

This tag indicates to the browser that is loading this HTML page that the HTML page is not like a traditional HTML page. Instead, it is a set of frames. There are two frames, each of which occupies 50% of the screen space.

This tag tells the browser that in the first 50% reserved area, the contents of the HTML page titled page1.html should be loaded.

Needless to say, this tag tells the browser that in the second 50% reserved area, the contents of the HTML page titled page2.html should be loaded.

T/IP Part IV

159 The output would be similar to what is shown in Fig. 6.18, provided the two HTML pages (page1.html and page2.html) contain the specified text line.

Fig. 6.18

Frames output

We should note that the browser reads our frame src tags for the columns from left to right. Therefore, we should keep everything in the order we want it to appear. Now, suppose we wanted three frames across the page, and not two. To achieve this, we need to modify our frameset tag and add another frame src tag for the third frame, as follows:

Interestingly, this covers only 99% of the space on the page. What about the remaining 1%? The browser would fill it up on its own. This may lead to slightly unpredictable results, so it is better to increase one of the 33% by 1 to make it 34%. We can also create frames in other formats. An example is shown in Fig. 6.19.

Fig. 6.19 Another frames output How do we code the main HTML page for doing this? It is shown in Fig. 6.20. Another Frames Example! 4xs3v


Fig. 6.20

Frames code

Web Technologies

160 Let us understand this now. n

n

n

n

n

The first frameset tag tells the browser to divide the page into two columns of 65% and 35% sizes, respectively. The frame src tag after it tells the browser the first column should be filled with the contents of page1.html. The next frameset tag is nested inside the first frameset tag. This tag tells the browser to divide the second column into two rows, instead of using a single HTML page to fill the column. The next two frame src tags tell the browser to fill the two rows with page2.html in the top row and page3.html in the bottom row, in the order of top to bottom. We must close all of our frameset tags after they have been used.

Based on all the concepts discussed so far, let us now take a look at a real-life example. Figure 6.21 shows code for three HTML pages: one test page (test.html), which contains a frameset that specifies two frames (left.html and right.html).

Fig. 6.21 Frames inside a frameset The resulting output is shown in Fig. 6.22. We will not discuss more features of frames, since they are not relevant to the current context.

T/IP Part IV

161

Fig. 6.22

Frameset concept

6.3.5 Working with Tables Table 6.2 summarizes the tags that can be used to create an HTML table.

Table 6.2

Table tags Tag

Use Marks a table within an HTML document. Marks a row within a table. Closing tag is optional. Marks a cell (table data) within a row. Closing tag is optional. Marks a heading cell within a row. Closing tag is optional.





and tags for the number of rows needed. We have one header row and three data rows. Hence, we would have four instances of and . Web Technologies 162


For example, suppose we want to create the following table in HTML, as shown in Table 6.3.

Table 6.3

Sample table output Book Name

Author

Operating Systems Data Communications and Networks Cryptography and Network Security

Godbole Godbole Kahate

Let us understand this step by step. Step 1 Start with the basic and
tags.


Step 2 Add


Step 3 Add and tags for table headings.


Step 4 Add and tags for adding actual data.


Step 5 Add actual heading and data values.



T/IP Part IV 163
Book Name Author
Operating SystemsGodbole
Data Communications and Networks Godbole
Cryptography and Network Security Kahate


The full HTML page is shown in Fig. 6.23. Table Example 5a3l4c

Here is a Table in HTML 363u5f

Book Name Author
Operating Systems Godbole
Data Communications and Networks Godbole
Cryptography and Network Security Kahate


Fig. 6.23 HTML code for a table The resulting output is shown in Fig. 6.24. We can see that the table does not have any borders. We can easily add them using the border attribute. The modified table tag is as follows (other things being exactly the same as before).

















The resulting output is as shown in Fig. 6.25. Web Technologies 164 Fig. 6.24 Output of HTML table Fig. 6.25 6.3.6 Adding border to a table Lists In HTML, there are two types of lists, unordered and ordered. An unordered list is a list of items marked with bullets. It starts with
, and so on. However, implementing them, i.e., utilizing the appropriate layout was supposed to be taken care of by the browser. That is, when we say "h2", what should be the font size, and font family? This was left to the individual Web browsers to decide. The two major Web browsers of those times, namely, Netscape Navigator and Internet Explorer, did not always follow the HTML specifications as defined by the Standards body. Instead, they went on adding new HTML tags and attributes (e.g., the tag and the color attribute) to the original HTML specifications. As a result, the following two problems arose.

T/IP Part IV

175 1. Applications were no longer browser-independent. Something that worked on Netscape Navigator was not guaranteed to work on Internet Explorer, and vice versa. This was because these browsers were also adding proprietary tags to their implementation of the HTML specifications. 2. It became increasingly more difficult to create Web sites where the content of HTML document and its presentation layout were very cleanly separated. In order to resolve this problem and come up with a general solution, the World Wide Web Consortium (W3C)—the non profit, standard setting consortium responsible for standardizing HTML—created styles in addition to HTML 4.0. Styles, as the name suggests, define how HTML elements should be displayed, very similar to the way the font tag and the color attribute in HTML 3.2 work. Styles control the output (i.e., display) of the HTML tags, and remove ambiguity. They also help reduce the clutter from HTML pages (we shall see an example of this to understand its meaning clearly). Technically, style sheets are implemented by using what are called as Cascading Style Sheets (CSS). The idea is simple. We keep all styling details separate (e.g., in an external file with a CSS extension), and we can refer to this file from our HTML document. Better yet, multiple HTML documents can make use of the same CSS file, so that all of them can have the same look-and-feel, as defined in the CSS file. This concept is shown in Fig. 6.47.

Fig. 6.47 Style sheet concept How does this work? Let us understand with an example, as shown in Fig. 6.48.

Fig. 6.48 Style sheet example

Web Technologies

176 Let us understand how this works. The HTML page has the following line inside its tag.

It means that the HTML page wants to link with a separate file, named one.css. In other words, the HTML page would be handed over to the CSS file for applying styles. Let us see now how this actually happens. For this purpose, let us go through our CSS file. body {color: black} h2 {text-align:center; color:blue; font-family: “verdana”} p {font-family: “sans serif”;color:red}

Figure 6.49 explains each line of the CSS code.

Fig. 6.49

Understanding CSS code

As we can see, the CSS file instructs the browser as to how to display the output of an HTML page with very precise formatting details. The resulting page is shown in Fig. 6.50.

Fig. 6.50 CSS output The same HTML page, without using the CSS, would look as shown in Fig. 6.51. We would not notice the differences in color in the black-and-white print of this book, but at least the differences in alignment and font of the h2 header should be clear.

T/IP Part IV

177

Fig. 6.51 Output without CSS CSS can be of three types: external, internal, and inline. Let us discuss these now.

External style sheets As the name says, in this case, the style sheet is external to the HTML document. In other words, the HTML document and the CSS file are separate. This type of CSS is ideal when the same style is applied to many HTML pages. With the help of an external style sheet, we can change the look of an entire Web site by changing just one file! This is because all HTML files can potentially link to the same CSS file! Of course, in real practice, this is not how it is done. Instead, many CSS files are created, and HTML pages link with them on needs basis. In general, an HTML page must link to the style sheet using the tag. The tag goes inside the head section. We have seen an example of this earlier, and hence we will not repeat the discussion here. Internal style sheets These style sheets should be used when a single document has a unique style. We define internal styles in the head section of the HTML document with the <style> tag. Figure 6.56 shows an example. CSS Example e6m4w <style type=”text/css”> body {background-color: lightblue} h1 {font-family:tahoma} p {margin-left: 50px; font-family:comic sans ms}

This is an Internal Style Sheet 6v1116

Hello World



Fig. 6.52 Internal style sheet

Web Technologies

178 Let us understand what we are doing here. Inside the head section, we do not specify a link to an external CSS file now. Instead, we have a style tag, which defines all the styles that we want to define for the current HTML page. As a result of which, both the HTML document and the CSS tags are in the same physical document (hence the name internal). For example, the background color of the HTML body has been defined to consist of light blue color. Similarly, other styles define how the header h1 should be displayed, how paragraphs are to be displayed, etc. The resulting output is shown in Fig. 6.53.

Fig. 6.53 Internal style sheet output Inline style sheets The inline style sheets should be used when a unique style is to be applied to a single occurrence of an element. To use inline styles, we can use the style attribute in the relevant tag. The style attribute can contain any CSS property. Here, we do not define styles in the section, but define them at the place where they are actually used in the HTML body. An example of inline style sheets is shown in Fig. 6.54. CSS Example e6m4w

This is an Internal Style sheet o3g2p

This is a paragraph

This is another paragraph



Fig. 6.54 Inline style sheet As we can see, we have defined styles inline, i.e., at the same place where the HTML tags are defined. Also, we have defined styles for one h1 and one p tags. On the other hand, we have not defined any styles for the remaining p tag. This is perfectly alright. We can define styles only wherever we want to use them.

T/IP Part IV

179 We can also combine some of the style sheet types. In other words, the same HTML document can have both inline and internal style sheets, or just one of them, and an external style sheet as well. The same tag can have references in multiple types of style sheets as well. That is, let us say that the external style sheet says that the heading

should be displayed in font with size 12 and type Times New Roman. On the other hand, suppose that there is an internal style declaration for the same 6i4120

tag, with different display characteristics (say, with font size 10 and font type as Tahoma). In any such case, the order of preference is always Inline -> Internal -> External. In other words, if the same HTML tag has references from multiple types of style sheets, inline takes the highest preference, followed by internal, and followed by external. This is depicted in Fig. 6.55. Fig. 6.55 Style sheet priorities Here is an example where we have used all the three types of style sheets (inline, internal, and external). See how the inline style sheet overrides the internal style sheet, which in turn, overrides the external one, as shown in Fig. 6.56. CSS Example <style type = “text/css”> p {font-family: “Tahoma”;color: blue; font-size = 50} 40y3z

This is an Internal Style Sheet 6v1116

This is a paragraph

This is another paragraph



Fig. 6.56 Multiple types of style sheet used in a single HTML page As we can see, there is an internal style sheet defined in the section of the HTML page using the style tag. We also have a reference to an external style sheet with the help of the link tag. On top of this, we also have an inline style defined inside the p tag in the body of the HTML page.

Web Technologies

180 Let us now take a look at the external style sheet, as shown in Fig. 6.57.

body {color: green} h2 {text-align:center;color:blue;font-family: “verdana”} p {font-family: “sans serif”;color: brown; font-size: 100}

Fig. 6.57 External style sheet The resulting output is shown in Fig. 6.58. See how the principle of inline -> internal -> external style sheet holds good.

Fig. 6.58

Style sheet output

6.4 WEB BROWSER ARCHITECTURE 6.4.1 Introduction Web browsers have a more complex structure than the Web servers. This is because a Web server’s task is relatively simple. It has to endlessly wait for a browser to open a new T connection and request for a specific Web page. When a Web server receives such a request, it locates the requested Web page, sends it back to the requesting browser, closes the T connection with that browser and waits for another request. That is why we say that a Web server waits for T connections ively. It does not initiate HTTP requests, but instead, waits for HTTP requests from one or more clients, and serves them. Therefore, a Web server is said to execute a ive open call upon start, as we have discussed before. It is the responsibility of the browser to display the document on the ’s screen when it receives it from the server. As a result, a browser consists of several large software components that work together that provide an abstracted view of a seamless service. Let us take a look at the architecture of a typical Web browser. This will give us more insight into its working. First, take a look at Fig. 6.59.

T/IP Part IV

181

Fig. 6.59 Internal architecture of a Web browser A browser contains some pieces of software that are mandatory and some that are optional depending upon the usage. HTTP client program shown in the above figure as (2) and HTML interpreter program (3) are mandatory. Some other interpreter programs as in (4), Java interpreter program (5) and other optional interpreter program (6) are optional. The browser also has a controller, shown as (1), which manages all of them. The controller is like the control unit in a computer’s U. It interprets both mouse clicks/selections and keyboard inputs. Based on these inputs, it calls the rest of the browser’s components to perform the specific tasks. For instance, when a types a URL, the controller calls the HTTP client program to fetch the requested Web page from a remote Web server whose address is given by the URL. When the Web page is received, the controller calls the HTML interpreter to interpret the tags and display the Web page on the screen. The HTML interpreter takes an HTML document as input and produces a formatted version of it for displaying it on the screen. For this, it interprets the various HTML tags and translates them into display commands based on the display hardware in the ’s computer. For instance, when the interpreter sees a tag to make the text bold, it instructs the display hardware to display the text in the bold format. Similarly, when the interpreter encounters a tag to change paragraphs, it performs the necessary display functions in conjunction with the display hardware.

6.4.2 Optional Clients Apart from the HTTP client and an HTML interpreter, a browser can contain additional clients. We have seen applications such as FTP and email. For ing these applications, a browser contains FTP and email client programs. These enable the browsers to perform FTP and email services. The interesting point is that a need not explicitly invoke these special services. Instead, the browser invokes them automatically on behalf of the . It hides these details from the . For example, for sending an email, there would be a link on an

Web Technologies

182 HTML page. Usually, there is such a link on every Web site so that the can send an email to the owner or technical staff of the Web site to resolve any queries, obtain more information, report problems, etc. If the clicks that link with a mouse, the controller of the browser would interpret this and then it would invoke the email client program automatically. Similarly, the could just select an option on the screen to invoke the FTP service. That mouse click would be interpreted by the controller of the browser and then it would invoke the FTP program through the FTP client program. The need not be aware of this. He gets a feeling that transferring a file or sending an email can be achieved through the browser. From a ’s point of view, he is just using the Web browser as usual.

6.5 COMMON GATEWAY INTERFACE (CGI) 6.5.1 CGI Introduction Common Gateway Interface (CGI) is the oldest dynamic Web technology. It is still in use, but is getting replaced by other technologies, such as Microsoft’s ASP.NET and Sun’s Servlets and JSP. Many people think that CGI is a language. But it is not actually the case. Instead, we should that CGI is a specification for communication between a Web browser and a Web server using the HTTP protocol. A CGI program (also called as a CGI script) can be written in any language that can read values from a standard input device (usually the keyboard), write to a standard output device (usually the screen), and read environment variables. Most well-known programming languages such as C, PERL, and even simple UNIX shell scripting provide these features, and therefore, they can be used to write CGI scripts. CGI scripts execute on a Web server, similar to ASP.NET and JSP/Servlets. Hence, CGI is also a serverside dynamic Web page technology. The typical manner in which a CGI script executes is shown in Fig. 6.60.

Fig. 6.60

Typical steps in CGI script execution

Let us take a look at these steps now.

6.5.2 Read Input from the HTML form We know that the HTML form is an area where the can enter the requested information. The form has various controls, such as text boxes, text areas, checkboxes, radio buttons, dropdown lists, and so on, which capture the inputs. When the submits the form, these inputs are sent to the Web server, as a part of the browser’s HTTP request.

T/IP Part IV

183 As we have seen earlier, we can read the inputs in ASP.NET or JSP/Servlets with the help of the request object. In CGI, the syntax for doing the same thing is a bit more complex. Figure 6.61 shows a sample PERL script for reading input. if (($ENV{‘REQUEST_METHOD’} eq ‘GET’) || ($ENV{‘REQUEST_METHOD’} eq ‘HEAD’) ) { $in= $ENV{‘QUERY_STRING’} ; } elsif ($ENV{‘REQUEST_METHOD’} eq ‘POST’) { if ($ENV{‘CONTENT_TYPE’}=~ m#^application/x-www-form-urlencoded$#i) { length($ENV{‘CONTENT_LENGTH’}) || &HTMLdie(“No Content-Length sent with the POST request.”); read(STDIN, $in, $ENV{‘CONTENT_LENGTH’}); } else { &HTMLdie(“Uned Content-Type: $ENV{‘CONTENT_TYPE’}”); } } else { &HTMLdie(“Script was called with uned REQUEST_METHOD.”); }

Fig. 6.61 CGI script to read form variables in PERL The script first attempts to see whether the ’s request was received in the form of a GET or HEAD method. Accordingly, it reads the contents of the query string (i.e., the area of memory where the ’s form data is kept for the server to access and process it). If the ’s request was POST, it performs some necessary conversions and then reads the content. Otherwise, it displays an error message.

6.5.3 Send HTTP Response Containing HTML Back to the This process is even simpler. Here, we first need to write the following statement. Content-type: text/html Then we need to send one blank line to the standard output. After this, we can write our HTML contents page to the standard output. Once the end of the contents is reached, the HTML content would automatically be sent to the browser, as a part of the server’s HTTP response. The example in Fig. 6.62 is reproduced from http://www.jmarshall.com/easy/cgi/. #!/usr/local/bin/perl # # hello.pl— standard “hello, world” program to demonstrate basic # CGI programming, and the use of the &getcgivars() routine. # # First, get the CGI variables into a list of strings %cgivars= &getcgivars ; # Print the CGI response header, required for all HTML output # Note the extra \n, to send the blank line print “Content-type: text/html\n\n” ; # Finally, print out the complete HTML response page

(Contd)

Web Technologies

184 Fig. 6.62 contd... print <<EOF ; CGI Results 206d55

Hello, world. 6o4r7

Your CGI input variables were:
    EOF # Print the CGI variables sent by the . # Note that the order of variables is unpredictable. # Also note this simple example assumes all input fields had unique names, # though the &getcgivars() routine correctly handles similarly named # fields— it delimits the multiple values with the \0 character, within # $cgivars{$_}. foreach (keys %cgivars) { print “
  • [$_] = [$cgivars{$_}]\n” ; } # Print close of HTML file print <<EOF ;
EOF exit ;

Fig. 6.62 CGI sample program to send output back to the

6.5.4 CGI Problems However, there is one problem with CGI, which is that for each client requesting a CGI Web page, a new process has to be created by the operating system running on the server. That is, the Web server must request the operating system to start a new process in memory, allocate all resources such as stack for it and schedule it, etc. This takes a lot of server resources and processing time, especially when multiple clients request the same CGI Web page (i.e., the page containing the CGI program). The operating system has to queue all these processes, allocate memory to them and schedule them. This is a large overhead. This is shown in Fig. 6.63. Here, three different clients are shown to request for the same CGI Web page (named CGI-1). However, the Web server sends a request to create a different process for each of them to the operating system.

T/IP Part IV

185

Fig. 6.63 Each CGI request results into a new process creation

6.6 REMOTE (TELNET) 6.6.1 Introduction The TELNET protocol allows remote services, so that a on a client computer can connect to a server on a remote system. TELNET has two parts, a client and a server. The client portion of TELNET software resides on an end ’s machine, and the server portion resides on a remote server machine. That is, the remote server is the TELNET server, which provides an interactive terminal session to execute commands on the remote host. Once a using the services of a TELNET client connects to the remote TELNET server computer, the keystrokes typed by the on the client are sent to the remote server to be interpreted/acted upon to give an impression as if the is using the server computer directly. The TELNET protocol emerged in the days of timesharing operating systems such as Unix. In a timesharing environment, a common server computer serves the requests of multiple s in turns. Although many s use the server at the same time, the speed is normally so fast that every gets an illusion that he is the only , using that server computer. The interaction between a and the server computer happens through a dumb terminal. Such a dumb terminal also has to have a microprocessor inside. Thus it can be considered to be a very primitive computer that simply has a keyboard, mouse and a screen and almost no processing power. In such an environment, all the processing is essentially done by the central server computer. When a enters a command using the keyboard, for example, the command travels all the way to the server computer, which executes it and sends the results back to the ’s terminal. At the same time, another might have entered another command. This command also travels to the server, which processes it and sends the results back to that ’s terminal. Neither is concerned with the fact that the server is processing the requests from another as well. Both s feel that they have exclusive access to the server resources. Thus, timesharing creates an environment in which every has an illusion of using a dedicated computer. The can execute a program, access the system’s resources, switch between two or more programs, and so on. How is this possible?

Web Technologies

186

6.6.2 Local In timesharing systems, all s to the central server computer and use its resources. This is called as local . A ’s terminal sends the commands entered by the to a program called as terminal driver, which is running on the central server computer. It is a part of the server computer’s operating system. The terminal driver program es the commands entered by the to the appropriate module of the server computer’s operating system. The operating system then processes these commands and invokes the appropriate application program, which executes on the server computer and its results are sent back to the ’s terminal. This is shown in Fig. 6.64.

Fig. 6.64 Local This forms the basis for further discussions about the TELNET protocol, as we shall study in the next section.

6.6.3

Remote and TELNET

In contrast to local , sometimes a wants to access an application program located on a remote computer. For this, the logs on to the remote computer in a process called as remote . A specifies the domain name or IP address to select a remote server with which it wants to establish a TELNET session. This is where TELNET comes into picture. TELNET stands for TERminal NETwork. This is shown in Fig. 6.65. The step numbers shown in the figure followed by their descriptions depict how TELNET works, in detail. 1. As usual, the commands and characters typed by the are sent to the operating system on the common server computer. However, unlike a local set up, the operating system now does not interpret the commands and characters entered by the . 2. Instead, the local operating system sends these commands and characters to a TELNET client program, which is located on the same server computer. 3. The TELNET client transforms the characters entered by the to a universally agreed format called as Network Virtual Terminal (NVT) characters and sends them to the T/IP protocol stack

T/IP Part IV

187 of the local server computer. TELNET was designed to work between any host (i.e., any operating system) and any terminal. NVT is an imaginary device, which is the commonality between the client and the server. Thus, the client operating system maps whatever terminal type the is using to NVT. At the other end, the server operating system maps NVT on to whatever actual terminal type the server is using. This concept is illustrated in Fig. 6.66.

Fig. 6.65

Remote using TELNET

Fig. 6.66 Concept of NVT

Web Technologies

188 4. The commands or text in the NVT format then travel from the local server computer to the T/IP stack of the remote computer via the Internet infrastructure. That is, the commands or text are first broken into T and then IP packets, and are sent across the physical medium from the local server computer to the remote computer. This works exactly similar to the way IP packets (and then physical hardware frames) travel over the Internet as described earlier many times. 5. At the remote computer’s end, the T/IP software collects all the IP packets, verifies their correctness/ completeness, and reconstructs the original command so that it can hand over these commands or text to that computer’s operating system. 6. The operating system of the remote computer hands over these commands or text to the TELNET server program, which is executing on that remote computer, ively waiting for requests from TELNET clients. 7. The TELNET server program on the remote computer then transforms the commands or text from the NVT format to the format understood by the remote computer. However, the TELNET server cannot directly hand over the commands or text to the operating system, because the operating system is designed so that it can accept characters only from a terminal driver: not from a TELNET server. To solve this problem, a software program called as pseudo-terminal driver is added, which pretends that the characters are coming from a terminal and not from a TELNET server. The TELNET server hands over the commands or text to this pseudo-terminal driver. 8. The pseudo-terminal driver program than hands over the commands or text to the operating system of the remote computer, which then invokes the appropriate application program on the remote server. The client using the terminal on the other side, can, thus, access this remote computer as if it is a local server computer!

6.6.4 TELNET: A Technical Perspective Technically, the TELNET server is actually quite complicated. It has to handle requests from many clients at the same time. These concurrent requests must be responded to in real time, as the s perceive TELNET as a real-time application. To handle this issue effectively, the TELNET server uses the principle of delegation. Whenever there is a new client request for a TELNET connection, the TELNET server creates a new child process and lets that child process handle that client’s TELNET connection. When the client wants to close down the TELNET connection, the child process terminates itself. Thus, if there are 10 clients utilizing TELNET services at the same time, there would be 10 child processes running, each servicing one client. There would, of course, be the main TELNET server process executing to coordinate the creation and handling of child processes. TELNET uses only one T connection (unlike FTP, which uses two). The server waits for TELNET client connection requests (made using T) at a well-known port 23. The client opens a TELNET connection (made using T) from its side whenever the requests for one. The same T connection is used to transfer data and control characters. The control characters are embedded inside data characters. How does TELNET then distinguish a control character from a data character? For this, it mandates that each sequence of control characters must be preceded by a special control character called as Interpret As Control (IAC).

6.6.5 TELNET as an Alternative to a Web Browser Interestingly, TELNET software can be used as a poor alternative to a Web browser. As we have seen, a Web browser is essentially a software program that runs on the computer of an Internet . It can be used to request an HTML page from a Web server and then interpret the HTML page and display its contents on the

T/IP Part IV

189 ’s screen. Supposing that a , for some reason, does not have a Web browser, but knows how to enter TELNET commands, and has some software that can interpret HTML pages. In such a case, the can actually type TELNET commands that mimic the function of a Web browser, by requesting Web pages from a Web server. This happens as if the request is sent from a Web browser. Of course, in such a case, the must be knowledgeable and should know how the Web works. However, the point to note is that TELNET can actually be used to send HTTP commands to a Web server.

SUMMARY l

l

l

l

l

l

l l

l

l

l

The World Wide Web (WWW) is the second most popular application on the Internet, after email. It also works on the basis of client-server architecture, and uses a request-response paradigm. An organization hosts a Web site, consisting of Web pages. Anybody armed with a Web browser and wanting to access these Web pages can do so. Each Web site has a unique identifier, called as Uniform Resource Locator (URL), which is essentially an address of home page of the Web site. The WWW application uses the Hyper Text Transfer Protocol (HTTP) to request for and serve Web pages. A Web server is a program running on a server computer. Additionally, it consists of the Web site containing a number of Web pages. The contents of a Web page are written using a special tag language, called as Hyper Text Markup Language (HTML). There are various HTTP commands to request, , and delete Web pages that a browser can use. HTTP is a stateless protocol. This means that the T connection between a client and a server is established and broken for every Web page request. The Hyper Text Markup Language (HTML) is used for creating Web pages. HTML is a presentation language that uses tags to demark different text formats, such as boldface, italics, underline, paragraphs, headings, colors, etc. The TELNET protocol allows remote services, so that a on a client computer can connect to a server on a remote system. In TELNET, the ’s commands are not processed by the local operating system. Instead, they are directed to a remote server to which the is connected.

REVIEW QUESTIONS Multiple-choice Questions 1. The main page of a Web site is generally called as the . (a) chief page (b) main page (c) home page 2. The world’s first real Web browser was . (a) Mosaic (c) Netscape Navigator (b) Internet Explorer (d) None of the above

(d) house page

Web Technologies

190 3. Web pages are created in the language. (a) HTTP (b) WWW (c) Java (d) HTML 4. The portion after the words WWW identify a . (a) client (b) Web server (c) database server (d) application server 5. GET and PUT commands are used to an HTML document. (a) (b) (c) delete (d) modify 6. HTTP is called as a protocol. (a) stateful (b) stateless (c) state-aware (d) connection-oriented 7. The command allows a client to remove a file from a Web server using HTTP. (a) GET (b) POST (c) UNLINK (d) DELETE 8. Proxy server is used to transform protocol to format. (a) T/IP, OSI (b) OSI, T/IP (c) Non-HTTP, HTTP (d) T/IP, HTTP character. 9. Generally, the closing HTML tag is indicated by the (a) * (b) / (c) \ (d) @ 10. The tag can be used to create hyper links. (a) anchor (b) arrow (c) link (d) pointer

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Define the Web site, Web page, Web server, URL and home page. What is the purpose of HTTP? How does a Web browser work? Describe the steps involved when a Web browser requests for and obtains a Web page from a Web server. Why is HTTP called as a stateless protocol? Why is it so? Discuss any three HTTP commands. What is the purpose of a proxy server? Why is the job of search engines not easy? Discuss the idea of HTML tags with an example. Describe any three HTML tags.

Exercises 1. Discuss the differences between GET and POST commands. (Hint: Use technical books on Web technologies such as ASP.NET, servlets and JSP—which we shall study later). 2. Investigate how you can set up your own Web site. What are the requirements for the same? 3. Create a Web page that displays your own details in about 100 words, and also includes your photograph. Use different fonts, colors and character-spacing tricks to see the change in the output. 4. Try to find out the major differences between the two major Web browsers—Internet Explorer and Netscape Navigator. 5. Try connecting to a remote server using TELNET. What are your observations in of look and feel, communication speed and features available?

JavaScript and AJAX

191

JavaScript and AJAX

+D=FJAH

7

INTRODUCTION HTML Web pages are static. They do not react to events. Also, they do not produce different outputs when different s ask for them, or even when the same asks for them, but under different conditions. Therefore, there is a lot of predictability about HTML pages. Moreover, the output is always the same. This means that there is no programming involved at all. Therefore, attempts were made to add interactivity to HTML pages. This was done both at the client (Web browser) side, as well as the server (Web server) side. Thus, we have both client-side as well as server-side programming on the Internet. The server-side programming techniques will be discussed at length later. This chapter looks at the client-side programming techniques. Several techniques have come and gone, but the one that has stayed on is the JavaScript language. JavaScript is a quick and dirty programming language, which can be used on the client (Web browser) for performing a number of tasks, such as validating input, doing local calculations, etc. In addition to JavaScript, the technology of AJAX has gained prominence in the last few years. We shall also discuss AJAX in detail.

7.1 JAVASCRIPT 7.1.1 Basic Concepts We know that HTML pages are static. In other words, there is no interactivity in the case of plain HTML pages. To add interactivity to HTML inside the browser itself, the technology of JavaScript was developed. JavaScript involves programming. We can write small programs that execute inside the HTML page, based on certain events or just like that. These programs are written in JavaScript. Earlier, there were a few other scripting languages such as VBScript and Jscript. However, these technologies are obsolete now, and JavaScript is the only one that has survived. JavaScript is an interpreted language. It can be directly embedded inside HTML pages, or it can be kept in a separate file (with extension .js) and referred to in the HTML page. It is ed by all the major browsers, such as Internet Explorer, Firefox, and Netscape Navigator. We need to that Java and JavaScript do not have anything in common, except for the naming. It was cool to call everything Java something when these technologies were coming up for the first time. Hence, we have the name JavaScript.

Web Technologies

192 JavaScript has several features: n n

n

n n

Programming tool—JavaScript is a scripting language with a very simple syntax. Can produce dynamic text into an HTML page—For example, the JavaScript statement document.write (“

” + name + “ 1e6i6w

”); results into the HTML output

Atul, if the variable name contains the text Atul. Reacting to events—JavaScript code executes when something happens, like when a page has finished loading or when a clicks on an HTML element. Read and write HTML elements—JavaScript can read and change the content of an HTML element. Validate data—JavaScript can be used to validate form data before it is submitted to a server. This saves the server from extra processing. The first JavaScript is shown in Fig. 7.1. <script type=”text/javascript”> document.write (“Hello World!”); Fig. 7.1 JavaScript example 7.1.2 Controlling JavaScript Execution As we can see, JavaScript is a part of the basic HTML page. It is contained inside the <script>… tags. Here, document is the name of an object, and write is a method of that object. We can control when JavaScript should code execute. By default, scripts in a page will be executed immediately while the page loads into the browser. This is not always what we want. Sometimes we want to execute a script when a page loads, and at other times when a triggers an event. Scripts that we want to execute only when they are called, or when an event is triggered, go in the head section. When we place a script in the head section, we ensure that the script is loaded before anyone uses it. That is, it does not execute on its own. However, if we put scripts in the body section, then they automatically get executed when the page loads in the browser. This difference is shown in Fig. 7.2. Fig. 7.2 Where to place JavaScript Of course, we can put as many scripts as we like, in an HTML page. Also, there is no limitation on how many of them should be in the section, and how many of them should be in the section. In the JavaScript and AJAX 193 example we had shown earlier, the script was written inside the section, and therefore, it executed without needing to make any explicit call. Instead, if we had written it inside the section, then we would have needed to call it explicitly from some part of the section. Let us understand the differences between the two clearly. Figure 7.3 shows the code for writing a script inside the section, versus in the section. <script type=”text/javascript”> function message () { alert (“Called from the section”) } (a) Script in the section Fig. 7.3 <script type=”text/javascript”> window.document.write (“Directly executed”) (b) Script in the section Writing scripts in and sections As we can see, the difference is where we have put the script. In case (a), the script is inside the section, and therefore, must explicitly get called to get executed. We call the script from the onload event of the section. In other words, we tell the browser that as soon as it starts loading the HTML page (i.e., the contents of the section), it should call the message () function written in the section. In case (b), the script is a part of the section itself, and therefore, would get executed as soon as the HTML page gets loaded in the browser. There is no need to call this script from anywhere. Figure 7.4 shows how to put the JavaScript in an external file and include it in our HTML page. We have not shown the script code itself, as the example is only to illustrate the concept. <script src=”MyScript.js”> Fig. 7.4 How to declare external JavaScript? As we can see, the JavaScript code is supposed to be contained in a separate file called as MyScript.js. Web Technologies 194 7.1.3 Miscellaneous Features Variables JavaScript allows us to define and use variables just like other programming languages. Variables are declared using the keyword var. However, this keyword is optional. In other words, the following two declarations are equivalent. var name = “test”; name = “test”; Variables can be local or global. n n Local variables When we declare a variable within a function, the variable can only be accessed within that function. When we exit the function, the variable is destroyed. This type of variable is a local variable. Global variables If we declare a variable outside a function, all the functions on our HTML page can access it. The lifetime of these variables starts when they are declared, and ends when the page is closed. Figure 7.5 shows an example of using variables. Seconds in a day <script type = “text/javascript”> var seconds_per_minute = 60; var minutes_per_hour = 60; var hours_per_day = 60; var seconds_per_day = seconds_per_minute * minutes_per_hour * hours_per_day; 3h2f59

We can see that ... 6o1x3y

<script type=”text/javascript”> window.document.write (“there are “); window.document.write (seconds_per_day); window.document.write (“ seconds in a day.”);

Fig. 7.5 Variables example The resulting output is shown in Fig. 7.6.

JavaScript and AJAX

195

Fig. 7.6 Output of variables example Operators JavaScript s a variety of operators. Table 7.1 summarizes them.

Table 7.1 JavaScript operators Operator classification

List of operators

Arithmetic Assignment Comparison Logical

+ - * / % ++ -= += -= *= /= %= = < > <= >= != && || !

Functions A function contains block of code that needs to be executed repeatedly, or based on certain events. Another part of the HTML page calls a JavaScript function on needs basis. Usually, all functions should be defined in the section, right at the beginning of the HTML page, and should be called as and when necessary. A function can receive arguments, or it can also be a function that does not expect any arguments. A function is declared by using the keyword function, followed by the name of the function, followed by parentheses. If there are any arguments that the function expects, they are listed inside the parentheses, separated by commas. A function can return a single value by using the return statement. However, unlike standard programming languages, a function does not have to mention its return data type in the function declaration. Enough of theory! Let us now look at a function example, as shown in Fig. 7.7. function total (a, b) { result = a+b; return result; }

Fig. 7.7 Function example

Web Technologies

196 As we can see, the name of the function is total. It expects two arguments. What should be their data types? This is not needed to be mentioned. The function adds the values of these two arguments and stores the result into a third variable called as result. It then returns this value back to the caller. How would the caller call this function? It would say something like sum = total (5, 7).

Conditional statements JavaScript s three types of conditional statements, if, if-else, and switch. They work in a manner that is quite similar to what happens in Java or C#.

Figure 7.8 shows an example of the if statement. <script type=”text/javascript”> var d = new Date (); var time = d.getHours (); if (time > 12) { document.write (“Good afternoon”); }

Fig. 7.8 Example of if statement The resulting output is shown in Fig. 7.9, assuming that currently it is the afternoon.

Fig. 7.9 Output of if example On the other hand, an if-else statement allows us to write alternative code whenever the if statement is not true. Figure 7.10 shows an example of the if-else statement.

JavaScript and AJAX

197 <script type=”text/javascript”> var d = new Date (); var time = d.getHours (); if (time < 12) { document.write (“Good morning!”); } else { document.write (“Good day!”); }

Fig. 7.10 Example of if-else statement The resulting output is shown in Fig. 7.11.

Fig. 7.11

Output of if-else example

Figure 7.12 shows an example of the switch statement. <script type = “text/javascript”> var d = new Date (); theDay = d.getDay (); switch (theDay) { case 5: document.write (“Finally Friday”); break; case 6: case 0:

(Contd)

Web Technologies

198 Fig. 7.12 contd... document.write (“Super Weekend”); break; default: document.write (“I’m looking forward to this weekend!”); }

Fig. 7.12

Example of switch statement

Figure 7.13 shows the resulting output.

Fig. 7.13 Output of the switch example We can also use the ?: conditional operator in JavaScript. For example, we can have the following code block. greeting = (visitor == “Senior”) ? “Dear sir “: “Dear “;

Loops JavaScript provides three kinds of loops, while, do-while, and for. The while loop first checks for the condition being tested, and if it is satisfied, only then executes the code. The do-while loop first executes the code and then checks for the condition being tested. In other words, it executes at least once, regardless of whether the condition being tested is successful or not. The for loop executes in iteration, usually incrementing or decrementing the loop index. Figure 7.14 shows the example of the while loop. <script type = “text/javascript”> var i = 0; while (i <= 5) { document.write (“The number is “ + i);

(Contd)

JavaScript and AJAX

199 Fig. 7.14 contd... document.write (“
”); i++; }

We have seen an example of the while loop

Fig. 7.14

Example of while loop

Figure 7.15 shows the output of the while example.

Fig. 7.15 Output of while example Figure 7.16 shows the example of the do-while loop. <script type=”text/javascript”> i = 0; do { document.write (“The number is “ + i); document.write (“
”); i++; } while (i <= 5);

(Contd)

Web Technologies

200 Fig. 7.16 contd...

We have seen an example of the do-while loop

Fig. 7.16 Example of do-while loop Figure 7.17 shows the output of the do-while example.

Fig. 7.17 Output of the do-while example Figure 7.18 shows the example of the for loop. <script type=”text/javascript”> for (i = 0; i <= 5; i++) { document.write (“The number is “ + i); document.write (“
”); }

We have seen an example of the for loop

Fig. 7.18

Example of the for loop

JavaScript and AJAX

201 Figure 7.19 shows the output of the for example

Fig. 7.19 Output of the for example Standard objects JavaScript provides several standard objects, such as Array, Boolean, Date, Math, String, etc. We shall quickly review some of them. Figure 7.20 shows the example of the Date object. <script type=”text/javascript”> var d = new Date (); document.write (d.getDate ()); document.write (“.”); document.write (d.getMonth () + 1); document.write (“.”); document.write (d.getFullYear ());

Fig. 7.20 Date object example In the code, we create a new instance of the Date object. From this object, we get the day number, the month number (and increment by one, since it starts with 0), and the four-digit year; all concatenated with each other by using a dot symbol. The output is shown in Fig. 7.21. We can manipulate values of the Date object as well. For example, we can display the current date and time in the full form, change the year value to a value of our choice, and then display the full date and time again. This is shown in Fig. 7.22.

Web Technologies

202

Fig. 7.21

Output of the Date object

<script type=”text/javascript”> var d = new Date (); document.write (d); document.write (“
”); d.setFullYear (“2100”); document.write (d);

Fig. 7.22 Manipulating dates The resulting output is shown in Fig. 7.23.

Fig. 7.23

Output of the date manipulation example

Here is another example related to dates, as shown in Fig. 7.24. Here, we use the Array default object as well.

JavaScript and AJAX

203 <script type = “text/javascript”> var d = new Date (); var weekday = new Array (“Sunday”, “Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”, “Saturday”); document.write (“Today is “ + weekday [d.getDay ()]);

Fig. 7.24

Use of Date and Array objects

The resulting output is shown in Fig. 7.25.

Fig. 7.25 Output of the Date and Array objects The same example is modified further, as shown in Fig. 7.26. <script type=”text/javascript”> var d = new Date (); var weekday = new Array (“Sunday”, “Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”, “Saturday”); var monthname = new Array (“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec”); document.write (weekday [d.getDay ()] + “ “); document.write (monthname [d.getMonth ()] + “ “); document.write (d.getFullYear ());

Fig. 7.26 Another date example The resulting output is shown in Fig. 7.27.

Web Technologies

204

Fig. 7.27 Output of the modified example Table 7.2 shows the most useful date functions.

Table 7.2 Date functions Method Date() getDate() getDay() getMonth() getFullYear() getYear() getHours() getMinutes() getSeconds()

Description Returns a Date object Returns the date of a Date object (from 1–31) Returns the day of a Date object (from 0–6, where 0 = Sunday, 1 = Monday, etc.) Returns the month of a Date object (from 0–11, where 0 = January, 1 = February, etc.) Returns the year of a Date object (four digits) Returns the year of a Date object (from 0–99). Returns the hour of a Date object (from 0–23) Returns the minute of a Date object (from 0–59) Returns the second of a Date object (from 0–59)

Figure 7.28 shows an example of using the Math object. <script type = “text/javascript”> document.write (Math.round (7.80))

Fig. 7.28 Math object example The resulting output is shown in Fig. 7.29.

JavaScript and AJAX

205

Fig. 7.29 Output of using the Math object Table 7.3 lists the important methods of the Math object.

Table 7.3 Math functions Method abs (x) cos (x) exp (x) log (x) max (x, y) min (x, y) pow (x, y) random () round (x) sin (x) sqrt (x) tan (x)

Description Returns the absolute value of x Returns the cosine of x Returns the value of E raised to the power of x Returns the natural log of x Returns the number with the highest value of x and y Returns the number with the lowest value of x and y Returns the value of the number x raised to the power of y Returns a random number between 0 and 1 Rounds x to the nearest integer Returns the sine of x Returns the square root of x Returns the tangent of x

JavaScript provides a few functions for handling strings. These are summarized below. n

n n n n

indexOf (): Finds location of a specified set of characters (i.e., of a sub-string). Starts counting

at 0, returns starting position if found, else returns -1. lastIndexOf (): Similar to the above, but looks for the last occurrence of the sub string. charAt (): Returns a single character inside a string at a specific position. subString (): Returns a sub string inside a string at a specific position. split (): Divides a string into sub strings, based on a delimiter.

We shall discuss a few string processing examples when we study form validations. Figure 7.30 shows a sample of the indexOf () function.

Web Technologies

206

Validate Email Address 421m5f <script type = “text/javascript”> function validateEmailAddress (the_email_address) { var the_at_symbol = the_email_address.indexOf (“@”); var the_dot_symbol = the_email_address.lastIndexOf (“.”); var the_space_symbol = the_email_address.indexOf (“ “); ///////////////////////////////////////////////////// // Now see if the email address is valid ///////////////////////////////////////////////////// if ( (the_at_symbol != -1) && // There must be an @ symbol (the_at_symbol != 0) && // The @ symbol must not be at the first position (the_dot_symbol != -1) && // There must be a . symbol (the_dot_symbol != 0) && // The . symbol must not be at the first position (the_dot_symbol > the_at_symbol + 1) && // Must have something after @ and before. (the_email_address.length > the_dot_symbol + 1) && // Must have something after. (the_space_symbol == -1) // Must not have a space anywhere ) { alert (“Email address seems to be correct.”); return true; } else { alert (“Error!!! Email address seems to be incorrect.”); return false; } }

Please enter your email address below 2h3n60

Email address:

Fig. 7.30 Example of indexOf ()

7.1.4 JavaScript and Form Processing JavaScript has a big role to play in the area of form processing. We know that HTML forms are used for accepting inputs. JavaScript helps in validating these inputs and also to perform some processing on the basis of certain events.

JavaScript and AJAX

207 Figure 7.31 shows a simple example of capturing the event of a button getting clicked. <script type=”text/javascript”> function show_alert() { alert(“Hello World!”) }


Fig. 7.31

Button click example

As we can see, we have a simple button on the screen. On clicking of this button, we are calling a JavaScript function to display an alert box. The resulting output is shown in Fig. 7.32.

Fig. 7.32

(a) Original screen, (b) Result when the button is clicked

Now let us take a look at a more useful example. Here, we accept two numbers from the and display a hyper link where the can click to compute their multiplication. When the does so, we display the resulting multiplication value inside an alert box. The code for this functionality is shown in Fig. 7.33.

Web Technologies

208 Simple Multiplication 2p6e4e <script type=”text/javascript”> function multiply () { var number_one = document.the_form.field_one.value; var number_two = document.the_form.field_two.value; var result = number_one * number_two; alert (number_one + “ times “ + number_two + “ is: “ + result); }


= “the_form”>

“#” onClick = “multiply (); return false;”>Multiply them!

Fig. 7.33

Using JavaScript to multiply two numbers

The resulting output is shown in Fig. 7.34.

Fig. 7.34

Multiplying two numbers

We will now modify the same example to display the resulting multiplication value inside a third text box, instead of displaying it inside an alert box. The code for this purpose is shown in Fig. 7.35.

JavaScript and AJAX

209 A Simple Calculator fdt <script type=”text/javascript”> function multiply () { var number_one = document.the_form.field_one.value; var number_two = document.the_form.field_two.value; var result = number_one * number_two; document.the_form.the_answer.value = result; } Number 1:
Number 2:
The Product:
Multiply them!

Fig. 7.35

Displaying result of multiplication in a separate text box

Fig. 7.36

Displaying result of multiplication in a separate text box

Let us now take an example of using checkboxes. Figure 7.37(a) shows the code, where we display three checkboxes. Depending on the number of selections, the JavaScript just displays the score, asg one mark per selection. The result is shown in Fig. 7.37(b). Note that JavaScript offers short hands for certain syntaxes. For example, every time writing the complete window.document.the_form syntax is quite tedious. However, a solution is available whereby the following two syntaxes are equivalent.

Web Technologies

210

Fig. 7.37

Checkbox example, (b) Output

As we can see, the second syntax is quite handy. Here is another example.


Inside onChange event, implicitly it is the current element, and hence, we can directly say this.value, even without saying age! In the following example (Fig. 7.38), we illustrate the usage of arrays and loops. The functionality achieved is actually the same as what we had achieved in the checkbox example shown earlier. But the code is quite compact here, as we can see.

JavaScript and AJAX

211 Using Arrays 3q6n51 <script type = “text/javascript”> function computeScore () { var index = 0, correct_answers = 0; while (index < 3) { if (window.document.the_form.elements[index].checked == true) { correct_answers++; } index++; } alert (“You have scored “ + correct_answers + “ mark(s)!”); }

An Interesting Quiz 5k2q6s



Select the statements that are true:



“the_form”> = “checkbox” name = “question1”>I stay in Pune
= “checkbox” name = “question2”>I am a Student
= “checkbox” name = “question3”>I enjoy Programming in JavaScript




Fig. 7.38

Usage of arrays and loops

The resulting output is not shown, as we have already had one look at it earlier. Of course, we can also use either the do-while or the for loop, instead. Figure 7.39 shows an example of the for loop. Rainbow! 4u1a1b <script = “text/javascript”> function rainbow () { var rainbow_colours = new Array (“red”, “orange”, “yellow”, “green”, “blue”, “violet”);

(Contd)

Web Technologies

212 Fig. 7.39 contd... var index = 0; for (index = 0; index < rainbow_colours.length; index++) { window.document.bgColor = rainbow_colours [index]; //window.document.writeln (index); } }


Fig. 7.39 Example of the for loop Figure 7.40 shows an example of where we want to validate the contents of an HTML form. Form Validation o566v <script type = “text/javascript”> function checkMandatoryFields () { var error_Message = “”; // Check text box if (window.document.the_form.the_text.value == “”) { error_Message += “Please enter your name.\n”; } // Check scrolling list if (window.document.the_form.state.selectedIndex < 0) { error_Message += “Please select a state.\n”; } // Check radio buttons var radio_Selected = “false”; for (var index = 0; index < window.document.the_form.gender.length; index++) { if (window.document.the_form.gender[index].checked == true) { radio_Selected = “true”; } } if (radio_Selected == “false”) { error_Message += “Please select a gender.\n”; } if (error_Message == “”) {

(Contd)

JavaScript and AJAX

213 Fig. 7.40 contd... return true; } else { error_Message = “Please correct the following errors:\n\n” + error_Message; alert (error_Message); return false; } }

Fig. 7.40 Form validations—Part 1/2

Please provide your details below 4p1450



(Contd)

Web Technologies

214 Fig. 7.40 contd...
Name:
State: <select name = “state” size = “5”>
Gender: Female Male


Fig. 7.40 Form validations—Part 2/2 Figure 7.41 shows a sample of the indexOf () string function. This example attempts to accept an email address from the in an HTML form and then validates it. The particulars of the validation logic are mentioned inside the code comments. So, we will not repeat them here. Validate Email Address 421m5f <script type = “text/javascript”> function validateEmailAddress (the_email_address) { var the_at_symbol = the_email_address.indexOf (“@”); var the_dot_symbol = the_email_address.lastIndexOf (“.”); var the_space_symbol = the_email_address.indexOf (“ “); ///////////////////////////////////////////////////// // Now see if the email address is valid ///////////////////////////////////////////////////// if ( (the_at_symbol != -1) && // There must be an @ symbol (the_at_symbol != 0) && // The @ symbol must not be at the first position (the_dot_symbol != -1) && // There must be a . symbol (the_dot_symbol != 0) && // The . symbol must not be at the first position (the_dot_symbol > the_at_symbol + 1) && // Must have something after @ and before. (the_email_address.length > the_dot_symbol + 1) && // Must have something after. (the_space_symbol == -1) // Must not have a space anywhere ) { alert (“Email address seems to be correct.”); return true; } else { alert (“Error!!! Email address seems to be incorrect.”); return false; } }

(Contd)

JavaScript and AJAX

215 Fig. 7.41 contd...

Please enter your email address below 2h3n60

Email address:


Fig. 7.41 Using the indexOf () string function We can write the same logic using another string function, namely charAt (). The resulting code is shown in Fig. 7.42. Validate Email Address 421m5f charAt Version <script type = “text/javascript”> function validateEmailAddress (the_email_address) { var var var var

the_at_symbol = the_email_address.indexOf (“@”); the_dot_symbol = the_email_address.lastIndexOf (“.”); the_space_symbol = the_email_address.indexOf (“ “); is_invalid = false;

///////////////////////////////////////////////////// // Now see if the email address is valid ///////////////////////////////////////////////////// if ( (the_at_symbol != -1) && // There must be an @ symbol (the_at_symbol != 0) && // The @ symbol must not be at the first position (the_dot_symbol != -1) && // There must be a . symbol (the_dot_symbol != 0) && // The . symbol must not be at the first position (the_dot_symbol > the_at_symbol + 1) && // Must have something after @ and before. (the_email_address.length > the_dot_symbol + 1) && // Must have something after. (the_space_symbol == -1) // Must not have a space anywhere ) { is_invalid = false; // do nothing } else { is_invalid = true; } if (is_invalid == true) { alert (“Error!!! Email address is invalid.”);

(Contd)

Web Technologies

216 Fig. 7.42 contd... return false; } ///////////////////////////////////////////////////// // Now check for the presence of illegal characters ///////////////////////////////////////////////////// var the_invalid_characters = “!#$%^&*()+=:;?/<>”; var the_char = “”;

Fig. 7.42 Using the charAt () function—Part 1 for (var index = 0; index < the_invalid_characters.length; index++) { the_char = the_invalid_characters.charAt (index); if (the_email_address.indexOf (the_char) != -1) { is_invalid = true; } } if (is_invalid == true) { alert (“Error!!! Email address is invalid.”); return false; } else { alert (“Email address seems to be valid.”); return true; } }

Please enter your email address below 2h3n60

Email address:


Fig. 7.42 Using the charAt () function—Part 2 Another string function, substring () is a bit tricky. The general syntax for this function is substring (from, until). This means return a string starting with from and ending with one character less than until. That is, until is at a position that is greater than the last position of the substring by one. As a result, some of the tricky examples shown in Table 7.4 need to be observed carefully.

JavaScript and AJAX

217

Table 7.4

Examples of the substring ( ) function Example

Result

the_word.substring (0, 4)

“Java”

the_word.substring (1, 4)

“ava”

the_word.substring (1, 2)

“a”

the_word.substring (2, 2)

“”

Explanation from = 0, until = 4–1 = 3. So, returns characters at positions 0, 1, 2, and 3. from = 1, until = 4–1 = 3. So, returns characters at positions 1, 2, and 3. from = 1, until = 2–1 = 1. So, returns character at position 1 only. from = 2, until = 2–1 = 1. So, returns an empty string.

7.2 AJAX 7.2.1 Introduction The term AJAX is used quite extensively in Information Technology these days. Everyone seems to want to make use of AJAX, but a few may not know where exactly it fits in, and what it can do. In a nutshell: AJAX can be used for making experience better by using clever techniques for communication between a Web browser (the client) and the Web server. How can AJAX do this? Let us understand this at a conceptual level. In traditional Web programming, we have programs that execute either on the client (e.g., written using JavaScript) or on the server (e.g., written using Java’s Servlets/JSP, Microsoft’s ASP.NET, or other technologies, such as PHP, Struts, etc.). This is shown in Fig. 7.43.

Fig. 7.43 Technologies and their location What do these programs do? They perform a variety of tasks. For example: n n n

Validate that the amount that the has entered on the screen is not over 10,000 Ensure that ’s age is numeric and is over 18 If city is entered as Pune, then country must be India

Mind you, these are simple examples of validating inputs. They are best done on the client-side itself, using JavaScript. However, all tasks are not validations of these kinds alone. For example:

Web Technologies

218 n

n

n

From the source number specified by the , transfer the amount mentioned by the into the target number specified by the Produce a report of all transactions that have failed in the last four hours with an appropriate reason code Due to 1% increase in the interest rates, increase the EMI amount for all the customers who have availed floating loans

These are examples of business processes. These are best run on the server-side, using the technologies listed earlier. We can summarize as follows. Client-side technologies, such as JavaScript, are used for validating inputs. Server-side technologies, such as Java Servlets, JSP, ASP.NET, PHP, etc., are used for ensuring that business processes happen as expected. Sometimes, we run into situations where we need a mixture of the two. For instance, suppose that there is a text box on the screen, where the needs to type the city name. As the starts typing the city name, we want to automatically populate a list of all city names that match what the has started typing. (For example, when the types M, then we want to show Madrid, Manila, Mumbai, and so on). The may select one of these, or may type the next character, say a. If the has typed the second character as a, the ’s input would now have become Ma. Now, we want to show only Madrid and Manila, but not Mumbai (which has the first two characters as Mu). We may perhaps even show a warning to the , in case the is typing the name of a city, which does not exist at all! The best example of this is Google Suggest (http://www.google.com/webhp?complete=1&hl=en). You can visit this URL and try what we have shown below. Suppose that we are trying to search for the word iflex. In the search window, type i. We would get a list of all the matching entries, starting with i from Google’s database, as shown in Fig. 7.44. Now add a hyphen to get the following screen. As we can see, the list is now filtered for entries starting with i-. The result is shown in Fig. 7.45. Now add an f to make it i-f. This is shown in Fig. 7.46. We get what we want! This process has used AJAX. We can use AJAX in similar situations, where we want to capture the matter the is typing or has typed, and process it while the continues to do whatever she is doing. Of course, this is just one of the uses of AJAX. It can be used in any situation, where we want the client to send a request to the server for taking some action, without the having to abandon her current task. Thus, AJAX helps us to do something behind the scenes, without impacting the ’s work. AJAX stand for Asynchronous JavaScript And XML, as explained below. n

n n

Asynchronous because it does not disturb the ’s work, and does not refresh the full screen (unlike what happens when the submits a form to the server, for example). JavaScript because it uses JavaScript for the actual work. And XML because XML is supposed to be everywhere today (using AJAX, the server can return XML to the browser).

JavaScript and AJAX

219

Fig. 7.44

Google Suggest—1

Fig. 7.45 Google Suggest—2

Web Technologies

220

Fig. 7.46 Google Suggest—3

7.2.2 How does AJAX Work? AJAX uses the following techniques, described in a generic fashion. Whenever AJAX needs to come into the picture, based on the action (e.g., when something is typed), it sends a request from the Web browser to the Web server. On the Web server, a program written in a server-side technology (any one from those listed earlier) receives this request from the Web browser, sent by AJAX. The program on the Web server processes this request, and sends a response back to the Web browser. Note that while this happens, the does not have to wait—actually, the does not even notice that the Web browser has sent a request to the Web server! The Web browser processes the response received from the Web server, and takes an appropriate action (e.g., in Google Suggest, the browser would show us a list of all the matching entries for the text typed so far, which was sent by the Google server to the browser in step 3 above). This concept is shown in Fig. 7.47. Let us understand how this works. 1. While the (client) is filling up an HTML form, based on a certain event, JavaScript in the client’s browser prepares and sends an AJAX request (usually called as an XMLHttpRequest) to the Web server. 2. While the continues working as if nothing has happened (shown with two processing arrows at the bottom part of the diagram), the Web server invokes the appropriate server-side code (e.g., a JSP/Servlet, an ASP.NET page, a PHP script, as we shall learn later).

JavaScript and AJAX

221

Fig. 7.47 The AJAX process 3. The server-side code prepares an AJAX response and hands it over to the Web server. 4. While the continues working with the remainder of the HTML form, the server sends the AJAX response back to the browser. The browser automatically reflects the result of the AJAX response (e.g., populate a field on the HTML form). Note that the would not even notice that steps 1 to 3 have happened behind the scene! Therefore, we can differentiate between non-AJAX based processing and AJAX based processing as shown in Fig. 7.48 and Fig. 7.49.

Fig. 7.48 Traditional HTTP processing (without AJAX)

Fig. 7.49

AJAX based processing

Web Technologies

222

7.2.3 AJAX FAQ In the beginning, people have a lot of questions regarding AJAX. We summarize them along with their answers below. 1. Do we not use the request/response model in AJAX? n

We do, but the approach is different now. We do not submit a form now, but instead send requests using JavaScript.

2. Why not submit the form? Why do we prefer to use AJAX? n

AJAX processing is asynchronous. Client does not wait for server to respond. When server responds, JavaScript does not refresh the whole page.

3. How does a page get back a response, then? n

When the server sends a response, JavaScript can update a page with new values, change an image, or transfer control to a new page. The does not have to wait while this happens.

4. Should we use AJAX for all our requests? n

No. Traditional form filling is still required in many situations. But for immediate and intermediate responses/s, we should use AJAX.

5. Where is the XML in AJAX? n

Sometimes the JavaScript can use XML to speak with the server back and forth.

7.2.4 Life without AJAX Suppose that we have a book shop, where we want to constantly view the amount of profit we have made. For this purpose, an application sends us the latest number of copies sold, as on that date. We multiply that with the profit per copy, and compute the total profit made. We shall get into coding details subsequently. The conceptual view of this is shown in Fig. 7.50.

Fig. 7.50

AJAX case study—1

JavaScript and AJAX

223 The way this executes is shown step by step below.

Step 1 clicks on the button shown in the HTML form. As a result, the request would go to the Web server. This is shown in Fig. 7.51.

Fig. 7.51 AJAX case study—2 Step 2 The server-side program (may be a JSP) processes the ’s request, and sends back an HTTP response to the . This response refreshes or reloads the screen completely. This is shown in Fig. 7.52.

Fig. 7.52 AJAX case study—3

Web Technologies

224 At this stage, let us reinforce our AJAX ideas. AJAX has Ability to fetch data from the server without having to refresh a page.

Applications without AJAX n n

Normal Web applications communicate with the server by referring to a new URL Example: When a form is submitted, it is processed by a server-side program, which gets invoked

AJAX applications n

n

Use an object called as XMLHttpRequest object built into the browser, using JavaScript to communicate with the server HTML form is not needed to communicate with the server

What is this XMLHttpRequest object all about? It is an alternative for HTML forms. It is used to communicate with the server side code, from inside a browser. The server side code now returns text or XML data, not the complete HTML Web page. The programmer has to extract data received from the server via the XMLHttpRequest object, according to the need.

7.2.5 AJAX Coding Figure 7.53 outlines the way we can write code for AJAX-based applications.

Fig. 7.53 AJAX processing steps Let us now discuss these steps in detail.

(1) Create the XMLHttpRequest object Two main browsers are required to be handled: Internet Explorer and Others.

JavaScript and AJAX

225 Code for non Internet Explorer browsers var XMLHttpRequestObject = false; if (window.XMLHttpRequest) { // Non-IE browser XMLHttpRequestObject = new XMLHttpRequest (); }

Code for Internet Explorer else if (window.ActiveXObject) { // IE browser XMLHttpRequestObject = new ActiveXObject (“Microsoft.XMLHTTP”); }

We can write a complete HTML page to ensure that our browser was able to successfully create the XMLHttpRequest object, as shown in Fig. 7.54. AJAX Example 6m3h1q <script language = “javascript”> var XMLHttpRequestObject = false; if (window.XMLHttpRequest) { XMLHttpRequestObject = new XMLHttpRequest (); } else if (window.ActiveXObject) { XMLHttpRequestObject = new ActiveXObject (“Microsoft.XMLHTTP”); } if (XMLHttpRequestObject) { document.write (“

Welcome to AJAX 15p3l

”); }


Fig. 7.54

Checking for the presence of the XMLHttpRequest object

This code does not do anything meaningful, except for checking that the browser is AJAX enabled. Of course, by this we simply mean that the browser is able to create and deal with the XMLHttpRequest object, as needed by the AJAX technology. If it is able to do so (which is what should happen for all modern browsers), we will see the output as shown in Fig. 7.55.

(2) Tell the XMLHttpRequest object as to where to send the request We need to open the XMLHttpRequest object now by calling its open method. It expects two parameters, the type of the method (GET/POST), and the URL where the asynchronous AJAX request is to be sent. An example is shown below. XMLHttpRequestObject.open (“GET”, “test.dat”);

Web Technologies

226 Here, we are saying that we want to send a GET request to fetch a file named test.dat.

Fig. 7.55 Output of the earlier HTML page (3) Tell the XMLHttpRequest object what to do when the request is answered We can data from the server using the XMLHttpRequest object. This process happens behind the scenes, i.e., in an asynchronous manner. When data comes from the server, the following two things happen. (i) The readyState property of the HTTPRequestObject changes to one of the following possible values: 0 – Uninitialized, 1 – Loading, 2 – Loaded, 3 – Interactive, 4 – Complete (ii) The status property holds the results of the HTTP 200 – OK, 404 – Not found, etc Thus, we can check this status as follows. if

((XMLHttpRequestObject.readyState == 4) && (XMLHttpRequestObject.status == 200)) { … }

(4) Tell the XMLHttpRequest object make a request In this step, we the data received from the server and use it in our application, as desired.

7.2.6 Life with AJAX Let us now continue our earlier example to understand how AJAX enabling makes it so much more effective. Our code would have the following JavaScript functions: n n n

getBooksSold ()—This function would create a new object to talk to the server. updatePage ()—This function would ask the server for the latest book sales figures. createRequest ()—This function would set the number of books sold and profit made.

Let us write the HTML part first. The code is shown in Fig. 7.56.

JavaScript and AJAX

227 Sales Report 4e2d1l

Sales Report for our Books 1q5d40

Books Sold <span id=”books-sold”>555
Sell Price Rs. <span id=”price”>300
Buying Cost Rs. <span id=”cost”>250

Profit Made: Rs. <span id=”cash”>27750 6n2d5b



Fig. 7.56

HTML code for AJAX-enabled page—Initial version

The resulting screen is shown in Fig. 7.57.

Fig. 7.57

Result of our HTML code

Web Technologies

228 We now want to add JavaScript so that at the click of the button, the function getBooksSold () will get called. This is shown in Fig. 7.58.

Fig. 7.58

Adding JavaScript

The getBooksSold () function What should the getBooksSold () function do? We can summarize: n n n n

Create a new request by calling the createRequest () function Specify the URL to receive the updates from Set up the request object to make a connection Request an updated number of books sold

Here is the outline of the JavaScript code so far. <script language=“javascript” type=“text/javascript”> function createRequest () // JavaScript code function getBooksSold () { createRequest (); }

Now, let us think about the contents of the createRequest () function. The createRequest () function This function would simply create an instance of the XMLHttpRequest object, as per the browser type: function createRequest () { if (window.XMLHttpRequest) { XMLHttpRequestObject = new XMLHttpRequest (); }

JavaScript and AJAX

229 else if (window.ActiveXObject) { XMLHttpRequestObject = new ActiveXObject (“Microsoft.XMLHTTP”); } }

Now let us modify the getBooksSold () function suitably, as follows: function getBooksSold () { createRequest (); var url = “getUpdatedBookSales.jsp”; XMLHttpRequestObject.open (“GET”, url); } …

This would call getUpdatedBookSales.jsp . We want to process the response sent by this JSP now. function getBooksSold () { createRequest (); var url = “getUpdatedBookSales.jsp”; XMLHttpRequestObject.open (“GET”, url); XMLHttpRequestObject.onreadystatechange = updatePage; XMLHttpRequestObject.send (null); … }

Here, updatePage () is a function that will get called when the JSP on the server side has responded to our XMLHttpRequest. What should this function have? Let us see. First, it should receive the value sent by the JSP. function updatePage () { var newTotal = XMLHttpRequestObject.responseText;

Note that normally, the server-side JSP would have returned a full HTML page. But now the JSP is dealing with an AJAX request (i.e., XMLHttpRequest object). Hence, the JSP does not send a full HTML page. Instead, it simply returns a number in which the updatePage () function is interested. This number is stored inside a JavaScript variable called as newTotal. Now, we want to also read the current values of the HTML form variables books-sold and cash. Hence, we amend the above function further. function updatePage () { var newTotal = XMLHttpRequestObject.responseText; var booksSoldElement = document.getElementById (“books-sold”); var cashElement = document.getElementById (“cash”);

Now, we want to replace the current value of the books sold element with the on received from the server now. Hence, we add one more line to the code. function updatePage () { var newTotal = XMLHttpRequestObject.responseText;

Web Technologies

230 var booksSoldElement = document.getElementById (“books-sold”); var cashElement = document.getElementById (“cash”); replaceText (booksSoldElement, newTotal); }

This would refresh only the tag of interest, which is the booksSoldElement, which, in turn, means the books-sold HTML form variable. What should the JSP do? It is expected to simply return the latest number of books sold at this juncture. Hence, it has a single line: out.print (300);

SUMMARY l l

l

l l

l

l

JavaScript adds dynamic content to Web pages on the client side. A JavaScript program is a small program that is sent by the Web server to the browser along with the standard HTML content. The JavaScript program executes in the boundaries of the Web browser, and performs functions such as client-side validations, responding to inputs, performing basic checks, and so on. JavaScript does not perform any operations on the server, but is clearly a client-side technology. JavaScript is a full-fledged programming language in its own right. It allows us to use operators, functions, loops, conditions, and so on. Ajax allows us to invoke server-side code from the client, but without submitting an HTML form, contrary to what happens in the normal processing. Instead, when we use Ajax, the requests are sent from the browser to the server in an asynchronous fashion. This means that the client-side can continue doing what she is doing while the request is sent to the server and the response is returned by the server. This allows for writing very creative code for a number of situations. For example, while the is entering data, we can perform on the fly server-side validations, provide online help, and so on, which was not possible with the earlier client-only or server-only programming models.

REVIEW QUESTIONS Multiple-choice Questions 1. JavaScript is language. (a) interpreted (c) interpreted and compiled 2. JavaScript is contained inside the (a) ... (c) ... 3. All functions should be defined in the (a) (b) <script>

(b) compiled (d) none of the above tags. (b) <script>... (d) ...
section. (c) (d)

JavaScript and AJAX

231 4. The function returns the month of a Date object. (a) getHours() (b) getMonth() (c) getDay() (d) getMinutes() 5. The function Returns a random number between 0 and 1. (a) pow (x, y) (b) random () (c) round (x) (d) sin (x) 6. The function Returns the second of a Date object. (a) getSeconds() (b) getMonth() (c) getDay() (d) getMinutes() 7. Client-side technology, such as is used for validating inputs. (a) JavaScript (b) ASP.NET (c) PHP (d) JSP 8. The function used to create an instance of the XMLHttpRequest object, as per the browser type: (a) createRequest () (b) getRequest () (c) putRequest () (d) Request() 9. AJAX application uses object built into the browser, using JavaScript to communicate with the server. (b) XMLHttpRequest (a) XMLFtpRequest (c) HttpRequest (d) HttpResponse 10. The function that will get called when the JSP on the server side has responds to XMLHttpRequest. (a) updateRequest() (b) updatePage() (c) HttpRequest (d) HttpResponse

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Explain how to call JavaScript from an HTML page. What are the various kinds of functions in JavaScript? Can JavaScript be in a separate file? Give details. What are the key usages of JavaScript? How are HTML form and JavaScript related? What is the purpose of Ajax? Differentiate between synchronous and asynchronous processing. Why is Ajax different from traditional request-response model? How do we refer to some server-side script/program from an Ajax-enabled page? Explain the XMLHttpRequest object.

Exercises 1. Write an HTML page and also provide a JavaScript for accepting a ID and from the to ensure that the input is not empty. 2. In the above page, stop the if the attempts to tab out of the ID or fields without entering anything. 3. Make the same HTML page now Ajax-enabled, so that the server-side code can check if the id and are correct (by comparing it with corresponding database fields). [Hint: We would need to make use of ASP.NET or JSP/Servlets for this]. 4. Find out how XML and Ajax are related. 5. Explore any possible sites that make use of Ajax.

Web Technologies

232

ASP.NET—An Overview

+D=FJAH

8

INTRODUCTION Web technologies have evolved at a breathtaking pace since the development of the Internet. So many technologies have come and gone, and yet, so many of them have successfully stayed on as well! In this chapter, we attempt to understand how all of them work, and how they fit in the overall scheme of things. At the outset, Web technologies classify Web pages into three categories, as shown in Fig. 8.1.

Fig. 8.1

Types of Web pages

Let us understand all the three types of Web pages in brief.

Static Web Pages A Web page is static, if it does not change its behaviour in response to external actions. The name actually says it all. A static Web page remains the same, i.e., static, for all its life, unless and until someone manually changes its contents. Any time any in the world sends an HTTP request to a Web server, the Web server returns the same contents to the via an HTTP response. Such a Web page is static. Examples of static Web pages are some home pages, pages specifying the details, etc., which do not change that often. The process of retrieving a static Web page is illustrated in Fig. 8.2. As we can see, when the client (Web browser) sends an HTTP request for retrieving a Web page, the client sends this request to the Web server. The Web server locates the Web page (i.e., a file on the disk with a .html or .htm extension) and sends it back to the inside the HTTP response. In other words, the server’s job in this case is simply to locate a file on the

ASP.NET—An Overview

233 disk and send its contents back to the browser. The server does not perform any extra processing. This makes the Web page processing static.

Fig. 8.2

Static Web page

Static Web pages can mainly contain HTML, JavaScript, and CSS. We have discussed all these technologies earlier. Therefore, we would not repeat this discussion here. All we would say is that using these technologies, or even with plain HTML, we can create static Web pages.

Dynamic Web Pages A Web page is dynamic, if it changes its behaviour (i.e., the output) in response to external actions. In other words, in response to a ’s HTTP request, if the Web page possibly produces different output every time, it is a dynamic Web page. Of course, the output need not always be different, but usually it is. For example, if we ask for current foreign exchange rate between US dollar and Indian rupee, a dynamic Web page would show the latest rate (and hence, it is dynamic). However, if we immediately refresh the page, the rate may not have changed in a second’s time, and hence, the output may not change. In that sense, a dynamic Web page may not always produce different output. In general, we should the following. n

n

A static Web page is a page that contains HTML and possibly JavaScript and CSS, and is pre-created and stored on the Web server. When a sends an HTTP request to fetch this page, the server simply sends it back. A dynamic Web page, on the other hand, is not pre-created. Instead, it is prepared on the fly. Whenever a sends an HTTP request for a dynamic Web page, the server looks at the name of the dynamic Web page, which is actually a program. The server executes the program locally. The program produces output at run time—on the fly—which is again in the HTML format. It may also contain JavaScript and CSS like a static Web page. This HTML (and possibly JavaScript and CSS) output is sent back to the browser as a part of the HTTP response.

The concept of dynamic Web pages is illustrated in Fig. 8.3.

Web Technologies

234

Fig. 8.3

Dynamic Web page

Thus, we can summarize: A static Web page is pre-created in the HTML and associated languages/technologies and stored on the Web server. Whenever a sends a request for this page, the server simply returns it. On the other hand, a dynamic Web page is actually a program, which produces HTML and associated output and sends it back to the .

Active Web Pages There is yet another category of Web pages as well, called as active Web pages. A Web page is active, if it executes a program (and here we are not talking about client-side JavaScript) on the client, i.e., Web browser. In other words, if a static or dynamic Web page not only sends HTML, JavaScript, and CSS to the browser, but in addition a program; the Web page is active. that we are talking about a program getting executed on the client, and not on the server, here. Now, what can that program be? It can be a Java applet, or an ActiveX control. We shall discuss some of these details later. The concept is shown in Fig. 8.4.

Fig. 8.4 Active Web page

ASP.NET—An Overview

235

8.1 POPULAR WEB TECHNOLOGIES Web technologies involve the concept of a tier. A tier is nothing but a layer in an application. In the simplest form, the Internet is a two-tier application. Here, the two tiers are the Web browser and the Web server. The technologies that exist in these tiers are as follows.

Client tier HTML, JavaScript, CSS Server tier Common Gateway Interface (CGI), Java Servlets, Java Server Pages (JSP), Apache Struts, Microsoft’s ASP.NET, PHP, etc. Clearly, if the Web pages are static, we do not need any specific technologies on the server tier. We simply need a server computer that can host and send back files to the client computer as and when required. However, for dynamic Web pages, we do need these technologies. In other words, we write our programs on the server tier in one of these technologies. We would now review a few major server-side technologies. At the outset, let us classify the available set of technologies into various categories, as shown in Fig. 8.5.

Fig. 8.5 Classification of server-side Web technologies Let us now discuss these technologies in reasonable detail in the following sections.

8.2 WHAT IS ASP.NET? Microsoft’s ASP.NET is a wonderful technology to rapidly develop dynamic Web pages. It eases development with features that were earlier unheard of. The basic idea of this technology is quite straightforward. 1. The fills up an HTML form, which causes an HTTP request to be sent to the ASP.NET Web server. Of course, this request need not necessarily go via an HTML form. It can also be sent without a form. The ASP.NET server is called as Internet Information Server (IIS). 2. The IIS Web server runs a program in response to the ’s HTTP request. This program is written to adhere to a specification, which we call as ASP.NET. The actual programming language is usually C# (pronounced C sharp) or VB.NET. We shall discuss this shortly. This program performs the necessary operations, based on the ’s inputs and selection of options, etc.

Web Technologies

236 3. This program prepares and sends the desired output back to the inside an HTTP response. At this stage, let us understand the fact that ASP.NET is a specification. When we say that ASP.NET is a specification, we simply mean that Microsoft has said that if an HTTP request is sent by the to the Web server, a certain number of things should happen (for example, the Web server should be able to read values entered by the in the HTML form in a certain manner; or that the database access should be possible in a certain way, and so on). A language such as C# or VB.NET would implement these specifications in the specific language syntax. Figure 8.6 shows an example.

Fig. 8.6

ASP.NET concept

ASP.NET provides a number of features for dealing with requests, working with server-side features, and sending responses back to the . It also reduces a lot of coding effort by providing drag-and-drop features. To work with ASP.NET, ideally we need software called as Microsoft Visual Web Developer 2008. However, as a wonderful gesture, Microsoft has provided a completely free able version of this development environment (called as the Express Edition). This free edition can be ed from the Microsoft site and used for developing ASP.NET pages.

8.3 AN OVERVIEW OF THE .NET FRAMEWORK .NET is a development platform. Some people allege that Microsoft is trying to sell old wine (read technology) in a new bottle (read development platform). However, it is difficult to agree with this theory. .NET is very powerful and rich in features. Figure 8.7 shows the make-up of the .NET framework.

ASP.NET—An Overview

237

Fig. 8.7 Overview of the .NET framework Let us understand the key aspects of the .NET framework.

Programming languages layer The highest level in the .NET framework is the programming languages layer. The .NET framework s many languages, including such languages as PERL, which were not heard of in the Microsoft world earlier. However, the prime languages are C# and VB.NET. This is the layer where the application programmer has the most interaction. In other words, a programmer can write programs in the C# or VB.NET (or other ed) programming languages, which execute on the .NET platform.

Common Language Specifications (CLS) At this layer, the differences between all the .NET programming languages are addressed. The Common Language Specifications (CLS) is the common thread between all the varying .NET programming languages. In other words, regardless of what is the programming language that the developer is using, the CLS makes the whole thing uniform. We can think of CLS as a neutral run-time format/specification, which transforms all the source code into this neutral format/specification. This neutral format is a language called as Microsoft Intermediate Language (MSIL). Thus, all .NET programs get compiled into MSIL, and MSIL operates under the umbrella of CLS. the idea about ASP.NET being a specification, and C# being a language that implements those specifications? Here is a similar idea. CLS is a specification, and the programming languages adhere to and implement those specifications. This raises some really innovative possibilities: n

n

Run-time differences between programming languages go away. A class written in C# can extend another class that is written in VB.NET! , as long as both speak in CLS at run time, the source languages do not matter. All languages have similar run-time performance. The notion of C++ being faster than Visual Basic in the earlier days does not any longer hold true.

Web Services and GUI applications The concept of Web Services would be discussed later in this book. However, for now we shall simply say that a Web Service is a program-to-program communication using XML-based standards. GUI applications are any traditional client-server or desktop applications that we want to build using the .NET framework.

Web Technologies

238

XML and ADO.NET At this layer, the data representation and storage technologies come into picture.XML, as we shall discuss separately, is the preferred choice for data representation and exchange in today’s world. ADO.NET, on the other hand, is the database management part of the .NET framework. ADO.Net provides various features using which we can persist our application’s data into database tables.

Base class library The base class library is the set of pre-created classes, interfaces, and other infrastructure that are reusable. For example, there are classes and methods to receive inputs from the screen, send output to the screen, perform disk I/O, perform database operations, create various types of data structures, perform arithmetic and logical operations, etc. All our application programs can make use of functionalities of the base class library.

Common Language Runtime (CLR) The Common Language Runtime (CLR) is the heart of the .NET framework. We can roughly equate the CLR in .NET with the Java Virtual Machine (JVM) in the Java technology. To understand the concept better, let us first take a look at Fig. 8.8.

Fig. 8.8

CLR concept

As we can see, programs written in the source language are translated by the appropriate language compilers into a universal Microsoft Intermediate Language (MSIL). The MSIL is like the Java byte code, or an intermediate language like the Assembly language. That is, it is neither a High Level Language, nor a Low Level Language. Instead, the MSIL is the language that the CLR understands. Thus, the CLR receives a program in MSIL as the input, and executes it step-by-step. In that sense, the CLR is basically a language interpreter, the language that is interpreted in this case being MSIL. This also tells us that the .NET framework embeds the various language compilers (e.g., a C# compiler, a VB.NET compiler, and so on). Also, the CLS specifies what should happen, and the CLR enforces it at run time. The CLR performs many tasks, such as creating variables at run time, performing garbage collection (i.e., automatically removing variables no longer in use from the computer’s memory), and ensuring that no unwanted behaviour (e.g., security breaches) is exhibited by the executing program.

ASP.NET—An Overview

239

8.4 ASP.NET DETAILS Before we discuss more on ASP.NET, we would like to do a quick comparison between ASP.NET and its predecessor: ASP. ASP.NET provides several advantages over ASP, major ones of which can be summarized as shown in Table 8.1.

Table 8.1

ASP versus ASP.NET

Point of discussion Coding style

Deployment and Configuration

Application structuring

ASP ASP relied on scripting languages such as JavaScript and VBScript. These languages are quick to learn and use. However, they are also languages that are easier to debug, do not provide extensive programming for good error handling, and in general, are not elegant unlike traditional programming languages, such as Java and C#. Deploying and configuring ASP applications was a big headache, since it needed multiple settings in IIS, working with the complex technology of Component Object Model (COM), etc. ASP applications have intermixed HTML and JavaScript code. It is often difficult to read, maintain, and debug.

ASP.NET ASP.NET uses full-fledged programming languages such as C# and VB.NET.

Deploying ASP.NET applications is very easy, with no complicated installations needed.

In ASP.NET, we can keep the HTML code and the programming code (written in C# or VB.NET) separate. This makes the whole application easy to maintain, understand, and debug.

How does an ASP.NET program look like? Figure 8.9 shows an example. The first page (a.aspx) shows an HTML form, which has a text box. The is expected to type her name in that text box. Once the enters her name and clicks on the button in the HTML form, the HTTP request goes to the server. This request is expected to be sent to another ASP.NET program, called as a1.aspx. This ASP.NET program (a1.aspx) reads the contents of the textbox sent by the HTTP request, and displays the value of the text box back to the . If we run this application, a.aspx displays a screen as shown in Fig. 8.10. If I type my name and click on the button, the browser sends an HTTP request to a1.aspx, ing my name. As a result, the screen shown in Fig. 8.10 appears.

Web Technologies

240 <%@ Page Language=”C#” %>



a.aspx



<%@ Page Language=”C#” %> Hi in a1.aspx <% String a; a = Request.QueryString [“aa”]; Response.Write(a); %>


Fig. 8.9

Fig. 8.10

Simple ASP.NET example

Output of the ASP.NET page—Part 1

a1.aspx

ASP.NET—An Overview

241

Fig. 8.10

Output of the ASP.NET page—Part 2

How does the magic happen? We can see in the URL bar the following string. http://localhost:2483/WebSite1/a1.aspx?aa=Atul

It means that the browser is asking the server to execute a1.aspx whenever the browser’s request is to be processed. In addition, the browser is telling the server that a variable named aa, whose value is Atul, is also being ed from the browser to the a1.aspx program. If we look at the code of a1.aspx again, we shall notice the following lines. <% String a; a = Request.QueryString[“aa”]; Response.Write(a); %>

Let us understand this line-by-line. <%

The <% symbol indicates that some C# code is starting now. This is how we can distinguish between HTML code and C# code inside an ASP.NET page. String a;

This line declares a string variable in our C# program with the name a. a = Request.QueryString [“aa”];

This line reads the value of the text box named aa from the HTML screen, and populates that value into the C# variable a, which was declared earlier. Response.Write(a);

This statement now simply writes back the same value that the had initially entered in the HTML form. Response is an object, which is used to send the HTTP response back to the , corresponding to the ’s original HTTP request. %>

Web Technologies

242 This line concludes our C# code part. In general, there are two ways in which we can develop ASP.NET pages:

Single-page model In this approach, we write the HTML code and the corresponding programming language code (in say C# or VB.NET) in a single file with an extension of .aspx. This approach is similar to the traditional manner of the older ASP days. This is useful in the case of smaller projects, or for study/ experimentation purposes.

Code-behind page model In this approach, all the HTML part is inside one .aspx file, and the actual functionality resides in various individual files. For example, if the application code is written in the C# programming language, then we will have as many .cs files as needed, one per class written in C#. This is more practical in real-life situations.

8.5 SERVER CONTROLS AND WEB CONTROLS ASP.NET provides rich features for creating HTML forms and for performing data validations. For this purpose, it provides modified versions of the basic HTML form controls, such as text boxes, radio buttons, drop down lists, submit buttons, and so on. In a nutshell, when using ASP.NET, we have three basic choices for creating an HTML form, as illustrated in Fig. 8.11.

Fig. 8.11

Types of HTML controls

Table 8.2 distinguishes between the three types of controls.

Table 8.2

Classification of ASP.NET HTML controls

Type of control HTML controls

Description These are the traditional HTML controls. We can use them in ASP.NET in exactly the same way as we can use them in HTML, or any other server-side technology. There is nothing new here.

Example

(Contd)

ASP.NET—An Overview

243 Table 8.2 contd...

Type of control HTML Server Controls

Web server controls

Description We can add an attributed titled runat = “server” to the above HTML controls to make them HTML server controls. This allows us to create HTML controls/tags that are understood by the Web server. This has certain implications, as we shall study shortly. This feature is not in the plain HTML syntax, but has been added by ASP.NET. This is a completely new way of adding HTML controls/tags to an HTML form. By using these types of controls, we can make our HTML page very interactive, and can provide a very rich interface to the of the application. We shall discuss this shortly.

Example



Let us now understand these types of controls in more detail.

HTML controls HTML controls are traditional, standard HTML-based controls, as shown in Fig. 8.12. As mentioned earlier, there is nothing unique here. We can use these types of controls in plain HTML or in other Web technologies as well. As much as possible, these controls are discouraged in ASP.NET, since the usage of these controls deprives the programmer from the real power of ASP.NET form processing and validations. Server Control and HTML Control Example 6d3x44
Visit Google!


Fig. 8.12

Simple HTML page

As we can see, this is a straightforward HTML form, which specifies an anchor tag that leads us to the URL of Google. There is nothing new or unique about this code. We will not discuss these controls any further.

HTML server controls These controls are very similar in syntax to the standard, traditional HTML controls, with one difference. As mentioned earlier, we add the runat = “server” attribute to traditional HTML controls to make them HTML server controls. Figure 8.13 shows the difference.

Web Technologies

244

Fig. 8.13

HTML controls and HTML server controls

As we can see, HTML server controls are special HTML tags. These are processed by the Web server in a manner somewhat to the way HTML tags are interpreted by the Web browser. We can know that an HTML tag is an HTML server tag because it contains a runat=”server” attribute. This attributes helps the server in differentiating between standard HTML controls and HTML server controls. Once the Web server sees an HTML server control, it creates an in-memory object, corresponding to the HTML server control. This server-side object can have properties, methods, and can expose or raise server– side events while processing the ASP.NET page. Once the processing is completed, the control produces its resulting output in the of HTML form. It is then sent to the browser as part of the resulting HTML page for actual display. The server controls help us simplify the process of dealing with the properties and attributes of the various HTML tags. They also allow us to hide the logic affecting the tags from the tags themselves, thus helping us to write a cleaner code. Figure 8.14 shows an example of HTML server control. We have modified our earlier example of the simple HTML control to convert it into an HTML server control. Server Control and HTML Control Example 6d3x44 <script language=”c#” runat=”server”> void page_load() { link1.HRef = “http://www.google.com”; }
Visit Google!


Fig. 8.14 HTML server control

ASP.NET—An Overview

245 As we can see, an HTML control is specified for the anchor tag, to create a hyper link. However, the actual hyper link is not specified in the anchor tag. Instead, it is added by the page_load () method. The page_load () method is actually an event, that gets called whenever the Web page loads in the Web browser. However, and this is the point, this is not client-side JavaScript code. Instead, it is C# code that executes on the Web server, not on the client. This can really confuse us at the beginning. However, we should that when we use HTML server controls, we ask ASP.NET to automatically execute the server-side code as if it is running on the client-side. That is, we write code using a syntax that makes it look like server-side code, but it actually executes on the client. Therefore, in this case, the page_load () event causes the Web page to be loaded on the HTML client (i.e., the browser), and yet executes code that is written in a server-side manner. To take this point further, Fig. 8.15 shows what the gets to see, if she does a View-Source. Server Control and HTML Control Example 6d3x44
Visit Google!


Fig. 8.15 Result of doing view source As we can see, there is no trace of any server-side code here. The does not even know that a method called as page_load () has got executed. Thus, HTML server controls hide the complexity from the , and yet perform the necessary functions as if the code is on the client. However, we shall notice that the effect of adding the hyperlink via the page_load () method can be seen in the end result. The href tag is indeed added to the resulting Web page. We will also notice that the source code has a hidden variable with strange contents for name, id, and value. This hidden variable is what ASP.NET internally uses to make our traditional HTML control an HTML server control. How it works and what are its contents is none of our business. It is managed internally by ASP.NET, and we must not make any attempts to directly access/manipulate it. If, on the other hand, what if we had not coded the anchor tag as an ordinary HTML control (and not as an HTML server control)? Let us see the modified code, as shown in Fig. 8.16. Notice that we do not have a runat =”server” attribute in the anchor tag anymore. Now, link1 is an ordinary anchor. What if we try to compile this application? We get an error, as shown in Fig. 8.17. As we can see, the compiler does not recognize link1 in the page_load () method now. Why is it so? It is because it is no longer an HTML server control. Instead, it is an ordinary HTML control. The moment we make it an ordinary HTML control, we lose the benefit of the ability of manipulating the contents of this control programmatically in server-side code. This is exactly what has happened here. Now, link1 has become

Web Technologies

246 a client-only HTML control. This means that it can be manipulated by client-side JavaScript in the browser, but not by the server! Server Control and HTML Control Example 6d3x44 <script language=”c#” runat=”server”> void page_load() { link1.HRef = “http://www.google.com”; }
Visit Google!


Fig. 8.16

HTML control example

Compiler Error Message: CS0103: The name ‘link1’ does not exist in the current context Source Error: Line Line Line Line Line

7: 8: 9: 10: 11:

void page_load() { link1.HRef = “http://www.google.com”; }

Source File: c:\Documents and Settings\atulk\My Documents\Visual Studio 2005\WebSites\WebSite1\a.aspx Line: 9

Fig. 8.17

Error in the example

This should clearly outline the practical differences between an ordinary HTML control and an HTML server control. We now summarize the advantages and disadvantages of the HTML server controls below.

Advantages 1. The HTML server controls are based on the traditional HTML-like object model. 2. The controls can interact with client-side scripting. Processing can be done at the client-side as well as at the server-side, depending on our logic.

Disadvantages 1. We would need to code for browser compatibility.

ASP.NET—An Overview

247 2. They have no way of identifying the capabilities of the client browser accessing the current page. 3. They have abstraction similar to the corresponding HTML tags, and they do not offer any added abstraction levels.

Web server controls Web server controls are an ASP.NET speciality. They are rich, powerful, and very easy to use. They go even beyond the HTML server controls. They exhibit behaviour that makes the ASP.NET applications extremely easy and /programmer friendly. All Web server controls have a special identifier, which is . These controls do not have the traditional HTML-like tags. Figure 8.18 distinguishes between the creation of a text box by using an HTML control, an HTML server control, and a Web server control.

Fig. 8.18

Difference between various control types

As we can see, the way to define a text box by using the Web server control is given below.

How does this work? We code the above tag. ASP.NET, in turn, transforms this code into an HTML text box control, so that the text box can be displayed on the ’s browser screen. However, in addition: (i) It also ensures that the same features that were provided to the HTML server control were retained. (ii) It adds a few new features of its own. Suppose that our code is as shown in Fig. 8.19. <%@ Page Language=”C#”%>



Fig. 8.19

Web server control—Part 1

Web Technologies

248 This will cause a text box to be displayed on the screen. If we again do a View-source, the result is shown in Fig. 8.19.



Fig. 8.19

Web server control—Part 2

As we can see, our code for the text box has again been converted into a traditional text box. Plus, two hidden variables have been added. Before we proceed, let us see what would have happened if we had used an HTML server control, instead of a Web server control. In other words, suppose that our source code is as shown in Fig. 8.22. <%@ Page Language=”C#”%>



Fig. 8.20 HTML control—Part 1 Note that we have replaced our Web server control for the text box with a corresponding HTML server control. Now if we do a View-source, what do we get to see? Take a look at Fig. 8.20. We can see that this code is almost exactly the same as what was generated in the Web server control case. How does a Web server control then differ from an HTML server control? There are some key differences between the two, as follows. n

n

Web controls provide richer Graphical Interface (GUI) features as compared to HTML server controls. For example, we have calendars, data grids, etc., in the Web controls. The object model (i.e., the programming aspects) in the case of Web controls is more consistent than that of HTML server controls.

ASP.NET—An Overview

249 n

Web controls detect and adjust for browsers automatically, unlike HTML server controls. In other words, they are browser-independent.

A detailed discussion of these features is beyond the scope of the current text. However, we would summarize the advantages and disadvantages of Web server controls.



Fig. 8.20 HTML control—Part 2 Advantages 1. They can detect the target browser’s capabilities and render themselves accordingly. 2. Additional controls, which can be used in the same manner as any HTML control, such as Calender controls are possible without any dependency on any external code. 3. Processing is done at the server side. 4. They have an object model, which is different from the traditional HTML model and they even provide a set of properties and methods that can change the outlook and behaviour of the controls. 5. They have the highest level of abstraction.

Disadvantages 1. The programmer does not have a very deep control on the generated code. 2. Migration of ASP to any ASP.NET application is difficult if we want to use these controls. It is actually the same as rewriting our application.

8.6 VALIDATION CONTROLS ASP.NET is a dynamic Web page, server-side technology. Therefore, it does not directly interact with the Web browser. For example, there are no ASP.NET properties/methods to get keyboard input from the , respond to mouse events, or perform other tasks that involve interaction with the browser. ASP.NET can get the results of such actions after the page has been posted, but cannot directly respond to browser actions. Therefore, in order to validate information (say whether the has entered a numeric value between 0 and 99 for age),

Web Technologies

250 we must write JavaScript as per the traditional approach. This client-side JavaScript would travel to the ’s browser along with the HTML page, and validate its contents before they are posted to the server. The other approach of validating all this information on the server is also available, but is quite wasteful. ASP.NET has introduced something quite amazing to deal with validations. Titled validation controls, these additional tags validate information with very little coding. They are very powerful, browserindependent, and can easily handle all kinds of validation errors. The way validation controls work is illustrated in Fig. 8.21. 1. ASP.NET checks the browser when generating a page. 2. If the browser can JavaScript, ASP.NET sends client-side JavaScript to the browser for validations, along with the HTML contents. 3. Otherwise, validations happen on the server. 4. Even if client-side validation happens, server-side validation still happens, thus ensuring double checking.

Fig. 8.21 Validation controls operation Table 8.3 summarizes the various validation controls provided by ASP.NET.

Table 8.3

Validation controls

Validation control RequiredFieldValidator CompareValidator RangeValidator RegularExpressionValidator CustomValidator

Explanation Ensures that a mandatory field must have some value Compares values of two different controls, based on the specified conditions Ensures that the value of a control is in the specified range Compares the value of a control to ensure that it adheres to a regular expression Allows the to provide her own validation logic

Let us understand how validation controls work, with an example. Figure 8.22 shows an ASP.NET page that displays a text box to the . It also makes this text box mandatory by using the RequiredFieldValidator validation control. Let us understand how this works. We have an HTML form, which has a text box named aaa. Associated with this text box is a special control called as . If we look at the syntax of this validation control, we shall notice that it specifies ControlToValidate as our text box (i.e., aaa). In other words, the validation control is intended to act upon the text box. Also, because this is a validation control that controls whether or not the has entered something in the text box, it is called as RequiredFieldValidator. Let us now see how this works in real life. As we can see, if the does not type anything in the text box and clicks on the button, we see an error message as shown in Fig. 8.23. How does this work? When we declare an HTML control to be of type RequiredFieldValidator, and associate it with some other HTML control (e.g., with a text box, in this case), ASP.NET generates the clientside JavaScript code behind to build the right association between them. In other words, it writes the code to ensure that whenever the tabs out of the text box, the validation control should kick in. This is so convenient as compared to writing tedious JavaScript code ourselves! Better still, we can ensure multiple validations on the same control (e.g., the fact that it is mandatory, and that it should contain a numeric value between this and

ASP.NET—An Overview

251 this range, and that it should be less than some value in some other control). The best part, though, is the fact that we need not write almost a single line of code to do all this! We can just use the drag-and-drop features of ASP.NET to do almost everything that we need here. <%@ Page Language=”C#” %> Validation Control Example 57319




Fig. 8.22 RequiredFieldValidator example

Fig. 8.23

RequiredFieldValidator usage example

Truly, this is something remarkable. Programmers, who have used JavaScript to do similar things in the past can vouch for the complexities they had to undergo to achieve similar objectives. JavaScript works, but it is quite tedious. And to an extent, it is browser-dependent, as well! We need not go through all that pain any more, if we are using ASP.NET. We can use the underlying features of this technology to implicitly implement all these features declaratively, rather than programmatically. Just to illustrate the point further, we shall illustrate one more example. This time, we make use of the RangeValidator. As the name suggests, this validation control allows us to specify the range in which the value of a particular control must be. Have a look at Fig. 8.24.

Web Technologies

252 <%@ Page Language=”C#” %> <script runat=”server”> void Button_Click(Object sender, EventArgs e) { if (Page.IsValid) { MessageLabel.Text = “Page submitted successfully.”; } else { MessageLabel.Text = “There is an error on the page.”; } } Validator Example 6n19t

Validator Example 1t2eg

Enter a number from 1 to 10.



Fig. 8.24 RangeValidator and ValidationSummary validation controls Let us understand how this code works. We have defined a text box, which is actually a Web server control.

ASP.NET—An Overview

253 We then have a RangeValidator:

This code tells us that we want to validate the text box control created earlier. We then say that the minimum value that the text box can accept is 1, and the maximum value is 10. We also specify the error message, in case the has not entered a number in the text box adhering to this range. We then also have a RequiredFieldValidator:

This validation control ensures that the does not leave our text box empty. Finally, we have an interesting piece of code:

This is the ValidationSummary validation control. This validation control ensures that instead of displaying different validation errors differently, and at different places, all if them can be summarized and displayed at one place. In other words, we want to summarize all the validation errors at one place for better look and feel. Following are its key features: n n n

Consolidates error reporting for all controls on a page Usually used for forms containing large amounts of data Shows list of errors in a bulleted list

Above this, we had the following code:


runat=”server” DataTextField=”Product_Name”

Fig. 8.29 SqlDataSource example We can also add an UpdateCommand attribute to the SqlDataSource to allow the to edit data.

8.7.2

GridView

This control allows us to access data without writing a single line of code! We can drag this control on to the screen, and link it to a data source. Simply by setting a couple of properties, we can have the data sorted, pagination enabled, and so on. The data that gets displayed is in a grid form.

8.7.3 FormView This control allows us to display a single data item from a bound data source control and allows insertions, updates, and deletions to data. We can also provide a custom template for the data display. Figure 8.30 illustrates the usage of this control with an example. <%@ Page Language=”C#” %> Untitled Page 4k391h
   lname:
fname:
hiredate:


(Contd)

Web Technologies

258 Fig. 8.30 contd... phone:
<EditItemTemplate> lname: ’>
fname: ’>
hiredate: ’>
phone: ’>


Fig. 8.30

FormView example—Part 1

lname: ’>
fname: ’>
hiredate: ’>
phone: ’>


(Contd)

ASP.NET—An Overview

259 Fig. 8.30 contd...
  


Fig. 8.30 FormView example—Part 2

8.7.4 Database Programming So far, we have discussed the options of performing database processing by using minimum code. However, ASP.NET also facilitates features whereby the programmer has full control over the database processing. The general steps for this approach are shown in Fig. 8.31.

Fig. 8.31

ASP.NET database programming steps

Web Technologies

260 The command object provides a number of useful methods, as summarized below.

ExecuteNonQuery This method executes the specified SQL command and returns the number of affected rows.

ExecuteReader This command provides a forward-only and read-only cursor; and executes the specified SQL command to return an object of type SqlDataReader (discussed subsequently).

ExecuteRow This method executes a command and returns a single row of type SqlRecord. ExecuteXMLReader Allows processing of XML documents. For the purpose of programming, there are two options, as specified in Fig. 8.32.

Fig. 8.32

ASP.NET programming approaches

As we can see, there are two primary approaches for database programming using ASP.NET.

Stream-based data access using the DataReader object Set-based data access using the DataSet and DbAdapter objects We shall discuss both now.

Using the DataReader As mentioned earlier, this control is read-only and forward-only. It expects a live connection with the database. It cannot be instantiated directly. Instead, it must be instantiated by calling the ExecuteReader method of the Command object. Figure 8.33 shows an example. As we can see, a GridView control is specified in the HTML page. It is bound to a DataReader object. The DataReader object fetches data from a table by using the appropriate SQL statement. However, we should note that the DataReader object can only be used for reading data. It cannot be used in the insert, update, and delete operations. If we want to perform these kinds of operations, we can directly call the ExecuteNonQuery method on the command object.

ASP.NET—An Overview

261 <%@ <%@ <%@ <%@ <%@

Page Language=”C#” Import Namespace = Import Namespace = Import Namespace = Import Namespace =

Debug = “true”%> “System.Data” %> “System.Data.SqlClient” %> “System.Configuration” %> “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load() { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; OleDbDataReader MyReader; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyCommand = new OleDbCommand(); MyCommand.CommandText = “SELECT lname FROM employees”; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; MyCommand.Connection.Open(); MyReader = MyCommand.ExecuteReader(CommandBehavior.CloseConnection); gvEmployees.DataSource = MyReader; gvEmployees.DataBind(); MyCommand.Dispose(); MyConnection.Dispose(); } } SQL Example 2i3x4m


Fig. 8.33

Using the DataReader object

Web Technologies

262 Figure 8.34 shows an example for inserting data using the ExecuteNonQuery method of the Command object. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” Import Namespace = Import Namespace = Import Namespace = Import Namespace =

Debug = “true”%> “System.Data” %> “System.Data.SqlClient” %> “System.Configuration” %> “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load() { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand (“INSERT INTO employees VALUES (‘8000’, ‘Kahate’, ‘Atul’, ’13-08-2001', 0, ‘D1’, ‘Mr’, ‘[email protected]’, ‘2101011’)”, MyConnection); MyCommand.ExecuteNonQuery (); MyConnection.Close(); MyCommand.Dispose(); MyConnection.Dispose(); } } SQL Example 2i3x4m
Hello


Fig. 8.34 Inserting data using ExecuteNonQuery method of the Command object We can similarly update data, as shown in Fig. 8.35.

ASP.NET—An Overview

263 <%@ <%@ <%@ <%@ <%@

Page Language=”C#” Import Namespace = Import Namespace = Import Namespace = Import Namespace =

Debug = “true”%> “System.Data” %> “System.Data.SqlClient” %> “System.Configuration” %> “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load() { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand (“UPDATE employees SET lname = ‘test’ WHERE empno = ‘8000’”, MyConnection); MyCommand.ExecuteNonQuery (); MyConnection.Close(); MyCommand.Dispose(); MyConnection.Dispose(); } } SQL Example 2i3x4m
Hello


Fig. 8.35

Updating data using ExecuteNonQuery method of the Command object

We can also delete data, as shown in Fig. 8.36. One of the nice features of SQL programming these days is to perform what are called as parameterized operations. In other words, we can decide at run time, what values should be provided to an SQL query for comparisons, insertions, updates, etc. For example, suppose that we want to accept some value from the and allow the to search for matching rows based on that value. Now, in this case, we cannot hard code that value in our SQL query, since that would stop the from providing a different value each time. However, if

Web Technologies

264 we parameterize it, the can provide the value at run time, and the query would take that value as the input for the look up. Figure 8.37 shows the example of a parameterized SELECT statement. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” Import Namespace = Import Namespace = Import Namespace = Import Namespace =

Debug = “true”%> “System.Data” %> “System.Data.SqlClient” %> “System.Configuration” %> “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load() { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand (“DELETE FROM employees WHERE empno = ‘3000’”, MyConnection); MyCommand.ExecuteNonQuery (); MyConnection.Close(); MyCommand.Dispose(); MyConnection.Dispose(); } } SQL Example 2i3x4m
Hello


Fig. 8.36

Deleting data using ExecuteNonQuery method of the Command object

ASP.NET—An Overview

265 <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace = “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; OleDbDataReader MyDataReader; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand(); MyCommand.CommandText = “ SELECT lname, fname FROM employees WHERE deptno = @deptno “; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; MyDataReader = null; MyCommand.Parameters.Add(“@deptno”, OleDbType.Char); MyCommand.Parameters[“@deptno”].Value = “D1”; try { MyDataReader = MyCommand.ExecuteReader(); if (MyDataReader.HasRows) Response.Write (“--- Found data ---
”); else Response.Write(“--- Did not find any data ---
”); } catch (OleDbException ex) { Response.Write(“*** ERROR *** ==> “ + ex.Message.ToString()); } while (MyDataReader.Read()) { Response.Write(“Last name = “ + MyDataReader[“lname”] + “ Response.Write(“First name = “ + MyDataReader[“fname”]); Response.Write(“
”); } MyDataReader.Dispose(); MyCommand.Dispose(); MyConnection.Dispose(); } }

Fig. 8.37

Parameterized SELECT—Part 1

“);

Web Technologies

266 Untitled Page 4k391h
 


Fig. 8.37 Parameterized SELECT—Part 2 As we can see, the department number is not hardcoded into the SQL query. Instead, this is ed as a parameter value at run time. Of course, in this case, it is provided without any intervention. But in real life, this value can come from the or from another table/application, etc. Figure 8.38 shows a parameterized UPDATE statement. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace = “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; OleDbParameter DeptnoParam; OleDbParameter DeptnameParam; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand(); MyCommand.CommandText = “ UPDATE departments SET deptname = @deptname WHERE deptno = @deptno “; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection;

Fig. 8.38 Parameterized UPDATE—Part 1

ASP.NET—An Overview

267 MyCommand.Parameters.Add(“@deptname”, OleDbType.Char); MyCommand.Parameters.Add(“@deptno”, OleDbType.Char); MyCommand.Parameters[“@deptname”].Value = “Test name”; MyCommand.Parameters[“@deptno”].Value = “D2”; try { MyCommand.ExecuteNonQuery(); } catch (OleDbException ex) { Response.Write(“*** ERROR *** ==> “ + ex.Message.ToString()); } MyCommand.Dispose(); MyConnection.Dispose(); } } Untitled Page 4k391h
 


Fig. 8.38

Parameterized UPDATE—Part 2

Just as we can select or update data based on the parameters provided by the or another application, we can even create a new row in the table, depending on what data the has provided. In other words, parameterized insert is also allowed. Figure 8.39 shows an example. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace = “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (Page.IsPostBack) { Label1.Text = “Result: “; OleDbConnection MyConnection;

(Contd)

Web Technologies

268 Fig. 8.39 contd... OleDbCommand MyCommand; String String String String

Dept_No = TextBox1.Text.ToString(); Dept_Name = TextBox2.Text.ToString(); Dept_Mgr = TextBox3.Text.ToString(); Dept_Location = TextBox4.Text.ToString();

MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand(); MyCommand.CommandText = “INSERT INTO departments VALUES (@deptno, @deptname, @deptmgr, @location) “; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; MyCommand.Parameters.Add(“@deptno”, OleDbType.Char); MyCommand.Parameters.Add(“@deptname”, OleDbType.Char); MyCommand.Parameters.Add(“@deptmgr”, OleDbType.Char); MyCommand.Parameters.Add(“@location”, OleDbType.Char); MyCommand.Parameters[“@deptno”].Value = Dept_No; MyCommand.Parameters[“@deptname”].Value = Dept_Name; MyCommand.Parameters[“@deptmgr”].Value = Dept_Mgr; MyCommand.Parameters[“@location”].Value = Dept_Location; try { int i = MyCommand.ExecuteNonQuery(); if (i == 1) Label1.Text += “ One row added to the table”; } catch (OleDbException ex) { Label1.Text += “*** ERROR *** ==> “ + ex.Message.ToString(); } MyCommand.Dispose(); MyConnection.Dispose(); } }

Fig. 8.39

Parameterized INSERT—Part 1

ASP.NET—An Overview

269 Untitled Page 4k391h

Please provide following values 481n4c


z4d4n
Department Number (Unique)
Department Name
Department Manager
Location

                                  


Fig. 8.39

Parameterized INSERT—Part 2

Using the DataSet, DataTable, and DataAdapter We have mentioned earlier that the DataSet offers disconnected data access. This is the most common form of database access. In other words, this technique

Web Technologies

270 allows the ASP.NET program to be disconnected from the database while performing the database operations. The final result of the operation, however, gets applied to the database by connecting once. The DataSet object is a collection of many DataTable objects. A DataTable represents one database table in the memory of the application. We can choose to directly work with a DataTable object. Figure 8.40 shows an example. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace =“System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; DataTable MyDataTable; OleDbDataReader MyReader; OleDbParameter EmpnoParam; MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies \\WT-2\\Examples\\employees.mdb\””); MyCommand = new OleDbCommand(); MyCommand.CommandText = “ SELECT * FROM employees WHERE empno = @empno “; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; EmpnoParam = new OleDbParameter(); EmpnoParam.ParameterName = “@empno”; EmpnoParam.OleDbType = OleDbType.Char; EmpnoParam.Size = 50; EmpnoParam.Direction = ParameterDirection.Input; EmpnoParam.Value = “4000”; MyCommand.Parameters.Add(EmpnoParam); MyCommand.Connection.Open(); MyReader = MyCommand.ExecuteReader(CommandBehavior.CloseConnection); MyDataTable = new DataTable(); MyDataTable.Load(MyReader); gvEmployees.DataSource = MyDataTable; gvEmployees.DataBind(); MyDataTable.Dispose(); MyCommand.Dispose(); MyConnection.Dispose(); } }

Fig. 8.40

Using the DataTable for SELECT—Part 1

ASP.NET—An Overview

271 Untitled Page 4k391h


Fig. 8.40

Using the DataTable for SELECT—Part 2

In a similar fashion, we can insert data as shown in Fig. 8.41. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace = “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; DataTable MyDataTable; OleDbDataReader MyReader; OleDbParameter DeptNoParam; OleDbParameter DeptNameParam; OleDbParameter DeptMgrParam; OleDbParameter LocationParam;

Fig. 8.41 Using the DataTable for INSERT—Part 1 MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyCommand = new OleDbCommand();

(Contd)

Web Technologies

272 Fig. 8.41 contd... MyCommand.CommandText = “ INSERT INTO departments VALUES (@deptno, @deptname, @deptmgr, @location)”; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; DeptNoParam = new OleDbParameter(); DeptNoParam.ParameterName = “@deptno”; DeptNoParam.OleDbType = OleDbType.Char; DeptNoParam.Size = 50; DeptNoParam.Direction = ParameterDirection.Input; DeptNoParam.Value = “D100”; MyCommand.Parameters.Add(DeptNoParam); DeptNameParam = new OleDbParameter(); DeptNameParam.ParameterName = “@deptname”; DeptNameParam.OleDbType = OleDbType.Char; DeptNameParam.Size = 50; DeptNameParam.Direction = ParameterDirection.Input; DeptNameParam.Value = “New Department”; MyCommand.Parameters.Add(DeptNameParam); DeptMgrParam = new OleDbParameter(); DeptMgrParam.ParameterName = “@deptmgr”; DeptMgrParam.OleDbType = OleDbType.Char; DeptMgrParam.Size = 50; DeptMgrParam.Direction = ParameterDirection.Input; DeptMgrParam.Value = “2000”; MyCommand.Parameters.Add(DeptMgrParam); LocationParam = new OleDbParameter(); LocationParam.ParameterName = “@location”; LocationParam.OleDbType = OleDbType.Char; LocationParam.Size = 50; LocationParam.Direction = ParameterDirection.Input; LocationParam.Value = “Pune”; MyCommand.Parameters.Add(LocationParam); MyCommand.Connection.Open(); MyReader = MyCommand.ExecuteReader(CommandBehavior.CloseConnection); MyDataTable = new DataTable(); MyDataTable.Load(MyReader); gvEmployees.DataSource = MyDataTable; gvEmployees.DataBind(); MyDataTable.Dispose(); MyCommand.Dispose(); MyConnection.Dispose(); } }

Fig. 8.41 Using the DataTable for INSERT—Part 2

ASP.NET—An Overview

273 Untitled Page 4k391h


Fig. 8.41 Using the DataTable for INSERT—Part 3 Let us now worry about the DataSet and DataAdapter. A DataSet does not interact with the database directly. It takes the help of the DataAdapter object. The job of the DataAdapter is to perform database operations and create DataTable objects. The DataTable objects contain the query results. It also writes the changes done to DataTable objects are reflected back on to the database. Conceptually, this can be depicted as shown in Fig. 8.42.

Fig. 8.42 DataSet and DataAdapter The DataAdapter object has a method called as Fill (), which queries a database and initializes a DataSet (actually a DataTable) with the results. Similarly, there is a method called as Update (), which is used to propagate changes back to the database. Figure 8.43 shows an example of selecting data from a table using this idea. <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace = “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack)

(Contd)

Web Technologies

274 Table 8.43 contd... { OleDbConnection MyConnection; OleDbCommand MyCommand; OleDbDataAdapter MyAdapter; DataTable MyTable = new DataTable(); MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand(); MyCommand.CommandText = “ SELECT lname, fname FROM employees WHERE deptno = ‘D1’ “; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; MyAdapter = new OleDbDataAdapter(); MyAdapter.SelectCommand = MyCommand; MyAdapter.Fill(MyTable); GridView1.DataSource = MyTable.DefaultView; GridView1.DataBind(); MyAdapter.Dispose(); MyCommand.Dispose(); MyConnection.Dispose(); } } Untitled Page 4k391h
 


Fig. 8.43 Selecting data using the DataSet and DataAdapter classes Figure 8.44 shows a parameterized SELECT using the DataSet and DataAdapter objects.

ASP.NET—An Overview

275 <%@ <%@ <%@ <%@ <%@

Page Language=”C#” %> Import Namespace =”System.Data” %> Import Namespace =”System.Data.SqlClient” %> Import Namespace =”System.Configuration” %> Import Namespace = “System.Data.OleDb” %>

<script runat=”server”> protected void Page_Load(object sender, EventArgs e) { if (!Page.IsPostBack) { OleDbConnection MyConnection; OleDbCommand MyCommand; OleDbDataAdapter MyAdapter; DataTable MyTable = new DataTable(); MyConnection = new OleDbConnection (“Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\”C:\\Lectures\\SICSR\\Web Technologies\\WT-2\\Examples\\employees.mdb\””); MyConnection.Open(); MyCommand = new OleDbCommand(); MyCommand.CommandText = “ SELECT lname, fname FROM employees WHERE deptno = @deptno “; MyCommand.CommandType = CommandType.Text; MyCommand.Connection = MyConnection; MyCommand.Parameters.Add(“@deptno”, OleDbType.Char); MyCommand.Parameters[“@deptno”].Value = “D1”; MyAdapter = new OleDbDataAdapter(); MyAdapter.SelectCommand = MyCommand; MyAdapter.Fill(MyTable); GridView1.DataSource = MyTable.DefaultView; GridView1.DataBind(); MyAdapter.Dispose(); MyCommand.Dispose(); MyConnection.Dispose(); } } Untitled Page 4k391h
 


Fig. 8.44 Parameterized select using the DataSet and DataAdapter classes

Web Technologies

276

8.8 ACTIVEX CONTROLS ActiveX controls (also called as ActiveX objects) are similar to the Java applets in the sense that they are also tiny programs that can be ed from the Web server to the Web browser, and executed locally at the browser. However, there are two major differences between an applet and an ActiveX control, as discussed below. 1. An applet has to go through many security checks (for example, an applet cannot write to the hard disk of the browser computer), an ActiveX object can actually write to the local hard disk. This makes its behaviour suspect, although it can offer richer functionality as a compensation for this. 2. An applet gets ed every time it is accessed. This means that if a accesses a Web page containing an applet, it gets ed to the ’s browser and executes there. When the closes the browser session, the applet is removed from the ’s computer because the applet is stored in the main memory of the client computer during its execution. In contrast, when ed, ActiveX controls are stored on the hard disk of the client machine. They remain there even when the browser is closed. Therefore, when the accesses the same Web page containing the same ActiveX control, the ActiveX control from the client’s computer is used, and is not ed once again from the server. ActiveX, as mentioned, is Microsoft technology. Therefore, ActiveX objects can run only on Internet Explorer browser. The reason for it being Microsoft-specific is again the Windows registry. All ActiveX objects must be recorded in the registry of the operating system that the Web server is running. This means the Web server must run on an operating system that s the concept of registry, that is Windows. We shall not discuss ActiveX further, since the conceptual framework is similar to applets. One more point needs to be noted. These days, the concept of code g has gained prominence. In simple , the organization, which develops the program code, declares (digitally) that it has developed a particular piece of code, and the person who has ed it (in the form of applets or ActiveX controls) can trust it not to perform any malicious actions. For example, an applet coming from Sun Microsystems could declare that the applet is developed by Sun Microsystems, and that the can trust it not to do any wrongdoings. Moreover, such signed applets or signed ActiveX controls can actually have more privileges than unsigned applets or ActiveX controls (e.g., that can perform disk operations). Of course, a signed applet can still contain malicious code, the only advantage here is that you know where this malicious code came from, and take an appropriate action. The war between client side technologies is going to become hotter, as Microsoft has decided to remove Java from Version 6 of its popular browser, Internet Explorer. This means that applets cannot execute inside Internet Explorer from Version 6, unless the instals the Java Virtual Machine (JVM) by ing it from the Internet—it would not be done automatically, anymore. Furthermore, as Internet Explorer is gaining in popularity, and as people realize that applets can make ing and processing time slower, it appears that ActiveX controls, or some other client-side technology, if and when it appears on the scene, will become the key to active Web pages.

SUMMARY l l

Microsoft’s .NET platform is one of the best ways of creating Web applications. The ASP.NET specifications allow us to specify how Web applications can be constructed so as to have effective design, validations, and clarity.

ASP.NET—An Overview

277 l

l l

l

l l

ASP.NET applications can be developed in a number of languages, but of most practical relevance are C# and VB.NET. Several features are available to make validations very easy in ASP.NET. An example is server controls. ASP.NET validations allow the developer to perform very complex validations without writing too much code. ASP.NET comes with an Integrated Development Environment (IDE), which allows development of applications very easily. ASP.NET provides database processing in the form of ADO.NET. ADO.NET technology allows us to perform database processing in a number of ways, depending on the requirements.

REVIEW QUESTIONS Multiple-choice Questions 1. A Web page is if it does not change its behaviour in response to external actions. (a) static (b) dynamic (c) active (d) frozen 2. A Web page is , if it changes its behaviour (i.e., the output) in response to external actions. (a) static (b) dynamic (c) active (d) frozen 3. The highest level in the .NET framework is the . (a) programming languages layer (b) Common Language Specifications (c) Microsoft Intermediate Language (d) Web Services and GUI applications 4. is the common thread between all the varying .NET programming languages. (a) Common Language Specifications (CLS) (b) XML and ADO.NET (c) Base class library (d) Common Language Runtime (CLR) 5. ASP.NET uses full-fledged programming languages such as . (a) C# and VB.NET (b) VB and C++ (c) Base class library (d) BL and ASP 6. In the approach, all the HTML part is inside one .aspx file, and the actual functionality resides in various individual files. (a) Single-page model (b) Code-behind page model (c) Double-page model (d) Multiple-page model 7. All have a special identifier, which is . (a) Web server controls (b) single server controls (c) document server controls (d) server controls 8. can run only on Internet Explorer browser and can actually write to the local hard disk. (a) Applet (b) ActiveX object (c) Dameon (d) Bean 9. method executes a command and returns a single row of type SqlRecord. (a) ExecuteNonQuery (b) ExecuteReader (c) ExecuteRow (d) Execute Column

Web Technologies

278

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Discuss in detail different types of Web pages. Explain different kind of Web technologies. Give an Overview of the .NET framework. Discuss in detail ASP.NET form controls. Discuss the advantages and disadvantages of Web server controls. Explain how validation controls work. Write a program in ASP.NET, which will take the name and and perform effective validations. Discuss in detail how database processing happens in ASP.NET, with an example. What do you think will happen if we use JavaScript instead of ready-made controls for validations in ASP.NET? Which is the best approach for database processing in ASP.NET? How does the .NET framework multiple languages?

Exercises 1. 2. 3. 4. 5.

Examine the differences between ASP and ASP.NET. Examine the equivalents of ASP.NET server controls in other technologies. Why did Microsoft come up with the .NET framework? Investigate. Why should we use ADO.NET, and not simple ODBC? Find out. Is C# a better language than C++? Why?

Java Web Technologies

279

+D=FJAH

Java Web Technologies

9

INTRODUCTION For unknown reasons, Sun had decided to name the second release of the Enterprise Java-related technologies as Java 2. Hence, programs developed on top of it, were called Java 2 xxxxx (refer to the table shown later for more details). This nomenclature left people wondering when Java 3, Java 4, etc., would emerge. On the contrary, Java had already moved from the second release to the fifth release by then! Hence, Java 2 Enterprise Edition 5.0 (or J2EE 5.0 for short) actually meant Java Enterprise fifth edition (and not the second edition)! But the “2” after “Java” had somehow just stayed on! It served no real purpose or made any sense. This should have been Java Enterprise Edition 5.0 (i.e., JEE 5.0 in short). This was, clearly, incredibly confusing and unnecessary. Thankfully, Sun has now simply dropped the “2” from the Java name, and the “dot zero” from the version number. Hence, the nomenclature has become quite simple now, compared to the time when everyone was confused about which version of which product one was referring to. To understand and appreciate this better, let us have a quick recap of what Sun had done earlier to create all this confusion, as shown in Table 9.1.

Table 9.1

Confusion about Java terminology

Old acronym

Old long name

New long name, with the “2” gone

JDK

Java Development Kit

No change

Description

This is needed if we wanted to just compile standard (core) Java programs, which do not make use of enterprise technologies such as JSP, Servlets, EJB, etc. In other words, these programs can make use of all the standard language features such as classes, strings, methods, operators, loops, and even AWT or Swing for graphics. This would translate a java program (i.e., a file with .java extension) into its compiled byte code version (i.e., a file with a .class extension). Many .class (Contd)

Web Technologies

280 Table 9.1 contd...

Old acronym

JRE J2SE J2EE

Old long name

Java Runtime Environment Java 2 Standard Edition Java 2 Enterprise Edition

New long name, with the “2” gone

No change Java SE Java EE

Description

files could be compiled into a Java archive file (i.e., a file with a .jar extension). This is the run time environment under which a Java program compiled above would execute It is basically JDK + JRE. This is the ‘enterprise version’ of Java, which meant for server-side technologies in the Web tier (e.g., Servlets and JSP) as well as in the application tier (e.g., EJB). People specialize in some or all of these tiers.

Note that not only is the “2” dropped, so also is the short-form of Java in the form of the letter “J”. Now, we must not refer to the older J2SE as JSE. We must call it as Java SE, for example. Enough about the naming fiasco! Let us have a quick overview about what Java EE 5 offers now. For this purpose, we have borrowed some really good diagrams from the official Java EE 5 tutorial developed by Sun Microsystems. Figure 9.1 depicts the communication between the various Java EE application layers. The client tier is usually made up of a Web browser, which means it can primarily deal with HTML pages and JavaScript (among others). These technologies communicate with the Web tier made up of JSP pages, Servlets, and JavaBeans (not EJB!). For example, a Servlet may display a page to the , and after the provides the credentials, authenticate the by checking the id and against a table maintained in the database, as discussed next. The JSP pages and Servlets then communicate with the Business tier, i.e., with one or more Enterprise JavaBeans (EJB). Note that the Business tier is optional, and is implemented only if the application needs to be able to handle very large volumes of data, provide high security, throughput, availability, robustness, etc. In any case, the Web tier usually talks to the EIS tier for database processing (either directly, or via the Business tier, as explained earlier). Based on this, the Java EE APIs can be depicted as shown in Fig. 9.2. Of course, it is not possible to explain any of these in detail here, but a small word on what is new, may perhaps help. In the Web tier (Web container in the above diagram), we now have: n

n

n

Better for Web Services. This is provided by the APIs called as JAX-WS, JAXB, StAX, SAAJ, and JAXR. Some of these existed earlier, but were very clumsily stacked together. More modern way of developing dynamic Web pages. The JSP technology has become highly tagoriented, rather than being code-intensive. In other words, the developer does not have to write a lot of code for doing things, but has to instead make certain declarations. Java Server Faces (JSF), which is input validation technology, built in response to Microsoft’s Web Controls in ASP.NET.

Java Web Technologies

281

Fig. 9.1 Sun’s Java server architecture (Copyright Sun MicroSystems)

Fig. 9.2 Sun’s Java technologies (Copyright Sun MicroSystems)

Web Technologies

282 In the Business tier (EJB container in the above diagram), we now have: n

Much easier way of writing Enterprise JavaBeans (EJB). EJB version 3.0 is more declarative rather than code-oriented, thus making the job of the developer far easier. There are several other changes in EJB, in line with these basic changes.

In the EIS tier (Database in the above diagram), we now have: n

Java Persistence API for easier integration of applications with database.

We shall review some of the key technologies in this context in the following sections.

9.1 JAVA SERVLETS AND JSP 9.1.1 Introduction to Servlets and JSP Just like an ASP.NET server-side program written in C#, a Servlet is a server-side program written in Java. The programmer needs to code the Servlet in the Java programming language. The programmer then needs to compile the Servlet into a class file, like any other Java program. Whenever an HTTP request comes, requesting for the execution of this Servlet, the class file is interpreted by the Java Virtual Machine (JVM), as usual. This produces HTML output and is sent back to the browser in the form of HTTP response. Some of these steps are shown in Fig. 9.3.

Fig. 9.3 Servlet compilation process A Servlet runs inside a Servlet container. A Servlet container is the hosting and execution environment for Java Servlets. We can consider it to be a compiler plus run-time hosting environment for Servlets. An example of Servlet container is Tomcat. Such a Servlet container runs inside a Web server, such as Apache. The flow of execution in the Servlet environment is as shown in Fig. 9.4. The step-by-step flow is explained below. 1. The browser sends an HTTP request to the Web server, as usual. This time, the request was for executing a Servlet. 2. The Web server notices that the browser has sent a request for the execution of a Servlet. Therefore, the Web server hands it over to the Servlet container, after it provides the appropriate execution environment to the Servlet container.

Java Web Technologies

283 3. The Servlet container loads and executes the Servlet (i.e., the .class file of the Servlet) by interpreting its contents via the JVM. The result of the Servlet processing is usually some HTML output. 4. This HTML output is sent back to the Web browser via the Web server, as a part of the HTTP response.

Fig. 9.4 Servlet processing concept Sometimes, the distinction between the Web server and the Servlet container is a bit blurred. People often mean the same thing when they either say Web server or Servlet container. As such, we shall also use these interchangeably now, since the distinction and context is clarified at this stage. The next question then is, what is a JSP? JSP stands for Java Server Pages (JSP). JSP offers a layer of abstraction on top of Servlets. A JSP is easier to code than a Servlet. Think about this in the same manner as that of the differences between high-level programming languages, such as Java/C# and Assembly language. A programmer can write a program either in a high-level programming language, or in the Assembly language. Writing code in high-level language is easier and friendlier, but does not give us deep control the way Assembly language gives. In a similar manner, writing code in JSP is easier, but provides lesser finer control than what Servlets provide. In most situations, this does not matter. Interestingly, when we write a JSP, the Servlet container (which now doubles up as a Servlet-JSP container) first translates the JSP into a temporary Servlet whenever a request arrives for the execution of this JSP. This happens automatically. The temporary Servlet is quickly compiled into a Java class file, and the class file is interpreted to perform the desired processing. This is depicted in Fig. 9.5. As we can see, there is some additional processing in the case of JSP, at the cost of ease of coding for the programmer.

9.1.2 Servlet Advantages The advantages of Servlets can be summarized as follows. 1. Servlets are multi-threaded. In other words, whenever the Servlet container receives a request for the execution of a Servlet, the container loads the Servlet in its memory, and assigns a thread of this Servlet for processing this client’s requests. If more clients send requests for the same Servlet, the Servlet container does not create new Servlet instances (or processes). Instead, it creates new threads of the same Servlet instance, and allocates these thread instances to the different client requests. This

Web Technologies

284 makes the overall processing faster, and also reduces the memory demands on the Servlet container/ Web server. The idea is shown in Fig. 9.6.

Fig. 9.5 JSP compilation and execution process

Fig. 9.6 Servlet process and threads concept As we can see, several clients are sending requests to the same Servlet in a concurrent fashion. The Servlet has created an instance (an operating system process) to handle them via multiple threads. 2. Since Servlets execute inside a controlled environment (container), they are usually quite stable and simple to deploy. 3. Since Servlets are nothing but more specific Java programs, they inherit all the good features of the Java programming language, such as object orientation, inherent security, networking capabilities, integration with other Java Enterprise technologies, etc.

Java Web Technologies

285

9.1.3 Servlet Lifecycle Java Servlets follow a certain path of execution during their life time. There are three phases that happen from the time the Servlet is deployed in the Servlet container. These three phases are illustrated in Fig. 9.7.

Fig. 9.7 Phases in the Servlet lifecycle Let us understand what this means. n

n

n

Servlet initialization happens only once. This is done by the Servlet container. Whenever a Servlet is deployed in a Servlet container, the container decides when to load a Servlet. The programmer cannot decide to or explicitly initialize a Servlet. As a result of initializing the Servlet, an instance of the Servlet is created in the memory. From this instance, as many Servlet threads as needed would get created to service the actual client requests. Once initialized, the Servlet can service client requests. This process is repeated for every client request. In other words, whenever an HTTP request arrives for a Servlet, the Servlet services it, as appropriate with the help of the particular thread of the Servlet instance. Like initialization, the Servlet destruction also happens only once. Just as when to initialize a Servlet is decided and implemented by the Servlet container, so is the case of the Servlet destruction. The container chooses an appropriate moment to destroy the Servlet. Usually, when the container resources are getting exhausted because of memory shortage, etc., the container decides to destroy one of the Servlets. On what basis it decides it, and how it actually puts it into action, is unpredictable. The programmer should not expect that the container would do the Servlet destruction at a particular point, or based on some condition.

How are these concepts implemented in reality? For this purpose, Sun has provided a Java class called as HttpServlet. Whenever we want to write our own Servlet (e.g., an OrderServlet or a MakePaymentServlet), we need to write a Java class that extends this HttpServlet. Now, this base class titled HttpServlet has methods for initialization, servicing, and destruction of Servlets. This is shown in Fig. 9.8. As we can see, our Servlet class extends the HttpServlet class provided by Sun. From this HttpServlet, our Servlet is able to inherit the service ( ) Java method. Similarly, the HttpServlet itself, in turn, has been inherited from GenericServlet (see the diagram). The GenericServlet defines the other two methods, namely init ( ) and destroy ( ). HttpServlet inherits these from GenericServlet, and es them on to our OrderServlet. Also, we can see that OrderServlet has some code written in all these three methods, namely init ( ), service ( ), and destroy ( ). Who calls these methods, and how would they execute? The simple answer is that we would not call these methods ourselves explicitly. Instead, the Servlet container would call them as

Web Technologies

286 and when it deems necessary. However, whenever it calls these methods, our code in the respective method would execute, producing three outputs in the server’s log.

Sun’s standard definition of a Java Servlet public abstract class HttpServlet extends GenericServlet { public void init (); public void service (HttpServletRequest request, HttpServletResponse response); void destroy (); }

Our own Servlet (e.g. OrderServlet) public class OrderServlet extends HttpServlet { public void init () { System.out.println (“In init …”); } public void service (HttpServletRequest request, HttpServletResponse response) { System.out.println (“In service …”); } void destroy () { System.out.println (“In destroy …”); } }

Fig. 9.8

Servlet life cycle

Just to make the picture complete, Fig. 9.9 shows the complete code for our OrderServlet. import java.io.*; import javax.servlet.*; import javax.servlet.http.*; public class OrderServlet extends HttpServlet { public void init () { System.out.println (“In init () method”); } public void service (HttpServletRequest request, HttpServletResponse response) { System.out.println (“In doGet () method”); } public void destroy () { System.out.println (“In destroy () method”); } }

Fig. 9.9 Sample servlet

Java Web Technologies

287 The resulting output after deploying the Servlet is shown in Fig. 9.10. In init () method In doGet () method

Fig. 9.10

Output of servlet

Now, when we make any changes to the Servlet source code and recompile the Servlet, the Servlet container would destroy and reload the Servlet so as to be able to load the fresh instance of the Servlet in memory. The actual change to the Servlet could be quite artificial (e.g., just add one space somewhere). However, this causes the Servlet to be reloaded (destroyed + re-initialized). As such, the output would now look as shown in Fig. 9.11. In destroy () method In init () method In doGet () method

Fig. 9.11 Output of servlet This completes our overview of Servlet life cycle. At this stage, we would like to specify one technical detail. In general, although we can code of the service ( ) method, the practice is discouraged. Instead, the recommended practice is to call one of “submethods” of the service ( ) method, called as doGet ( ) and doPost ( ). A detailed discussion of these is beyond the scope of the current text, but it should suffice to say that if we see doGet ( ) or doPost ( ) instead of service ( ), it should not surprise us.

9.1.4 Servlet Examples We discuss some simple Servlet examples now, to get a better idea behind their working. In the first example, we ask the to enter her email ID on the screen. When the provides this information and clicks on the Submit button on the screen, it causes an HTTP request to be sent to the server. There, we have a Servlet running, which captures this email ID and displays it back to the . There is no other processing involved. We start with the HTML page that requests the to enter the email ID. This is shown in Fig. 9.12. Servlet Example Using a Form z1v6v

Forms Example Using Servlets 5r3i30

Enter your email ID:


Fig. 9.12

HTML page to accept ’s email ID

Web Technologies

288 As we can see, this HTML page would request the to enter her email ID. When the does so and clicks on the submit button, this will cause an HTTP request to be sent to the EmailServlet Servlet on the server. The result of viewing this HTML page in the browser is shown in Fig. 9.13. Now let us look at the Servlet code that would execute in response to the HTTP request. It is shown in Fig. 9.13. // import statements here ... public class EmailServlet extends HttpServlet { public void doGet (HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { String email; email = request.getParameter (“email”); response.setContentType (“text/html”); PrintWriter out = response.getWriter (); out.println out.println out.println out.println

(“”); (“”); (“Servlet Example u4o16 ”); (“”);

out.println (“”); out.println (“

The email ID you have entered is: “ + email + “

”); out.println (“
”); out.println (“”); out.close (); } }

Fig. 9.13 EmailServlet As we can see, our Servlet has the doGet ( ) method, which is the rough equivalent of the service ( ) method. This is the method that gets called when the HTTP request is submitted to the Servlet. We can also see that this method has two parameters, one is the HttpServletRequest object and the other is the HttpServletResponse object. As the names suggest, the former sends HTTP data from the browser to the server when a request is made, whereas the latter sends HTTP data from the server back to the browser when the server sends a response. This is illustrated in Fig. 9.14. As we can see, the Servlet then uses the HttpServletRequest object (received as request here) to execute the following code: String email; email = request.getParameter (“email”);

This code declares a Java string named email, and then reads the value of the on-screen field named email (received along with the HTTP request), which is assigned to the Java string. This is how communication between browser and server happens in Servlets.

Java Web Technologies

289

Fig. 9.14 HTTP requests and responses with respect to Servlets When the server is ready to send a response to the browser, the server uses the HttpServletResponse object as shown: response.setContentType (“text/html”); PrintWriter out = response.getWriter (); out.println (“”); out.println (“”); ...

In this code, we obtain an instance of the PrintWriter object, which is a special object used to help send HTTP responses to the browser. For this purpose, it calls the println ( ) method with the HTML content that we want to send to the browser. As we can see, this is actually quite a clumsy way of writing code. We are writing HTML statements inside a Java method. This is not only a bit strange, but is also quite difficult to write at first. As such, people some times find it a bit unnerving to write Servlets to start with. However, one gets used to this style of coding easily. Now let us take another example. Here, we write code for converting US Dollars into Indian Rupees, considering a rate of USD 1 = INR 40. The Servlet code is shown in Fig. 9.15. Let us understand what the code is doing. As before, the Servlet has a doGet ( ) method, which will get invoked when the Servlet executes. Inside this method, we have a series of println ( ) method calls to send various HTML tags to the browser for display. Then it has a simple for loop, which displays the values of dollars from 1 to 50, and the equivalent values in rupees. Note that the Servlet displays the dollar-rupee conversion in an HTML table. For this purpose the appropriate HTML table related tags are included in the Servlet.

Web Technologies

290 import import import import

java.io.*; java.net.*; javax.servlet.*; javax.servlet.http.*;

public class CurrencyConvertor extends HttpServlet { protected void doGet (HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType (“text/html”); PrintWriter out = response.getWriter (); out.println out.println out.println out.println out.println out.println out.println out.println out.println out.println out.println out.println

(“”); (“”); (“Dollars to Rupees Conversion Chart r414r ”); (“”); (“”); (“
”); (“

Currency Conversion Chart 2x2b1t

”); (“



”); (“ ”); (“ ”); (“ ”); (“ for (int dollars = 1; dollars <= 50; dollars++) { int rupees = dollars * 40; out.println (“ ” + “ ” + “ ” + “ ”); } out.println out.println out.println out.println (“
DollarsRupees
” + dollars + “” + rupees + “
”); (“
”); (“”); (“”);

out.close (); } }

Fig. 9.15 Servlet for doing currency conversions The Servlet produces output, as shown in Fig. 9.16.

9.1.5 Introduction to JSP JSP is the next version of Servlets. Servlets are pretty complex to write in some situations, especially if the aim is to send HTML content to the (instead of doing some business processing on the server). In such cases, we can look at Java Server Pages (JSP). JSPs are much easier to write than Servlets. However, we should quickly examine how the JSP technology has evolved. When Java Servlets technology was developed by Sun, around the same time, Microsoft came up with Active Server Pages (ASP). ASP was a simpler technology to use than Servlets. This was because ASP pages could be created in simple scripting languages, such as JavaScript and VBScript. However, to code Servlets, one needed to know Java, and moreover the syntax of Servlets was cumbersome (as we have already experienced

Java Web Technologies

291

Fig. 9.16 Output of the servlet here). To overcome the drawbacks of Servlets, instead of revamping the Servlets technology, Sun decided to come up with JSP, which was a layer on top of Servlets. We have already discussed how this works. From a programmer’s point of view, the advantages of using JSPs instead of Servlets in certain cases are immense. Simply to send the tag to the browser, these two technologies take a completely different path, as illustrated in Fig. 9.17.

Fig. 9.17 Servlet versus JSP

Web Technologies

292 The life cycle of a JSP does not greatly differ from that of a Servlet, since internally a JSP is anyway a Servlet, once compiled! Hence, we would not talk about it separately here. Figure 9.18 shows a Hello World JSP example. Hello World 3k696j

Hello World 4w250

<% out.print (“

Hello World!”); %>

Fig. 9.18 Hello World JSP Before we proceed any further, we would like to have a look at the corresponding Servlet code to reemphasize the point about ease of coding JSPs versus Servlets. This is shown in Fig. 9.19. import import import import

java.io.*; java.net.*; javax.servlet.*; javax.servlet.http.*;

public class HelloWorld extends HttpServlet { protected void doGet (HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType (“text/html”); PrintWriter out = response.getWriter (); out.println out.println out.println out.println out.println out.println out.println out.println out.println

(“”); (“”); (“

Hello World 3k696j ”); (“”); (“”); (“

Hello World 4w250

”); (“

Hello World”); (“”); (“”);

out.close (); } }

Fig. 9.19

Hello World Servlet

Java Web Technologies

293 As we can see, coding a JSP seems to be simpler than coding the corresponding Servlet. In the JSP, we do not have to write complex Java code and worse still, HTML inside that Java code. We can straightaway write HTML tags, and wherever needed, write Java code in between HTML tags. Hence, we can roughly say that Servlets are HTML inside Java, whereas JSPs are Java inside HTML. This is depicted in Fig. 9.20.

Fig. 9.20

Servlets and JSP: Conceptual difference

Having discussed this, let us now discuss the JSP way of coding now. We can see that our JSP page is nothing more than a simple HTML page, except for one part of the code: <% out.print (“

Hello World!”); %>

As we can see, there is an out.print ( ) statement here, which is clearly not HTML syntax. It is a Java statement. This means that we can write Java in JSP. However, the Java part of a JSP page needs to be embedded inside the tag pair <% and %>. As we know, whenever the Servlet container receives a request for the execution of the JSP page, it first translates it into a Servlet. Hence, we can imagine that our JSP code would actually look like the Servlet code shown in the earlier diagram. Of course, there would not be an exact match between the Servlet code written by hand and a JSP translated into Servlet code by the Servlet container. However, from a conceptual point of view, the two code blocks would indeed look similar. In summary, we can say that if producing HTML is the aim, JSP is a better choice. On the other hand, if performing business processing is more important, we should go for Servlets.

9.1.6 Elements of a JSP Page A JSP page is composed of directives, comments, scripting elements, actions, and templates. This is shown in Fig. 9.21.

Fig. 9.21 JSP elements Let us discuss these JSP elements now.

Web Technologies

294

Directives Directives are instructions to the JSP container. These instructions tell the container that some action needs to be taken. For example, if we want to import some standard/non-standard Java classes or packages into our JSP, we can use a directive for that purpose. Directives are contained inside special delimiters, namely <%@ and %>. The general syntax for directives is: <%@ Directive-name Attribute-Value pairs %>

There are three main directives in JSP, namely page, include, and taglib. For example, we can have the following directive in our JSP page to explicitly say that our JSP is relying on some Java code: <%@ page language = “Java” %>

Here, page is the name of the directive, language is the name of the attribute, and Java is its value. Here is another directive example, this time to import a package: <%@ page import = “java.text.*” %>

As we can see, this time the page directive has an import attribute to import a Java package.

Comments Comments in JSP are of two types, as follows. HTML comments These comments follow the standard HTML syntax, and the contents of these comments are visible to the end in the Web browser, if the attempts to view the source code of the HTML page. Thus, the syntax of HTML comments is as follows: <%-- This is a an HTML style comment --%>

JSP comments These comments are removed by the JSP container before the HTML content is sent to the browser. The syntax of JSP comments is as follows:

As we can see, it is okay to have the end view the contents of our comments. We should code them as HTML comments. Else, we should code them as JSP comments.

Scripting elements The scripting elements are the areas of the JSP page where the main Java code resides. We know that Java code is what makes JSPs dynamic. The Java code adds dynamism to an otherwise static HTML page. The JSP container interprets and executes this Java code, and mixes the results with the HTML parts of the JSP page. The final resulting output is sent to the browser. Scripting elements can be further subdivided into three categories, as shown in Fig. 9.22.

Fig. 9.22

JSP scripting elements

Java Web Technologies

295 Let us understand these areas of JSP scripting elements now.

Expressions Expressions are simple means of accessing the values of Java variables or other expressions that directly yield a value. The results of an expression can be merged with the HTML page that gets generated. The syntax of expressions is as follows: <%= Expression %>

Some examples of using expression are: The current time is: <%= new java.util.Date ( ) %> Square root of 2 is <%= Math.sqrt (2) %> The item you are looking for is <%= items [i] %> Sum of a, b, and c is <%= a + b + c %>

Scriptlets Scriptlets are one or more Java statements in a JSP page. The syntax of scriptlets is as follows: <% Scriptlet code; %>

We have already seen some examples of scriptlet. Let us have another one:

<% for (int i = 0; i < n; i++) %> <% } %>
Number <%= i+1 %>


{

As we can see, there is some HTML code for creating a table. Then we have a scriptlet (starting with <% and ending with %>). This is followed by some more HTML code, which is followed by one more scriptlet. Thus, we can see that in JSP, we can combine HTML code and JSP code the way we want. Interestingly, our code snippet also has an expression: <%= i+1 %>

This proves that we can inter-mix HTML, scriptlets, and expressions. Also, we need to observe that scriptlets can be alternatives to expressions, if we want. Thus, the above statement could be written as a scriptlet, instead of as an expression as shown: <% out.print (i + 1); %>

It would work in exactly the same way. But as we can see, an expression is a better short-hand version, provided we are comfortable with its syntax.

Web Technologies

296 Figure 9.23 shows an example that shows HTML code, directives, comments, scriptlets, and expressions. <%@ page import=”java.text.*” session = “false” %> Temperature conversion 6x432h



<% numberformat fmt = new decimalformat (“###.000”); for (int f = 32; f <= 212; f += 20) { double c = ((f - 32) * 5) / 9.0; string cs = fmt.format (c); %> <% } %>
Fahrenheit Celsius
<%= f %> <%= cs %>


Fig. 9.23

Temperature conversion JSP

Declarations Declarations should be used, as the name suggests, when we need to make any declarations in the JSP page. The syntax of making declarations is as follows. <%! Declarations; %>

Here are some declaration examples: <%! int i = 0; %> <%! int a, b; double c; %> <%! Circle a = new Circle (2.0); %>

Figure 9.24 shows an example of using declarations in JSP.

Java Web Technologies

297 <%! int counter = 0; %> The page count is now: <%= ++counter %>


Fig. 9.24

JSP declarations

As we can see, we have declared a variable here by using the declaration syntax. Of course, we could have also declared this variable inside a scriptlet (as shown in Fig. 9.25), instead of specifying a declaration block. There are slight differences if we do that, and their discussion is out of scope of the current text. However, we are just explaining all the possibilities that exist. <% int counter = 0; %> The page count is now: <%= ++counter %>


Fig. 9.25 Variable declaration inside a scriptlet Actions Actions are used in the context of some new areas in JSP, which we shall discuss later. Templates Templates are also used in the context of some new areas in JSP, which we shall discuss later.

9.1.7 JavaBeans Many times, it is useful to use a JavaBean in JSP. People often confuse between a JavaBean and an Enterprise JavaBean (EJB). However, there is no resemblance between the two, and they must not be equated at all. A JavaBean is a self-contained Java class, which provides set and get methods for accessing and updating its attributes from other classes. The set and get methods in a JavaBean are respectively called as setters and getters. Figure 9.26 shows an example of a JavaBean. public class { private String firstName; private String lastName; private String emailAddress; public () { } public (String first, String last, String email) { firstName = first; lastName = last; emailAddress = email; }

(Contd)

Web Technologies

298 Fig. 9.26 contd... public void setFirstName (String f) { firstName = f; } public String getFirstName () { return firstName; } public void setLastName(String l) { lastName = l; } public String getLastName () { return lastName; } public void setEmailAddress(String e) { emailAddress = e; } public String getEmailAddress () { return emailAddress; } }

Fig. 9.26

JavaBean

As we can see, we have a simple Java class, which has three private attributes, namely, firstName, lastName, and emailAddress. There are three methods to accept values from other methods to set the values of these three attributes of the class (called as the setters). Similarly, there are three methods to retrieve or get the values of these three attributes of the class (called as the getters). Thus, whenever any outside object needs to access/update values of attributes in the class, that object can use these get/set methods. This allows the class to keep these attributes private, and yet allow other objects to access/update their values. Whenever a class is written to this functionality, it is called as a JavaBean. How are JavaBeans useful in a JSP? We can consider the fields/controls on an HTML form as attributes of a JavaBean. Whenever the HTML form is submitted to the server, the JSP on the server-side can use the JavaBean’s get-set methods to retrieve/update the form values, as appropriate. This is better than writing form processing code in the JSP itself.

9.1.8 Implicit JSP Objects JSP technology provides a number of useful implicit (ready-made) objects. We can make use of these objects to make our programming easier, rather than having to code for small details ourselves. These implicit objects are shown in Fig. 9.27. We have used some of these objects in our earlier examples. Let us have a formal explanation for them, amongst others. For that, we need to take a look at Table 9.2.

Java Web Technologies

299

Fig. 9.27 Implicit objects of JSP Table 9.2

JSP implicit objects

Object

Description

request

The request object is used to read the values of the HTML form in a JSP, received as a part of the HTTP request sent by the client to the server. The response object is used to send the necessary information from the server to the client. For example, we can send cookies (discussed separately) as shown in the example. We can use a pageContext reference to get any attributes from any scope.

response

pageContext

Example <% String uname; uname = request.getParameter (name); %>

<% Cookie mycookie = new Cookie (“name”, “atul”);response.addCookie (mycookie); %>

Setting a page-scoped attribute <% Float one = new Float (42.5); %> <% pageContext.setAttribute (“test”, one); %>

Getting a page-scoped attribute session

We will discuss this separately.

application

It is the master object, and should not be used, since it puts a load on the JSP container. This object is used to send HTML content to the ’s browser.

out

<%= pageContext.getAttribute (“test”); %> HttpSession session = request.getSession (); session.setAttribute (“name”, “ram”);

NA

<% String [] colors = {“red”, “green”, “blue”}; for (int i = 0; i < colors.length; i++) out.println (“

” + colors [i] + “

”); %>

9.1.9 Session Management in JSP/Servlets HTTP is a stateless protocol. It means that it is a forgetful protocol. It forgets what it had done in the previous step. The straightforward way to describe this situation is as follows.

Web Technologies

300 1. Client (Web browser) sends an HTTP request to the Web server. 2. The Web server sends an HTTP response to the Web browser. 3. The server forgets about the client. As we can see, this can be quite unnerving. For example, suppose our browser displays a page, where we need to enter the id and and submit it to the server. Once we enter these details and send the HTTP request to the server, the server will check whether the id and are correct. Accordingly, it would generate the next HTML page and send it to our browser. At this stage, it has already forgotten ! This means that whenever our browser sends the next HTTP request to the same server (e.g., perhaps as a result of clicking on some hyper link on the page), the server will not even know us! Perhaps the best way to understand this is to take the example of telephone conversations. Suppose that we dial the telephone number of our friend. Once we identify each other (with a Hello I am so and so, How are you, etc.), we start speaking. But what if our memory is too short and we forget each other after every turn in the conversation? It would lead to a very comical situation, such as given below. n n

n n n

Person Atul (picks up the ringing phone): Hi, Atul here. Person Achyut (had dialed Atul’s number): Hi Atul, this is Achyut here. I wanted to know if you have completed the 7th chapter. Person Atul: Yes, I have. Person Achyut (had dialed Atul’s number): Ok, what about the 8th? Person Atul (has forgotten about the previous conversation): Who are you?

As we can see, after the initial handshake, Atul (equivalent of the Web server) has forgotten Achyut (equivalent of the Web browser)! This is very strange indeed. This means that Achyut (equivalent of the Web browser) needs to identify himself to Atul (equivalent of the Web server) every single time he needs to communicate something to him during the same conversation, and provide information as to what was discussed in the past. Well, unfortunately, HTTP works in the same way. Let us say that again. 1. Client (Web browser) sends an HTTP request to the Web server. 2. The Web server sends an HTTP response to the Web browser. 3. The server forgets about the client. This means that it is the client’s responsibility to every time make the server who the client is, and what had happened in the conversation up to that point. For this purpose, we need the concept of session state management (also called as only session management). The idea for doing so is depicted in Fig. 9.28. The unique ID that keeps floating between the client and the server is called as session ID. How is this sent by the server to the browser? There are two techniques in JSP to work with session IDs. This is outlined in Fig. 9.29. Let us understand them in brief.

Cookies In the first technique, the server creates a small text file, called as a cookie and associates this particular with that cookie. The cookie is created by the server, and sent to the browser along with the first HTTP response. The browser accepts it and stores it inside the browser’s memory. Whenever the browser sends the next HTTP request to the server, it reads this cookie from its memory and adds it to the request. Thus, the cookie keeps travelling between the browser and the server for every request-response pair.

Java Web Technologies

301

URL rewriting However, there is an option to disable cookies in the browser. If the does so, session management will not work. Hence, another technique exists, whereby the session ID is not embedded inside a cookie. Instead, the session ID is appended to the URL of the next request that the browser is supposed to send to the server. For example, suppose that the server has sent an HTML form to the , which the is supposed to fill and send back to the server. This form will go to a JSP called as CheckForm.jsp. Also, the server has created a session ID with value 0AAB6C8DE415. Then, whenever the submits the form, the URL that will be seen in the browser window would not just be CheckForm.jsp, but instead, it would be CheckForm.jsp&JSESSIONID=0AAB6C8DE415. This would mean that the session ID is travelling from the browser to the server as a part of the URL itself. This technique is called as URL rewriting.

Fig. 9.28

Session management concept

Web Technologies

302

Fig. 9.29

9.1.10

Session management techniques

JSP Standard Template Library (JSTL)

The JSP Standard Template Library (JSTL) is used to reduce the amount of coding to achieve the same functionality as would normally be achieved by writing scriptlet code. In other words, JSTL is a more efficient way of writing JSP code, instead of using scriptlet code. Using JSTL, we do code development using tags, rather than writing a lot of code. Figure 9.30 shows an example of JSTL. <%@ taglib uri=”http://java.sun.com/jstl/core” prefix=”c” %>

JSP is as easy as ... 206z2c

<%-- Calculate the sum of 1, 2, and 3 dynamically --%> 1 + 2 + 3 =


Fig. 9.30

JSTL example

Let us understand how this works. At the beginning of the code, we have a directive that includes a taglib file. A taglib is the tag library, i.e., a collection of ready-made, precompiled tags used to accomplish a specific task. Although it would not be clear here, this directive is mapped to a Java Archive (JAR) file in the deployment descriptor file. That is how our JSP code understands the meaning of this taglib file. The only other new statement in our code is this: 1 + 2 + 3 =

This will print the following output: 1 + 2 + 3 = 6

How is this done? To compute the sum of 1, 2, and 3, the following code is used: “${1 + 2 + 3}”

Java Web Technologies

303 The above statement follows the syntax of what is called as Expression Language (EL). EL is a short hand tag-oriented language. An EL expression always starts with ${ and ends with }. We have put this EL expression inside a statement. This is equivalent to an out.println ( ) statement in the standard Java scriptlet code. Thus, the following two statements are equivalent: JSTL version Scriptlet version

1 + 2 + 3 = out.println (“1 + 2 + 3 = “ + 1 + 2 + 3);

JSTL can be quite powerful. Figure 9.31 shows the scriptlet code to display numbers from 1 to 10, along with the version of the JSTL code.

Count Example 59332k <% for (int i=1; i<=10; i++) { %> (a) Scriptlet version <%= i %>
<% } %>


<%@ taglib uri=”http://java.sun.com/jstl/core” prefix=”c” %> Count Example 59332k


Fig. 9.31

(b) JSTL version

Scriptlet and JSTL versions for program to display numbers from 1 to 10

The JSTL version of the code has a tag . As we can guess, this is a shorthand notation for the standard Java for statement. Similarly, we again use the shorthand notation instead of

Web Technologies

304 the standard JSP out.println ( ) notation. We then use the EL syntax to display the current value of the variable i by using the EL syntax, as before. Other than using these predefined tags such as forEach and out, we can also develop our own custom tags. These can be used in situations where we want to develop generic functionality, and use it in several JSP applications.

9.1.11

JSP and JDBC

The Java programming language has in-built for database processing. For this purpose, it uses the technology of Java Database Connectivity (JDBC). JDBC is a set of classes and interfaces for allowing any Java application to work with an RDBMS in a uniform manner. In other words, the programmer need not worry about the differences in various RDBMS technologies, and can consider all RDBMS products as some DBMS, which all work in a similar fashion. Of course, it does not mean that the programmer can use any DBMS-specific (and not generic) functionalities and yet expect JDBC to them across all other DBMS products. The basic database accessing and processing mechanism is made uniform by JDBC, as long as the programmer sticks to the standard SQL/RDBMS features. The conceptual view of JDBC is shown in Fig. 9.32.

Fig. 9.32

JDBC concept

As we can see, the main idea of JDBC is to provide a layer of abstraction to our programs while dealing with the various RDBMS products. Instead of our programs having to understand and code in the RDBMSspecific language, they can be written in Java. This means that our Java code needs to speak in JDBC. JDBC, in turn, transforms our code into the appropriate RDBMS language. The JDBC interface is contained in the packages. n n

java.sql Core API, part of J2SE javax.sql Optional extensions API, part of J2EE

JDBC uses more interfaces than classes, so that different vendors are free to provide an appropriate implementation for the specifications. Overall, about 30 interfaces and 9 classes are provided, such as Connection, Statement, PreparedStatement, ResultSet, and SQLException. We explain some of them briefly below.

Java Web Technologies

305

Connection object It is the pipe between a Java program and the RDBMS. It is the object through which commands and data flow between our program and the RDBMS.

Statement object Using the pipe (i.e., the Connection object), the Statement object is used to send SQL commands that can be executed on the RDBMS. There are three types of commands that can be executed by using this object:

Statement object This object is used to define and execute static SQL statements. PreparedStatement This object is used to define and execute dynamic SQL statements. CallableStatement This object is used to define and execute stored procedures. ResultSet object The result of executing a Statement is usually some data. This data is returned inside an object of type ResultSet.

SQLException object This object is used to deal with errors in JDBC.

9.1.12 JDBC Examples Basic concepts Suppose we have two tables in our database, containing columns as shown in Fig. 9.33. n

CREATE TABLE departments ( deptno CHAR (2), deptname CHAR (40), deptmgr CHAR (4) );

n

CREATE TABLE employees ( empno CHAR (4), lname CHAR (20), fname CHAR (20), hiredate DATE, ismgr BOOLEAN, deptno CHAR (2), title CHAR (50), email CHAR (32), phone CHAR (4) );

Fig. 9.33

Sample tables

Based on these, we want to display a list of departments, along with their manager name, title, telephone number, and email address. This can be done by using a JSP as shown in Fig. 9.34.

Web Technologies

306 <%@page <%@page <%@page <%@page <%@page

contentType=”text/html”%> pageEncoding=”UTF-8"%> session=”false” %> import="java.sql.*" %> import="java.util.*" %>

Department Managers 2x5h1y <% // Open Database Connection Class.forName (“sun.jdbc.odbc.JdbcOdbcDriver”); // Open a connection to the database Connection con = DriverManager.getConnection(“jdbc:odbc:Employee”); String sql = “SELECT D.deptname, E.fname, E.lname, E.title, E.email, E.phone “ + “FROM departments D, employees E “ + “WHERE D.deptmgr = E.empno “ + “ORDER BY D.deptname”; // Create a statement object and use it to fetch rows in a resultset object Statement stmt = con.createStatement (); ResultSet rs = stmt.executeQuery (sql); while (rs.next ()) { String dept = rs.getString (1); String fname = rs.getString (2); String lname = rs.getString (3); String title = rs.getString (4); String email = rs.getString (5); String phone = rs.getString (6); %>
Department: <%= dept %> 216g3x
<%= fname %> <%= lname %>, <%= title %>
(91 20) 2290 <%= phone %>, <%= email %> <% } rs.close (); rs = null; stmt.close(); stmt=null; con.close (); %>

-- END OF DATA --


Fig. 9.34

JSP containing JDBC code

The Statement object provides a number of useful methods, as listed in Table 9.3.

Java Web Technologies

307

Table 9.3 Useful methods of the Statement object Method

Purpose

executeQuery executeUpdate execute executeBatch

Execute a SELECT and return result set INSERT/UPDATE/DELETE or DDL, returns count of rows affected Similar to (1) and (2) above, but does not return a result set (returns a Boolean value) Batch update

Let us discuss an example of the executeUpdate statement. Figure 9.35 shows code that allows us to update the value of the department column to some fixed text for all the rows in the table. <%@page <%@page <%@page <%@page <%@page

contentType=”text/html”%> pageEncoding=”UTF-8"%> session=”false” %> import=»java.sql.*» %> import=»java.util.*» %>

Update Employees 2e66g

List of Locations BEFORE the Update 63c1t

<% // Open Database Connection Class.forName (“sun.jdbc.odbc.JdbcOdbcDriver”); // Open a connection to the database Connection con = DriverManager.getConnection(“jdbc:odbc:Employee”); String sql = “SELECT location FROM departments”; // Create a statement object and use it to fetch rows in a resultset object Statement stmt = con.createStatement (); ResultSet rs = stmt.executeQuery (sql); while (rs.next ()) { String location = rs.getString (1);
<%= location %> 6i2g5j
<% } rs.close (); rs = null; %>

Now updating ... 242a5x



<% try { String location = “Near SICSR”; int nRows = stmt.executeUpdate (“UPDATE departments SET location = ‘“ + location + “‘“); out.println (“Number of rows updated: “ + nRows); stmt.close (); stmt=null; con.close ();

(Contd)

Web Technologies

308 Fig. 9.35 contd... } catch (SQLException se) { out.println (se.getMessage ()); } %>


Fig. 9.35 Using the executeUpdate () method Figure 9.36 shows an example of deleting data with the help of a result set. <%@page import=”java.util.*” %> Delete Department Name using ResultSet 6t6v4u

Fetching data from the table ... 5s284e

<% Class.forName (“sun.jdbc.odbc.JdbcOdbcDriver”); Connection con = DriverManager.getConnection(“jdbc:odbc:Employee”); String sql = “SELECT deptname FROM departments WHERE deptno = ‘Del’”; Statement stmt = null; ResultSet rs = null; boolean foundInTable = false; try { stmt = con.createStatement (ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE); rs = stmt.executeQuery (sql); foundInTable = rs.next (); } catch (SQLException ex) { System.out.println (“Exception occurred: “ + ex); } if (foundInTable) { String str = rs.getString (1); out.println (“Data found”); out.println (“Old value = “ + str); } else { out.println (“Data not found”); } if (foundInTable) { try { rs.deleteRow (); rs.close (); rs = null; } catch (SQLException ex) { System.out.println (“Exception occurred: “ + ex); }

(Contd)

Java Web Technologies

309 Fig. 9.36 contd... out.println (“Delete successful”); } try { stmt.close (); }

stmt=null;

con.close ();

catch (SQLException ex) { System.out.println (“Exception occurred: “ + ex); } %>


Fig. 9.36 Deleting data through a result set There is something interesting in this JSP page: stmt = con.createStatement (ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);

What does this line indicate? It tells us that the result set that is going to be produced should be insensitive as well as updatable. Let us understand these two parameters. They are first conceptually shown in Fig. 9.37.

Fig. 9.37

Understanding the createStatement parameters

The first parameter can have values, as described with their meanings in Table 9.4.

Table 9.4

Possible values for the first parameter Value

TYPE_FORWARD_ONLY TYPE_SCROLL_SENSITIVE

TYPE_SCROLL_INSENSITIVE

Meaning Allow a cursor to move only in the forward direction on the result set Allow movement of the cursor on the result set in either direction, and if some other program is making any changes to the data under consideration, reflect those changes in our result set Allow movement of the cursor on the result set in either direction, and if some other program is making any changes to the data under consideration, ignore those changes in our result set

Web Technologies

310 Similarly, we describe the second parameter in Table 9.5.

Table 9.5 Possible values for the second parameter Value

Meaning Do not allow this result set to be updatable Allow the result set to make changes to the database

CONCUR_READ_ONLY CONCUR_UPDATABLE

Transactions in JDBC JDBC transaction management is quite simple. By default, all database changes in JDBC are automatically committed. However, if we want to control when commits or roll backs should happen, we need to do the following at the beginning of the JDBC code: n

con.setAutoCommit (false);

Whenever we need to commit or update the changes, we need to execute either of the following: n n

con.commit (); con.rollback ();

or

Prepared statements In order to allow the programmer to build the SQL statements dynamically at run time, JDBC s the concept of prepared statements. A prepared statement specifies what operations will take place on the database, but does not indicate with what values. For example, consider the code block shown in Fig. 9.38. // Prepare a string containing an SQL statement without a specific value String preparedSQL = “SELECT location FROM departments WHERE deptno = ?”; // Now supply the SQL statement to the PreparedStatement instance PreparedStatement ps = connection.prepareStatement (preparedSQL); // Fill up the deptno value with whatever value we want to supply ps.setString (1, _deptno); // Execute our prepared statement with the supplied value and store results into a result set ResultSet rs = ps.executeQuery ();

Fig. 9.38

Prepared Statement concept

As we can see, prepared statements allow us to prepare them at compile time, but with empty values. At run time, we supply the actual value of interest. Now, in this case, we could have either hard coded the value of the deptno, or as shown in the particular example, we can get it from another Java variable inside our JSP page. This allows us to execute the same prepared statement with different values for the deptno as many times as we wish. This means that we have the following advantages.

Java Web Technologies

311 n n n

Reduced effort of checking and rechecking many statements Write one statement and execute it as many times as desired, with different parameters Better performance

We should note that the prepared statements need not always be only for selecting data. We can even insert data using the same concept, as illustrated in Fig. 9.39. // Prepare a statement with no values for the paremeters String preparedQuery = “INSERT INTO departments (deptno, deptname, deptmgr, location) VALUES (?, ?, ?, ?)”; // Make it available to the PreparedStatement object PreparedStatement ps = con.prepareStatement (preparedQuery); // Now supply the actual parameter values ps.setString (1, _deptno); ps.setString (2, _deptname); ps.setString (3, _deptmgr); ps.setString (4, _location); // Execute the INSERT statement ps.executeUpdate ();

Fig. 9.39

Prepared Statement for INSERT

We can similarly have prepared statements for deleting and updating data. We shall not discuss them here to avoid repetition. Figure 9.40 shows a JSP page as an example of a prepared statement. <%@page pageEncoding=”UTF-8"%> <%@page session=”false” %> <%@page import=”java.sql.*” %> <%@page import=”java.util.*” %> <% boolean ucommit = true; %> JDBC Transactions Application 5a2n2t

Balances BEFORE the transaction 3j35s







<% Class.forName (“sun.jdbc.odbc.JdbcOdbcDriver”); Connection con = DriverManager.getConnection(“jdbc:odbc:s”); (Contd) Web Technologies 312 Fig. 9.40 contd... // ************************************************************************************ // THIS PART OF THE CODE DISPLAYS THE DETAILS BEFORE THE TRANSACTION // ************************************************************************************ String sql = “SELECT _Number, _Name, Balance “ + “FROM s “ + “ORDER BY _Name”; Statement stmt = con.createStatement (); ResultSet rs = stmt.executeQuery (sql); while (rs.next ()) { String _Number = rs.getString (1); String _Name = rs.getString (2); String balance = rs.getString (3); %> <% } rs.close (); rs = null; stmt.close(); stmt=null; %>
Number Name Balance
<%= _Number %> <%= _Name %> <%= balance %>


-- END OF DATA —



Fig. 9.40

Prepared statement Example—Part 1

<% // ***************************************************************************** // ATTEMPT TO EXECUTE THE TRANSACTION IF COMMIT WAS SELECTED // **************************************************************************** if (request.getParameter (“Commit”) == null) { // Rollback was selected out.println (“ You have chosen to ROLL BACK the funds transfer. No changes would be made to the database. ”); } else { // Now try and execute the database operations int from = Integer.parseInt (request.getParameter (“fromAcc”)); int to = Integer.parseInt (request.getParameter (“toAcc”));

(Contd)

Java Web Technologies

313 Fig. 9.40 contd... int amount = Integer.parseInt (request.getParameter (“amount”)); int nRows = 0; // Debit FROM PreparedStatement stmt_upd = con.prepareStatement (“UPDATE s “ + “SET Balance = Balance - ?” + “ WHERE _Number = ?”); stmt_upd.setInt (1, amount); stmt_upd.setInt (2, from); out.print (“
Amount = “ + amount); out.print (“
From Acc = “ + from); try { nRows = stmt_upd.executeUpdate (); out.print (“
” + nRows); // out.print (“
” + stmt_upd); stmt_upd.clearParameters (); } catch (SQLException se) { ucommit = false; out.println (se.getMessage ()); } // Credit TO stmt_upd = con.prepareStatement (“UPDATE s “ + “SET Balance = Balance + ?” + “ WHERE _Number = ?”); stmt_upd.setInt (1, amount); stmt_upd.setInt (2, to); out.print (“
Amount = “ + amount); out.print (“
To Acc = “ + to);

Fig. 9.40 Prepared statement Example—Part 2 try { nRows = stmt_upd.executeUpdate (); out.print (“
” + nRows); stmt_upd.clearParameters (); } catch (SQLException se) { ucommit = false; out.println (se.getMessage ()); } if (ucommit) con.commit ();

{ // No problems, go ahead and commit transaction

(Contd)

Web Technologies

314 Fig. 9.40 contd... out.println (“ Transaction committed successfully! ”); } else { con.rollback (); out.println (“ Transaction had to be rolled back! ”); } } %> <% // ************************************************************************************ // DISPLAY THE DETAILS AFTER THE TRANSACTION OPERATION // ************************************************************************************ %>





<% sql = “SELECT _Number, _Name, Balance FROM s “; stmt = con.createStatement (); rs = stmt.executeQuery (sql); while (rs.next ()) { String _Number = rs.getString (1); String _Name = rs.getString (2); String balance = rs.getString (3); %> <% } rs.close (); rs = null; stmt.close(); stmt=null; con.close (); %> Fig. 9.40 Prepared statement Example—Part 3 Java Web Technologies 315
Number Name Balance
<%= _Number %> <%= _Name %> <%= balance %>


-- END OF DATA --





Fig. 9.40 Prepared statement Example—Part 4

9.2 APACHE STRUTS 9.2.1 Model-View-Controller (MVC) Architecture Over a period of time, it was realized that the best way to design and architect Web-based applications was to follow a technique known as Model-View-Controller (MVC) architecture. The idea of MVC is quite simple. We will understand it with the help of the JSP/Servlets technology, although it can be applied to other dynamic Web page technologies as well. Instead of a single Servlet or JSP dealing with the ’s HTTP request, performing the necessary processing, and also sending back the HTTP response to the , the MVC approach recommends that we consider the whole Web application to be made up of three parts.

Model This is where the business logic resides. For example, it could be a simple Java class fetching data from a database using JDBC, or a JavaBean, or an Enterprise JavaBean (EJB), or even a non-Java application.

View The view is used to prepare and send the resulting output back to the . Usually, this is done with the help of a JSP. In other words, the JSP constructs the HTML page that is sent to the browser as a part of the HTTP response.

Controller The controller is usually a Servlet. As the name suggests, the controller Servlet is responsible for controlling the overall flow of the application. In other words, it coordinates all the functions of the application and ensures that the ’s request is processed appropriately. The overall application architecture using the MVC approach looks as shown in Fig. 9.41. Let us understand how this works, step by step. The step numbers refer to the corresponding sequence number depicted in the diagram. 1. The browser sends an HTTP request to the server, as usual. 2. The server es the ’s request on to the controller Servlet. 3. After performing the appropriate validations, etc., the controller calls the right model (depending on the business logic that needs to be executed). The model performs the business logic, and sends results back to the controller. 4. The controller invokes the view to prepare an HTML page that would eventually be sent out to the browser. 5. The HTML page is embedded inside an HTTP response message by the Web server. 6. The HTTP response is sent back to the browser.

Web Technologies

316

Fig. 9.41 MVC concept As we can see, there is a very clear distinction between the responsibilities of the various components now. The controller, model, and view do not interfere with each other at all. Without MVC, the whole thing would have to be performed by a Servlet—worse yet, perhaps by the same single Servlet!

9.2.2 Apache Struts and MVC Struts from Apache is an open-source software that can be used to create applications in the MVC architecture by making use of declarative programming more than descriptive programming. In other words, Struts allows the programmer to specify a lot of functionality via configuration files and declarations, instead of having to write the code for those ourselves. When an HTTP request is sent to a Struts application, it is handled by a special type of Servlet, called as ActionServlet. When the ActionServlet receives a request, it checks the URL and consults the Struts configuration files. Accordingly, it delegates the handling of the request to an Action class. The Action class is part of the controller and is responsible for communicating with the model layer. The Struts framework provides an abstract Action class that we need to extend based on our application-specific requirements. Let us understand all this terminology in a better manner now. Figure 9.42 shows the typical parts of a Struts application. Let us understand this process step-by-step. 1. The client sends an HTTP request to the server, as usual. In Struts applications, an action Servlet receives HTTP requests. 2. The action Servlet consults a configuration file named struts-config.xml to figure out what to do next. It realizes that it has to forward this request to a view component (usually a JSP), as per the configuration done beforehand by the programmer.

Java Web Technologies

317 3. The action Servlet forwards the request to the form bean (a Java representation of every field in the form data). The form bean is also called as an ActionForm class. This is a JavaBean, which has getter and setter methods for the fields on the form. 4. It then consults the action class (for validating input provided in the HTML form). The action class decides how to invoke the business logic now. 5. The business logic processing happens at this stage. The business logic can also be a part of the earlier class, i.e., the action class itself, or it can be a separate code (e.g., a Java class or an EJB). 6. Optionally, the database is also accessed. 7. The result of the above steps causes some output to be produced (which is not in the displayable HTML format yet). This is now ed to the View component (usually a JSP). 8. The JSP transforms this output into the final output format (usually HTML).

Fig. 9.42

Struts application flow

In essence, Struts takes the concept of MVC even further by providing built-in features for writing models, views and controllers and appropriately ing control back and forth between these components. If the programmer wants to do this herself, she would need to write the whole orchestrating logic herself.

9.3 JAVASERVER FACES (JSF) 9.3.1 Background Since the inception of the Internet programming, there has always been some debate as to where should one write the data input validation logic. Clearly, there are two ways to handle this. Either we write the validation code on the client-side (i.e., inside the Web browser in the form of JavaScript), or we can write it to execute on the server-side (i.e., inside the Web server). This is illustrated in Fig. 9.43. Both approaches have their advantages and disadvantages. Over a period of time, it was almost standardized to keep the data-entry related validations in JavaScript on the client-side. And then came ASP.NET. Until its arrival, validating anything on the server-side was possible, but quite cumbersome. ASP.NET changed all of that with the help of a novel concept titled server controls. These controls, basically drag-and-drop, can allow

Web Technologies

318 powerful validation logic to be built into ASP.NET pages with minimal effort, and the code also does not look ugly. In some sense, this actually transformed the whole Web programming model.

Fig. 9.43 HTTP request/response Quite clearly, Sun had to do something about it. The Java programming model for validations was still based on client-side JavaScript coding.

9.3.2 What is JSF? With this in mind, Sun came up with what we now know as JavaServer Faces (JSF). JSF is a technology that allows programmers to develop an HTML page using tags that are similar to the basic HTML tags, but have lot many features. For example, these tags know how to maintain their state (a huge problem in Web applications, otherwise), how and what sort of validations should take place, how to react to events, how to take care of internationalization, etc. To understand this better, Fig. 9.44 shows a very simple example of defining the same tag first in plain HTML and then in JSF.

Fig. 9.44

JSF concept

Naturally, our first reaction would be that JSF seems to be quite complex. Well, we may feel that initially because the syntax looks a bit odd. However, when we play around a bit, and also see the benefits in lieu of using somewhat more syntax, it is easy to get convinced that JSF is a very powerful technology in many situations. In the above case of JSF, for example, there is a textbox called as celsiusEdit that needs to be shown on the screen. Its value needs to be retrieved from the Web server. How? On the Web server, there would be a Java class (actually a simple JavaBean) named PageBean, which has an attribute called as celsius. The textbox is expected to show the value of this attribute on the Web page.

Java Web Technologies

319 This should clearly tell us that unlike traditional Web programming model, where the client-side variables are quite de-linked from the corresponding server-side variables, here we are literally plumbing the serverside variable values straight into HTML pages! Moreover, we are doing this without writing any code on the client-side. Now, it is also easy to imagine that we can manipulate the value of this variable in the server-side Java class the way we like. Essentially, we are allowing the server-side code to prepare the contents of this variable, and are sending them to the HTML screen in a very simple manner. Thus, the business logic need not be brought to the client, like the traditional JavaScript model. In other words, we are achieving the functionality of server-side business logic, as before. However, what is more important to understand is that we not only achieve the above, but we would also be able to perform simple validations and formatting almost without coding, but by using some simple declarations. For example, suppose that we want the to be able to enter only a two-digit value in our input text box. Then, we can modify the declaration to look as shown in Fig. 9.45.

Fig. 9.45

JSF tags example

Similarly, we can also restrict the minimum and maximum values to say 10 and 50, as shown in Fig. 9.46.

Fig. 9.46 More JSF syntax Of course, these are only a few of the possibilities that exist. Also, that to achieve the same thing in traditional Web pages, we need to write a lot of complex JavaScript code, which is also difficult to debug. In the case of JSF (like ASP.NET), we delegate this responsibility to the technology and get away with making only a few declarations, as shown earlier. This can be a tremendous boost to productivity in the case of interface intensive applications. This also brings us to another point. JSF is an overkill for applications where we do not have too much of interaction. In other words, if the is providing very few inputs, and instead the server is sending a lot of data back to the in the form of HTML pages, then using JSF would not make sense at all! Working with JSF is no longer a pain, either. All standard IDEs offer JSF these days. For example, whenever we are creating a new Web application, NetBeans prompts us to say whether we want to use JSF in this application, and provides all the basic framework for enabling JSF. Apache has provided an open source implementation of JSF, titled MyFaces. It integrates nicely with Tomcat as the Web server.

Web Technologies

320

9.3.3 JSF versus Struts How does JSF compare with other frameworks, such as Apache Struts? In general, the consensus is that JSF fares better than Struts. The simple reasons for this view are that firstly JSF has been designed to mimic what ASP.NET does, something that the developers of Struts did not have in mind. Secondly, well, Struts came before JSF, and did not have the luxury of any hindsight! Many developers feel that working with JSF is like working with Java Swing, in other words, having the ability to develop rich client applications, except that the client happens to be a thin client (Web browser), rather than the traditional Java client. However, everything in Struts is designed to work like a traditional Web application. However, Struts has proved to be quite successful in production environments, and it may be sometime before it can be replaced. But new applications can straightaway be developed in JSF, which promises to be an exciting technology for developing Web applications with rich client . Here are some guidelines for making that decision ing either Struts or JSF, or well, both! Table 9.6 lists them.

Table 9.6 Struts versus JSF—Part 1 Advantages Struts Mature and proven framework Easy to resolve problems since the developer community is quite large and documentation is quite good Good tool and IDE Open source framework

JSF Interface is very powerful Event handling is very effective

ed by a Java Community Also open source now (Apache MyFaces)

Table 9.6 Struts versus JSF—Part 2 Use only this framework if ... Struts We have an existing Struts application, that needs minor enhancements The project deadlines are very tight and unpredictability or lack of knowledgeable resources can become an issue Good tool and IDE

JSF An application needs to be built from scratch and deadlines are not neck tight Rich interface is a very high priority item

There is a small existing Web application (even done using Struts) that needs major changes

Java Web Technologies

321

Table 9.6 Struts versus JSF—Part 3 Use both Struts and JSF if ... There is a large Web application that needs significant changes. Here, we can write new code in JSF, retain existing Struts code with changes, as appropriate.

9.3.4 JSF Case Study Figure 9.47 shows a sample JSP page that contains some JSF tags. <%@page contentType=”text/html”%> <%@page pageEncoding=”UTF-8"%> <%@ taglib uri=”http://java.sun.com/jsf/core” prefix=”f” %> <%@ taglib uri=”http://java.sun.com/jsf/html” prefix=”h” %> JSF Page Example 33681r

The Page 146h3s







(Contd) Web Technologies 322 Fig. 9.47 contd...


Fig. 9.47 JSF sample This page would produce the following Web page in the browser, as shown in Fig. 9.48.

Fig. 9.48 Sample output Let us understand how this works. Firstly, we see the following code in the JSP page: <%@ taglib uri=”http://java.sun.com/jsf/core” prefix=”f” %> <%@ taglib uri=”http://java.sun.com/jsf/html” prefix=”h” %>

These taglib directives refer to the JSTL tags: n

jsf/core Core library, contains custom action elements that represent JSF objects (which are

n

jsf/html HTML library, contains custom action elements that represent JSF objects (which are to

independent of the page markup language) be rendered as HTML elements) Next, we have the following statement:

This is an action element. A view in JSF is the grouping of components that make a specific UI screen. The view contains an instance of the javax.faces.component.UIViewRoot class. It does not display anything, but it is a container for other view components (e.g., input fields, buttons, etc.). Then, we have the form elements:


Java Web Technologies

323 This represents a form component. It acts as a container for all input components that hold values that needs to be processed together. Examples of these are , , . For example:

This identifies a component that generates an HTML label. Similarly, we can guess what happens with the following:

This generates a text box with id txtName. The value that enters here would populate an attribute called as name of a server-side JavaBean titled Bean. On similar lines:

This generates a command button with type as submit or reset. The action attribute here has relevance, as explained later. How is this linked to the next JSP page (welcome.jsp)? This is depicted in Fig. 9.49.

Fig. 9.49 Understanding JSF flow There is a sequence of events when a request is sent to a JSF page is called as the JSF request processing lifecycle, or simply JSF lifecycle. For example:

Web Technologies

324 This specifies that when the form is submitted, the value entered by the in the input text box should be ed on to the corresponding property in the server-side JavaBean named modelBean. Let us understand this. 1. As we can see, the index.jsp page is supposed to execute something called as . Now, what is this ? It is a like a placeholder. It says that if index.jsp is saying “”, then we want to do something. Now, what is that something? It is welcome.jsp. In other words, we are saying that if “.jsp” says , please transfer control to “welcome.jsp”. 2. Therefore, control would now come to welcome.jsp. However, this would not happen directly. , we had stated earlier that some of the controls on the HTML page refer to some properties in the Bean Java class? Hence, the control es to the Bean class first, and after that, it goes to the welcome.jsp page. 3. The welcome.jsp simply picks up the name from the bean and displays a welcome message back to the . The Bean class is shown in Fig. 9.50 and the welcome.jsp code is shown in Fig. 9.51. package com.jsf.; public class Bean { private String name; private String ; public String getName() { return name; } public void setName(String Name) { name = Name; } public String get() { return ; } public void set(String ) { = ; } }

Fig. 9.50

Bean.java

<%@ taglib uri=”http://java.sun.com/jsf/core” prefix=”f” %> <%@ taglib uri=”http://java.sun.com/jsf/html” prefix=”h” %> Welcome to JSF! 1c642u

(Contd)

Java Web Technologies

325 Table 9.51 contd...

4p345f



Fig. 9.51

welcome.jsp

Figure 9.52 shows the output.

Fig. 9.52 Output of welcome.jsp

9.4 ENTERPRISE JAVABEANS (EJB) 9.4.1 Introduction Let us not confuse the Enterprise JavaBeans (EJB) technology with any of the earlier technologies we have discussed. EJB is not an alternative for JSP/Servlets, Struts, etc. Instead, these technologies, such as JSP/ Servlets and Struts use EJB for performing business processing. EJB is also called as transaction-oriented middleware. In other words, it takes care of the heavy-duty work, such as transaction management, security, load balancing, etc., for providing better throughput. The earlier versions of EJB were quite complex to set up and code. Hence, a lot of effort has gone into making EJB simpler. The resulting output is EJB version 3.0. EJB encourages component-based development. For example, suppose that we need to create a shopping cart-based application. Then, we can think of three main aspects, customer data, order data, and payment data. EJB looks at these three as components, and aims at building an integration layer between them. This concept is shown in Fig. 9.53.

Web Technologies

326

Fig. 9.53

Components concept

What features does EJB provide? We can summarize them as follows.

Transaction management A developer can specify that your enterprise beans need a transactional environment simply by setting a specific property of the bean you develop. It means that the code inside the enterprise bean would automatically run inside a transaction, which is managed by the EJB infrastructure. That is, you can be rest assured that either the entire code in the enterprise bean would be executed completely or none at all. For this, the enterprise bean in turn, calls an API of the EJB container implicitly. A software developer does not have to worry about it. Note that the transaction management applies to the whole bean, and not to any specific error checking within that bean. That is, suppose an end-of-day stock update bean performs the following two steps: (a) Read each record (sales/purchase) from a daily transaction file. (b) Update the corresponding master file record (based on a common product id) with the results of the transaction (either decrement or increment the quantity). Now, when the code for the above operation is ready, the developer could set the transaction-enabled property of the bean to true, which means that whenever the bean is executed, the responsibility of making sure that the whole transaction file is processed, and the master file updated correctly, is left to the bean. In case of a failure, the bean would automatically roll back the changes done to the master file since the bean was invoked (or until the last automatic commit, depending on several other factors that we need not discuss here), thus ensuring database consistency.

Persistence Persistence means storing the state of an object in some form of permanent storage, such as a disk. When a developer signifies to the EJB container that he wants to make an enterprise bean persistenceenabled, the EJB container automatically ensures that the last state of the enterprise bean object is preserved on the disk, and later on retrieved. This can be important in situations where an enterprise bean has to store certain values on the server-side. For instance, suppose a visits a shopping site and puts three items in the shopping cart. Then the disconnects. Now, the state of the ’s transaction can be recorded in a database or the enterprise bean managing the conversation can store it. When the connects back, say after three days, the enterprise bean simply brings back the values for that from the disk, so that the can continue purchasing or complete his purchasing process.

Remote awareness Since EJB is all about remote objects, i.e., since objects and clients can be in different parts of the world, it is important that all these objects are allowed to communicate over networks. A developer does not have to write any kind of network code to make the enterprise beans that he develops, network-aware/ distributed. The EJB container automatically does this. For this, the EJB container wraps the enterprise bean in a network-enabled object (i.e., it does not matter where the calling/called objects reside—they can be on the

Java Web Technologies

327 same or different machines). This network-enabled object intercepts calls from remote clients, and delegates them to the appropriate enterprise bean.

Multi- The EJB container implicitly adds the code required for allowing an enterprise bean to work with several clients at the same time. It provides built-in for multithreading, instantiating multiple instances of an enterprise bean whenever required, etc., automatically.

Location transparency The client of an enterprise application bean need not worry about the actual physical location of the bean. That is handled by the EJB container. In addition to these, the EJB server takes on the responsibilities of creating instances of new components, managing database connections, threads and sockets, etc.

9.4.2 EJB Classification At a broad level, Enterprise Java Beans are classified into three major types, session beans, entity beans, and Message Driven Bean (MDB). This is shown in Fig. 9.54.

Fig. 9.54 Types of EJB Let us discuss these types of beans in detail. At this stage, we need to highlight that entity beans are not recommended by EJB 3.0. Instead, Java Persistence APIs (also called as entities) are recommended to be used. However, we shall cover entity beans for the sake of completeness.

9.4.3 Session Beans A session bean contains some specific business-processing related logic. It is a Java object that encapsulates the necessary code for performing a set of operations corresponding to one or more business processes. The business processes themselves can be business logic, business rules or workflow. A session bean is a reusable piece of software. For instance, a session bean called as Update salary could be used to update the salary of one or all employees by a certain percentage. The name session stems from the fact that the life of a session bean is limited to the time for which a client uses the services of a session bean. Thus, when the client code invokes the services of a session bean, the application server creates an instance of the session bean. This session bean then services the client as long as necessary. When the client completes the job and disconnects, the application server destroys the instance of the session bean. An instance of a session bean is unique for a client—that is, two or more clients can never share a single session bean. This is essential for ensuring transaction management. If two or more clients use the services of

Web Technologies

328 the same instance of a session bean, there would be utter confusion; because they might accidentally access the same data. Of course, to avoid this, session beans can be made thread-safe, so that two or more session beans can share code, but maintain separate copies of data. However, this is an implementation issue that needs to be decided by the application server vendor. From a common developer’s perspective, a session bean is never shared among s. A client never explicitly creates an instance of, or destroys, a session bean. It is always done by the EJB container running inside the application server. This ensures optimum utilization of bean resources. Also, this frees the client from issues such as memory allocation, stack management, etc., and provides him with a simple and clean interface. The bean management is left to the application server. A session bean is further classified into two types, stateful session beans and stateless session beans. This is shown in Fig. 9.55.

Fig. 9.55 Classification of session beans Let us discuss these now.

Stateful session beans A session bean corresponds to a business process. However, a business process may complete in just one stroke, or it might need more than one interaction between the client and the server. The concept of transactions is more relevant for the latter. In this case, while the clients and servers interact more than once, during the entire lifecycle of these interactions, the state of the application (any data values) must be preserved. Only if all the steps during these set of interactions completes successfully, the operation as a whole can be considered to be successful. For handling such situations, or more correctly, transactions, stateful session beans are extremely important. A typical situation for requiring multiple interactions between the client and the server is a shopping cart in an e-commerce application. Initially, the application might present a shopping cart to the . Then, the might add items to it, remove items from it, or change some of the items. This interaction can go on for quite some time. Throughout this time, the application must the latest state of the shopping cart, as decided by the . In such a scenario, a stateful session bean is very useful. We can create a session bean that represents the business processes involved in the shopping cart, and ensure that the state of the application is always maintained. that in the absence of a server-side transactional environment such as EJB, this would have to be handled by means of techniques such as cookies.

Stateless session beans A typical interaction between a Web browser and a Web server consists of a request-response pair. That is, the browser sends a request for an HTML document, the server finds the document and sends it back as a response to the browser. After this, the browser might send a new request for another Web page. However, this request is no way related to the previous request as far as the server is concerned. In such situations, where the client and the server interact in a request-response mode, and then forget about it,

Java Web Technologies

329 there is no necessity for maintaining the state of the application. Stateless session beans are candidates for such business processes. For instance, in an e-commerce application, the client might enter credit cards details, such as its issuing company, number, expiration date and the customer’s name. It might request a stateless session bean to this credit card. The stateless session bean might perform the verification, and send a success/failure message back to the client, depending on whether the credit card is valid. No more interactions between the client and the server are required for this. Such business scenarios are pretty useful as candidates for stateless session beans.

9.4.4 Entity Beans At the end of a business process, usually there is a need for storing the results of the operations. Also, throughout the business process, some data from persistent storage might be referenced. For instance, when a customer wants to see his details online, using the Web site set up by his bank, the bank must be able to retrieve the ’s information from its database and present it to the . The might then initiate a funds transfer request, because of which, the bank might need to update its database for this . Similar examples can be given for many business processes. In case of EJB, the entity beans represent database objects, which either bring data from databases to running applications, when required, or update data into the databases, when requested by the application. An entity bean is an in-memory object representation of persistent data stored in a database. Thus, an entity bean can be used for modelling data items such as a bank , an item in a purchase order, an employee record, and so on. They can represent real-world objects such as products, employees and customers. Thus, entity beans do not associate themselves directly with business processes. They are useful for modelling data elements only. As against this, session beans handle the business processes, as we have already discussed. Thus, transfer amount could be a business process, which can be modelled by a session bean. This process would need to credit one and debit another. The information regarding which s to credit and debit, and the end result, must be represented by one or more entity beans. Thus, session beans normally make use of entity beans whenever they want to access or update persistent data from databases. Entity beans were devised for a simple reason that whereas most of today’s databases are in the relational form, the applications that make use of these databases use the object technology. Thus, a mapping is required between the relational view and the object view. Entity beans provide just that. They allow session beans to treat the persistent data, actually stored as rows and columns in relational tables, as objects. For example, in the transfer amount example, an entity bean could be used by a session bean for reading the details in an object. Suppose the holder’s name is xyz. The session bean might then issue a transfer instruction on that , i.e., the xyz object. For example, the instruction could be in the form xyz.transfer (1000, 2000, 100). As far as the session bean is concerned, it is sticking to the objectoriented paradigm of creating objects, and manipulating them with the help of their methods such as transfer. However, internally, the object might represent one particular row of an s relational table, which gets updated as a result of this operation. The entity bean hides these implementation details from the session bean, and allows it to treat every piece of persistent data as real-world objects. This is shown in Fig. 9.56. Clearly, since entity beans are used for representing data that is preserved across sessions, they have a much longer life than session beans. Even if an application crashes, or a client disconnects from a server for some reason, an entity bean can always be reconstructed from its underlying database.

Web Technologies

330

Fig. 9.56 Session and entity beans Entity beans are different from session beans in one more respect. Whereas only one can use a single session bean instance at a time, an entity bean can service more than one client at the same time. Another point to note is that since entity beans basically model data stored in databases, they are very useful when there is huge data existing in legacy applications that need to be Web-enabled by using EJB. The data would be already there—all that would be required would be to model entity beans on the existing database structure. Like session beans, entity beans are also classified into two types. These two types of entity beans are (a) Bean-managed persistent entity beans and (b) Container-managed persistent entity beans. Let us discuss these two types now.

Bean-managed persistent entity beans As we have noted, an entity bean is an in-memory object representation of persistent data stored in a database (the database itself could be relational/hierarchical/network/ object). In case of bean-managed entity beans, the responsibility of locating, changing and writing back the data between an entity bean and the persistent storage of the database is left to the developer. That is, the developer has to explicitly write program code for performing all these tasks.

Container-managed persistent entity beans In this type of entity beans, the EJB container performs automatic persistence on behalf of the developer. The developer does not hard code any statements required for locating, changing or writing back the data between an entity bean and the underlying database. Instead, the developer describes what he wants, and the EJB container performs the translation from the developer’s description to the actual program code. This makes the application database independent, since the developer now does not write code specific for a database, and instead, leaves it to the EJB container.

9.4.5 EJB 3.0 Example We shall consider a very simple example of EJB 3.0 so as to understand the concepts of the technology better. We had mentioned earlier that in version 3.0, the EJB technology has simplified the programming model remarkably to make it a lot easier. EJB 3.0 makes heavy use of annotations. Annotations are small declarations, which reduce the coding effort.

Java Web Technologies

331 The various aspects of an EJB in version 3.0 are as follows.

Business interface The business interface is a standard Java interface, which specifies the list of methods that can be called by a client while making use of this EJB. For example, if the EJB is written to validate a credit card number, we may have a business interface by the name CardValidate, which has a method called as validateCreditCard (), among other methods.

Bean class The methods specified in the business interface are coded inside the bean class. In other words, the business interface states what can be done by that interface and leaves the bean class to implement those features. Based on our earlier example, if we have a CardValidateBean containing a method called as validateCreditCard (), the actual code for this method is written in the bean class.

Client code The client code uses Java Naming and Directory API (JNDI). JNDI is where the EJB repository resides. The client code locates the concerned bean using JNDI. Once located, it uses this reference to actually call the business method of interest, by making use of the business interface specifications. For example, in our case, the client would call the validateCreditCard () method. Figure 9.57 shows the business interface for our EJB. package demo; import javax.ejb.Remote; @Remote public interface CardValidate { public String validateCreditCard (); }

Fig. 9.57

Business interface sample

Figure 9.58 shows the corresponding bean class. package demo; import javax.ejb.Remote; import javax.ejb.Stateless; @Stateless public class CardBean implements CardValidate { public String validateCreditCard () { System.out.println (“Validating card ...”); return “Card is valid”; } }

Fig. 9.58 Session bean example Here, we are simply returning a string message saying that the card is valid. In real life, of course, we will have far more complex logic. Finally, the client code looks as shown in Fig. 9.59.

Web Technologies

332 package democlient; import javax.naming.Context; import javax.naming.InitialContext; import stateless.*; public class CardClient { public static void main (String [] args) throws Exception { InitialContext ctx = new InitialContext (); CardValidate cv = (CardValidate) ctx.lookup (“demo.CardValidate”); System.out.println (cv.validateCreditCard ()); } }

Fig. 9.59 EJB client example Let us understand what is happening here. The client code is written in a class titled CardClient. This client code has a main method. Inside this method, we use JNDI to obtain what is called as an entry point into the JNDI data structure. From there, we lookup for an interface called as CardValidate, which is our bean’s business interface. Once we obtain a handle to this interface, we call the method validateCreditCard () on this business interface.

9.5 JAVA APPLETS Sun Microsystems have developed a very popular active Web page technology. This involves the use of Java applets. An applet is a small program written in the Java programming language, and is embedded in HTML page to form a Web page. An applet makes a Web page active. The applet gets ed to the Web browser (client) along with the requested Web page and is executed there under the control of the Java Virtual Machine (JVM) installed in the browser. An applet then creates animations on the client computer. Other similar technologies are based on this primary concept. Also, we must that though the applets were originally thought of primarily for animation, they can be used for many other applications, as discussed later. This is shown in Fig. 9.60.

9.6 WHY ARE ACTIVE WEB PAGES POWERFUL? Active Web page technology is powerful because of the following reasons. 1. Active Web pages get ed onto the client computer. There, they locally perform computations and tasks, such as drawing images and creating animations. Therefore, there is no delay between the creation of an image and its display. Obviously, once an active Web page is ed onto the client computer, there is no reason for ing the Web server again, unlike client pull. As a result, the client computer has full control in of displaying animations. Slow Internet connection speed does not matter. Even if this connection is slow, it will take little longer to the entire Web page, but once ed, the animation will look continuous and not jerky.

Java Web Technologies

333

Fig. 9.60

Active Web pages and Java applets

Other uses of applets are still contemplated. For example, the income tax Web site could embed applets in Web pages that can be ed with the Web page to the client. The applet could then open up a spreadsheet where the can enter all his tax data for the year, such as earnings, deductions, etc. The applet would then calculate the taxes based on the figures, and when final, the could the spreadsheet back to the income tax site. This can significantly reduce the manual processing and paperwork involved for such purposes. Applets can be used in such e-commerce applications in future. Apart from this, applets can be used as explained in the chapter on web architectures. 2. Since the client computer takes responsibility of executing the program, the Web server is relieved of this job. This reduces the burden on the part of the Web server. Recall that this is in contrast to the dynamic Web pages. In case of dynamic Web pages, the program is executed at the Web server. So, if many s are accessing it, the Web server might not be able to serve all the requests very fast. In case of active Web pages, this is clearly the client’s responsibility.

9.7 WHEN NOT TO USE ACTIVE WEB PAGES? Active Web pages are mainly useful for client-side animation, as explained earlier. However, they are not useful when server-side programming is important. Server-side programming is useful for business rules checking, validations against some databases (e.g., referential integrity) unlike only local validations for which the client side scripting can be used, database operations, etc. For instance, when a enters his id and , active Web pages cannot be used because these details have to be validated against a database of valid ids and s stored on the server. It is very important to understand the difference between dynamic and active Web pages. Dynamic Web pages are mainly used for server-side processing (although they allow client-side scripting for basic validations, etc.), whereas active web pages have to be executed on the client browser entirely. However, the main processing is always done at the server side. For example, access/changes to the databases, validation routines (against databases), etc., must be executed on the server for reasons of security and bandwidth. However, in addition, the dynamic Web pages might add small client-side code for screen

Web Technologies

334 validations (e.g., only 3 items can be selected), for which a trip to the server is not desirable. When the dynamic web page is requested for by the browser, it is executed and the resulting HTML code along with this client side script (in Javascript/Vbscript, etc.) as it is, is sent back to the client. At the client side, the browser interprets the HTML code and interprets the script. Active Web pages are mainly used for client-side execution of code, e.g., applets.

9.8 LIFE CYCLE OF JAVA APPLETS An applet is a windows-based program (note that the term windows here should not be confused with Microsoft’s Windows operating system; it actually means the windows that we see on the screen). Applets are eventdriven—similar to the way an operating system has Interrupt Service Routines (ISR). An applet waits until a specific event happens. When such an event occurs, the applet receives intimation from the Java Virtual Machine (JVM) inside the browser. The applet then has to take an appropriate action and upon completion, give control back to the JVM. For example, when the moves the mouse inside an applet window, the applet is informed that there is a mouse-move event. The applet may or may not take an action, depending on the purpose it was written for and also its code. The typical stages in the life cycle of an applet are given below. 1. When an applet needs to be executed for the first time, the init (), start () and paint () methods of the applet are called in the said sequence. (a) The init () method is used to initialize variables or for any other start up processing. It is called only once during the lifetime of an applet. (b) The start () method is called after init (). It is also called to restart an applet after it has been stopped. Whereas init () is called only once, start () is called every time the Web page containing the applet is displayed on the screen. Therefore, if a leaves a Web page and comes back to it, the applet resumes execution at start (). (c) The paint () method is called each time the applet’s output must be redrawn. This can happen for a variety of reasons. For instance, windows of other applications can overwrite the window in which the applet is running, or the might minimize and then restore the applet window. It also gets called when the applet begins execution for the very first time. 2. The stop () method is called when the leaves the Web page containing the applet. This can happen when the selects or types the URL of another Web page, for instance. The stop () method is used to suspend all the threads that are running for the applet. As we have seen, they can be restarted using the start () method if the visits the Web page again. 3. The destroy () method is called when the environment determines that the applet needs to be removed completely from the client’s memory. This method should then free all the resources used by the applet. As in the case of a servlet, your applet need not use all these methods. The applet can use only the methods that are useful to it—others get inherited from the various Java classes anyway, and need not be overridden. So, one applet may use just the paint () method and leave everything else to the other default methods that are inherited from the applet’s super classes. Let us take a simple applet example. Suppose our Web page contains the following code for executing an applet, as shown in Fig. 9.61.

Java Web Technologies

335 ... ...


Fig. 9.61 HTML page containing an applet As we know, when the Web server sends this HTML page to the client, it also sends the bytecode of an applet named TestApplet along with it. Let us take a look at the applet’s code, as shown in Fig. 9.62. The applet sets the background colour to cyan, the foreground colour to red and displays a message that shows the order in which the init (), start () and paint () methods of an applet get called. // As before, import statements are used to add the standard ready-made Java classes to your code. import java.awt.*; import java.applet.*; public class TestApplet extends Applet { String msg; // public and void are Java keywords and can be ignored for the current discussion. public void init () { msg = “** In the init () method **”; } // Initialize the string to be displayed. public void start () { msg += “** In the start () method **”; } // Display string in the applet window. Note that we have a parameter of the type // Graphics to this method. This is a ready-made object that contains various // graphics-related methods. For example, you can draw a window using a method of the // Graphics class. public void paint (Graphics g) { msg += “** In the paint () method **”; g.drawString (msg, 10, 30); } }

Fig. 9.62 Applet code When a client browser requests for this HTML page, the bytecode of the above applet would also be sent to the browser. There, the above code would execute. The init () method would be called first. It sets the variable msg to a message ** In the init () method **.

Web Technologies

336 Next, the start () method gets called. It adds a string ** In the start () method ** to the msg variable. Thus, msg variable now contains ** In the init () method ** ** In the start () method **. Finally, the paint () method gets called. It adds its own message to the msg variable, making it ** In the init () method **** In the start () method **** In the paint () method **. It then prints it using the standard drawString () method provided by Java. Here, 10 and 30 indicate the pixel coordinates of the screen on which the value of msg is to be displayed. The stop () and destroy () methods are not used in this simple applet as they are not required here. The output of the applet would be: ** In the init () method **** In the start () method **** In the paint () method ** This is shown in Fig. 9.63.

Fig. 9.63 The output of the applet The drawback of applets is that they make the overall execution slow. First, they need to be ed from the Web server, and then interpreted by the JVM installed in the Web browser. Therefore, the key is to keep them as small as possible.

Java Web Technologies

337

SUMMARY l l

l l

l l

l

l

l

l

l

l

The Java Web technologies have evolved over a number of years, starting Java Servlets. Servlets are programs written in Java that execute on the Web server, in response to a request from the Web browser to produce HTML content dynamically. Servlets are a bit tedious to write. Hence, Sun Microsystems came up with Java Server Pages (JSP). A JSP is a program written using HTML and Java languages, to produce and send back a Web page to the browser, in response to an HTTP request. JSPs are easier to write than Servlets. Over a period of time, Sun Microsystems came up with a model, which could use both Servlets as well as JSPs in a single application. JSP and Servlets provide for all kinds of Java technologies, e.g., JDBC for database access, JavaBeans for easier programming, etc. The Enterprise JavaBeans (EJB) technology allows us to write server-side business components, that can be used by programs, such as Servlets and JSPs, as and when needed. EJB should be used only if the application is quite demanding or intensive in nature, needing performance, security, load balancing, etc. The technology of Java Server Faces (JSF) has been developed to allow Web programmers to write extensive validations and to provide richer interface to the end s. JSF allows the developers to write validations and perform other interface related activities without writing too much of complex code, but instead, to do this in a declarative fashion. The technology of Struts is another technology for developing rapid Web applications. It allows the developers to quickly develop a Web-based application containing business rules and navigation.

REVIEW QUESTIONS Multiple-choice Questions 1. The is usually made up of a Web browser, which means it can primarily deal with HTML pages and JavaScript. (a) client tier (b) server tier (c) Internet layer (d) proxy server 2. for Web Services is provided by the APIs. (a) TAPI (b) JAX-WS (c) SOAP (d) JAXB 3. A Servlet runs inside . (a) Servlet container (b) browser (c) applet (d) Web-enabled browser 4. A Servlet container is the environment for a Java Servlet. (a) working (b) browsing (c) hosting and execution (d) broadcasting

Web Technologies

338 5. A JSP page is composed of . (a) directives (b) scripting elements (c) actions (d) templates 6. The following snippet of code <%= new java.util.Date ( ) %> is . (a) comment (b) expression (c) variable (d) condition 7. The object is used to read the values of the HTML form in a JSP, received as a part of the HTTP request sent by the client to the server. (a) Page (b) Request (c) Response (d) PageContext 8. is the pipe between a Java program and the RDBMS. (a) Response (b) Object (c) Page (d) Connection object 9. is used to define and execute dynamic SQL statements. (a) PreparedStatement (b) CallableStatement (c) ResultSet object (d) Page 10. The is used to prepare and send the resulting output back to the . (a) model (b) view (c) controller (d) JavaScript

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Discuss in detail Sun’s Java server architecture. What is a Servlet? Explain how a Servlet is processed. What are the elements of a JSP page? Write a Servlet which will accept name and in a form, which will compare both in the database display success or failure. Why is session management is required in JSP/Servlet? Write a JSP scriptlet for displaying even numbers between 1 to 50 and also its JSTL version. How do transaction in JDBC happen? Discuss MVC architecture in detail. Explain JSF in detail and discuss how it affects the Web development. Write a Java program which will display a Java applet showing Welcome to Applet message.

Exercises 1. Find out the different Servlet containers available. Study their features and also differences. 2. Examine how real-life Web sites perform data entry validations. 3. Evaluate the various Java Integrated Development Environments (IDEs), such as NetBeans, Eclipse, and JDeveloper. 4. Mention all the plug-in available with the Apache MyJSF. 5. Explain where MVC can be used in real life Web sites (e.g., Amazon or ICICI Bank).

Web Security

339

Web Security

+D=FJAH

10

INTRODUCTION Most initial computer applications had no, or at best, very little security. This continued for a number of years until the importance of data was truly realized. Until then, computer data was considered to be useful, but not something to be protected. When computer applications were developed to handle financial and personal data, an urgent need for security was felt. People realized that data on computers is an extremely important aspect of modern life. Therefore, various mechanisms to maintain security began to gain prominence. Two typical examples of such security mechanisms were as follows. n n

Provide a id and to every , and use that information to authenticate a . Encode information stored in the databases in some fashion, so that it is not visible to s who do not have the right permissions.

Organizations employed their own mechanisms in order to provide for these kinds of basic security mechanisms. As technology improved, the communication infrastructure became extremely mature, and newer applications began to be developed for various demands and needs. Soon, people realized that the basic security measures were not quite enough. Furthermore, the Internet took the world by storm, and there were many examples of what could happen if there was insufficient security built in applications developed for the Internet. Figure 10.1 shows such an example of what can happen when you use your credit card for making purchases over the Internet. From the ’s computer, the details such as id, order details such as order id and item id, and payment details such as credit card information travel across the Internet to the server (i.e., to the merchant’s computer). The merchant’s server stores these details in its database. There are various security holes here. First of all, an intruder can capture the credit card details as they travel from the client to the server. If we somehow protect this transit from an intruder’s attack, it still does not solve our problem. Once the merchant receives the credit card details and validates them so as to process the order and later obtain payments, the merchant stores the credit card details into its database. Now, an attacker can simply succeed in accessing this database, and therefore, gain access to all the credit card numbers stored therein! One Russian attacker (called Maxim) actually managed to intrude into a merchant Internet site and obtained 300,000 credit card numbers from its database. He then attempted extortion by demanding protection money ($100,000) from the merchant. The merchant refused to oblige. Following this, the attacker published

Web Technologies

340 about 25,000 of the credit card numbers on the Internet! Some banks reissued all the credit cards at a cost of $20 per card, and others forewarned their customers about unusual entries in their statements.

Fig. 10.1

Example of information travelling from a client to a server over the Internet

Such attacks could obviously lead to great losses—both in of finance and goodwill. Generally, it takes $20 to replace a credit card. Therefore, if a bank has to replace 3,00,000 such cards, the total cost of such an attack is about $6 million! How nice it would have been, if the merchant in the example just discussed had employed proper security measures! Of course, this is just one example. Several such cases have been reported in the last few months, and the need for proper security is being felt increasingly with every such attack. For example, in 1999, a Swedish hacker broke into Microsoft’s Hotmail Web site, and created a mirror site. This site allowed anyone to enter any Hotmail ’s email id, and read her emails! In 1999, two independent surveys were conducted to invite people’s opinions about the losses that occur due to successful attacks on security. One survey pegged an average loss of $256,296 per incident, and the other survey reported average loss of $759,380 per incident. Next year, this figure rose to $972,857!

10.1

PRINCIPLES OF SECURITY

Having discussed some of the attacks that have occurred in real life, let us now classify the principles related to security. This will help us understand the attacks better, and also help us in thinking about the possible solutions to tackle them. We shall take an example to understand these concepts. Let us assume that a person A wants to send a check worth $100 to another person B. Normally, what are the factors that A and B will think of, in such a case? A will write the check for $100, put it inside an envelope, and send it to B. n

n

n

A will like to ensure that no one except B gets the envelope, and even if someone else gets it, that person does not come to know about the details of the check. This is the principle of confidentiality. A and B will further like to make sure that no one can tamper with the contents of the check (such as its amount, date, signature, name of the payee, etc.). This is the principle of integrity. B would like to be assured that the check has indeed come from A, and not from someone else posing as A (as it could be a fake check in that case). This is the principle of authentication.

Web Security

341 n

What will happen tomorrow if B deposits the check in her , the money is transferred from A’s to B’s , and then A refuses having written/sent the check? The court of law will use A’s signature to disallow A to refute this claim, and settle the dispute. This is the principle of nonrepudiation.

These are the four chief principles of security. There are two more, access control and availability, which are not related to a particular message, but are linked to the overall system as a whole. We shall discuss all these security principles in the next few sections.

10.1.1 Confidentiality The principle of confidentiality specifies that only the sender and the intended recipient(s) should be able to access the contents of a message. Confidentiality gets compromised if an unauthorized person is able to access a message. Example of compromising the confidentiality of a message is shown in Fig. 10.2. Here, the of computer A sends a message to of computer B. (Actually, from here onwards, we shall use the term A to mean the A, B to mean B, etc., although we shall just show the computers of A, B, etc.). Another C gets access to this message, which is not desired, and therefore, defeats the purpose of confidentiality. Example of this could be a confidential email message sent by A to B, which is accessed by C without the permission or knowledge of A and B. This type of attack is called as interception.

Fig. 10.2

Loss of confidentiality

Interception causes loss of message confidentiality.

10.1.2 Authentication Authentication mechanisms help establish proof of identities. The authentication process ensures that the origin of a electronic message or document is correctly identified. For instance, suppose that C sends an electronic document over the Internet to B. However, the trouble is that C had posed as A when she sent this document to B. How would B know that the message has come from C, who is posing as A? A real life example of this could be the case of a C, posing as A, sending a funds transfer request (from A’s to C’s ) to bank B. The bank might happily transfer the funds from A’s to C’s —after all, it would think that A has requested for the funds transfer! This concept is shown in Fig. 10.3. This type of attack is called as fabrication. Fabrication is possible in absence of proper authentication mechanisms.

Web Technologies

342

Fig. 10.3 Absence of authentication

10.1.3 Integrity When the contents of a message are changed after the sender sends it, but before it reaches the intended recipient, we say that the integrity of the message is lost. For example, suppose you write a check for $100 to pay for the goods bought from the US. However, when you see your next statement, you are startled to see that the check resulted in a payment of $1000! This is the case for loss of message integrity. Conceptually, this is shown in Fig. 10.4. Here, C tampers with a message originally sent by A, which is actually destined for B. C somehow manages to access it, change its contents, and send the changed message to B. B has no way of knowing that the contents of the message were changed after A had sent it. A also does not know about this change. This type of attack is called as modification.

Fig. 10.4

Loss of integrity

Modification causes loss of message integrity.

10.1.4 Non-repudiation There are situations where a sends a message, and later on refuses that she had sent that message. For instance, A could send a funds transfer request to bank B over the Internet. After the bank performs the funds transfer as per A’s instructions, A could claim that she never sent the funds transfer instruction to the bank! Thus, A repudiates, or denies, her funds transfer instruction. The principle of non-repudiation defeats such possibilities of denying instructions, once sent. This is shown in Fig. 10.5.

Web Security

343

Fig. 10.5

Establishing non-repudiation

Non-repudiation does not allow the sender of a message to refute the claim of not sending that message.

10.1.5 Access Control The principle of access control determines who should be able to access what. For instance, we should be able to specify that A can view the records in a database, but cannot update them. However, B might be allowed to make updates as well. An access control mechanism can be set up to ensure this. Access control is broadly related to two areas, role management and rule management. Role management concentrates on the side (which can do what), whereas rule management focuses on the resources side (which resource is accessible, and under what circumstances). Based on the decisions taken here, an access control matrix is prepared, which lists the s against a list of items they can access (e.g., it can say that A can write to file X, but can only update files Y and Z). An Access Control List (ACL) is a subset of an access control matrix. Access control specifies and controls who can access what.

10.1.6 Availability The principle of availability states that resources (i.e., information) should be available to authorized parties at all times. For example, due to the intentional actions of another unauthorized C, an authorized A may not be able to a server computer B, as shown in Fig. 10.6. This would defeat the principle of availability. Such an attack is called as interruption.

Fig. 10.6

Attack on availability

Interruption puts the availability of resources in danger. We may be aware of the traditional OSI standard for Network Model (titled OSI Network Model 7498-1), which describes the seven layers of the networking technology (application, presentation, session, transport,

Web Technologies

344 network, data link, and physical). A lesser known standard on similar lines is the OSI standard for Security Model (titled OSI Security Model 7498-2). This also defines seven layers of security in the form of: n n n n n n n

Authentication Access control Non repudiation Data integrity Confidentiality Assurance or Availability Notarization or Signature

10.1.7 Specific Attacks Sniffing and spoofing On the Internet, computers exchange messages with each other in the form of small groups of data, called as packets. A packet, like a postal envelope, contains the actual data to be sent, and the addressing information. Attackers target these packets, as they travel from the source computer to the destination computer over the Internet. These attacks take two main forms: (a) Packet sniffing (also called as snooping) and (b) Packet spoofing. Since the protocol used in this communication is called as Internet Protocol (IP), other names for these two attacks are (a) IP sniffing and (b) IP spoofing. The meaning remains the same. Let us discuss these two attacks.

Packet sniffing Packet sniffing is a ive attack on an ongoing conversation. An attacker need not hijack a conversation, but instead, can simply observe (i.e., sniff) packets as they by. Clearly, to prevent an attacker from sniffing packets, the information that is ing needs to be protected in some ways. This can be done at two levels, i.e., (i) the data that is travelling can be encoded in some ways, or (ii) the transmission link itself can be encoded. To read a packet, the attacker somehow needs to access it in the first place. The simplest way to do this is to control a computer via which the traffic goes through. Usually, this is a router. However, routers are highly protected resources. Therefore, an attacker might not be able to attack it, and instead, attack a less-protected computer on the same path.

Packet spoofing In this technique, an attacker sends packets with an incorrect source address. When this happens, the receiver (i.e., the party who receives these packets containing false address) would inadvertently send replies back to this forged address (called as spoofed address), and not to the attacker. This can lead to three possible cases. (i) The attacker can intercept the reply If the attacker is between the destination and the forged source, the attacker can see the reply and use that information for hijacking attacks. (ii) The attacker need not see the reply If the attacker’s intention was a Denial Of Service (DOS) attack, the attacker need not bother about the reply. (iii) The attacker does not want the reply The attacker could simply be angry with the host, so it may put that host’s address as the forged source address and send the packet to the destination. The attacker does not want a reply from the destination, as it wants the host with the forged address to receive it and get confused.

Phishing Phishing has become a big problem in recent times. In 2004, the estimated losses due to phishing were to the tune of USD 137 million, according to Tower Group. Attackers set up fake Web sites, which look like real Web sites. It is quite simple to do so, since creating Web pages involves relatively simple technologies

Web Security

345 such as HTML, JavaScript, CSS (Cascading Style Sheets), etc. Learning and using these technologies is quite simple. The attacker’s modus operandi works as follows. 1. The attacker decides to create her own Web site, which looks very identical to a real Web site. For example, the attacker can clone Citibank’s Web site. The cloning is so clever that human eye will not be able to distinguish between the real (Citibank’s) and fake (attacker’s) sites now. 2. The attacker can use many techniques to attack the bank’s customers. We illustrate the most common one, below. The attacker sends an email to the legitimate customers of the bank. The email itself appears to have come from the bank. For ensuring this, the attacker exploits the email system to suggest that the sender of the email is some bank official (e.g., [email protected]). This fake email warns the that there has been some sort of attack on the Citibank’s computer systems and that the bank wants to issue new s to all its customers, or their existing PINs, etc. For this purpose, the customer is asked to visit a URL mentioned in the same email. This is conceptually shown in Fig. 10.7.

Fig. 10.7 Attacker sends a forged email to the innocent victim (customer) 3. When the customer (i.e., the victim) innocently clicks on the URL specified in the email, she is taken to the attacker’s site, and not the bank’s original site. There, the customer is prompted to enter confidential information, such as her or PIN. Since the attacker’s fake site looks exactly like the original bank site, the customer provides this information. The attacker gladly accepts this information and displays a Thank you to the unsuspecting victim. In the meanwhile, the attacker now uses the victim’s or PIN to access the bank’s real site and can perform any transaction by posing as the customer! A real-life example of this kind of attack is reproduced below from the site http://www.fraudwatchinternational.com.

Web Technologies

346 Figure 10.8 shows a fake email sent by an attacker to an authorized PayPal .

Fig. 10.8 Fake email from the attacker to a PayPal As we can see, the attacker is trying to fool the PayPal customer to her credit card details. Quite clearly, the aim of the attacker is to access the credit card information of the customer and then misuse it. Figure 10.9 shows the screen that appears when the clicks on the URL specified in the fake email. Once the provides these details, the attacker’s job is easy! She simply uses these credit card details to make purchases on behalf of the cheated card holder!

Pharming (DNS spoofing) Another attack, known earlier as DNS spoofing or DNS poisoning is now called as pharming attack. As we know, using the Domain Name System (DNS), people can identify Web sites with human-readable names (such as www.yahoo.com), and computers can continue to treat them as IP addresses (such as 120.10.81.67). For this, a special server computer called as a DNS server maintains the mappings between domain names and the corresponding IP addresses. The DNS server could be located anywhere. Usually, it is with the Internet Service Provider (ISP) of the s. With this background, the DNS spoofing attack works as follows. 1. Suppose that there is a merchant (Bob), whose site’s domain name is www.bob.com, and the IP address is 100.10.10.20. Therefore, the DNS entry for Bob in all the DNS servers is maintained as follows: www.bob.com 100.10.10.20

Web Security

347

Fig. 10.9 Fake PayPal site asking for ’s credit card details

Web Technologies

348 2. The attacker (say, Trudy) manages to hack and replace the IP address of Bob with her own (say 100.20.20.20) in the DSN server maintained by the ISP of a , say, Alice. Therefore, the DNS server maintained by the ISP of Alice now has the following entry: www.bob.com 100.20.20.20 Thus, the contents of the hypothetical DNS table maintained by the ISP would be changed. A hypothetical portion of this table (before and after the attack) is shown in Fig. 10.10.

Fig. 10.10

Effect of the DNS attack

3. When Alice wants to communicate with Bob’s site, her Web browser queries the DNS server maintained by her ISP for Bob’s IP address, providing it the domain name (i.e., www.bob.com). Alice gets the replaced (i.e., Trudy’s) IP address, which is 100.20.20.20. 4. Now, Alice starts communicating with Trudy, believing that she is communicating with Bob! Such attacks of DNS spoofing are quite common, and cause a lot of havoc. Even worse, the attacker (Trudy) does not have to listen to the conversation on the wire! She has to simply be able to hack the DNS server of the ISP and replace a single IP address with her own! A protocol called as DNSSec (Secure DNS) is being used to thwart such attacks. However, unfortunately it is not widely used.

10.2 CRYPTOGRAPHY This chapter introduces the basic concepts of cryptography. Although this word sounds discouraging, we shall realize that it is very simple to understand. In fact, most in computer security have very straightforward meaning. Many , for no reason, sound complicated. Our aim will be to demystify all such in relation to cryptography in this chapter. After we are through with this chapter, we shall be ready to understand computer-based security solutions and issues that follow in later chapters.

Cryptography is the art of achieving security by encoding messages to make them non-readable. Figure 10.11 shows the conceptual view of cryptography. Some more need to be introduced in this context.

Cryptanalysis is the the technique of decoding messages from a non-readable format back to readable format without knowing how they were initially converted from readable format to non-readable format. In other words, it is like breaking a code. This concept is shown in Fig. 10.12.

Cryptology is a combination of cryptography and cryptanalysis. This concept is shown in Fig. 10.13.

Web Security

349 In the early days, cryptography used to be performed by using manual techniques. The basic framework of performing cryptography has remained more or less the same, of course, with a lot of improvements in the actual implementation. More importantly, computers now perform these cryptographic functions/algorithms, thus making the process a lot faster and secure. This chapter, however, discusses the basic methods of achieving cryptography without referring to computers.

Fig. 10.11

Cryptographic system

Fig. 10.12

Fig. 10.13

Cryptanalysis

Cryptography + Cryptanalysis = Cryptology

The basic concepts in cryptography are introduced first. We then proceed to discuss how we can make messages illegible, and thus, secure. This can be done in many ways. We discuss all these approaches in this chapter. Modern computer-based cryptography solutions have actually evolved based on these premises. This chapter touches upon all these cryptography algorithms. We also discuss the relative advantages and disadvantages of the various algorithms, as and when applicable. Some cryptography algorithms are very trivial to understand, replicate, and therefore, crack. Some other cryptography algorithms are highly complicated, and therefore, difficult to crack. The rest are somewhere in the middle.

Web Technologies

350

10.3 PLAIN TEXT AND CIPHER TEXT Any communication in the language that you and I speak—that is in the human language, takes the form of plain text or clear text. That is, a message in plain text can be understood by anybody knowing the language as long as the message is not codified in any manner. For instance, when we speak with our family , friends or colleagues, we use plain text because we do not want to hide anything from them. Suppose I say “Hi Anita”, it is plain text because both Anita and I know its meaning and intention. More significantly, anybody in the same room would also get to hear these words, and would know that I am greeting Anita. Notably, we also use plain text during electronic conversations. For instance, when we send an email to someone, we compose the email message using English (or these days, another) language. For instance, I can compose the email message as shown in Fig. 10.14. Hi Amit, Hope you are doing fine. How about meeting at the train station this Friday at 5 pm? Please let me know if it is ok with you. Regards. Atul

Fig. 10.14 Example of a plain text message Now, not only Amit, but also any other person who reads this email would know what I have written. As before, this is simply because I am not using any codified language here. I have composed my email message using plain English. This is another example of plain text, albeit in written form. Clear text or plain text signifies a message that can be understood by the sender, the recipient, and also by anyone else who gets an access to that message. In normal life, we do not bother much about the fact that someone could be overhearing us. In most cases, that makes little difference to us because the person overhearing us can do little damage by using the overheard information. After all, we do not reveal many secrets in our day-to-day lives. However, there are situations where we are concerned about the secrecy of our conversations. For instance, suppose that I am interested in knowing my bank ’s balance and hence I call up my phone banker from my office. The phone banker would generally ask a secret question (e.g., What is your mother’s maiden name?) whose answer only I know. This is to ascertain that someone else is not posing as me. Now, when I give the answer to the secret question (e.g., Leela), I generally speak in low voice, or better yet, initially call up from a phone that is isolated. This ensures that only the intended recipient (the phone banker) gets to know the correct answer. On the same lines, suppose that my email to my friend Amit shown earlier is confidential for some reason. Therefore, I do not want anyone else to understand what I have written even if she is able to access the email by using some means, before it reaches Amit. How do I ensure this? This is exactly the problem that small children face. Many times, they want to communicate in such a manner that their little secrets are hidden from the elderly. What do they do in order to achieve this? Usually the simplest trick that they use is a code language. For instance, they replace each alphabet in their conversation with another character. As an example, they replace each alphabet with the alphabet, that is, actually three alphabets down the order. So, each A will be replaced by D, B will be replaced by E, C will be replaced by F, and so on. To complete the cycle, each W will

Web Security

351 be replaced by Z, each X will be replaced by A, each Y will be replaced by B and each Z will be replaced by C. We can summarize this scheme as shown in Fig. 10.15. The first row shows the original alphabets, and the second row shows what each original alphabet will be replaced with.

Fig. 10.15

A scheme for codifying messages by replacing each alphabet with an alphabet three places down the line

Thus, using the scheme of replacing each alphabet with the one that is three places down the line, a message I love you shall become L ORYH BRX, as shown in Fig. 10.16.

Fig. 10.16 Codification using the alphabet replacement scheme Of course, there can be many variants of such a scheme. It is not necessary to replace each alphabet with the one that is three places down the order. It can be the one that is four, five or more places down the order. The point is, however, that each alphabet in the original message can be replaced by another to hide the original contents of the message. The codified message is called as cipher text. Cipher means a code or a secret message. When a plain text message is codified using any suitable scheme, the resulting message is called as cipher text. Based on these concepts, let us put these into a diagrammatic representation, as shown in Fig. 10.17.

Fig. 10.17 Elements of a cryptographic operation

Web Technologies

352 Let us now write our original email message and the resulting cipher text by using the alphabet-replacing scheme, as shown in Fig. 10.18. This will clarify the idea further. Hi Amit,

Kl Dplw,

Hope you are doing fine. How about meeting at the train station this Friday at 5 pm? Please let me know if it is ok with you.

Krsh brx duh grlqj ilqh. Krz derxw phhwlqj dw wkh wudlq vwdwlrq wklv Iulgdb dw 5 sp? Sohdvh ohw ph nqrz li lw lv rn zlwk brx.

Regards.

Uhjdugv.

Atul

Dwxo

Plain text message

Fig. 10.18

Corresponding cipher text message

Example of a plain text message being transformed into cipher text

10.3.1 Types of Cryptography Based on the number of keys used for encryption and decryption, cryptography can be classified into two categories.

Symmetric key encryption Also called as secret key encryption, in this scheme, only one key is used and the same key is used for encryption and decryption of messages. Obviously, both the parties must agree upon the key before any transmission begins, and nobody else should know about it. The example in Fig. 10.19 shows how symmetric cryptography works. Basically at the sender’s end, the key changes the original message to an encoded form. At the receiver’s end, the same key is used to decrypt the encoded message, thus deriving the original message out of it. IBM’s Data Encryption Standard (DES) uses this approach. It uses 56-bit keys for encryption.

Fig. 10.19

Symmetric key encryption

In practical situations, symmetric key encryption has a number of problems. One problem is regarding key agreements and distribution. In the first place, how do two parties agree on a key? One way is for somebody from the sender (say A) to physically visit the receiver (say B) and hand over the key. Another way is to courier a paper on which the key is written. Both ways are not very convenient. A third way is to send the key over the

Web Security

353 network to B and ask for the confirmation. But then, if an intruder gets the message, he can interpret all the subsequent ones! The second problem is more complex. Since the same key is used for encryption and decryption, one key per communicating parties is required. Suppose A wants to securely communicate with B and also with C. Clearly, there must be one key for all communications between A and B, and there must be another, distinct key for all communications between A and C. The same key as used by A and B cannot be used for communications between A and C. Otherwise, there is a chance that C can interpret messages going between A and B, or B can do the same for messages going between A and C! Since the Internet has thousands of merchants selling products to hundreds of thousands of buyers, using this scheme would be impractical because every buyerseller combination would need a separate key! DES has been found to be vulnerable. Therefore, better symmetric key algorithms have been proposed and are in active use. One way is simply to use DES twice with two different keys (called as DES-2). A still stronger mechanism is DES-3, wherein key-1 is used to encrypt first, key-2 (a different key) is used to reencrypt the encrypted block, and key-1 is used once again to re-encrypt doubly encrypted block. DES-3 is quite popular and is in wide use. Other popular algorithms are IDEA, RC5, RC2, etc.

Asymmetric key encryption This is a better scheme and is also called as public key encryption. In this type of cryptography, two different keys (called as a key pair) are used. One key is used for encryption and only the other key must be used for decryption. No other key can decrypt the message—not even the original (i.e., the first) key used for encryption! The beauty of this scheme is that every communicating party needs just a key pair for communicating with any number of other communicating parties. Once someone obtains a keypair, he can communicate with anyone else on the Internet in a secure manner, as we shall see. There is a simple mathematical basis for this scheme. If you have an extremely huge number that has only two factors that are prime numbers, you can generate a pair of keys. For example, consider a number 10. The number 10 has only two factors, 5 and 2. If you apply 5 as an encryption factor, only 2 can be used as the decryption factor. Nothing else—even 5 itself—can do the decryption. Of course, 10 is a very small number. Therefore, with a small effort, this scheme can be broken into. However, if the number is huge, even years of computation cannot break the scheme. One of the two keys is called a public key and the other is private key. Suppose you want to communicate over a computer network such as the Internet in a secure manner. You would need to obtain a public key and a private key. You can generate these keys using standard algorithms. The private key remains with you as a secret. You must not disclose your private key to anybody. However, the public key is for the general public. It is disclosed to all parties that you want to communicate with. In this scheme, in fact, each party or node publishes his public key. Using this, a directory can be constructed where the various parties or nodes (i.e., their ids) and their corresponding public keys are maintained. One can consult this and get the public key for any party that one wishes to communicate with by a simple table search. Suppose A wants to send a message to B without having to worry about its security. Then, A and B should each have a private key and a public key. n n

A’s private key should be known only to A. However, A’s public key should be known to B. Only B should know B’s private key. However, A should know B’s public key.

How this works, is simple. 1. When A wants to send a message to B, A encrypts the message using B’s public key. This is possible because A knows B’s public key. 2. A sends this message (encrypted using B’s public key) to B. 3. B decrypts A’s message using his private key. Note that only B knows about his private key. Thus, no one else can make any sense out of the message even if one can manage to intercept the message. This

Web Technologies

354 is because the intruder (hopefully) does not know about B’s private key. It is only B’s private key that can decrypt the message. 4. When B wants to send a message to A, exactly reverse steps take place. B encrypts the message using A’s public key. Therefore, only A can decrypt the message back to its original form, using his private key. This is shown in Fig. 10.20.

Fig. 10.20

Public key encryption

This can be shown in another way. For instance, suppose a bank needs to accept many requests for transactions from its customers. Then, the bank can have a private key—public key pair. The bank can then publish its public key to all its customers. The customers can use this public key of the bank for encrypting messages before they send them to the bank. The bank can decrypt all these encrypted messages with its private key, which remains with itself. This is shown in Fig. 10.21.

Fig. 10.21 The use of a public key-private key pair by a bank

Web Security

355

10.4 DIGITAL CERTIFICATES 10.4.1 Introduction We have discussed the problem of key agreement or key exchange in great detail. We have also seen how even the algorithm such as Diffie-Hellman Key Exchange designed specifically to tackle this problem also has its own pitfalls. The asymmetric key cryptography can be a very good solution. But it also has one unresolved issue, which is, how do the parties/correspondents (i.e., the sender and the receiver of a message) exchange their public keys with each other? Obviously, they cannot exchange them openly—this can very easily lead to a man-in-the-middle attack on the public key itself! This problem of key exchange or key agreement is, therefore, quite severe, and in fact, is one of the most difficult challenges to tackle in deg any computer-based cryptographic solution. After a lot of thought, this problem was resolved with a revolutionary idea of using digital certificates. We shall study this in great detail. Conceptually, we can compare digital certificates to the documents such as our ports or driving licences. A port or a driving licence helps in establishing our identity. For instance, my port proves beyond doubt a variety of aspects, the most important ones being: n n

My full name My nationality

n n

My date and place of birth My photograph and signature

Likewise, my digital certificate would also prove something very critical, as we shall study.

10.4.2 The Concept of Digital Certificates A digital certificate is simply a small computer file. For example, my digital certificate would actually be a computer file with the file name such as atul.cer (where .cer signifies the first three characters of the word certificate. Of course, this is just an example, in actual practice, the file extensions can be different). Just as my port signifies the association between me and my other characteristics such as full name, nationality, date and place of birth, photograph and signature, my digital certificate simply signifies the association between my public key and me. This concept of digital certificates is shown in Fig. 10.22. Note that this is merely a conceptual view, and does not depict the actual contents of a digital certificate.

Fig. 10.22 Conceptual view of a digital certificate

Web Technologies

356 We have not specified who is officially approving the association between a and the ’s digital certificate. Obviously, it has to be some authority in which all the concerned parties have a great amount of trust and belief. Imagine a situation where our ports are not issued by a government office, but by an ordinary shopkeeper. Would we trust the ports? Similarly, digital certificates must be issued by some trusted entity. Otherwise we will not trust anybody’s digital certificate. As we have noted, a digital certificate establishes the relation between a and her public key. Therefore, a digital certificate must contain the name and the ’s public key. This will prove that a particular public key belongs to a particular . Apart from this, what does a digital certificate contain? A simplified view of a sample digital certificate is shown in Fig. 10.23.

Fig. 10.23 Example of a digital certificate We will notice a few interesting things here. First of all, my name is shown as subject name. In fact, any ’s name in a digital certificate is always referred to as subject name (this is because a digital certificate can be issued to an individual, a group or an organization). Also, there is another interesting piece of information called as serial number. We shall see what it means in due course of time. The certificate also contains other pieces of information, such as the validity date range for the certificate, and who has issued it (issuer name). Let us try to understand the meanings of these pieces of information by comparing them with the corresponding entries in my port. This is shown in Table 10.1.

Table 10.1

Similarities between a port and a digital certificate port entry Full name port number Valid from Valid to Issued by Photograph and signature

Corresponding digital certificate entry Subject name Serial number Same Same Issuer name Public key

As the figure shows, the digital certificate is actually quite similar to a port. Just as every port has a unique port number, every digital certificate has a unique serial number. As we know, no two ports

Web Security

357 issued by the same issuer (i.e., government) can have the same port number. Similarly, no two digital certificates issued by the same issuer can have the same serial number. Who can issue these digital certificates? We shall soon answer this question.

10.4.3 Certification Authority (CA) A Certification Authority (CA) is a trusted agency that can issue digital certificates. Who can be a CA? Obviously, not any Tom, Dick and Harry can be a CA. The authority of acting as a CA has to be with someone who everybody trusts. Consequently, the governments in the various countries decide who can and who cannot be a CA. Usually, a CA is a reputed organization, such as a post office, financial institution, software company, etc. Two of the world’s most famous CAs are VeriSign and Entrust. Safescrypt Limited, a subsidiary of Satyam Infoway Limited, became the first Indian CA in February 2002. Thus, a CA has the authority to issue digital certificates to individuals and organizations, which want to use those certificates in asymmetric key cryptographic applications.

10.5 DIGITAL SIGNATURES 10.5.1 Introduction All along, we have been talking of the following general scheme in the context of asymmetric key cryptography. If A is the sender of a message and B is the receiver, A encrypts the message with B’s public key and sends the encrypted message to B. We have deliberately hidden the internals of this scheme. As we know, actually this is based on digital envelopes as discussed earlier, wherein not the entire message but only the one-time session key used to encrypt the message is encrypted with the receiver’s public key. But for simplicity, we shall ignore this technical detail, and instead, assume that the whole message is encrypted with the receiver’s public key. Let us now consider another scheme, as follows. If A is the sender of a message and B is the receiver, A encrypts the message with A’s private key and sends the encrypted message to B. This is shown in Fig. 10.24.

Fig. 10.24 Encrypting a message with the sender’s private key

Web Technologies

358 Our first reaction to this would be, what purpose would this serve? After all, A’s public key would be, well, public, i.e., accessible to anybody. This means that anybody who is interested in knowing the contents of the message sent by A to B can simply use A’s public key to decrypt the message, thus causing the failure of this encryption scheme! Well, this is quite true. But here, when A encrypts the message with her private key, her intention is not to hide the contents of the message (i.e., not to achieve confidentiality), but it is something else. What can that intention be? If the receiver (B) receives such a message encrypted with A’s private key, B can use A’s public key to decrypt it, and therefore, access the plain text. Does this ring a bell? If the decryption is successful, it assures B that this message was indeed sent by A. This is because if B can decrypt a message with A’s public key, it means that the message must have been initially encrypted with A’s private key ( that a message encrypted with a public key can be decrypted only with the corresponding private key, and vice versa). This is also because only A knows her private key. Therefore, someone posing as A (say C) could not have sent a message encrypted with A’s private key to B. A must have sent it. Therefore, although this scheme does not achieve confidentiality, it achieves authentication (identifying and proving A as the sender). Moreover, in the case of a dispute tomorrow, B can take the encrypted message, and decrypt it with A’s public key to prove that the message indeed came from A. This achieves the purpose of non-repudiation (i.e., A cannot refuse that she had sent this message, as the message was encrypted with her private key, which is supposed to be known only to her). Even if someone (say C) manages to intercept and access the encrypted message while it is in transit, then uses A’s public key to decrypt the message, changes the message, that would not achieve any purpose. Because C does not have A’s private key, C cannot encrypt the changed message with A’s private key again. Therefore, even if C now forwards this changed message to B, B will not be fooled into believing that it came from A, as it was not encrypted with A’s private key. Such a scheme, wherein the sender encrypts the message with her private key, forms the basis of digital signatures, as shown in Fig. 10.25.

Fig. 10.25

Basis for digital signatures

Digital signatures have assumed great significance in the modern world of Web-commerce. Most countries have already made provisions for recognizing a digital signature as a valid authorization mechanism, just like paper-based signatures. Digital signatures have legal status now. For example, suppose you send a message to your bank over the Internet, to transfer some amount from your to your friend’s , and digitally

Web Security

359 sign the message, this transaction has the same status as the one wherein you fill in and sign the bank’s paperbased money transfer slip. We have seen the theory behind digital signatures. However, there are some undesirable elements in this scheme, as we shall study next.

10.5.2 Message Digests Introduction If we examine the conceptual process of digital signatures, we will realize that it does not deal with the problems associated with asymmetric key encryption, namely, slow operation and large cipher text size. This is because we are encrypting the whole of the original plain text message with the sender’s private key. As the size of the original plain text can be quite large, this encryption process can be really very slow. We can tackle this problem using the digital envelope approach, as before. That is, A encrypts the original plain text message (PT) with a one-time symmetric key (K1) to form the cipher text (CT). It then encrypts the one-time symmetric key (K1) with her private key (K2). She creates a digital envelope containing CT and K1 encrypted with K2, and sends the digital envelope to B. B opens the digital envelope, uses A’s public key (K3) to decrypt the encrypted one-time symmetric key, and obtains the symmetric key K1. It then uses K1 to decrypt the cipher text (CT) and obtains the original plain text (PT). Since B uses A’s public key to decrypt the encrypted one-time symmetric key (K1), B can be assured that only A’s private key could have encrypted K1. Thus, B can be assured that the digital envelope came from A. Such a scheme could work perfectly. However, in real practice, a more efficient scheme is used. It involves the usage of a message digest (also called as hash). A message digest is a fingerprint or the summary of a message. It is similar to the concepts of Longitudinal Redundancy Check (LRC) or Cyclic Redundancy Check (CRC). That is, it is used to the integrity of the data (i.e., to ensure that a message has not been tampered with after it leaves the sender but before it reaches the receiver). Let us understand this with the help of an LRC example (CRC would work similarly, but will have a different mathematical base). An example of LRC calculation at the sender’s end is shown in Fig. 10.26. As shown, a block of bits is organized in the form of a list (as rows) in the Longitudinal Redundancy Check (LRC). Here, for instance, if we want to send 32 bits, we arrange them into a list of four (horizontal) rows. Then we count how many 1 bits occur in each of the 8 (vertical) columns. [If the number of 1s in the column is odd, then we say that the column has odd parity (indicated by a 1 bit in the shaded LRC row); otherwise if the number of 1s in the column is even, we call it as even parity (indicated by a 0 bit in the shaded LRC row).] For instance, in the first column, we have two 1s, indicating an even parity, and therefore, we have a 0 in the shaded LRC row for the first column. Similarly, for the last column, we have three 1s, indicating an odd parity, and therefore, we have a 1 in the shaded LRC row for the last column. Thus, the parity bit for each column is calculated and a new row of eight parity bits is created. These become the parity bits for the whole block. Thus, the LRC is actually a fingerprint of the original message. The data along with the LRC is then sent to the receiver. The receiver separates the data block from the LRC block (shown shaded). It performs its own LRC on the data block alone. It then compares its LRC values with the ones received from the sender. If the two LRC values match, then the receiver has a reasonable confidence that the message sent by the sender has not been changed, while in transit. We perform a hashing operation (or a message digest algorithm) over a block of data to produce its hash or message digest, which is smaller in size than the original message. This concept is shown in Fig. 10.27.

Web Technologies

360

Fig. 10.26 Longitudinal Redundancy Check (LRC)

Fig. 10.27 Message digest concept So far, we are considering very simple cases of message digests. Actually, the message digests are not so small and straightforward to compute. Message digests usually consist of 128 or more bits. This means that the chance of any two message digests being the same is anything between 0 to at least 2128. The message digest length is chosen to be so long with a purpose. This ensures that the scope for two message digests being the same.

Requirements of a message digest We can summarize the requirements of the message digest concept, as follows. 1. Given a message, it should be very easy to find its corresponding message digest. This is shown in Fig. 10.28. Also, for a given message, the message digest must always be the same.

Web Security

361

Fig. 10.28

Message digest for the same original data should always be the same

2. Given a message digest, it should be very difficult to find the original message for which the digest was created. This is shown in Fig. 10.29.

Fig. 10.29 Message digest should not work in the opposite direction 3. Given any two messages, if we calculate their message digests, the two message digests must be different. This is shown in Fig. 10.30.

Web Technologies

362

Fig. 10.30

Message digests of two different messages must be different

If any two messages produce the same message digest, thus violating our principle, it is called as a collision. That is, if two message digests collide, they meet at the digest! As we shall study soon, the message digest algorithms usually produce a message digest of length 128 bits or 160 bits. This means that the chances of any two message digests being the same are one in 2128 or 2160, respectively. Clearly, this seems possible only in theory, but extremely rare in practice. A specific type of security attack called as birthday attack is used to detect collisions in message digest algorithms. It is based on the principle of Birthday Paradox, which states that if there are 23 people in a room, chances are that more than 50% of those present, two will share the same birthday. At first, this may seem to be illogical. However, we can understand this in another manner. We need to keep in mind we are just talking about any two people (out of the 23) sharing the same birthday. Moreover, we are not talking about this sharing with a specific person. For instance, suppose that we have Alice, Bob, and Carol as three of the 23 people in the room. Therefore, Alice has 22 possibilities to share a birthday with anyone else (since there are 22 pairs of people). If there is no matching birthday for Alice, she leaves. Bob now has 21 chances to share a birthday with anyone else in the room. If he fails to have a match too, the next person is Carol. She has 20 chances, and so on. 22 pairs + 21 pairs + 20 pairs ... + 1 pair means that there is a total of 253 pairs. Every pair has a 1/365 th chance of finding a matching birthday. Clearly, the chances of a match cross 50% at 253 pairs. The birthday attack is most often used to attempt discover collisions in hash functions, such as MD5 or SHA1. This can be explained as follows. If a message digest uses 64-bit keys, then after trying 2^32 transactions, an attacker can expect that for two different messages, we may get the same message digests. In general, for a given message, if we can compute up to N different message digests, then we can expect the first collision after the number of message digests computed exceeds square-root of N. In other words, a collision is expected when the probability of collision exceeds 50%. This can lead to birthday attacks.

Web Security

363 It might surprise you to know that even a small difference between two original messages can cause the message digests to differ vastly. The message digests of two extremely similar messages are so different that they provide no clue at all that the original messages were very similar to each other. This is shown in Fig. 10.31. Here, we have two messages (Please pay the newspaper bill today and Please pay the newspaper bill tomorrow), and their corresponding message digests. Note how similar the messages are, and yet how different their message digests are. Message

Please pay the newspaper bill today

Message digest

306706092A864886F70D010705A05A3058020100300906052B0E03 021A0500303206092A864886F70D010701A0250423506C65617365 2070617920746865206E65777370617065722062696C6C20746F646

Message

Please pay the newspaper bill tomorrow

Message digest

306A06092A864886F70D010705A05D305B020100300906052B0E 03021A0500303506092A864886F70D010701A0280426506C65617 3652070617920746865206E65777370617065722062696C6C20746

Fig. 10.31 Message digest example Looked at another way, we are saying that given one message (M1) and its message digest (MD), it is simply not feasible to find another message (M2), which will also produce MD exactly the same, bit-by-bit. The message digest scheme should try and prevent this to the maximum extent possible. This is shown in Fig. 10.32.

Fig. 10.32

Message digests should not reveal anything about the original message

Web Technologies

364

Digital signature process We have mentioned that RSA can be used for performing digital signatures. Let us understand how this works in a step-by-step fashion. For this, let us assume that the sender (A) wants to send a message M to the receiver (B) along with the digital signature (S) calculated over the message (M).

Step 1 The sender (A) uses the SHA-1 message digest algorithm to calculate the message digest (MD1) over the original message (M). This is shown in Fig. 10.33.

Fig. 10.33 Message digest calculation Step 2 The sender (A) now encrypts the message digest with her private key. The output of this process is called as the digital signature (DS) of A. This is shown in Fig. 10.34.

Fig. 10.34 Digital signature creation Step 3 Now the sender (A) sends the original message (M) along with the digital signature (DS) to the receiver (B). This is shown in Fig. 10.35. Step 4 After the receiver (B) receives the original message (M) and the sender’s (A’s) digital signature, B uses the same message digest algorithm as was used by the A, and calculates its own message digest (MD2) as shown in Fig. 10.36.

Web Security

365

Fig. 10.35

Transmission of original message and digital signature together

Fig. 10.36

Receiver calculates its own message digest

Step 5 The receiver (B) now uses the sender’s (A’s) public key to decrypt (sometimes also called as de-sign) the digital signature. Note that A had used her private key to encrypt her message digest (MD1) to form the digital signature. Therefore, only A’s public key can be used to decrypt it. The output of this process is the original message digest as was calculated by A (MD1) in step 1. This is shown in Fig. 10.37.

Step 6 B now compares the following two message digests. n n

MD2, which it had calculated in step 4 MD1, which it retrieved from A’s digital signature in step 5

If MD1 = MD2, the following facts are established. n n

B accepts the original message (M) as the correct, unaltered, message from A. B is also assured that the message came from A, and not from someone posing as A.

This is shown in Fig. 10.38.

Web Technologies

366

Fig. 10.37

Receiver retrieves sender’s message digest

Fig. 10.38 Digital signature verification The basis for the acceptance or the rejection of the original message on the basis of the outcome of the message digest comparison (i.e., step 6) is simple. We know that the sender (A) had used her private key to encrypt the message digest to produce the digital signature. If decrypting the digital signature produces the correct message digest, the receiver (B) can be quite sure that the original message and the digital signature came indeed from the sender (A). This also proves that the message was not altered by an attacker while in transit. Because, if the message was altered while in transit, the message digest calculated by B in step 4 (i.e., MD2) over the received message would differ from the one sent (of course, in encrypted form) by A (i.e., MD1). Why can the attacker not alter the message, recalculate the message digest, and sign it again? Well, as we know, the attacker can very well perform the first two steps (i.e., alter the message, and recalculate the message digest over the altered message); but it cannot sign it again, because for that to be possible, the attacker needs A’s private key. Since only A knows about A’s private key, the attacker cannot use A’s private key to encrypt the message digest (i.e., sign the message) again. Thus, the principle of digital signatures is quite strong, secure and reliable.

10.6 SECURE SOCKET LAYER (SSL) 10.6.1 Introduction The Secure Socket Layer (SSL) protocol is an Internet protocol for secure exchange of information between a Web browser and a Web server. It provides two basic security services, authentication and confidentiality.

Web Security

367 Logically, it provides a secure pipe between the Web browser and the Web server. Netscape Corporation developed SSL in 1994. Since then, SSL has become the world’s most popular Web security mechanism. All the major Web browsers SSL. Currently, SSL comes in three versions: 2, 3 and 3.1. The most popular of them is Version 3, which was released in 1995.

10.6.2 The Position of SSL in T/IP Protocol Suite SSL can be conceptually considered as an additional layer in the T/IP protocol suite. The SSL layer is located between the application layer and the transport layer, as shown in Fig. 10.39.

Fig. 10.39 Position of SSL in T/IP As such, the communication between the various T/IP protocol layers is now as shown in Fig. 10.40. As we can see, the application layer of the sending computer (X) prepares the data to be sent to the receiving computer (Y), as usual. However, unlike what happens in the normal case, the application layer data is not ed directly to the transport layer now. Instead, the application layer data is ed to the SSL layer. Here, the SSL layer performs encryption on the data received from the application layer (which is indicated by a different colour), and also adds its own encryption information header, called as SSL Header (SH) to the encrypted data. We shall later study what exactly happens in this process. After this, the SSL layer data (L5) becomes the input for the transport layer. It adds its own header (H4), and es it on to the Internet layer, and so on. This process happens exactly the way it happens in the case of a normal T/IP data transfer. Finally, when the data reaches the physical layer, it is sent in the form of voltage pulses across the transmission medium. At the receiver’s end, the process takes place similar to the case of a normal T/IP connection, until it reaches the new SSL layer. The SSL layer at the receiver’s end removes the SSL Header (SH), decrypts the encrypted data, and gives the plain text data back to the application layer of the receiving computer. Thus, only the application layer data is encrypted by SSL. The lower layer headers are not encrypted. This is quite obvious. If SSL has to encrypt all the headers, it must be positioned below the data link layer. That would serve no purpose at all. In fact, it would lead to problems. If SSL encrypted all the lower layer headers, even the IP and physical addresses of the computers (sender, receiver, and intermediate nodes) would be encrypted, and become unreadable. Thus, a big question is where to deliver the packets. To understand the problem, imagine what would happen if we put the address of the sender and the receiver of a letter inside the

Web Technologies

368 envelope! Clearly, the postal service would not know where to send the letter! This is also why there is no point in encrypting the lower layer headers. Therefore, SSL is required between the application and the transport layers.

Fig. 10.40 SSL is located between application and transport layers

10.6.3 How Does SSL Work? SSL has three sub-protocols, namely, the Handshake Protocol, the Record Protocol and the Alert Protocol. These three sub-protocols constitute the overall working of SSL. We shall take a look at all the three protocols now.

The handshake protocol The handshake protocol of SSL is the first sub-protocol used by the client and the server to communicate using an SSL-enabled connection. This is similar to how Alice and Bob would first shake hands with each other with a hello before they start conversing. As the figure shows, the handshake protocol consists of a series of messages between the client and the server. Each of these messages has the format shown in Fig. 10.41. As shown in the figure, each handshake message has three fields, as follows. Type (1 byte) This field indicates one of the ten possible message types. These ten message types are listed in Fig. 6.11.

Length (3 bytes) This field indicates the length of the message in bytes.

Web Security

369

Content (1 or more bytes) This field contains the parameters associated with this message, depending on the message type, as listed in Fig. 6.11.

Fig. 10.41 Format of the handshake protocol messages Let us now take a look at the possible messages exchanged by the client and the server in the handshake protocol, along with their corresponding parameters, as shown in Table 10.2.

Table 10.2

SSL handshake protocol message types

Message Type Hello request Client hello Server hello Certificate Server key exchange Certificate request Server hello done Certificate Client key exchange Finished

Parameters None Version, Random number, Session id, Cipher suite, Compression method Version, Random number, Session id, Cipher suite, Compression method Chain of X.509V3 certificates Parameters, signature Type, authorities None Signature Parameters, signature Hash value

The handshake protocol is actually made up of four phases, as shown in Fig. 10.42. These phases are given below. 1. 2. 3. 4.

Establish security capabilities Server authentication and key exchange Client authentication and key exchange Finish

Fig. 10.42 SSL handshake phases Let us now study these four phases one by one.

Web Technologies

370

Phase 1. Establish security capabilities This first phase of the SSL handshake is used to initiate a logical connection and establish the security capabilities associated with that connection. This consists of two messages, the client hello and the server hello, as shown in Fig. 10.43.

Fig. 10.43

SSL Handshake protocol Phase 1: Establish security capabilities

As shown in the figure, the process starts with a client hello message from the client to the server. It consists of the following parameters.

Version This field identifies the highest version of SSL that the client can . As we have seen, at the time of this writing, this can be 2, 3 or 3.1.

Random This field is useful for the later, actual communication between the client and the server. It contains two sub-fields. n n

A 32-bit date-time field that identifies the current system date and time on the client computer. A 28-byte random number generated by the random number generator software built inside the client computer.

Session id This is a variable length session identifier. If this field contains a non-zero value, it means that there is already a connection between the client and the server, and the client wishes to update the parameters of that connection. A zero value in this field indicates that the client wants to create a new connection with the server.

Cipher suite This list contains a list of the cryptographic algorithms ed by the client (e.g., RSA, Diffie-Hellman, etc.), in the decreasing order of preference.

Compression method This field contains a list of the compression algorithms ed by the client. The client sends the client hello message to the server and waits for the server’s response. Accordingly, the server sends back a server hello message to the client. This message also contains the same fields as in the client hello message. However, their purpose is now different. The server hello message consists of the following fields.

Version This field identifies the lower of the version suggested by the client and the highest ed by the server. For instance, if the client had suggested version 3, but the server also s version 3.1, the server will select 3.

Random This field has the same structure as the Random field of the client. However, the Random value generated by the server is completely independent of the client’s Random value.

Web Security

371

Session id If the session id value sent by the client was non-zero, the server uses the same value. Otherwise, the server creates a new session id and puts it in this field.

Cipher suite Contains a single cipher suite, which the server selects from the list sent earlier by the client. Compression method Contains a compression algorithm, which the server selects from the list sent earlier by the client.

Phase 2. Server authentication and key exchange The server initiates this second phase of the SSL handshake, and is the sole sender of all the messages in this phase. The client is the sole recipient of all these messages. This phase contains four steps, as shown in Fig. 10.44. These steps are Certificate, Server key exchange, Certificate request and Server hello done.

Fig. 10.44 SSL Handshake protocol Phase 2: Server authentication and key exchange Let us discuss the four steps of this phase. In the first step (certificate), the server sends its digital certificate and the entire chain leading up to root CA to the client. This will help the client to authenticate the server using the server’s public key from the server’s certificate. The server’s certificate is mandatory in all situations, except if the key is being agreed upon by using Diffie-Hellman. The second step (Server key exchange) is optional. It is used only if the server does not send its digital certificate to the client in step 1 above. In this step, the server sends its public key to the client (as the certificate is not available). The third step (certificate request), the server can request for the client’s digital certificate. The client authentication in SSL is optional, and the server may not always expect the client to be authenticated. Therefore, this step is optional. The last step (server hello done) message indicates to the client that its portion of the hello message (i.e., the server hello message) is complete. This indicates to the client that the client can now (optionally) the certificates sent by the server, and ensure that all the parameters sent by the server are acceptable. This message does not have any parameters. After sending this message, the server waits for the client’s response.

Phase 3. Client authentication and key exchange The client initiates this third phase of the SSL handshake, and is the sole sender of all the messages in this phase. The server is the sole recipient of all these messages. This phase contains three steps, as shown in Fig. 10.45. These steps are Certificate, Client key exchange, and Certificate .

Web Technologies

372

Fig. 10.45

SSL Handshake protocol Phase 3: Client authentication and key exchange

The first step (certificate) is optional. This step is performed only if the server had requested for the client’s digital certificate. If the server has requested for the client’s certificate, and if the client does not have one, the client sends a No certificate message, instead of a Certificate message. It then is up to the server to decide if it wants to still continue or not. Like the server key exchange message, this second step (client key exchange) allows the client to send information to the server, but in the opposite direction. This information is related to the symmetric key that both the parties will use in this session. Here, the client creates a 48-byte pre-master secret, and encrypts it with the server’s public key and sends this encrypted pre-master secret to the server. The third step (Certificate ) is necessary only if the server had demanded client authentication. As we know, if this is the case, the client has already sent its certificate to the server. However, additionally, the client also needs to prove to the server that it is the correct and authorized holder of the private key corresponding to the certificate. For this purpose, in this optional step, the client combines the pre-master secret with the random numbers exchanged by the client and the server earlier (in Phase 1: Establish security capabilities) after hashing them together using MD5 and SHA-1, and signs the result with its private key.

Phase 4. Finish The client initiates this fourth phase of the SSL handshake, which the server ends. This phase contains four steps, as shown in Fig. 10.46. The first two messages are from the client: Change cipher specs, Finished. The server responds back with two identical messages: Change cipher specs, Finished.

Fig. 10.46 SSL Handshake protocol Phase 4: Finished Based on the pre-master secret that was created and sent by the client in the Client key exchange message, both the client and the server create a master secret. Before secure encryption or integrity verification can be performed on records, the client and server need to generate shared secret information known only to them.

Web Security

373 This value is a 48-byte quantity called the master secret. The master secret is used to generate keys and secrets for encryption and MAC computations. The master secret is calculated after computing message digests of the pre-master secret, client random and server random, as shown in Fig. 10.47.

Fig. 10.47 Master secret generation concept The technical specification for calculating the master secret is as follows: Master_secret = MD5(pre_master_secret ClientHello.random MD5(pre_master_secret ClientHello.random MD5(pre_master_secret ClientHello.random

+ + + + + +

SHA(‘A’ + pre_master_secret + ServerHello.random)) + SHA(‘BB’ + pre_master_secret + ServerHello.random)) + SHA(‘CCC’ + pre_master_secret + ServerHello.random))

Finally, the symmetric keys to be used by the client and the server are generated. For this, the conceptual process as shown in Fig. 10.48 is used.

Fig. 10.48 Symmetric key generation concept The actual key generation formula is as follows: key_block = MD5(master_secret + SHA(‘A’ + master_secret + ServerHello.random + ClientHello.random)) + MD5(master_secret + SHA(‘BB’ + master_secret + ServerHello.random + ClientHello.random)) +

Web Technologies

374 MD5(master_secret + SHA(‘CCC’ + master_secret + ServerHello.random + ClientHello.random))

After this, the first step (Change cipher specs) is a confirmation from the client that all is well from its end, which it strengthens with the Finished message. The server sends identical messages to the client.

The record protocol The Record Protocol in SSL comes into picture after a successful handshake is completed between the client and the server. That is, after the client and the server have optionally authenticated each other and have decided what algorithms to use for secure information exchange, we enter into the SSL record protocol. This protocol provides two services to an SSL connection, as follows.

Confidentiality This is achieved by using the secret key that is defined by the handshake protocol. Integrity The handshake protocol also defines a shared secret key (MAC) that is used for assuring the message integrity. The operation of the record protocol is shown in Fig. 10.49.

Fig. 10.49

SSL record protocol

As the figure shows, the SSL record protocol takes an application message as input. First, it fragments it into smaller blocks, optionally compresses each block, adds MAC, encrypts it, adds a header and gives it to the transport layer, where the T protocol processes it like any other T block. At the receiver’s end, the header of each block is removed; the block is then decrypted, verified, decompressed, and reassembled into application messages. Let us discuss these steps in more detail.

Fragmentation The original application message is broken into blocks, so that the size of each block is less than or equal to 214 bytes (16,384 bytes).

Web Security

375

Compression The fragmented blocks are optionally compressed. The compression process must not result into the loss of the original data, which means that this must be a loss-less compression mechanism.

Addition of MAC Using the shared secret key established previously in the handshake protocol, the Message Authentication Code (MAC) for each block is calculated. This operation is similar to the HMAC algorithm.

Encryption Using the symmetric key established previously in the handshake protocol, the output of the previous step is now encrypted. This encryption may not increase the overall size of the block by more than 1024 bytes. Table 10.3 lists the permitted encryption algorithms. Table 10.3 Permitted SSL encryption algorithms Stream cipher

Block cipher

Algorithm

Key size

RC4 RC4

40 128

Algorithm

Key size

AES IDEA RC2 DES DES DES-3 Fortezza

128, 256 128 40 40 56 168 80

Append header Finally, a header is added to the encrypted block. The header contains the following fields. n

n

n

n

Content type (8 bits) Specifies the protocol used for processing the record in the next higher level (e.g., handshake, alert, change cipher). Major version (8 bits) Specifies the major version of the SSL protocol in use. For instance, if SSL version 3.1 is in use, this field contains 3. Minor version (8 bits) Specifies the minor version of the SSL protocol in use. For instance, if SSL version 3.0 is in use, this field contains 0. Compressed length (16 bits) Specifies the length in bytes of the original plain text block (or the compressed block, if compression is used).

The final SSL message now looks as shown in Fig. 10.50.

Fig. 10.50

Final output after SSL record protocol operations

Web Technologies

376

The alert protocol When either the client or the server detects an error, the detecting party sends an alert message to the other party. If the error is fatal, both the parties immediately close the SSL connection (which means that the transmission from both the ends is terminated immediately). Both the parties also destroy the session identifiers, secrets and keys associated with this connection before it is terminated. Other errors, which are not so severe, do not result in the termination of the connection. Instead, the parties handle the error and continue. Each alert message consists of two bytes. The first byte signifies the type of error. If it is a warning, this byte contains 1. If the error is fatal, this byte contains 2. The second byte specifies the actual error. This is shown in Fig. 10.51.

Fig. 10.51 Alert protocol message format We list the fatal alerts (errors) in Table 10.4.

Table 10.4 Fatal alerts Alert Unexpected message Bad record MAC Decompression failure Handshake failure Illegal parameters

Description An inappropriate message was received. A message is received without a correct MAC. The decompression function received an improper input. Sender was unable to negotiate an acceptable set of security parameters from the available options. A field in the handshake message was out of range or was inconsistent with the other fields.

The remaining (non-fatal) alerts are shown in Table 10.5.

Table 10.5 Non-fatal alerts Alert

Description

No certificate Bad certificate Uned certificate Certificate revoked Certificate expired Certificate unknown Close notify

Sent in response to certificate request if an appropriate certificate is not available. A certificate was corrupt (its digital signature verification failed). The type of the received certificate is not ed. The signer of a certificate has revoked it. A received certificate has expired. An unspecified error occurred while processing the certificate. Notifies that the sender will not send any more messages in this connection. Each party must send this message before closing its side of the connection.

Web Security

377

SUMMARY l

l

l l l

l

l l

l l

l

l

Cryptography is a technique of encoding and decoding messages, so that they are not understood by anybody except the sender and the intended recipient. The sender encodes the message (a process called as encryption) and the receiver decodes the encrypted message to get back the original message (a process called as decryption). Encryption can be classified into symmetric key encryption and asymmetric key encryption. In symmetric key encryption, the same key is used for encryption and decryption. In asymmetric key encryption, each participant has a pair of keys (one private, the other public). If encryption is done using public key, decryption must be done using private key alone, and vice versa. The private key remains private with the participant; the public key is freely distributed to the general public. Digital signature has become a very critical technology for modern secure data communications. It involves a very intelligent combination of public key encryption techniques to achieve secure communication. To further strengthen the security mechanisms, the concept of digital certificates has gained popularity. Just as we have paper certificates to prove that we have ed a particular examination, or that we are eligible for driving a car (the certificate being a driver’s licence), a digital certificate is used for authenticating either a Web client or a Web server. The authority issuing a digital certificate is called as Certification Authority (CA). CAs also have to maintain a Certificate Revocation List (CRL), which lets s know which digital certificates are no longer valid. The Secure Socket Layer (SSL) protocol is used to encrypt all communications between a Web browser and a Web server. It also provides message integrity. SSL consists of three sub-protocols, namely, the handshake protocol, the record protocol, and the alert protocol.

REVIEW QUESTIONS Multiple-choice Questions 1. When only the sender and the receiver want to be able to access the contents of a message, the principle of comes into picture. (a) confidentiality (b) authentication (c) authorization (d) integrity 2. When the receiver wants to be sure of the sender’s identity, is important. (a) confidentiality (b) authentication (c) authorization (d) integrity 3. When the receiver wants to be sure that the contents of a message have not been tampered with, is the key factor. (a) confidentiality (b) authentication (c) authorization (d) integrity

Web Technologies

378 4. When the sender and the receiver use the same key for encryption and decryption, it is called as . (a) symmetric key encryption (b) asymmetric key encryption (c) public key encryption (d) any of the above 5. When the sender and the receiver use different keys for encryption and decryption, it is called as . (a) symmetric key encryption (b) asymmetric key encryption (c) public key encryption (d) any of the above 6. is a public key encryption algorithm. (a) DES (b) RSA (c) RAS (d) DSE 7. can be cracked if groups of characters repeat in the plaintext. (a) Stream cipher (b) Character cipher (c) Block cipher (d) Group cipher 8. Digital signature uses . (a) array (b) table (c) chain (d) hash is important. 9. In digital signature, at the sender’s end, the sender’s (a) public key (b) private key (c) none of the public or the private keys (d) either public or private key 10. Digital certificate establishes the relation between a and her . (a) private key (b) name (c) public key (d) credit card number

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Describe the risks involved in data communication over a network. What is cryptography? Explain how the symmetric encryption works. Explain the technique used in the asymmetric cryptography. Discuss the term digital signature. Illustrate how digital signatures work by giving an example. What are digital certificates? How are they useful? Explain phising. Discuss pharming attacks. How does the SSL protocol work?

Exercises 1. Find out which algorithms are popular in message digests, digital signatures, symmetric as well as asymmetric key encryption and try to understand at least one of them in complete detail. 2. Read more about the IT laws in your home country. What is the significance of digital signatures? 3. Search for files with an extension .cer on your computer. Are there any such files? If there are, they contain digital certificates. 4. Create a digital certificate using Java. Also try to investigate about the process involved and the payable fees in obtaining real-life digital certificates. 5. What does it take to implement the SSL protocol? Study the OpenSSL protocol.

Network Security

379

Network Security

+D=FJAH

11

INTRODUCTION In the previous chapter, we looked at the application layer security issues. While they are very critical and are worth examining in detail, equal importance needs to be given to network security-related issues also. Network security goes hand in hand with application security. While application security looks more at the transactional issues, network security deals with raw packets, and attempts to fix holes that appear at that layer. Various schemes can be used to provide network security, such as firewalls, VPNs, etc. This chapter deals with all these issues at the network layer, and completes our overview of the Internet security issues and their solutions.

11.1

FIREWALLS

11.1.1 Introduction The dramatic rise and progress of the Internet has opened possibilities that no one could have thought of earlier. We can connect any computer in the world to any other computer, no matter how far the two are located from each other. This is undoubtedly a great advantage for individuals and corporates as well. However, this can be a nightmare for network staff, which is left with a very difficult job of trying to protect the corporate networks from a variety of attacks. At a broad level, there are two kinds of attacks. Most corporations have large amounts of valuable and confidential data in their networks. Leaking of this critical information to competitors can be a great setback. n Apart from the danger of the insider information leaking out, there is a great danger of the outside elements (such as viruses and worms) entering a corporate network to create havoc. We can depict this situation in Fig. 11.1. As a result of these dangers, we must have mechanisms which can ensure that the inside information remains inside, and also prevent the outside attackers from entering inside a corporate network. As we know, encryption of information (if implemented properly) renders its transmission to the outside world redundant. That is, even if confidential information flows out of a corporate network, if it is in an encrypted form, outsiders cannot make any sense of it. However, encryption does not work in the other direction. Outside attackers can still try to break inside a corporate network. Consequently, better schemes are desired to achieve protection from outside attacks. This is where a firewall comes into picture. n

Web Technologies

380

Fig. 11.1

Threats from inside and outside a corporate network

Conceptually, a firewall can be compared with a sentry standing outside an important person’s house (such as the nation’s president). This sentry usually keeps an eye on and physically checks every person that enters into or comes out of the house. If the sentry senses that a person wishing to enter the president’s house is carrying a knife, the sentry would not allow the person to enter. Similarly, even if the person does not possess any banned objects, but somehow looks suspicious, the sentry can still prevent that person’s entry. A firewall acts like a sentry. If implemented, it guards a corporate network by standing between the network and the outside world. All traffic between the network and the Internet in either direction must through the firewall. The firewall decides if the traffic can be allowed to flow, or whether it must be stopped from proceeding further. This is shown in Fig. 11.2.

Fig. 11.2

Firewall

Network Security

381 Of course, technically, a firewall is a specialized version of a router. Apart from the basic routing functions and rules, a router can be configured to perform the firewall functionality, with the help of additional software resources. The characteristics of a good firewall implementation can be described as follows. n

n n

All traffic from inside to outside, and vice versa, must through the firewall. To achieve this, all the access to the local network must first be physically blocked, and access only via the firewall should be permitted. Only the traffic authorized as per the local security policy should be allowed to through. The firewall itself must be strong enough, so as to render attacks on it useless.

11.1.2 Types of Firewalls Based on the criteria that they use for filtering traffic, firewalls are generally classified into two types, as shown in Fig. 11.3.

Fig. 11.3

Types of firewalls

Let us discuss these two types of firewalls one by one.

Packet filters As the name suggests, a packet filter applies a set of rules to each packet, and based on the outcome, decides to either forward or discard the packet. It is also called as screening router or screening filter. Such a firewall implementation involves a router, which is configured to filter packets going in either direction (from the local network to the outside world, and vice versa). The filtering rules are based on a number of fields in the IP and T/UDP headers, such as source and destination IP addresses, IP protocol field (which identifies if the protocol in the upper transport layer is T or UDP), T/UDP port numbers (which identify the application which is using this packet, such as email, file transfer or World Wide Web). The idea of a packet filter is shown in Fig. 11.4.

Fig. 11.4 Packet filter

Web Technologies

382 Conceptually, a packet filter can be considered as a router that performs three main actions, as shown in Fig. 11.5.

Fig. 11.5 Packet filter operation A packet filter performs the following functions. 1. It receives each packet as it arrives. 2. It es the packet through a set of rules, based on the contents of the IP and transports header fields of the packet. If there is a match with one of the set rules, it decides whether to accept or discard the packet based on that rule. For example, a rule could specify either to disallow all incoming traffic from an IP address 157.29.19.10 (this IP address is taken just as an example), or to disallow all traffic that uses UDP as the higher (transport) layer protocol. 3. If there is no match with any rule, the packet filter takes the default action. The default can be discard all packets, or accept all packets. The former policy is more conservative, whereas the latter is more open. Usually, the implementation of a firewall begins with the default discard all packets option, and then rules are applied one by one to enforce packet filtering. The chief advantage of the packet filter is its simplicity. The s need not be aware of a packet filter at all. Packet filters are very fast in their operating speed. However, the two disadvantages of a packet filter are the difficulties in setting up the packet filter rules correctly, and lack of for authentication. Figure 11.6 shows an example where a router can be converted into a packet filter by adding the filtering rules in the form of a table. This table decides which of the packets should be allowed (forwarded) or discarded. The rules specified in the packet filter work as follows. 1. 2. 3. 4.

Incoming packets from network 130.33.0.0 are not allowed. They are blocked as a security precaution. Incoming packets from any external network on the TELNET server port (number 23) are blocked. Incoming packets intended for a specific internal host 193.77.21.9 are blocked. Outgoing packets intended for port 80 (HTTP) are banned. That is, this organization does not want to allow its employees to send requests to the external world (i.e., the Internet) for browsing the Internet.

Attackers can try and break the security of a packet filter by using the following techniques. 1. IP address spoofing An intruder outside the corporate network can attempt to send a packet towards the internal corporate network, with the source IP address set equal to one of the IP addresses of the internal s. This is shown in Fig. 11.7. This attack can be defeated by discarding all the

Network Security

383 packets that arrive at the incoming side of the firewall, with the source address equal to one of the internal addresses.

Fig. 11.6

Example of packet filter table

Fig. 11.7 Packet filter defeating the IP address spoofing attack 2. Source routing attacks An attacker can specify the route that a packet should take as it moves along the Internet. The attacker hopes that by specifying this option, the packet filter can be fooled to by its normal checks. Discarding all packets that use this option can thwart such an attack. 3. Tiny fragment attacks IP packets through a variety of physical networks, such as Ethernet, Token Ring, X.25, Frame Relay, ATM, etc. All these networks have a pre-defined maximum frame size (called as the Maximum Transmission Unit or MTU). Many times, the size of the IP packet is greater than this maximum size allowed by the underlying network. In such cases, the IP packet needs to be fragmented, so that it can be accommodated inside the physical frame, and carried further. An attacker might attempt to use this characteristic of the T/IP protocol suite by intentionally creating fragments of the original IP packet and sending them. The attacker feels that the packet filter can be fooled, so that after fragmentation, it checks only the first fragment, and does not check the remaining fragments. This attack can be foiled by discarding all the packets where the (upper layer) protocol type is T and the packet is fragmented (refer to identification and protocol fields of an IP packet discussed earlier to understand how we can implement this).

Web Technologies

384 An advanced type of packet filter is called as dynamic packet filter or stateful packet filter. A dynamic packet filter allows the examination of packets based on the current state of the network. That is, it adapts itself to the current exchange of information, unlike the normal packet filters, which have routing rules hard coded. For instance, we can specify a rule with the help of a dynamic packet filter as follows. Allow incoming T segments only if they are responses to the outgoing T segments that have gone through our network. Note that the dynamic packet filter has to maintain a list of the currently open connections and outgoing packets in order to deal with this rule. Hence, it is called as dynamic or stateful. When such a rule is in effect, the logical view of the packet filtering can be illustrated, as shown in Fig. 11.8.

Fig. 11.8

Dynamic packet filter technology

As shown in the figure, firstly, an internal client sends a T segments to an external server, which the dynamic packet filter allows. In response, the server sends back a T segments, which the packet filter examines, and realizes that it is a response to the internal client’s request. Therefore, it allows that packet in. However, next, the external server sends a new UDP datagram, which the filter does not allow, because previously, the exchange of the client and the server packets happened using the T protocol. However, this packet is based on the UDP protocol. Since this is against the rule that was set up earlier, the filter drops the packet.

Network Security

385

Application Gateways An application gateway is also called as a proxy server. This is because it acts like a proxy (i.e., deputy or substitute), and decides about the flow of application level traffic. The idea is shown in Fig. 11.9.

Fig. 11.9

Application gateway

Application gateways typically work as follows. 1. An internal s the application gateway using a T/IP application, such as HTTP or TELNET. 2. The application gateway asks the about the remote host with which the wants to set up a connection for actual communication (i.e., its domain name or IP address, etc.). The application gateway also asks for the id and the required to access the services of the application gateway. 3. The provides this information to the application gateway. 4. The application gateway now accesses the remote host on behalf of the , and es the packets of the to the remote host. Note that there is a variation of the application gateway, called as circuit gateway, which performs some additional functions as compared to those performed by an application gateway. A circuit gateway, in fact, creates a new connection between itself and the remote host. The is not aware of this, and thinks that there is a direct connection between himself and the remote host. Also, the circuit gateway changes the source IP address in the packets from the end ’s IP address to its own. This way, the IP addresses of the computers of the internal s are hidden from the outside world. This is shown in Fig. 11.10. Of course, both the connections are shown with a single arrow to stress on the concept, though in reality, both are two-way connections. The SOCKS server is an example of the real-life implementation of a circuit gateway. It is a clientserver application. The SOCKS client runs on the internal hosts, and the SOCKS server runs on the firewall. 5. From here onwards, the application gateway acts like a proxy of the actual end , and delivers packets from the to the remote host and vice versa. Application gateways are generally more secure than packet filters, because rather than examining every packet against a number of rules, here we simply detect whether a is allowed to work with a T/IP application, or not. The disadvantage of application gateways is the overhead in of connections. As we noticed, there are actually two sets of connections now, one between the end and the application gateway, and another between the application gateway and the remote host. The application gateway has to manage these two sets of connections, and the traffic going between them. This means that the actual communicating internal host is under an illusion, as illustrated in Fig. 11.11.

Web Technologies

386

Fig. 11.10

Circuit gateway operation

Fig. 11.11 Application gateway creates an illusion An application gateway is also called as bastion host. Usually, a bastion host is a very key point in the security of a network.

11.2 IP SECURITY 11.2.1 Introduction The IP packets contain data in plain text form. That is, anyone watching the IP packets by can actually access them, read their contents, and even change them. We have studied higher-level security mechanisms (such as SSL, SHTTP, PGP, PEM, S/MIME and SET) to prevent such kinds of attacks. Although these higherlevel protocols enhance the protection mechanisms, there was a general feeling for a long time to make IP packets themselves secure. If we can achieve this, then we need not rely only on the higher-level security mechanisms. The higher-level security mechanisms can then serve as additional security measures. Thus, we will have two levels of security in this scheme.

Network Security

387 n n

First offer security at the IP packet level itself. Continue implementing higher-level security mechanisms, depending on the requirements.

This is shown in Fig. 11.12.

Fig. 11.12 Security at the Internet layer as well as the above layers We have already discussed the higher-level security protocols. Our focus of discussion in this chapter is the first level of security (at the Internet layer). In 1994, the Internet Architecture Board (IAB) prepared a report called as Security in the Internet Architecture (RFC 1636). This report stated that the Internet was a very open network, which was unprotected from hostile attacks. Therefore, said the report, the Internet needs better security measures, in of authentication, integrity and confidentiality. In 1997 above, about 150,000 Web sites were attacked in various ways, proving that the Internet was quite unsafe. Consequently, the IAB decided that authentication, integrity and encryption must be a part of the next version of the IP protocol, called as IP version 6 (Ipv6) or IP new generation (IPng). However, since the new version of IP was to take some years to be released and implemented, the designers devised ways to incorporate these security measures in the current version of IP, called as IP version 4 (IPv4), as well. The outcome of the study and IAB’s report is the protocol for providing security at the IP level, called as IP Security (IPSec). In 1995, the Internet Engineering Task Force (IETF) published five security-based standards related to IPSec, as shown in Table 11.1.

Table 11.1

RFC documents related to IPSec RFC Number 1825 1826 1827 1828 1829

Description An overview of the security architecture Description of a packet authentication extension to IP Description of a packet encryption extension to IP A specific authentication mechanism A specific encryption mechanism

IPv4 may these features, but IPv6 must them. The overall idea of IPSec is to encrypt and seal the transport and application layer data during transmission. It also offers integrity protection for the Internet layer. However, the Internet header itself is not encrypted, because of which the intermediate routers

Web Technologies

388 can deliver encrypted IPSec messages to the intended recipient. The logical format of a message after IPSec processing is shown in Fig. 11.13.

Fig. 11.13 Result of IPSec processing Thus, the sender and the receiver look at IPSec, as shown in Fig. 11.14, as another layer in the T/IP protocol stack. This layer sits in between the transport and the Internet layers of the conventional T/IP protocol stack.

Fig. 11.14

Conceptual IPSec positioning in the T/IP protocol stack

11.2.2 IPSec Overview Applications and advantages Let us first list the applications of IPSec. Secure remote Internet access Using IPSec, we can make a local call to our Internet Service Provider (ISP) so as to connect to our organization’s network in a secure fashion from our home or hotel. From there, we can access the corporate network facilities or access remote desktops/servers.

Network Security

389

Secure branch office connectivity Rather than subscribing to an expensive leased line for connecting its branches across cities/countries, an organization can set up an IPSec-enabled network to securely connect all its branches over the Internet.

Set up communication with other organizations Just as IPSec allows connectivity between various branches of an organization, it can also be used to connect the networks of different organizations together in a secure and inexpensive fashion. Following are the main advantages of IPSec. n n

n

n

n n

IPSec is transparent to the end s. There is no need for training, key issuance or revocation. When IPSec is configured to work with a firewall, it becomes the only entry-exit point for all traffic, thus making it extra secure. IPSec works at the network layer. Hence no changes are needed to the upper layers (application and transport). When IPSec is implemented in a firewall or a router, all the outgoing and incoming traffic gets protected. However, the internal traffic does not have to use IPSec. Thus, it does not add any overheads for the internal traffic. IPSec can allow travelling staff to have secure access to the corporate network. IPSec allows interconnectivity between branches/offices in a very inexpensive manner.

Basic concepts We must learn a few and concepts in order to understand the IPSec protocol. All these concepts are interrelated. However, rather than looking at these individual concepts straightaway, we shall start with the big picture. We will first take a look at the basic concepts in IPSec, and then elaborate each of the concepts. In this section, we shall restrict ourselves to the broad overview of the basic concepts in IPSec.

IPSec protocols As we know, an IP packet consists of two portions, IP header and the actual data. IPSec features are implemented in the form of additional IP headers (called as extension headers) to the standard, default IP headers. These extension IP headers follow the standard IP headers. IPSec offers two main services, authentication and confidentiality. Each of these requires its own extension header. Therefore, to these two main services, IPSec defines two IP extension headers, one for authentication and another for confidentiality. IPSec actually consists of two main protocols, as shown in Fig. 11.15.

Fig. 11.15 IPSec protocols These two protocols are required for the following purposes. n

The Authentication Header (AH) protocol provides authentication, integrity and an optional antireplay service. The IPSec AH is a header in an IP packet, which contains a cryptographic checksum (similar to a message digest or hash) for the contents of the packet. The AH is simply inserted between

Web Technologies

390

n

the IP header and any subsequent packet contents. No changes are required to the data contents of the packet. Thus, security resides completely in the contents of the AH. The Encapsulating Security Payload (ESP) protocol provides data confidentiality. The ESP protocol also defines a new header to be inserted into the IP packet. ESP processing also includes the transformation of the protected data into an unreadable, encrypted forma. Under normal circumstances, the ESP will be inside the AH. That is, encryption happens first, and then authentication.

On receipt of an IP packet that was processed by IPSec, the receiver processes the AH first, if present. The outcome of this tells the receiver if the contents of the packet are all right, or whether they have been tampered with, while in transit. If the receiver finds the contents acceptable, it extracts the key and algorithms associated with the ESP, and decrypts the contents. There are some more details that we should know. Both AH and ESP can be used in one of the two modes, as shown in Fig. 11.16.

Fig. 11.16

AH and ESP modes of operation

We shall later study more about these modes. However, a quick overview would help. In the tunnel mode, an encrypted tunnel is established between two hosts. Suppose X and Y are two hosts, wanting to communicate with each other using the IPSec tunnel mode. What happens here is that they identify their respective proxies, say P1 and P2, and a logical encrypted tunnel is established between P1 and P2. X sends its transmission to P1. The tunnel carries the transmission to P2. P2 forwards it to Y. This is shown in Fig. 11.17.

Fig. 11.17

Concept of tunnel mode

How do we implement this technically? As we shall see, we will have two sets of IP headers, internal and external. The internal IP header (which is encrypted) contains the source and destination addresses as X and Y,

Network Security

391 whereas the external IP header contains the source and destination addresses as P1 and P2. That way, X and Y are protected from potential attackers. This is shown in Fig. 11.18.

Fig. 11.18 Implementation of tunnel mode n

In the tunnel mode, IPSec protects the entire IP datagram. It takes an IP datagram (including the IP header), adds the IPSec header and trailer, and encrypts the whole thing. It then adds new IP header to this encrypted datagram.

This is shown in Fig. 11.19.

Fig. 11.19 IPSec tunnel mode n

In contrast, the transport mode does not hide the actual source and destination addresses. They are visible in plain text, while in transit. In the transport mode, IPSec takes the transport layer payload, adds IPSec header and trailer, encrypts the whole thing, and then adds the IP header. Thus, the IP header is not encrypted.

Web Technologies

392 This is shown in Fig. 11.20.

Fig. 11.20

IPSec transport mode

How does the decide which mode should be used? n

n

We will notice that in the tunnel mode, the new IP header has information different from the information in the original IP header. The tunnel mode is normally used between two routers, a host and a router, or a router and a host. In other words, it is generally not used between two hosts, since the idea is to protect the original packet, including its IP header. It is as if the whole packet goes through an imaginary tunnel. The transport mode is useful when we are interested in a host-to-host (i.e., end-to-end) encryption. The sending host uses IPSec to authenticate and/or encrypt the transport layer payload, and only the receiver verifies it.

The Internet Key Exchange (IKE) Protocol Another ing protocol used in IPSec for the key management procedures is called as Internet Key Exchange (IKE) protocol. IKE is used to negotiate the cryptographic algorithms to be later used by AH and ESP in the actual cryptographic operations. The IPSec protocols are designed to be independent of the actual lower-level cryptographic algorithms. Thus, IKE is the initial phase of IPSec, where the algorithms and keys are decided. After the IKE phase, the AH and ESP protocols take over. This process is shown in Fig. 11.21. Security Association (SA) The output of the IKE phase is a Security Association (SA). SA is an agreement between the communicating parties about factors such as the IPSec protocol version in use, mode of operation (transport mode or tunnel mode), cryptographic algorithms, cryptographic keys, lifetime of keys, etc. By now, we would have guessed that the principal objective of the IKE protocol is to establish an SA between the communicating parties. Once this is done, both major protocols of IPSec (i.e., AH and ESP) make use of SA for their actual operation.

Network Security

393

Fig. 11.21

Steps in IPSec operation

Note that if both AH and ESP are used, each communicating party requires two sets of SA, one for AH and one for ESP. Moreover, an SA is simplex, i.e., unidirectional. Therefore, at a second level, we need two sets of SA per communicating party, one for incoming transmission and another for outgoing transmission. Thus, if the two communicating parties use both AH and ESP, each of them would require four sets of SA, as shown in Fig. 11.22.

Fig. 11.22 Security association types and classifications Obviously, both the communicating parties must allocate some storage area for storing the SA information at their end. For this purpose, a standard storage area called as Security Association Database (SAD) is predefined and used by IPSec. Thus, each communicating party requires maintaining its own SAD. The SAD contains active SA entries. The contents of a SAD are shown in Table 11.2.

Web Technologies

394

Table 11.2 SAD fields Field

Description

Sequence number counter Sequence counter overflow

Anti-replay window AH authentication ESP authentication ESP encryption IPSec protocol mode Path Maximum Transfer Unit (PMTU) Lifetime

This 32-bit field is used to generate the sequence number field, which is used in the AH or ESP headers. This flag indicates whether the overflow of the sequence number counter should generate an audible event and prevent further transmission of packets on this SA. A 32-bit counter field and a bit map, which are used to detect if an incoming AH or ESP packet is a replay. AH authentication cryptographic algorithm and the required key. ESP authentication cryptographic algorithm and the required key. ESP encryption algorithm, key, Initial Vector (IV) and IV mode. Indicates which IPSec protocol mode (e.g., transport or tunnel) should be applied to the AH and ESP traffic. The maximum size of an IP datagram that will be allowed to through a given network path without fragmentation. Specifies the life of the SA. After this time interval, the SA must be replaced with a new one.

Having discussed the background of IPSec, let us now discuss the two main protocols in IPSec, which are AH and ESP.

11.2.3 Authentication Header (AH) AH format The Authentication Header (AH) provides for data integrity and authentication of IP packets. The data integrity service ensures that data inside IP packets is not altered during the transit. The authentication service enables an end or a computer system to authenticate the or the application at the other end, and decide to accept or reject packets, accordingly. This also prevents the IP spoofing attacks. Internally, AH is based on the MAC protocol, which means that the two communicating parties must share a secret key in order to use AH. The AH structure is shown in Fig. 11.23.

Fig. 11.23

Authentication Header (AH) format

Network Security

395 Let us discuss the fields in the AH now, as shown in Table 11.3.

Table 11.3

Authentication header field descriptions Field

Next header

Payload length

Reserved Security Parameter Index (SPI)

Sequence number Authentication data

Description This 8-bit field identifies the type of header that immediately follows the AH. For example, if an ESP header follows the AH, this field contains a value 50, whereas if another AH follows this AH, this field contains a value 51. This 8-bit field contains the length of the AH in 32-bit words minus 2. Suppose that the length of the authentication data field is 96 bits (or three 32-bit words). With a three-word fixed header, we have a total of 6 words in the header. Therefore, this field will contain a value of 4. This 16-bit field is reserved for future use. This 32-bit field is used in combination with the source and destination addresses as well as the IPSec protocol used (AH or ESP) to uniquely identify the Security Association (SA) for the traffic to which a datagram belongs. This 32-bit field is used to prevent replay attacks, as discussed later. This variable-length field contains the authentication data, called as the Integrity Check Value (ICV), for the datagram. This value is the MAC, used for authentication and integrity purposes. For IPv4 datagrams, the value of this field must be an integral multiple of 32. For IPv6 datagrams, the value of this field must be an integral multiple of 64. For this, additional padding bits may be required. The ICV is calculated generating a MAC using the HMAC digest algorithm.

Dealing with replay attacks Let us now study how AH deals with and prevents the replay attacks. To reiterate, in a replay attack, the attacker obtains a copy of an authenticated packet and later sends it to the intended destination. Since the same packet is received twice, the destination could face some problems because of this. To prevent this, as we know, the AH contains a field called as sequence number. Initially, the value of this field is set to 0. Every time the sender sends a packet to the same sender over the same SA, it increments the value of this field by 1. The sender must not allow this value to circle back from 232 – 1 to 0. If the number of packets over the same increases this number, the sender must establish a new SA with the recipient. On the receiver’s side, there is some more processing involved. The receiver maintains a sliding window of size W, with the default value of W = 64. The right edge of the window represents the highest sequence number N received so far, for a valid packet. For simplicity, let us depict a sliding window with W = 8, as shown in Fig. 11.24. Let us understand the significance of the receiver’s sliding window, and also see how the receiver operates on it. As we can see, the following values are used. n n

W: Specifies the size of the window. In our example, it is 8. N: Specifies the maximum highest sequence number so far received for a valid packet. N is always at the right edge of the window.

Web Technologies

396

Fig. 11.24

Receiver’s sliding window

For any packet with a sequence number in the range from (N – W + 1) to N that has been correctly received (i.e., successfully authenticated), the corresponding slot in the window is marked (see figure). On the other hand, any packet in this range, which is not correctly received (i.e., not successfully authenticated), the slot is unmarked (see figure). Now, when a receiver receives a packet, it performs the following action depending on the sequence number of the packet, as shown in Fig. 11.25.

1. If the sequence number of the received packet falls within the windows, and if the packet is new, its MAC is checked. If the MAC is successfully validated, the corresponding slot in the window is marked. The window itself does not move to the right-hand side. 2. If the received packet is to the right of the window [i.e., the sequence number of the packet is > N], and if the packet is new, the MAC is checked. If the packet is authenticated successfully, the window is advanced to the right in such a way that the right edge of the window now matches with the sequence number of this packet. That is, this sequence number now becomes the new N. 3. If the received packet is to the left of the window [i.e., the sequence number of the packet is < (N – W)], or if the MAC check fails, the packet is rejected, and an audible event is triggered.

Fig. 11.25

Sliding window logic used by the receiver for each incoming packet

Note that the third action thwarts replay attacks. This is because if the receiver receives a packet whose sequence number is less than (N – W), it concludes that someone posing as the sender is attempting to resend a packet sent by the sender earlier. We must also realize that in extreme conditions, this kind of technique can make the receiver believe that a transmission is in error, even though it is not the case. For example, suppose that the value of W is 64 and that of N is 100. Now suppose that the sender sends a burst of packets, numbered 101 to 500. Because of network congestions and other issues, suppose that the receiver somehow receives a packet with sequence number 300 first. It would immediately move the right edge of the window to 300 (i.e., N = 300 now). Now suppose that the receiver next receives packet number 102. From our calculations, N – W = 300 – 64 = 236. Therefore, the sequence number of the packet just received (102) is less than (N – W = 236). Thus, our third condition in the earlier list would get triggered, and the receiver would reject this valid packet, and raise an alarm. However, such situations are rare, and with an optimized value of W, such situations can be avoided.

Network Security

397

Modes of operation As we know, both AH and ESP can work in two modes, that is, the transport mode and the tunnel mode. Let us now discuss AH in the context of these two modes. AH transport mode In the transport mode, the position of the Authentication Header (AH) is between the original IP header and the original T header of the IP packet. This is shown in Fig. 11.26.

Fig. 11.26

AH transport mode

AH tunnel mode In the tunnel mode, the entire original IP packet is authenticated and the AH is inserted between the original IP header and a new outer IP header. The inner IP header contains the ultimate source and destination IP addresses, whereas the outer IP header possibly contains different IP addresses (e.g., IP addresses of the firewalls or other security gateways). This is shown in Fig. 11.27.

Fig. 11.27 AH tunnel mode

11.2.4 Encapsulating Security Payload (ESP) ESP format The Encapsulating Security Payload (ESP) protocol provides confidentiality and integrity of messages. ESP is based on symmetric key cryptography techniques. ESP can be used in isolation, or it can be combined with AH.

Web Technologies

398 The ESP packet contains four fixed-length fields, and three variable-length fields. Figure 11.28 shows the ESP format.

Fig. 11.28 Encapsulating Security Payload (ESP) format Let us discuss the fields in the ESP now, as shown in Table 11.4.

Table 11.4

ESP field descriptions Field

Security Parameter Index (SPI)

Sequence number Payload data Padding

Padding length Next header

Authentication data

Description This 32-bit field is used in combination with the source and destination addresses as well as the IPSec protocol used (AH or ESP) to uniquely identify the Security Association (SA) for the traffic to which a datagram belongs. This 32-bit field is used to prevent replay attacks, as discussed earlier. This variable-length field contains the transport layer segment (transport mode) or IP packet (tunnel mode), which is protected by encryption. This field contains the padding bits, if any. These are used by the encryption algorithm, or for aligning the padding length field, so that it begins at the third byte within the 4-byte word. This 8-bit field specifies the number of padding bytes in the immediately preceding field. This 8-bit field identifies the type of encapsulated data in the payload. For example, a value 6 in this field indicates that the payload contains T data. This variable-length field contains the authentication data, called as the Integrity Check Value (ICV), for the datagram. This is calculated over the length of the ESP packet minus the Authentication Data field.

Modes of operation ESP, like AH, can operate in the transport mode or the tunnel mode. Let us discuss these two possibilities now.

Network Security

399

ESP transport mode Transport mode ESP is used to encrypt, and optionally authenticate the data carried by IP (for example, a T segment). Here, the ESP is inserted into the IP packet immediately before the transport layer header (i.e., T or UDP), and an ESP trailer (containing the fields Padding, Padding length and Next header) is added after the IP packet. If authentication is also used, the ESP Authentication Data field is added after the ESP trailer. The entire transport layer segment and the ESP trailer are encrypted. The entire cipher text, along with the ESP header is authenticated. This is shown in Fig. 11.29.

Fig. 11.29

ESP transport mode

We can summarize the operation of the ESP transport mode as follows. 1. At the sender’s end, the block of data containing the ESP trailer and the entire transport layer segment is encrypted and the plain text of this block is replaced with its corresponding cipher text to form the IP packet. Authentication is appended, if selected. This packet is now ready for transmission. 2. The packet is routed to the destination. The intermediate routers need to take a look at the IP header as well as any IP extension headers, but not at the cipher text. 3. At the receiver’s end, the IP header plus any plain text IP extension headers are examined. The remaining portion of the packet is then decrypted to retrieve the original plain text transport layer segment.

ESP tunnel mode The tunnel mode ESP encrypts an entire IP packet. Here, the ESP header is pre-fixed to the packet, and then the packet along with the ESP trailer is encrypted. As we know, the IP header contains the destination address as well as intermediate routing information. Therefore, this packet cannot be transmitted as it is. Otherwise, the delivery of the packet would be impossible. Therefore, a new IP header is added, which contains sufficient information for routing. This is shown in Fig. 11.30. We can summarize the operation of the ESP tunnel mode as follows. 1. At the sender’s end, the sender prepares an inner IP packet with the destination address as the internal destination. This packed is pre-fixed with an ESP header, and then the packet and ESP trailer are

Web Technologies

400 encrypted and Authentication Data is (optionally) added. A new IP header is added to the start of this block. This forms the outer IP packet. 2. The outer packet is routed to the destination firewall. Each intermediate router needs to check and process the outer IP header, along with any other outer IP extension headers. It need not know about the cipher text. 3. At the receiver’s end, the destination firewall processes the outer IP header plus any extension headers, and recovers the plain text from the cipher text. The packet is then sent to the actual destination host.

Fig. 11.30

ESP tunnel mode

11.2.5 IPSec Key Management Introduction Apart from the two core protocols (AH and ESP), the third most significant aspect of IPSec is key management. Without a proper key management set up, IPSec cannot exist. This key management in IPSec consists of two aspects, which are, key agreement and distribution. As we know, we require four keys if we want to make use of both AH and ESP: two keys for AH (one for message transmissions, one for message receiving), and two keys for ESP (one for message transmissions, one for message receiving). The protocol used in IPSec for key management is called as ISAKMP/Oakley. The Internet Security Association and Key Management Protocol (ISAKMP) protocol a platform for key management. It defines the procedures and packet formats for negotiating, establishing, modifying and deleting SAs. ISAKMP messages can be transmitted via the T or UDP transport protocol. T and UDP port number 500 is reserved for ISAKMP. The initial version of ISAKMP mandated the use of the Oakley protocol. Oakley is based on the DiffieHellman key exchange protocol, with a few variations. We will first take a look at Oakley, and then examine ISAKMP.

Oakley key determination protocol The Oakley protocol is a refined version of the Diffie-Hellman key exchange protocol. We will not discuss the concepts of Diffie-Hellman, as they are not relevant here. However, we will note here that Diffie-Hellman offers two desirable features.

Network Security

401 (a) Creation of secret keys is possible as and when required. (b) There is no requirement for any pre-existing infrastructure. However, Diffie-Hellman also suffers from a few problems, as follows. n n n

It does not contain mechanism for authentication of the parties. It is vulnerable to man-in-the-middle-attack. It involves a lot of mathematical processing. An attacker can take undue advantage of this by sending a number of hoax Diffie-Hellman requests to a host. The host can unnecessarily spend a large amount of time in trying to compute the keys, rather than doing any actual work. This is called as congestion attack or clogging attack.

The Oakley protocol is designed to retain the advantages of Diffie-Hellman, and to remove its drawbacks. The features of Oakley are as follows. 1. 2. 3. 4.

It has features to defeat replay attacks. It implements a mechanism called as cookies to defeat congestion attacks. It enables the exchange of Diffie-Hellman public key values. It provides authentication mechanisms to thwart man-in-the-middle attacks.

We have already discussed the Diffie-Hellman key exchange protocol in great detail. Here, we shall simply discuss the approaches taken by Oakley to tackle the issues with Diffie-Hellman.

Authentication Oakley s three authentication mechanisms: digital signatures (generation of a message digest, and its encryption with the sender’s private key), public key encryption (encrypting some information such as the sender’s id with the recipient’s public key), and secret key encryption (a key derived by using some out-of-band mechanisms).

Dealing with congestion attacks Oakley uses the concept of cookies to thwart congestion attacks. As we know, in this kind of attack, an attacker forges the source address of another legitimate and sends a public Diffie-Hellman key to another legitimate . The receiver performs modular exponentiation to calculate the secret key. A number of such calculations performed rapidly one after the other can cause congestion or clogging of the victim’s computer. To tackle this, each side in Oakley must send a pseudo-random number, called as cookie, in the initial message, which the other side must acknowledge. This acknowledgement must be repeated in the first message of Diffie-Hellman key exchange. If an attacker forges the source address, she does not get the acknowledgement cookie from the victim, and her attack fails. Note that at the most the attacker can force the victim to generate and send a cookie, but not to perform the actual Diffie-Hellman calculations. The Oakley protocol provides for a number of message types. For simplicity, we shall consider only one of them, called as aggressive key exchange. It consists of three message exchanges between the two parties, say X and Y. Let us examine these three messages.

Message 1 To begin with, X sends a cookie and the public Diffie-Hellman key of X for this exchange, along with some other information. X signs this block with its private key.

Message 2 When Y receives message 1, it verifies the signature of X using the public key of X. When Y is satisfied that the message indeed came from X, it prepares an acknowledgement message for X, containing the cookie sent by X. Y also prepares its own cookie and Diffie-Hellman public key, and along with some other information, it signs the whole package with its private key.

Web Technologies

402

Message 3 Upon receipt of message 2, X verifies it using the public key of Y. When X is satisfied about it, it sends a message back to Y to inform that it has received Y’s public key. ISAKMP The ISAKMP protocol defines procedures and formats for establishing, maintaining and deleting SA information. An ISAKMP message contains an ISAKMP header followed by one ore more payloads. The entire block is encapsulated inside a transport segment (such as T or UDP segment). The header format for ISAKMP messages is shown in Fig. 11.31.

Fig. 11.31 ISAKMP header format Let us discuss the fields in the ISAKMP header now, as shown in Table 11.5.

Table 11.5 ISAKMP header field descriptions Field Initiator cookie Responder cookie

Next payload Major version Minor version Exchange type Flags Message ID Length

Description This 64-bit field contains the cookie of the entity that initiates the SA establishment or deletion. This 64-bit field contains the cookie of the responding entity. Initially, this field contains null when the initiator sends the very first ISAKMP message to the responder. This 8-bit field indicates the type of the first payload of the message (discussed later). This 4-bit field identifies the major ISAKMP protocol version as used in the current exchange. This 4-bit field identifies the minor ISAKMP protocol version as used in the current exchange. This 8-bit field indicates the type of exchange (discussed later). This 8-bit field indicates the specific set of options for this ISAKMP exchange. This 32-bit field identifies the unique id for this message. This 32-bit field specifies the total length of the message, including the header and all the payloads in octets.

Network Security

403 Let us quickly discuss the fields not explained yet.

Payload types ISAKMP specifies different payload types. For example, an SA payload is used to start establishment of an SA. The proposal payload contains information used during the SA establishment. The key exchange payload indicates for exchanging keys using mechanisms such as Oakley, Diffie-Hellman, RSA, etc. There are many other payload types. Exchange types There are five exchange types defined in ISAKMP. The base exchange allows the transmission of the key and authentication material. The identity protection exchange expands the base exchange to protect the identities of the . The authentication only exchange is used to perform mutual authentication. The aggressive exchange attempts to minimize the number of exchanges at the cost of hiding the ’s identities. The information exchange is used for one-way transmission of information for SA management.

11.3 VIRTUAL PRIVATE NETWORKS (VPN) 11.3.1 Introduction Until very recently, there has been a very clear demarcation between public and private networks. A public network, such as the public telephone system and the Internet, is a large collection of communicators who are generally unrelated with each other. In contrast, a private network is made up of computers owned by a single organization, which share information with each other. Local Area Networks (LAN), Metropolitan Area Networks (MAN) and Wide Area Networks (WAN) are examples of private networks. A firewall usually separates a private network from a public network. Let us assume that an organization wants to connect two of its branch networks to each other. The trouble is that these branches are located quite a distance apart. One branch may be in Delhi, and the other branch may be in Mumbai. Two following solutions out of all the available ones seem logical: n

n

Connect the two branches using a personal network, i.e., lay cables between the two offices yourself, or obtain a leased line between the two branches. Connect the two branches with the help of a public network, such as the Internet.

The first solution gives far more control and offers a sense of security, as compared to the second solution. However, it is also quite complicated. Laying cables between two cities is not easy, and is usually not permitted either. The second solution seems easier to implement, as there is no special infrastructure setup required. However, it also seems to be vulnerable to possible attacks. It would be a perfect situation if we could combine the two solutions! Virtual Private Networks (VPN) offers such a solution. A VPN is a mechanism of employing encryption, authentication and integrity protection so that we can use a public network (such as the Internet) like a private network (i.e., a physical network created and controlled by you). VPN offers a high amount of security, and yet does not require any special cabling to be laid by the organization that wants to use it. Thus, a VPN combines the advantages of a public network (cheap and easily available) with those of a private network (secure and reliable). A VPN can connect distant networks of an organization, or it can be used to allow travelling s to remotely access a private network (e.g., the organization’s intranet) securely over the Internet. A VPN is thus a mechanism to simulate a private network over a public network, such as the Internet. The term virtual signifies that it depends on the use of virtual connections. These connections are temporary, and do not have any physical presence. They are made up of packets.

Web Technologies

404

11.3.2 VPN Architecture The idea of a VPN is actually quite simple to understand. Suppose an organization has two networks, Network 1 and Network 2, which are physically apart from each other, and we want to connect them using the VPN approach. In such a case, we set up two firewalls, Firewall 1 and Firewall 2. The encryption and decryption are performed by the firewalls. The architectural overview is shown in Fig. 11.32.

Fig. 11.32 VPN between two private networks We have shown two networks, Network 1 and Network 2. Network 1 connects to the Internet via a firewall named Firewall 1. Similarly, Network 2 connects to the Internet with its own firewall, Firewall 2. We shall not worry about the configuration of the firewall here, and shall assume that the best possible configuration is selected by the organization. However, the key point to note here is that the two firewalls are virtually connected to each other via the Internet. We have shown this with the help of a VPN tunnel between the two firewalls. With this configuration in mind, let us understand how the VPN protects the traffic ing between any two hosts on the two different networks. For this, let us assume that host X on Network 1 wants to send a data packet to host Y on Network 2. This transmission would work as follows. 1. Host X creates the packet, inserts its own IP address as the source address, and the IP address of host Y as the destination address. This is shown in Fig. 11.33. It sends the packet using the appropriate mechanism. 2. The packet reaches Firewall 1. As we know, Firewall 1 now adds new headers to the packet. In these new headers, it changes the source IP address of the packet from that of host X to its own address (i.e., the IP address of Firewall 1, say F1). It also changes the destination IP address of the packet from that of host Y to the IP address of Firewall 2, say F2). This is shown in Fig. 11.34. It also performs the packet encryption and authentication, depending on the settings, and sends the modified packet over the Internet. 3. The packet reaches Firewall 2 over the Internet, via one or more routers, as usual. Firewall 2 discards the outer header and performs the appropriate decryption and other cryptographic functions as necessary. This yields the original packet, as was created by host X in step 1. This is shown in Fig. 11.35. It then takes a look at the plain text contents of the packet, and realizes that the packet is meant for

Network Security

405 host Y (because the destination address inside the packet specifies host Y). Therefore, it delivers the packet to host Y.

Fig. 11.33

Fig. 11.34

Fig. 11.35

Original packet

Firewall 1 changes the packet contents

Firewall 2 retrieves the original packet contents

There are three main VPN protocols. A detailed study of these protocols is beyond the scope of the current text. However, we shall briefly discuss them for the sake of completeness. n

n

n

The Point-to-Point Tunnelling Protocol (PPTP) is used on Windows NT systems. It mainly s the VPN connectivity between a single and a LAN, rather than between two LANs. Developed by IETF, the Layer 2 Tunnelling Protocol (L2TP) is an improvement over PPTP. L2TP is considered as the secure open standard for VPN connections. It works for both combinations: to-LAN and LAN-to-LAN. It can include the IPSec functionality as well. Finally, IPSec can be used in isolation. We have discussed IPSec in detail earlier.

Web Technologies

406

SUMMARY l l l l

l

l

l

l

l

l l

l l l l l

l

l l l

Firewalls are specialized routers, which filter unwanted content. A firewall can be configured to only allow specific traffic, while getting rid of unwanted traffic. A firewall can be of two types: packet filter and application gateway. A packet filter examines every packet for suspicious/banned content (e.g., a packet containing some specific words) and allows or stops it. An application gateway does not worry about the contents of the packet too much. Instead, it focusses on the application layer protocol in use. For example, it can allow all HTTP traffic and SMTP traffic, but ban FTP traffic. Usually, a combination of a packet filter and an application gateway is used to ensure both protocol layer security, as well as packet layer security. A special type of firewall called as proxy server or circuit gateway can be used to improve the security even further. Here, a special server called as proxy server is set up as the firewall, which acts as the middle layer between the internal network and the rest of the Internet. The proxy server receives outgoing packets from an internal host, and instead of forwarding them to the external server, opens a separate connection with the external server and then sends the packets. Thus, with a proxy server, there are two separate connections—one between the internal host and the proxy, and the second between the proxy and the external server. Proxy server helps save the internal network details from the external networks. The IPSec protocol is used to connect two firewalls at two ends to create a Virtual Private Network (VPN). A VPN allows organizations to use the public, free Internet as if it is a private network. VPN allows two firewalls at the two ends of its connection to handle encryption, message integrity, etc. VPN can work in two modes: transport mode and tunnel mode. Transport mode protects an IP datagram, excluding its IP header. Tunnel mode protects an IP datagram, including its IP header. Thus, the original sender and the final recipient details also get hidden from the intermediate routers/networks. IPSec protocol has two sub-protocols: Authentication Header (AH) and Encapsulating Security Payload (ESP). AH takes care of the message integrity and authentication details. ESP ensures message confidentiality. AH and ESP can work either independently, or also together.

REVIEW QUESTIONS Multiple-choice Questions 1. Firewall works at the layer. (a) application (b) transport

(c) network

(d) data link

Network Security

407 2. Application gateway looks at the layer protocols. (a) application (b) transport (c) network (d) data link 3. Packet filter looks at the layer protocols. (a) application (b) transport (c) network (d) data link 4. A is used to establish two separate connections from the to the end server. (a) packet filter (b) application gateway (c) DMZ (d) proxy server 5. VPN makes use of . (a) Internet (b) leased lines (c) wireless networks only (d) LAN 6. In the , the original entire IP datagram is encapsulated into another. (a) transport mode (b) tunnel mode (c) none of these (d) both of these 7. In the , only the T segment is encapsulated into another IP datagram. (a) transport mode (b) tunnel mode (c) none of these (d) both of these ensures message authentication/integrity. 8. The (a) ESP (b) AH (c) ISAKMP (d) SA 9. The ensures message confidentiality. (a) ESP (b) AH (c) ISAKMP (d) SA 10. The is used to identify a unique VPN connection. (a) ESP (b) AH (c) ISAKMP (d) SA

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Explain the concept of a firewall. What are the various types of firewalls? Explain in brief. Discuss packet filters in detail. Explain application gateways. How are proxy servers useful? What is a VPN? How does it work? What is the AH protocol? Explain the ESP protocol. Discuss the idea of SA. Describe the usage of VPN in practical life.

Exercises 1. 2. 3. 4. 5.

Study at least one firewall product. Document its features. Study the proxy server implementation in at least one college/organization. What is SSL VPN? Study in detail. What does it take to implement VPN? Examine both the client and the server sides. Which VPN products are most popular? Why?

Web Technologies

408

Online Payments

+D=FJAH

12

INTRODUCTION Making online payments error free and secure is one of the biggest challenges of the Internet world. Several challenges exist. For one, the payer and the payee do not meet or see each other, in contrast to what happens in many paper-based payments. There is no paper evidence for online transactions (e.g., there is no cheque or demand draft). Nobody signs anything by hand. And even if we can find some ways to thwart all these challenges so as to make the payment process possible, there are a number of risks to deal with. Since the payer and the payee do not see each other, or cannot even feel anything about the other, where is the question of trusting each other? Also, the payer can make a payment, and later on claim that someone else has used her credentials to make the payment, thereby forging the transaction. The payee can claim that she received a payment instruction, and therefore, went ahead with the payment transaction. The payee can be an attacker herself—thus accepting payments, and then running away with them, without actually supplying goods or services in return! In the case of a genuine payer making a successful payment to a genuine payee, an attacker can silently observe the payer’s payment details (e.g., the credit card details) and later misuse them. As we can see, this is quite an interesting headache to solve! In the late 1990s and the early part of the new century, several online payment protocols emerged. Everyone of them was supposed to be the best available in the market, most secure, and quite authentic. Soon, the market was proliferated with so many online payment protocols that the situation became confusing and chaotic. When the electronic commerce boom of the years 2000–2002 turned out to be a doom, most of the online payment protocols just withered away. Also, wiser decisions led to consolidation and standardization of a number of payment protocols into only a few. We review some of these key online payment protocols in this chapter. This is a very dynamic area. So, chances are that the evolution in the space of online payment protocols will continue for quite some time to come.

12.1 PAYMENTS USING CREDIT CARDS 12.1.1 Brief History of Credit Cards The first modern credit card was issued by the Franklin National bank in New York in 1951. They sent unsolicited credit cards to prospective customers without ing their credit screening. Various merchants signed

Online Payments

409 agreements with the bank, guaranteeing the acceptance of the cards. When a customer made purchases using the card, she would present the card to the merchant. The merchant would copy the information on the strip of the card on the sales slip. The merchant would then present a collection of these sales slips to the bank, which would credit the merchant with the sales amount. In the late 1950s, hundreds of other banks also started providing credit cards to their customers. However, this approach had one major drawback. The customers could use their credit cards in their own geographic area, and could make payments using the card only at the merchants who had also signed up with their own bank (i.e., the customer’s bank). To resolve this problem, Bank of America started the licensing of a few banks outside California to issue their card, the BankAmericard. That is, all the banks participating in this licensing scheme could issue a card to any of their customers. The customers could use the card at any of the merchants, who also had an with one of the participating banks. For example, suppose that a customer had a credit card issued by bank A. The customer could use that card to make payment to a merchant who had tied up with bank B to accept card payments. This would work fine as long as both banks (A and B) had entered the licensing agreement with Bank of America. This arrangement worked fine for the banks that obtained the BankAmericard licence. (This network was later renamed to Visa in 1976.) However, this arrangement did not cover all the banks. Therefore, these left out banks got together in New York in 1966 to form their own card network, called as Interbank Card Association, which later became MasterCard International. As Visa and MasterCard gained popularity and acceptance, most banks started ing one of these groups, rather than entering the credit cards business on their own. All these participant banks agreed to display the bank name as well as the group name (Visa or MasterCard) on the card, to signify which group the bank belonged to. Now, both Visa and MasterCard have become immensely popular worldwide and every year, about 100 million customers get added to one of these groups. What is the prime job of Visa and MasterCard? These associations perform the authorizations, clearing and settlement that allow a bank’s credit card to be used at any merchant site that is a member of either of these associations. These associations also ensure security and fraud control. They are responsible for setting standards worldwide for card issuance, acceptance and compatibility among member banks.

12.1.2 Credit Card Transaction Participants Having discussed the history of credit cards in brief, let us now pay our attention to the main parties in a credit card transaction. The four main parties are (a) Cardholder, (b) Merchant, (c) Bank and (d) Association.

Cardholder Cardholder is the customer who uses a credit card to make payments for purchasing goods or services. A cardholder does not require carrying cash when making purchases. She does not also need to take loan every month to buy first and pay later. A credit card addresses both these issues. Using a credit card allows the customer to make purchases without needing to pay in cash. Secondly, the customer can make purchases first and then pay for them later (as per the credit card agreement with the bank). In the case of lost cash, there is a very high scope for misuse. However, in the case of a lost credit card, the customer’s liability is limited.

Merchant From a merchant’s perspective, credit cards provide several attractions. Generally, the convenience of credit cards induces customers to make high-value and impulsive purchases more often. Validating credit cards is also quite easy. To authorize a sale, the merchant can swipe the customer’s credit card through a Point Of Sale (POS) terminal, via which the credit card information travels to the authorization network. This process results into the validation of the card.

Web Technologies

410

Bank The usage of credit cards gives banks more customers, that is, both the cardholders and the merchants. When a bank issues a credit card to a cardholder, it is called as issuing bank. When a merchant ties up with a bank to accept credit card payments, that bank becomes the acquiring bank.

Association By association, we mean Visa or MasterCard. These bodies are owned by their member banks, and are governed by separate board of directors. Apart from licensing, setting up regulations, conducting research and analysis, etc., their main task is to process credit card payments. Processing millions of card transactions every day necessitates standardization and automation in clearing, interchange, and payment settlement.

12.1.3 Sample Credit Card Payment Flow Let us now understand how a typical transaction using credit card takes place. There are two distinct phases in any credit card transaction: clearing and settlement. 1. Clearing is the process by which the transaction information is ed from the acquirer to the cardholder via the issuer to effect posting to the cardholder . There is no transfer of funds in the clearing process. 2. Settlement is the process in which actual funds are transferred from the cardholder to the acquirer. Let us understand this with an example, where the customer has made a purchase worth $100 using her card. Note that in the settlement process, the cardholder pays $100 to the issuer bank. The issuer bank pays only $98.50 to the association (Visa or MasterCard), retaining the $1.50 as its income. The association pays that amount to the acquirer bank. The acquirer bank pays only $97 to the merchant, retaining the $1 as its profit. Figure 12.1 shows the clearing process, and Fig. 12.2 shows the settlement process. Note that the last step in the settlement process (the delivery of goods or services) is not usually the last in the sequence. It is many times done before the settlement phase is entered. However, it is shown here in the settlement phase simply to complete the logical flow.

Fig. 12.1

Clearing process

At the broadest level, the credit card processing models in e-commerce transactions can be classified into two categories, based on who takes on the job of processing credit cards and making payments. These two models are as follows.

Without involving a payment gateway This follows the traditional (manual) approach of credit card processing. Here, a third party (called as a payment gateway) is not involved in the credit card processing. Therefore, it is left to the merchant to process credit cards online.

Online Payments

411

Involving a payment gateway In this type of credit card processing mechanism, a third party specializing in credit card processing, i.e., the payment gateway, is involved. A payment gateway is a third party essentially taking care of the routing of messages between the merchant and the banks.

Fig. 12.2

Settlement process

The payments related to e-commerce transactions pose the following difficulties. n n

n

n

n

Settlement of payment by physical means slows down the process and is inconvenient. The buyer and seller are not physically present at the same place during the transaction and often may be completely unknown to each other. Therefore, although they may be genuine, their identities need to be authenticated. The Internet being a public network raw transmission of payment data (for example, credit card and amount details) to the merchant or any other party is highly unsafe. A payment gateway facilitates e-commerce payments by authenticating the parties involved, routing payment related data between these parties and the concerned banks/financial institutions in a highly secure environment and providing general to them. The merchant ties up with a payment gateway, which takes on the responsibility of processing credit cards on the merchant’s behalf. The payment gateway ties up with all the banks and financial institutions, whose participation is required for effecting electronic payments, relieving the merchant of these requirements. The payment gateways are independent companies offering payment solutions to merchants for effecting online payments.

As we mentioned, this model of processing credit cards is very similar to the way shops and restaurants process credit cards in the manual scenario. The same process is mimicked using the Internet technologies. This happens as explained below.

Stage 1: Verification In this stage, the credit card details of the customer are verified with the help of a number of financial institutions. Let us first take a look at Fig. 12.3, which is explained later. Let us understand the process. 1. The customer provides the credit card details such as the credit card number, expiry date and the customer’s name as it appears on the credit card, to the merchant. In the early days of e-commerce transactions, the customer would send these details by email, or by filling up an online form. However, due to security issues realized later, the email approach is discouraged these days, and if the customer enters these details in an online form, this involves an SSL session between the merchant and the customer.

Web Technologies

412

Fig. 12.3

Payment verification process

2. The merchant would forward this information (via another SSL-enabled session) to its own bank, called as the acquiring bank. 3. The acquiring bank would then forward these credit cards details, in turn, all the way, to the customer’s bank, called as the issuing bank, via the card association. 4. The card-issuing bank would information such as the credit card details, the customer’s credit limit, whether the credit card is in the list of stolen credit cards, etc., and send the appropriate status back to the merchant’s acquiring bank. 5. The merchant’s acquiring bank would then forward the status message back to the merchant. 6. Depending on whether the credit card was validated successfully or not, the merchant would either process the order, or reject it, and inform the customer accordingly.

Stage 2: Payment Having verified the credit card details of the customer, the actual payment processing has to now happen. This is shown in Fig. 12.4. The merchant would collect all such credit card transactions that took place in a particular day, and send this list to its acquiring bank for obtaining payment for them. The acquiring bank would then interact with the various card-issuing banks through the card-association clearing house (a financial institution that settles credit card payments between banks, i.e., Visa or MasterCard, just as a clearing house settles check payments within banks), and debit the appropriate card-issuing bank s of the customers, and credit the merchant’s acquiring bank appropriately. Notice that the merchant is directly dealing with its acquiring bank here. Over a period of time, people realized that the merchant had to take too many responsibilities in such a model, and that gave birth to the concept of a payment gateway. A payment gateway is a third party, that acts as a middleman between merchants, acquiring banks and card-issuing banks to authorize credit cards and ensure that the money is transferred from

Online Payments

413 the customer’s to the merchant’s . This relieves the merchant from all these tasks, which it has to otherwise take upon itself.

Fig. 12.4 Payment process

12.2 SECURE ELECTRONIC TRANSACTION (SET) 12.2.1 Introduction The Secure Electronic Transaction (SET) is an open encryption and security specification that is designed for protecting credit card transactions on the Internet. The pioneering work in this area was done in 1996 by MasterCard and Visa tly. They were ed by IBM, Microsoft, Netscape, RSA, Terisa and VeriSign. Starting from that time, there have been many tests of the concept, and by 1998, the first generation of SETcompliant products appeared in the market. The need for SET came from the fact that MasterCard and Visa realized that for e-commerce payment processing, software vendors were coming up with new and conflicting standards. Microsoft mainly drove these on one hand, and IBM on the other. To avoid all sorts of future incompatibilities, MasterCard and Visa decided to come up with a standard, ignoring all their competition issues, and in the process, involving all the major software manufacturers. SET is not a payment system. Instead, it is a set of security protocols and formats that enable the s to employ the existing credit card payment infrastructure on the Internet in a secure manner. SET services can be summarized as follows. 1. It provides a secure communication channel among all the parties involved in an e-commerce transaction. 2. It provides authentication by the use of digital certificates. 3. It ensures confidentiality, because the information is only available to the parties involved in a transaction, and that too only when and where necessary.

Web Technologies

414 SET is a very complex specification. In fact, when released, it took 971 pages to describe SET across three books! (Just for the record, SSL Version 3 needs 63 pages to describe). Thus, it is not possible to discuss it in great detail. However, we shall summarize the key points.

12.2.2 SET Participants Before we discuss SET, let us summarize the participants in the SET system.

Cardholder Using the Internet, consumers and corporate purchasers interact with merchants for buying goods and services. A cardholder is an authorized holder of a payment card such as MasterCard or Visa that has been issued by an Issuer (discussed subsequently).

Merchant A merchant is a person or an organization that wants to sell goods or services to cardholders. A merchant must have a relationship with an Acquirer (discussed subsequently) for accepting payments on the Internet.

Issuer The issuer is a financial institution (such as a bank) that provides a payment card to a cardholder. The most critical point is that the issuer is ultimately responsible for the payment of the cardholder’s debt.

Acquirer This is a financial institution that has a relationship with merchants for processing payment card authorizations and payments. The reason for having acquirers is that merchants accept credit cards of more than one brand, but are not interested in dealing with so many bankcard organizations or issuers. Instead, an acquirer provides the merchant an assurance (with the help of the issuer) that a particular cardholder is active and that the purchase amount does not exceed the credit limits, etc. The acquirer also provides electronic funds transfer to the merchant . Later, the issuer reimburses the acquirer using some payment network.

Payment Gateway This is a task which can be taken up by the acquirer or it can be taken up by an organization as a dedicated function. The payment gateway processes the payment messages on behalf of the merchant. Specifically in SET, the payment gateway acts as an interface between SET and the existing card payment networks for payment authorizations. The merchant exchanges SET messages with the payment gateway over the Internet. The payment gateway, in turn, connects to the acquirer’s systems using a dedicated network line in most cases.

Certification Authority (CA) As we know, this is an authority that is entrusted to provide public key certificates to cardholders, merchants and payment gateways. In fact, CAs are very crucial to the success of SET.

12.2.3 The SET Process Let us now take a simplistic look at the SET process before we describe the technical details of the SET process.

The customer opens an The customer opens a credit card (such as MasterCard or Visa) with a bank (issuer) that s electronic payment mechanisms and the SET protocol.

The customer receives a certificate After the customer’s identity is verified (with the help of details such as port, business documents, etc.), the customer receives a digital certificate from a CA. The certificate also contains details such as the customer’s public key and its expiration date.

The merchant receives a certificate A merchant that wants to accept a certain brand of credit cards must possess a digital certificate.

Online Payments

415

The customer places an order This is a typical shopping cart process wherein the customer browses the list of items available, searches for specific items, selects one or more of them, and places the order. The merchant, in turn, sends back details such as the list of items selected, their quantities, prices, total bill, etc., back to the customer for his record, with the help of an order form.

The merchant is verified The merchant also sends its digital certificate to the customer. This assures the customer that he is dealing with a valid merchant.

The order and payment details are sent The customer sends both the order and payment details to the merchant along with the customer’s digital certificate. The order confirms the purchase transaction with reference to the items mentioned in the order form. The payment contains credit card details. However, the payment information is so encrypted that the merchant cannot read it. The customer’s certificate assures the merchant of the customer’s identity.

The merchant requests payment authorization The merchant forwards the payment details sent by the customer to the payment gateway via the acquirer (or to the acquirer if the acquirer also acts as the payment gateway) and requests the payment gateway to authorize the payment (i.e., ensure that the credit card is valid and that the credit limits are not breached).

The payment gateway authorizes the payment Using the credit card information received from the merchant, the payment gateway verifies the details of the customer’s credit card with the help of the issuer, and either authorizes or rejects the payment.

The merchant confirms the order Assuming that the payment gateway authorizes the payment, the merchant sends a confirmation of the order to the customer.

The merchant provides goods or services The merchant now ships the goods or provides the services as per the customer’s order.

The merchant requests payment The payment gateway receives a request from the merchant for making the payment. The payment gateway interacts with the various financial institutions such as the issuer, acquirer and the clearing house to effect the payment from the customer’s to the merchant’s .

12.2.4 How SET Achieves its Objectives The main concern with online payment mechanisms is that they demand that the customer send his credit card details to the merchant. There are two aspects of this. One is that the credit card number travels in clear text, which provides an intruder with an opportunity to know that number and use it with malicious intentions (for instance, to make his own payments using that credit card number). The second issue is that the credit card number is available to the merchant, who can misuse it. The first concern is generally dealt with by SSL. Since all information exchange in SSL happens in an encrypted form, an intruder cannot make any sense out of it. Therefore, even if an intruder is able to listen to an active conversation between a client and a server over the Internet, as long as the session is SSL-enabled, the intruder’s intentions will be defeated. However, SSL does not achieve the second objective, which is of protecting the credit card number from the merchant. In this context, SET is very important, as it hides the credit card details from the merchant. The way SET hides the cardholder’s credit card details from the merchant is quite interesting. For this, SET relies on the concept of a digital envelope. The following steps illustrate the idea.

Web Technologies

416 1. The SET software prepares the Payment Information (PI) on the cardholder’s computer (which primarily contains the cardholder’s credit card details) exactly the same way as it happens in any Web-based payment system. 2. However, what is specific to SET is that the cardholder’s computer now creates a one-time session key. 3. Using this one-time session key, the cardholder’s computer now encrypts the payment information. 4. The cardholder’s computer now wraps the one-time session key with the public key of the payment gateway to form a digital envelope. 5. It then sends the encrypted payment information (step 3) and the digital envelope (step 5) together to the merchant (who has to it on to the payment gateway). Now, the following points are important. The merchant has access only to the encrypted payment information, so it cannot read it. If it were to read it, it would need to know the one-time session key that was used to encrypt the payment information. However, the one-time session key itself is further encrypted by the payment gateway’s public key (to form a digital envelope). The only way to open the digital envelope, that is, to obtain the original one-time session key, is to use the payment gateway’s private key. And as we know very well, the whole idea behind a private key is that it must be kept private. So, it is expected that only the payment gateway knows its private key—the merchant does not know it. Therefore, it cannot open the envelope and know the one-time session key, and thus it cannot also decrypt the original payment information. Thus, SET achieves its objective of hiding the payment details from the merchant using the concept of digital envelope.

12.2.5 SET Internals Let us discuss the major transactions ed by SET. They are Purchase Request, Payment Authorization and Payment Capture.

Purchase Request Before the Purchase Request transaction begins, the cardholder is assumed to have completed browsing, selecting and ordering of items. This preliminary phase ends when the merchant sends a completed order form to the customer over the Web. SET is not used in any of these steps. SET begins when the Purchase Request starts. The Purchase Request exchange is made up of four messages: Initiate Request, Initiate Response, Purchase Request and Purchase Response.

Step 1: Initiate Request In order to send SET messages to the merchant, the cardholder must have the digital certificates of the merchant as well as that of the payment gateway. There are three agencies involved: (a) The agency that issues credit cards (the issuer, which is a Financial Institution or FI), (b) Certification Authority (CA) and (c) Payment Gateway (PG), which can be the same as the acquirer. There can be only one, two or three organizations carrying out these functions, as one organization can perform more than one function. However, for the sake of clarity, we shall assume that there are three separate entities, which are described in brief as follows. (a) A Financial Institution (FI), such as MasterCard or Visa, issues credit cards for people to make purchases without making cash payments. (b) We have discussed Certification Authorities (CA) earlier in detail. They authenticate individuals/ organizations and issue digital certificates to them for conducting electronic commerce transactions. CAs help in ensuring non-fraudulent transactions on the Web.

Online Payments

417 (c) Payment gateways are third party payment processors, who process payments on behalf of merchants by tying up with FIs and banks. We have discussed them earlier. In some cases, the financial institutions outsource the functions of the payment gateway to third parties. Thus, there can be various models for this. The cardholder requests the merchant’s certificates in the Initiate Request message. The cardholder also sends the name of its credit card company and an id created by the cardholder for this interaction process, to the merchant in this message, as shown in Fig. 12.5.

Fig. 12.5

Initiate Request

Step 2: Initiate Response The merchant generates a response and signs it with its private key. The response includes a transaction id for this transaction (created by the merchant), the merchant’s digital certificate and the payment gateway’s digital certificate. This message is called as Initiate Response, as shown in Fig. 12.6.

Fig. 12.6 Initiate Response Step 3: Purchase Request The cardholder verifies the digital certificates of the merchant and that of the payment gateway by means of their respective CA signatures and then creates an Order Information (OI) and Payment Information (PI). The transaction id created by the merchant is added to both OI and PI. The OI does not contain explicit order details such as item numbers and prices. Instead, it has references to the shopping phase between the customer and the selected merchant (such as order number, transaction date and card type) that precedes the Purchase Request phase (i.e., using the shopping cart saved in the merchant’s database). PI contains details such as credit card information, purchase amount and order description. The cardholder now prepares the Purchase Request. For this, he generates a one-time symmetric key (say K). The Purchase Request message contains the following.

Purchase-related information This information is mainly for the payment gateway. (a) It contains (a) PI, (b) digital signature calculated over PI and OI, and (c) OI Message Digest (OIMD), which is the message digest calculated over OI by g it with the cardholder’s private key. (b) All these are encrypted with K. (c) Finally, the digital envelope is created by encrypting K with the payment gateway’s public key. The name envelope signifies that it must be decrypted first before any other PI can be accessed. The value of K is not made available to the merchant, and therefore, it cannot read any of the payment-related information. Instead, it forwards this to the payment gateway.

Order-related information The merchant needs this information. It consists of the OI, the signature calculated over PI and OI, and the PI Message Digest (PIMD) (which is calculated by encrypting a small

Web Technologies

418 portion of the PI with the cardholder’s private key). The PIMD is needed by the merchant in order to the signature calculated over PI and OI.

Cardholder certificate This contains the cardholder’s public key, required by the merchant as well as by the payment gateway. This is shown in Fig. 12.7.

Fig. 12.7

Purchase Request

An interesting aspect of this process is the dual signature. This ensures that the merchant and the payment gateway receive the information that they require, and yet the cardholder protects the credit card details from the merchant. The concept is shown in Fig. 12.8.

Fig. 12.8 Dual signature Let us describe this process in brief, at the cost of some repetition. n

n

n

The cardholder performs a message digest or hash (H) on the PI to generate PIMD. The cardholder also hashes OI to generate OIMD. The cardholder then combines PIMD and OIMD, and hashes them together to form POMD. It then encrypts the POMD with its own private key to generate the Dual Signature (DS). The POMD is available to both the merchant and the payment gateway. The cardholder sends the merchant the OI, DS and PIMD. Note that the merchant must not get PI (we shall see how the cardholder achieves it, soon). Using these pieces of information, the merchant verifies that the order came from the cardholder, and not from someone posing as the cardholder. For this, the merchant performs the actions as shown in Fig. 12.9. The payment gateway gets PI, DS and OIMD. Note that the payment gateway need not get OI. Using these, the payment gateway can POMD. This verification satisfies the payment gateway that the payment information came from the cardholder, and not from someone posing as the cardholder. For this purpose, the payment gateway performs the actions as shown in Fig. 12.10.

Online Payments

419

Fig. 12.9 Verification of cardholder’s authenticity by the merchant An important question now arises. How does the cardholder protect the payment information from the merchant? For this, the cardholder performs the following processes. n n n n

Cardholder creates PI, DS and OIMD and encrypts the whole process with a one-time session key K. Cardholder then encrypts session key K with the payment gateway’s public key. These two together form a digital envelope. Cardholder sends the digital envelope to the merchant, instructing it to forward it to the payment gateway. Since the merchant does not have the private key of the payment gateway, it cannot decrypt the envelope and obtain the payment details.

Step 4: Purchase Response When the merchant receives the Purchase Request, it does the following. (a) It verifies the cardholder’s certificates by means of its CA signatures. (b) It verifies the signature created over PI and OI using the cardholder’s public key (which is a part of the cardholder’s digital certificate). This ensures that the order has not been tampered with while in transit, and that it was signed using the cardholder’s private key. (c) It processes the order and forwards the Payment Information (PI) to the payment gateway for authorization (discussed later). (d) It sends a Purchase Response back to the cardholder, as shown in Fig. 12.11.

Web Technologies

420

Fig. 12.10

Verification of cardholder’s authenticity by the payment gateway

Fig. 12.11

Purchase Response

The Purchase Response message includes a message acknowledging the order and references the corresponding transaction number. The merchant signs the message using its private key. The message and its signature are sent along with the merchant’s digital certificate to the cardholder. When the cardholder software receives the Purchase Response message, it verifies the merchant’s certificate and then takes some action, such as displaying a message to the .

Payment Authorization This process ensures that the issuer of the credit card approved the transaction. The Payment Authorization step happens when the merchant sends the payment details to the payment gateway. The payment gateway verifies these details and authorizes the payment, which ensures that the merchant will receive payment.

Online Payments

421 Therefore, the merchant can provide the services or goods to the cardholder, as ordered. The Payment Authorization exchange consists of two messages: Authorization Request and Authorization Response.

Step 1: Authorization Request The merchant sends an Authorization Request to the payment gateway, which consists of the following steps. 1. Purchase-related information This information is obtained by the merchant from the cardholder and includes the PI, the signature calculated over PI and OI (signed with the cardholder’s private key), the OI Message Digest (OIMD) and the digital envelope, as discussed earlier. 2. Authorization-related information This information is generated by the merchant and consists of the transaction id signed with the merchant’s private key and encrypted with a one-time symmetric key generated by the merchant and the digital envelope. 3. Certificates The merchant also sends the cardholder’s digital certificate needed for ing the cardholder’s digital signature and the merchant’s digital certificate needed for ing the merchant’s digital signature. This is shown in Fig. 12.12.

Fig. 12.12

Authorization Request

As a result, the payment gateway performs the following tasks. 1. It verifies all certificates. 2. It decrypts the digital envelope to obtain the symmetric key and then decrypts the authorization block. 3. It verifies the merchant’s signature on the authorization information received from the merchant. 4. It performs steps 2 and 3 for the payment information received from the cardholder (PI). 5. It matches the transaction id received from the merchant with the transaction id received from the PI (indirectly) from the cardholder. 6. It requests and receives an authorization from the credit card issuer (i.e., the cardholder’s bank) for the payment from the cardholder to the merchant.

Step 2: Authorization Response Having obtained authorization from the issuer, the payment gateway returns an Authorization Response message to the merchant. This message contains the following. 1. Authorization-related information This includes an authorization block that is signed with the gateway’s private key and encrypted with a one-time symmetric key generated by the gateway. It also includes a digital envelope that contains the one-time key encrypted with the merchant’s public key. 2. Capture token information This information would be used for effecting the payment transaction later. The basic structure of this piece of information is the same as the authorization-related information. This token is not processed by the merchant, and is instead, ed back to the customer as it is.

Web Technologies

422 3. Certificate The gateway’s digital certificate is also included in the message. This is shown in Fig. 12.13.

Fig. 12.13

Authorization Response

With this authorization from the payment gateway, the merchant can provide the goods or services to the cardholder.

Payment Capture For obtaining payment, the merchant engages the payment gateway in a Payment Capture transaction. It also contains two messages: Capture Request and Capture Response.

Step 1: Capture Request Here, the merchant generates, signs and encrypts a Capture Request block that includes the payment amount and the transaction id. This message also includes the encrypted capture token received earlier (in the Authorization Response transaction), the merchant’s digital signature and digital certificate. When the payment gateway receives the Capture Request message, it decrypts and verifies the Capture Request block, and decrypts and verifies the capture token as well. It then checks for consistency between the Capture Request and the capture token. It then creates a clearing request that is sent to the issuer over the private payment network. This request results into a funds transfer to the merchant’s . This is shown in Fig. 12.14.

Fig. 12.14

Capture Request

Step 2: Capture Response In this message, the payment gateway notifies the merchant of the payment. The message includes a Capture Response block, which is signed and encrypted by the payment gateway. The message also includes the payment gateway’s digital certificate. The merchant software processes this message and stores the information therein for reconciliation with the payment received from the bank. This is shown in Fig. 12.15.

Online Payments

423

Fig. 12.15 Capture Response

12.2.6 SET Conclusions From the discussion, it should become clear that although SSL and SET are both used for facilitating secure exchange of information, their purposes are quite different. Whereas SSL is primarily used for secure exchange of information of any kind between only two parties (a client and a server), SET is specifically designed for conducting e-commerce transactions. SET involves a third party called as a payment gateway, which is responsible for issues such as credit card authorization, payment to the merchant, etc. This is not the case with SSL. SSL primarily deals with encryption and decryption of information between two parties. It does not specify how payment would be made. The architecture of SET ensures this as well.

12.2.7 SET Model Having looked at the detailed processing involved in SET, let us summarize the concepts learnt by studying the overall processing model of SET. As we have studied, the authentication provided by SET is quite strong. In order that the identification and verification of the customer (cardholder), merchant and the payment gateway are ensured, the SET protocol requires that all the parties involved in this transaction should have a valid digital certificate and that they use digital signatures. This means that all the three concerned parties must have a valid digital certificate from an approved certification authority. Let us discuss a simple model for implementing SET. Note that this implementation can be done with some other approaches as well. However, here, we are interested only in trying to understand how a typical set up for SET might look like. First, let us take a look at Fig. 12.16. The figure shows the (simplified) SET model for a typical purchase transaction. The three main parties involved in the actual transaction are, of course, the customer, the merchant and the payment gateway. The merchant and the customer make requests for their respective certificates. Interestingly, we have shown two different certification authorities. Of course, it is very much possible that both the merchant and the customer receive certificates from the same certification authority. In general, the certificate to a customer is issued by the bank or the credit card company who has issued the card to the customer, or sometimes also by a third party agency representing the credit card company. On the other hand, a financial institution, also called as an acquirer, issues a merchant’s certificate. An acquirer is usually a financial institution such as MasterCard or Visa (or their appointed agencies), who can authorize payments made by their brands of credit cards. Therefore, a merchant needs to have as many certificates as the number of different brands of credit cards that it accepts (e.g., one for MasterCard, one for Visa, one for Amex, and so on). Thus, when a customer receives a merchant certificate, it is also assured that the merchant is authorized to accept payments for that brand of credit card. This is similar to the boards displayed by reallife stores and restaurants that they accept certain credit cards. As discussed, the transactions between a customer and merchant are for purchases, and those between the merchant and the payment authority are for authorization of payment. This is described in detail earlier.

Web Technologies

424

Fig. 12.16

12.2.8

The SET model

SSL versus SET

Having discussed SSL and SET in detail, let us take a quick look at the differences between them, as shown in Table 12.1.

Table 12.1

SSL versus SET

Issue Main aim

SSL

Authentication

Mechanisms in place, but not very strong

Risk of merchant fraud

Possible, since customer gives financial data to merchant Possible, no mechanisms exist if a customer refuses to pay later Merchant is liable

E-commerce related payment mechanism All the involved parties must be certified by a trusted third party Strong mechanisms for authenticating all the parties involved Unlikely, since customer gives financial data to payment gateway Customer has to digitally sign payment instructions Payment gateway is liable

High

Has turned out to be a failure

Certification

Risk of customer fraud Action in case of customer fraud Practical usage

Exchange of data in an encrypted form Two parties exchange certificates

SET

Online Payments

425 This table should give us an idea that SET is a standard that describes a very complex authentication mechanism that makes it almost impossible for either party to commit any sort of fraud. However, there is no such mechanism in SSL. In SSL, data is exchanged securely. However, the customer provides critical data such as credit card details to a merchant, and hopes that the customer does not misuse them. This is not possible in SET. Also, in the case of SSL, a merchant believes that the credit card really belongs to the customer, and that he is not using a stolen card. In the case of SET, this is very unlikely, and even if it happens, the merchant is safe, since the payment gateway has to ensure that the customer is not committing fraud. The whole point is, whereas SSL was created for exchanging secure messages over the Internet, SET was specifically designed for secure e-commerce transactions involving online payment. So, these differences should not surprise anybody.

12.3 3-D SECURE PROTOCOL In spite of its advantages, SET has one limitation: it does not prevent a from providing someone else’s credit card number. The credit card number is protected from the merchant. However, how can one prevent a customer from using another person’s credit card number? That is not achieved in SET. Consequently, a new protocol developed by Visa has emerged, called as 3-D Secure. The main difference between SET and 3-D Secure is that any cardholder who wishes to participate in a payment transaction involving the usage of the 3-D Secure protocol has to enroll on the issuer bank’s Enrolment Server. That is, before a cardholder makes a card payment, she must enrol with the issuer bank’s Enrolment server. This process is shown in Fig. 12.17.

Fig. 12.17 enrolment At the time of an actual 3-D Secure transaction, when the merchant receives a payment instruction from the cardholder, the merchant forwards this request to the issuer bank through the Visa network. The issuer bank requires the cardholder to provide the id and that were created at the time of enrolment

Web Technologies

426 process. The cardholder provides these details, which the issuer bank verifies against its 3-D Secure enrolled s database (against the stored card number). Only after the is authenticated successfully that the issuer bank informs the merchant that it can accept the card payment instruction.

12.3.1 Protocol Overview Let us understand how the 3-D Secure protocol works, step by step.

Step 1 The shops using the shopping cart on the merchant site, and decides to pay the amount. The enters the credit card details for this purpose, and clicks on the OK button, as shown in Fig. 12.18.

Fig. 12.18

Step 1 in 3-D Secure

Step 2 When the clicks on the OK button, the will be redirected to the issuer bank’s site. The bank site will pop up a screen, prompting the to enter the provided by the issuer bank. This is shown in Fig. 12.19. The bank (issuer) authenticates the by the mechanism selected by the earlier. In this case, we consider a simple static id and based mechanism. Newer trends involve sending a number to the ’s mobile phone and asking the to enter that number on the screen. However, that falls outside of the purview of the 3-D Secure protocol.

Fig. 12.19

Step 2 in 3-D Secure

At this stage, the bank verifies the ’s by comparing it with its database entry. The bank sends an appropriate success/failure message to the merchant, based on which the merchant takes an appropriate decision, and shows the corresponding screen to the .

Online Payments

427

12.3.2 What happens Behind the Scene? Figure 12.20 depicts the internal operations of 3-D Secure. The process uses SSL for confidentiality and server authentication.

Fig. 12.20 3-D Secure internal flow The flow can be described as follows. 1. The customer finalizes on the payment on merchant site (the merchant has all the data of this customer). 2. A program called as merchant plug in, which resides at the merchant Web server, sends the information to the Visa/MasterCard directory (which is LDAP-based). 3. The Visa/MasterCard directory queries access control server running at the issuer bank (i.e., the customer’s bank), to check the authentication status of the customer. 4. The access control server forms the response for the Visa directory and sends it back to the Visa/ MasterCard directory. 5. The Visa/MasterCard directory sends the payer’s authentication status to the merchant plug-in. 6. After getting the response, if the is currently not authenticated, the plug-in redirects the to the bank site, requesting the bank or the issuer site to perform the authentication process. 7. The access control server (running on the bank’s site) receives the request for authentication of the . 8. The authentication server performs authentication of the based on the mechanism of authentication chosen by the (e.g., , dynamic , mobile, etc.)

Web Technologies

428 9. The access control server returns the authentication information to the merchant plug-in running in the acquirer domain by redirecting the to the merchant site. It also sends the information to the repository where the history of the authentication is kept for legal purpose. 10. The plug-in receives the response of the access control server through the ’s browser. This contains the digital signature of the access control server. 11. The plug-in validates the digital signature of the response and the response from the access control server. 12. If the authentication was successful and the digital signature of the access control server is validated, the merchant sends the authorization information to its bank (i.e., the acquirer bank).

12.4 ELECTRONIC MONEY 12.4.1 Introduction Electronic money, which is also called as electronic cash or digital cash is one more way of making payments on the Internet. Electronic money is nothing but money represented by computer files. In other words, the physical form of money is converted into binary form of computer data. Let us first understand how one can obtain and use electronic money. For this, first take a look at Fig. 12.21. The figure shows the conceptual steps involved in electronic money processing.

Fig. 12.21 Model of electronic money

Online Payments

429 As the figure shows, the customer obtains electronic money (which is nothing but one or more computer files) from a bank in exchange of physical money (from his with the bank). When the customer wants to make any purchases in an electronic commerce transaction and make payments using electronic money, the customer sends these files representing electronic money to the merchant. The merchant forwards these files to the same bank, which verifies the electronic money and credits the merchant’s with the actual money equivalent to the value of the electronic money.

12.4.2 Security Mechanisms in Electronic Money The security mechanisms in these procedures are similar to all the previous mechanisms described earlier. Let us study the process of the customer obtaining the money in the form of files from the bank. The same principles would apply in other transactions (e.g., a customer buying something from a merchant and then sending these files to him).

Step 1 Bank sends the electronic money to the customer, as shown in Fig. 12.22.

Fig. 12.22

Bank sends electronic money to the customer after encrypting it twice

As the figure shows, the bank first encrypts the original message with its own private key. It then encrypts this encrypted message further, this time with the customer’s public key. Thus, the original message is encrypted twice. The bank sends this twice-encrypted message to the customer.

Step 2 The customer receives the money and decrypts it, as shown in Fig. 12.23.

Fig. 12.23 Customer decrypts the bank’s message twice to get the electronic money

Web Technologies

430 Here, the customer first decrypts the received message with its own private key. Further, it decrypts this once-decrypted message using the bank’s public key. Thus, the customer gets the original message back (which is $100). To ensure authentication, techniques of digital signatures and certificates may also be used in addition to these steps. We shall not describe those, as we have studied them earlier.

12.4.3 Types of Electronic Money Electronic money can be classified in two ways. In the first classification, the types of electronic money are decided based on whether the electronic money is tracked or not. In this classification, electronic money can be of two types, identified electronic money and anonymous electronic money. In the other method of classification, it is based on whether or not the transaction is real-time. In this classification, electronic money can be either online electronic money or offline electronic money. We shall study these types now.

Classification based on the tracking of money This classification is based on whether the electronic money is tracked throughout its lifetime. Accordingly, it can be classified as follows.

Identified electronic money Identified electronic money more or less works like a credit card. The progress of the identified electronic money from the very first time it is issued by a bank to one of its customer, up to its final return to the bank can be easily tracked by the bank. As a result, the bank can precisely know how and when the money was spent by the customer. Consequently, the bank knows who is the original customer that had requested for this money and how he spent it. How is this possible? For making electronic money identifiable like this, the file containing the information about the electronic money contains a unique serial number that is generated by the bank itself. Therefore, the bank has a list of these serial numbers vis-à-vis the customer who requested for that money. Now, suppose the serial number generated by the bank for electronic money worth $100 is say SR100. Suppose the customer who requested for this electronic money now spends these $100 by sending the corresponding files to a merchant. The merchant would go back to the bank to redeem the electronic money and get real money, instead. At this point, the bank again has the electronic money with the serial number SR100. Therefore, it knows that the customer has bought something worth $100 from a specific merchant on a specific date. This is shown in Fig. 12.24.

Fig. 12.24 Steps involved in identified electronic money Since the entire journey of identified electronic money is traceable, this can create privacy issues.

Online Payments

431

Anonymous electronic money The anonymous electronic money (also called as blinded money) works like real hard cash. There is no trace of how the money was spent. There is no trail of the transactions involved in this type of electronic money. Products like DigiCash provide this kind of electronic money to Internet s to spend, by tying up with banks. The key difference between identified electronic money and anonymous electronic money (which creates the anonymity) is the fact that whereas in case of the identified electronic money the bank creates the serial number, in case of the anonymous electronic money, it is the customer who creates the serial number. The process of the customer generating the random number is as follows. 1. The customer generates a random number by some mathematical algorithm. The customer then multiplies it by another huge number (called as the blinding factor). 2. The customer sends the resulting number, called as blinded number to the bank. 3. The bank does not know about the original number of step (1). 4. Bank signs (i.e., encrypts) the blinded number and sends it back to the customer. 5. The customer converts the blinded number back to the original number using some algorithm. 6. The customer then uses the original number (and not the blinded number) when making any transaction with a merchant. 7. The merchant’s encashment request to the bank is also with the original number. 8. The bank cannot trace this electronic money as it does not know the relationship between the original number and the blinded number. This process is shown in Fig. 12.25.

Fig. 12.25

Steps involved in anonymous electronic money

In the case of identified electronic money, the chances of a customer trying to spend the same money more than once can be easily caught or prevented. This is possible because the bank maintains a list of the issued and spent serial numbers. Therefore, it can catch attempts of spending the same piece of electronic money more than once.

Web Technologies

432

Classification based on the involvement of the bank in the transaction Based on involvement (or otherwise) of the bank in the actual transaction, electronic money can be further classified into two categories: online electronic money and offline electronic money.

Online electronic money In this type, the bank must actively participate in the transaction between the customer and the merchant. That is, before the purchase transaction of the customer can complete, the merchant would confirm from the bank in real time as to whether the electronic money offered by the customer is acceptable (e.g., ensuring that it is not already spent, or that the serial number for it is valid).

Offline electronic money In this type, the bank does not participate in the transaction between the customer and the merchant. That is, the customer purchases something from the merchant and offers to pay by electronic money. The merchant accepts the electronic money, but does not validate it online. The merchant might collect a group of such electronic money transactions and process them together at a fixed time every day.

12.4.4 The Double Spending Problem Now, if we combine the two ways of classifying electronic money, we have four possibilities: 1. 2. 3. 4.

Identified Online electronic money Identified Offline electronic money Anonymous Online electronic money Anonymous Offline electronic money

Of the four, the last type can create the double spending problem. More specifically, a customer could arrange for anonymous electronic money by using the blinded money concept. Later on, he could spend it offline more than once in quick succession (say, in the same hour) with two different merchants. Since the bank is not involved in any of the two online transactions, the fact that the same piece of money is being spent cannot be prevented. Moreover, when it is realized that the same piece of money is spent more than once (when both merchants send their daily transaction lists to the bank), the bank cannot determine which customer spent it more than once, because of the blinding factor (recall our discussion of anonymous electronic money). Consequently, anonymous offline electronic money is of little practical use. Double spending problem can happen in case of identified offline electronic money as well. However, upon detection, the customer under question can be easily tracked from the serial numbers of the electronic money. This is shown in Fig. 12.26.

Fig. 12.26 Detection of double spending problem

Online Payments

433 However, this detection is not possible in case of anonymous offline electronic money. Of course, in case of either of the online electronic money transactions, double spending problem is simply not possible, since the bank is a part of the transaction between the customer and the merchant.

12.5 PAYPAL PayPal is the world’s most popular middle-layer service for online payments. Close to 100 million Internet s prefer to use PayPal to send money to each other via plain email! PayPal has become very convenient and guaranteed way to transfer money online. It became so popular that the hugely successful online auction site eBay bought over PayPal! Peter Thiel and Max Levchin founded PayPal in 1999. In those days, it was called as Confinity. The aim of the company was to allow flow of money from one country to another, free from government controls. There was tremendous interest and several criticism of this scheme. It was alleged that this scheme would lead to fraud and possible security attacks. Legal suites were filed against this scheme. To summarize PayPal, it acts as the online financial transaction broker (middleman), PayPal allows people to send money to each other’s email address. The credit card or bank information is never transmitted over the email. Just as an escrow service acts as a safe, trustworthy middleman of information, PayPal acts as the middleman holder of money. By way of its policies, business practices, and overall integrity, PayPal has been able to establish the trust of all concerned parties. Because so many guarantees are in place, both buyers and sellers (or payers and payees) entrust PayPal with their credit card and bank details. PayPal vows to keep the private customer information secret. Money sent via PayPal resides in a PayPal till the time the receiver of the money decides to retrieve it or spend it. However, if the receiver’s bank information is already with PayPal in a verified state, then the money can be transferred directly into her . To sign up for using PayPal services, a person simply needs an email ID, and ideally a bank/credit card . The person ing for PayPal needs to provide her basic information like address, phone numbers, and so on, and also needs to choose two security questions to foil possible attacks. Interestingly, PayPal actually does not basically change the way merchants interact with banks and credit card processing companies. Instead, as mentioned earlier, it just acts as a middleman. We should know that credit and debit card transactions travel on physically separate networks. Whenever a merchant accepts a payment from a card, the merchant needs to pay some fees (called as interchange), equivalent to about ten cents plus approximately 2 per cent of the transaction fees. What role does PayPal play in this cycle? With PayPal in picture, both the buyer and the seller deal with PayPal. This is because they would already have provided their bank or credit card information to PayPal. PayPal carries out all the transactions on their behalf with the concerned banks and credit card companies. It also pays the interchange fees. How does PayPal make money, then? It charges fees to the s for using its service as well as based on the interest on the money left in the s’ PayPal s. Another interesting feature of PayPal is that unlike in the traditional online transactions, sensitive information such as the ’s credit card details does not travel every time. It is ed with PayPal and remains there. Hence, PayPal claims that their payment mechanism is more secure.

Web Technologies

434

SUMMARY l l

l

l

l

l

l l

l l l l l l l

The Secure Electronic Transactions (SET) protocol is meant for online payment processing. SET is an open encryption and security specification that is designed for protecting credit card transactions on the Internet. The pioneering work in this area was done in 1996 by MasterCard and Visa tly. They were ed by IBM, Microsoft, Netscape, RSA, Terisa and VeriSign. SET uses a novel concept of dual signature. This ensures that the payment-related data goes only to the payment processor (i.e., the bank or payment gateway), and that the order information goes only to the merchant. SET makes use of standard cryptographic operations/tools, such as message digests, digital certificates, digital signatures, etc. 3-D Secure is another protocol for safe payment transactions. It is implemented with different names by Visa and MasterCard. Electronic money or electronic cash is one more way of making payments on the Internet. Electronic money is money represented by computer files. In other words, the physical form of money is converted into binary form of computer data. Electronic money can be online or offline. In online electronic money, the bank is actively involved in the payment transaction. In offline electronic money, the bank is not involved while the payment transaction is in progress. Electronic money can also be classified into identified and anonymous (or blinded). Identified electronic money can be easily tracked. It is almost impossible to track anonymous electronic money. Anonymous offline electronic money is most difficult to keep secret or keep a track of.

REVIEW QUESTIONS Multiple-choice Questions 1. The bank dealing with the merchant for credit card processing is (a) merchant bank (b) consumer bank (c) acquirer bank (d) 2. The bank dealing with the customer for credit card processing is (a) merchant bank (b) consumer bank (c) acquirer bank (d) 3. SET is closer to . (a) host-to-host secure communications (b) network layer security (c) secure payments over the Web (d) transport layer security 4. The first step in SET is . (a) Purchase Request (b) Payment Capture (c) Payment Authorization (d) Purchase Response

. issuer bank . issuer bank

Online Payments

435 5. Risk of merchant fraud in SSL is SET. (a) more than (b) less than (c) same as 6. Electronic money is made up of in physical form. (a) floppy disks (b) computer files (c) hard disks 7. is also called as blinded money. (a) Identified (b) Anonymous (c) Online 8. Bank cannot track money. (a) identified (b) anonymous (c) online 9. money can create the double spending problem. (a) Identified online (b) Identified offline (c) Anonymous online (d) Anonymous offline 10. allows sending money over email. (a) IBM (b) PayPal (c) Visa

(d) less than or same as (d) credit card (d) Offline (d) offline

(d) MasterCard

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is the idea behind SET? What are the major steps involved in SET? Describe the various sub-steps in each of the SET step. What are the differences between SSL and SET? Discuss 3-D Secure. How is it different from SET? Why are component-based solutions most preferred? Discuss the model of electronic money. What are the security mechanisms in electronic money? What are the types of electronic money? Why is anonymous offline electronic money dangerous? Discuss the double spending problem.

Exercises 1. 2. 3. 4. 5.

Examine the PayPal model in detail. Who are the competitors to PayPal? How do they differ from PayPal? What are the new trends in credit card payments? Why are credit cards inherently insecure for electronic commerce? What are one time use credit cards? Find out using the Internet.

Web Technologies

436

Introduction to XML

+D=FJAH

13

13.1 WHAT IS XML? 13.1.1 Communication Incompatibilities Extensible Markup Language (XML) is perhaps one of the most misunderstood concepts in the area of computers today. In spite of its tremendous all-around success and widespread use, not many people seem to really understand what XML is and where it needs to be used. It is observed that more often than not, XML is used because someone has decided or because someone has been told to use it. It may seem strange to read this. However, it is not only true, but is quite common. Perhaps the reason behind this apparent confusion and lack of understanding is due to the fact that unlike a programming language (say Java, C++, or ASP.NET) or a DBMS (say Oracle, DB2, or MySQL), it is not very easy to imagine the end use and applications of XML. Unfortunately, many books and other literature on the subject do not aim at clarifying this confusion. They make an attempt to teach the syntax and semantics of XML. However, they do not answer the all-important question of what XML is all about, and why do we need to learn it in the first place! XML syntax and semantics are well known, but where to use these is usually not clear! Therefore, let us try to solve this mystery surrounding XML. For this purpose, let us take a simple example from normal life. Imagine that there are two persons, wishing to communicate with each other. However, the problem is that both of them speak different languages. One of them can only understand and speak in English, while the other understands and can speak only in Hindi. How will they be able to communicate with each other, then? Clearly, we need some sort of intermediary who can translate between these two languages and thus convey messages to each other. This is quite similar to how interpreters assist political leaders when the leaders do not understand each others’ language (let alone the intent of the conversation!). The problem is depicted in Fig. 13.1. We have not answered the all-important question of who this translator is going to be, and how would this translator function. There are two primary approaches to resolve this problem, as follows. 1. When communicating the thoughts of the person who speaks only in English, translate them into Hindi and then on the message to the other person who understands only Hindi. The translator would perform an opposite task in the other direction of communication. This approach is shown in Fig. 13.2. 2. Think about a Common Language (let us call this as CL for the sake of brevity) that both the persons should learn. This CL should be universally acceptable, and work for different communicating pairs.

Introduction to XML

437 That is, even if person A is communicating with person B, or person X with Y, or A with Y, or T with U; the CL will not change. In this approach, the translation needs to happen at the thought process level. That is, the person who is speaking has to speak in the CL itself, and no further translation is necessary, unlike in the previous approach. This is shown in Fig. 13.3. Because the other person understands CL, there is no incompatibility or ambiguity.

Fig. 13.1

Fig. 13.2

Fig. 13.3

The problem of incompatibility in human conversations

Approach 1: Use of a translator to solve the problem of incompatibility

Approach 2: Making the communicating parties use a Common Language (CL)

Web Technologies

438 Let us now quickly analyze these two approaches. Quite clearly, the first approach provides a quick-anddirty solution. In this case, the communicating parties need not really bother about each others’ language. They are free to use their native languages, and the responsibility lies on the translator to correctly communicate thoughts and ideas in the appropriate languages. Therefore, it is the translator, who needs to know multiple (at least two) languages. The second approach is slightly more painful, since every communicating party needs to learn a new language (CL). However, in medium-to-long term, this approach is more superior, since the dependence on the translator is no longer there. Also, everyone speaks in and can understand CL. Therefore, the question really is, are we ready to invest in a solution that is quick-and-dirty, but which is not guaranteed to work for all possible situations/persons, or in another one that is a bit annoying to start with, but is bound to pay rich dividends later? If we have time, money, and concurrence from all the communicating parties, we would clearly opt for the second approach. Getting all of them to agree, of course, is not a straightforward job. However, if we somehow succeed in doing that, then the second approach is far better. Having discussed this background sufficiently well, let us now think as to how this relates to XML and what decisions we are likely to make there.

13.1.2 XML and Application Communication Incompatibilities Let us relate our discussion so far to XML and see how these concepts are interlinked. Imagine that we have two applications A and B, possibly on different networks, wanting to communicate with each other. The basic question that arises in this situation, like human conversation, is about the language that they need to use for communication. Of course, we are not just referring to computer languages here, but are instead talking about the overall platform and architecture of these two applications. The situation is depicted in Fig. 13.4.

Fig. 13.4

Problem of incompatibility between applications

This discussion is quite similar to our earlier discussion about humans wanting to communicate with each other, without worrying about the possible incompatibilities. Let us discuss this in detail. We know that one of the most popular data representation and exchange formats is American Standard Code for Information Interchange (ASCII). It is commonly said that XML is the ASCII of the present and of the future. Strictly speaking, XML must not be compared with ASCII, because ASCII is merely representation of alphanumeric and other symbolic data in binary form, whereas XML is for other purposes. XML can be used to exchange data across the Internet. XML can be used to create data structures that can be shared between incompatible systems. XML is a common meta-language that will enable data to be transformed from one format to another. It is worth noting that even ASCII was not ambitious to this extent. This would allow organizations and individuals to exchange data over the Internet in a uniform manner. Going one step further,

Introduction to XML

439 XML need not always be used across the Internet. That is, it can be used for Web as well as non-Web applications equally effectively. This basic concept is illustrated in Fig. 13.5. XML can be used to exchange data between compatible/incompatible applications in Web/non-Web applications.

Fig. 13.5 XML as the data exchange mechanism between applications Does this sound quite similar to the second approach that we had discussed, with reference to human conversations? We had suggested that everyone should learn a Common Language (CL) and converse in that language. Thus, XML for applications seems to be similar to CL for humans. Let us not jump to this conclusion, however, and reach there step by step. When we had raised this problem of incompatibility of data formats between applications, the obvious question that arose was as follows. Was data not being exchanged by applications before XML came into picture? Quite clearly, data was being exchanged by applications for several decades now. Since the days of IBM Mainframe applications of the 1960s, varying applications and platforms had needed to speak with each other, and they had been able to do so. Then, what is so great about XML? The answer is that XML simplifies this talking between two applications, regardless of their purpose, domain, technology, or platform. XML simplifies the process of data exchange between two or more applications. Now, the question is, why not use the existing Database Management System (DBMS) products such as Oracle, SQL Server, IMS, IDMS, and Informix, etc., for exchanging data over the Internet (and also outside of the Internet)? The reason is incompatibility of various kinds. These DBMS products are extremely popular and provide great data storage and access mechanisms. However, they are not always compatible with each other in of sharing or transferring data. Their formats, internal representations, data types, encoding, etc., are different. This creates problems in data exchange. This is similar to a situation when one person understands only English and the other understands only Hindi. English and Hindi by themselves are great languages. However, they are not compatible with each other! Similarly, for instance, suppose organization X uses Oracle as its DBMS (relational) and organization Y uses IMS as its DBMS (Hierarchical). Each of these DBMS systems internally represents the data in their own formats as well as by using data structures such as chains, indexes, lists, etc. Now, whenever X and Y want to exchange any kind of data (say list of products available, last month’s sales report, etc.), they would not be able to do this directly, as shown in Fig. 13.6.

Fig. 13.6 Incompatible data formats Database Management Systems (DBMS) are incompatible with each other, when it comes to data exchange.

Web Technologies

440 If X and Y want to exchange data, the simple solution would be that they agree on a common data format, and use that format for data exchange. For example, when X wants to send an inventory status to Y, it would first convert that data from Oracle format into this common format and then send it to Y. When Y receives this data, it would convert the data from this common format into IMS format, and then its applications can use it. In the simplest case, this common format can be a text file. This is shown in Fig. 13.7.

Fig. 13.7 Data exchange in a text format This approach of exchanging data in the text format seems to be fine. After all, all that is needed is some data transformation programs at both ends, which either read from or write to text format from the native (Oracle/IMS) format. This approach would be very similar to the one used in our translator approach for human conversations. But there are some issues with this approach as well, in addition to what we had discussed earlier in the context of human conversations. n

n

For instance, suppose another organization Z now wants to do business with X and Y. Therefore, X and Y now need to exchange data with Z also. Suppose that Z is already interacting with other business partners such as A and B. Now, if Z is using a different text format for data exchange with A and B, its data exchange text formats with X/Y and A/B would be different! That is, for exchanging the same data with different business partners, different application programs might be required! Also, suppose that these business partners specify some business rules. For instance, Z mandates that a sales order arriving from any of its business partners (i.e., A, B, X or Y) must carry at least three items. For this, appropriate logic can be incorporated in the application program at its end to validate this rule, whenever it receives any sales order from one of its business partners. However, can we not apply this business rule before the data is sent by any of the business partners, rather than first accepting the data and then validating it? If different data exchanges among different business partners demand different business rules like this, it might be difficult to apply them in the text format.

Issues such as these resulted in the emergence of a common standard for exchanging business documents— Electronic Data Interchange (EDI). We shall study EDI in detail soon. EDI is a standard that specifies the formats for different business documents. EDI allows the integration of incompatible data formats by bringing these formats on a common platform—the EDI standard. Therefore, EDI would solve the problems associated with data exchange in the text format, as shown in Fig. 13.8. Now, there was no incompatibility issue. Also, data could be exchanged in a seamless manner as business rules could be built in the EDI standard itself (as we shall study soon). Thus, EDI became the de-facto standard for exchanging business documents. However, this was also not free of issues. The biggest issue with EDI is cost. EDI solutions are very expensive to implement and maintain. Smaller and medium-sized organizations cannot usually afford this. Moreover, in the last few years, the idea of using the Internet protocols such as T/IP for exchanging business documents started to gain acceptance worldwide. This is because the Internet is a virtually free network,

Introduction to XML

441 unlike the proprietary EDI networks (called as Value Added Networks or VAN). Sophisticated hardware and software are not required to a great extent for using the Internet. Since this meant that expensive VAN networks employed by EDI systems had an alternative transport medium, all that was needed was a standard such as EDI. Web-enabling EDI is one such solution. However, that is still in the experimental stage.

Fig. 13.8 Using EDI for data exchange In the meanwhile, XML emerged as the data exchange standard over the Internet. That is, the exchange standard was XML and the underlying transport medium was the Internet (i.e., T/IP). In the case of EDI, the data exchange standard was EDI and the underlying transport medium was VAN. With some fine-tuning and technology improvements, the underlying transport mechanism for VAN can now be any protocol, such as SNA or even T/IP. This means that we can use XML in the place of EDI wherever possible. This is how XML has become the modern standard for exchanging business documents over the Internet, as shown in Fig. 13.9. Of course, it would be wrong to suggest that XML has replaced all data exchange formats completely, although in this example we have shown such a situation. EDI is still extremely popular. Also, other incompatible formats are still in use. However, it is expected that in a few years, this all will be replaced by XML.

Fig. 13.9

XML as the data exchange standard

Web Technologies

442 This brings us to an obvious question. What is so great about XML? Why should everyone agree upon and start using XML (similar to our CL in human conversations)? Let us discuss this now. Think about the book you are holding right now. It was developed almost entirely using Microsoft Word. Whenever we add things such as chapter numbers, section numbers, sub-section numbers, paragraphs, and so on, Word keeps a track of all such things by formatting them appropriately and retaining the formatting details for ever. Instead of using Word, if we had used XML for creating this book, we would have used a different syntax for creating them. We could have done that quite easily. Now, if a word processor can do what XML is offering us, why do we need XML at all? We have seen the business side of it, but what about cases such as document processing? Well, there is again a problem of data exchange. Different word processors use different styling information. The styling information used by Microsoft Word is completely different from Corel’s WordPerfect, which is again different from Sun’s StarOffice word processor. We need to convert documents created by using one word processor into another format before they can be used in that other format. In contrast to this, the same XML document can be read by any application without the need for any changes/ conversions.

13.2 XML VERSUS HTML Having understood the basic need for XML, let us now go one step further. Here, we will try to examine what is so unique about XML that it should start becoming the world’s leading data exchange mechanism. Also, most of us would know that Hyper Text Markup Language (HTML) is used for creating Web pages on the Internet. Can it not be reused instead of creating a new language? Let us examine this question. As we know, HTML is the de facto language of the Internet. HTML defines a set of tags describing how the Web browser should display the contents of a particular document to the end . For example, it uses tags that indicate that a particular portion of the text is to be made boldface, underlined, small, big, and so on. In addition, we can display lists of values using bullets, or create tables on the screen by using HTML. As an example, Fig. 13.10 shows how a piece of text can be made bold in HTML, and the actual result of this code.

Fig. 13.10

HTML tags example

As we can see, there is a word Atul in the HTML code, surrounded by two strange pieces of text, namely and . These are called as tags. The tags are surrounded by the less than (<) and greater than (>) signs. In this case, the tag is B. This in HTML means bold. Thus, means make the text that appears after this tag bold. On the other hand, the tag indicates the end of the bold tag. Therefore, the boundaries of the text to be displayed in bold (i.e., Atul) are defined by the tags and . The result shows this by displaying the word Atul in bold font. The similarity between XML and HTML is that both languages use tags to structure documents. This, incidentally, is perhaps the only real similarity between the two!

Introduction to XML

443 Although XML also uses tags to organize documents and the contents therein just as HTML does, it is not concerned with these presentation features of a document. XML is more concerned with the meaning and rules of the data contained in a document. XML describes what the various data items in a document mean, rather than describing how to display them. Therefore, whereas HTML is an information presentation language, XML is an information description language. Thus, conceptually, XML is pretty similar to a data definition language. We shall see how XML achieves this later. HTML concentrates on the display/presentation of data to the end , whereas XML deals with the representation of data in documents. This point is emphasized in Fig. 13.11.

Fig. 13.11 Basic difference between HTML and XML It is necessary to understand why HTML is not sufficient for conducting electronic business on the Internet, and how XML can solve the problems associated with HTML in this regard. As we know, the basic purpose of HTML is to allow presentation of documents that can be interpreted and displayed by a Web browser. However, electronic business applications have other demands such as processing, rearranging, storing, forwarding, exchanging, encrypting, and g these documents. The data values on an HTML page usually originate from databases or files. The databases or files store not only data items, but also store the inter-relationships between them. However, when using HTML, it is difficult to express or represent these relationships of data items. Therefore, during the transfer of information from the databases to HTML, this information about data is lost. This is because HTML is purely designed for displaying the data values in the desired format. Therefore, if organizations want to exchange business documents in the HTML format, it would serve little purpose, because the HTML format would convey nothing about the meaning of the data. It would convey more details about its formatting. This is where XML steps in. Rather than describing how to display data, XML describes the meaning of that data. For example, suppose we want to create a Web page describing the products that we sell. The responsibility of making the Web pages attractive by using catchy colours, fonts, and images would be left to HTML. However, the basic data about the products themselves (such as product names, categories, prices, etc.) would be stored in some databases and converted into the form of XML files (also called as XML documents). HTML would present this data to the ’s browser. This concept is illustrated in Fig. 13.12. One question needs to be answered here. Why should we transform the data from the database first into XML and then into HTML? Why do we not directly read the data from the database using our application program and create an HTML file out of it? What is the advantage that we are getting by converting the data from the database into XML form as an intermediate step before transforming it into HTML? The reasons for this are many. Once we study technologies such as XML Stylesheet Language (XSL), Cascading Style Sheets (CSS), and XML parsing, these things would become clear. For now, it should suffice to that an intermediate step of XML helps in areas such as making the final output media independent (i.e., it can finally be displayed as an HTML page, or as a PDF document, etc.), and it can also be sent to another application

Web Technologies

444 for further processing. This would not be possible if we transform the data read from the database straightaway into HTML.

Fig. 13.12 The role of HTML and XML The surprising point about all this is that XML implements an idea that is not revolutionary at all. The fact that data should be exchanged in the form of documents (e.g., product catalogs, invoices, purchase orders, contracts, etc.) is not new by any means. Organizations are already familiar with and have been using document exchange procedures. As mentioned previously, EDI was one of them, which has existed for more than a

Introduction to XML

445 couple of decades. Then what is wrong with EDI, and how is XML slowly replacing it? Let us examine this question now with an overview of EDI.

13.3 ELECTRONIC DATA INTERCHANGE (EDI) 13.3.1 Understanding EDI When businesses sell or buy, they need to exchange a variety of documents, such as purchase orders, sales orders, letters of credit, etc. Each company has its own formats for all these documents. The format specifies how various items such as product code, description, quantity, rate, amounts, discounts, etc., will look like, and what their sizes are. Interestingly, when company A sends a Purchase Order (PO) to company B, company B creates a Sales Order (SO) from it. Because the format of B’s SO differs from that of the PO of A, not only in of product codes, etc. but also units of measures, the sizes of various data items, etc. Therefore, company B has to re-enter its sales order in its computer system to carry out the further follow-up. This problem is illustrated in Fig. 13.13.

Fig. 13.13

Problem of incompatible data formats and too many documents

How nice it would be, if A’s PO is sent electronically, and if it automatically gets converted as B’s SO, and gets entered into B’s SO processing system with very little human intervention? EDI was born precisely with this aim. Electronic Data Interchange (EDI) is the exchange of business documents such as purchase orders, invoices, etc., in an electronic format. This exchange happens, like email messages, in a few seconds and does not involve any human intervention or any paper. EDI has been around since the 1960s and is used mostly by large corporations to conduct business with their suppliers and their customers over secure networks. Until very recently, EDI was the primary means of conducting electronic business. However, very high costs have prohibited EDI to be used by smaller organizations. These days, Business-to-Business (B2B) kind of electronic commerce transactions that are conducted over the Internet are also getting equally popular, which again can use EDI when it comes to exchanging any business documents. The other category of e-commerce, called as Business-to-Consumer (B2C), is not that much related to EDI. Anyway. EDI is a form of communication system that allows business applications in different organizations to exchange information automatically to process a business transaction. The relationships between the parties involved in EDI transactions are pre-defined (e.g., trading partners, customers and suppliers of an organization). Most importantly, EDI transactions have traditionally been

Web Technologies

446 conducted over privately set up networks called as Value Added Networks (VAN) (unlike the e-commerce mode, which is over the public Internet). This explains the higher costs of EDI. A VAN is a communications network that provides additional applications/functionality to the top of basic network infrastructure. A network with e-mail application installed on all its subscribers allowing the email facility is one such example. Another example is EDI in which the VAN exchanges EDI messages among the trading partners. It also provides other services such as interfacing with other VANs, and ing a number of transmission protocols and communication mechanisms. This allows organizations to exchange business documents such as purchase orders, invoices and payment instructions in a secure and automated manner. The basic idea behind EDI is shown in Fig. 13.14.

Fig. 13.14

The basic concept behind EDI

As we can see, the diagram defines various organizations in the form of business partners and their EDI systems, interconnected by the EDI network and the Internet. The point is that EDI is much more than a data format/representation, unlike XML. There is no concept of an XML network. XML is only the common format for data exchange. EDI, on the other hand, not only attempts at unifying the data exchange formats, it also provides the backbone network that is essential for this data exchange.

13.3.2 An Overview of EDI Let us have a broad-level overview of an EDI system, before we discuss the details. Typically, an EDI service provider maintains a VAN and establishes mailboxes for each business partner involved in EDI. The provider stores and forwards EDI messages between these partners. The main aspect here is standardization. All parties involved in EDI transactions must use an agreed set of document layout standards, so that the same document looks exactly similar no matter who has created it, in of the overall layout and format. All such business forms are then transmitted over the VAN as messages similar to emails. Figure 13.15 shows the overall flow.

Introduction to XML

447

Fig. 13.15

Overview of EDI software

As the figure conveys, EDI defines standard formats for all types of documents. Firstly, sender A’s documents are converted to the standard EDI formats, and are transmitted over a VAN to the receiver B. At this point, another conversion takes place from the standard EDI format to B’s internal format, as defined by the application software running on B’s computer. Recall that this is quite similar to the conversion of data from the internal database format to XML, which we had discussed earlier. The document standards for EDI were firstly developed by large business houses during the 1970s, and are now under the control of the American National Standards Institute (ANSI). As we have noted, EDI demands two things. n

n

One is a set of software programs at each /partner’s site to convert the documents from their own formats to the standard ones and also from the standard formats to their own formats, which they understand. These are required because any partner could send or receive documents at different times. Secondly, EDI also demands a network connection between the two companies that want to exchange business documents with each other. This need translates into the trading partners having a dedicated leased line between them, or a connection to a VAN. Since this is very expensive, it is not feasible for many small and medium-sized organizations, which are the trading partners of the bigger corporations. However, because many large organizations, which can easily deploy EDI, demand that their vendors also have EDI setup, small and medium-sized organizations sometimes have no choice but to use EDI rather than lose a big customer.

13.3.3 Advantages of EDI Having understood where and how EDI systems can be beneficial, let us summarize the advantages offered by EDI systems. 1. 2. 3. 4. 5. 6.

Reduced lead-time from placing an order to actually receiving goods. Substantial decrease in the number of errors, otherwise due to manual data entry and paperwork. Reduction in overall processing costs. Availability of information all the time. Provision for planning the future activities in a better and more organized manner. Building long-term relationships between trading partners.

Web Technologies

448

13.3.4 EDI and the Internet So far, we have focussed our attention on the EDI systems that require a dedicated network connection between the trading partners, called as a VAN. Although this works fine in the large business houses, its high costs make it difficult to implement it for a relatively smaller organization. At times, these costs of setting up and maintaining a VAN can be simply beyond the reach of smaller organizations. The arrival of the Internet has given everybody in the world a very cheap and simple way to potentially connect to every other computer in the world. Naturally, the idea of web-enabling EDI has emerged in the last few years. Simply put, this means that the EDI systems could be connected to the Internet, so that the trading partners who cannot afford the high costs of VAN services, can simply use the Internet to connect to their bigger partners for conducting EDI transactions. Of course, this concept has the biggest practical problem of potential lack of security. As we know, the basic aim of setting up a dedicated VAN, or using the services of a VAN provider, is to ensure that the business transactions between two trading partners are totally secure and reliable. This is possible because the VANs are private networks between two partners. However, the fundamental feature of the Internet is that it is open to every potential computer in the world, who possesses a Web browser and the basic connectivity features such as a dial-up . In other words, the Internet was not created with an aim of securely exchanging business information. That has come only as an afterthought, and not as a carefully built-in feature. Stories of online credit card information being tapped and misused still go around. Therefore, the basic purpose of EDI contradicts that of the Internet. In this situation, if the two have to co-exist, there must be a guarantee that we can exchange information securely using the Internet. Thankfully, with the emergence of technologies such as encryption mechanisms and digital signatures, this is more or less assured these days. Of course, this is still not as safe as having a VAN connection between the trading partners. But surely, this is the closest that the Internet can go to, with the current technology. Therefore, connecting EDI systems to the Internet is certainly a possibility, and some organizations are doing that. The technology for combining EDI with the Internet can be done by adding a browser-based interface to the VAN networks. Existing s continue to have the usual EDI interface. Neither set of s is aware that depending on whether they are on a VAN or the Internet, a different set of forms (either XML or HTML) is sent to them by the VAN provider. The VAN provider, as shown in Fig. 13.16, does this behind the scene. As the figure shows, the VAN provider is responsible for translating EDI documents into HTML forms, when presenting data to the Internet s. Similarly, the VAN provider translates HTML forms and data entered by the Internet s into EDI standard forms such as ANSI ASC X12. Neither the Internet s nor the EDI s are aware of this translation process. Thus, the VAN provider performs a dual role here—that of a VAN provider as usual, and the additional role of a Web server. As we have noted, the actual document interchange can be done using the XML standard. Since the EDI approach of standardizing and exchanging business documents using a hierarchical structure such as ASC X12 is extremely close to the way XML documents are organized, the future directions taken for brining EDI and the Internet closer would be by converting all standard EDI documents to their equivalent XML formats. This is the current trend in the business industry at the moment. The basic technology would be VAN on one side, and the use of standard browser-based Internet interface on the other. The former would continue to work with EDI standards such as ASC X12, whereas the latter would employ XML standards.

Introduction to XML

449

Fig. 13.16 EDI and the Internet

13.4 XML TERMINOLOGY We have discussed the origins, need, and relevance of XML. Now let us dig a bit deeper into the XML terminology that we need to be familiar with. The simplest way to do this is to actually take a look at an XML document and then study its various parts. We will use the XML file shown in Fig. 13.17 for our discussion. Look Homeward, Angel Wolfe, Thomas Gravity’s Rainbow Pynchon, Thomas Cards as Weapons Jay, Ricky Computer Networks Tanenbaum, Andrew

Fig. 13.17 Sample XML document

Web Technologies

450 Every XML file has an extension of .XML. Let us call the above file as books.xml. As we can see, the file seems to contain information organized in a hierarchical manner, with some unfamiliar symbols. Let us understand this example step by step. In the process, we will start getting familiar with the XML syntax and terminology. Figure 13.18 shows a short pictorial explanation of this XML document. A detailed explanation is provided in Table 13.1.

Fig. 13.18

Terminology in XML—High level overview

As we can notice, some of the key that have been introduced here are, XML tag, element (composed of element name and element value), attribute (composed of attribute name and attribute value), and root element. Some of the other are start element indicator and end element indicator. Let us now understand their meanings.

Table 13.1

XML example described

Contents of the XML file

Description This line identifies that this is an XML document. Every XML document must begin with this line. Note that the text is delimited inside the opening tag . We shall soon see that, in general, XML contents are delimited inside the symbol pair < and >. However, some special keywords, including the xml declaration shown here, have a slightly different symbol pair (i.e., ). Regardless, there is always an opening symbol, and a closing symbol for every line in an XML file. (Contd)

Introduction to XML

451 Table 13.1 contd...

Contents of the XML file





Look

Homeward, Angel


Description Note that this line also comes with the symbol pair . This is a style sheet declaration, which we shall ignore for the moment. This has no direct relevance to the content of the XML document. We will discuss this concept later in the book. However, for now, the point to note is that apart from the xml declaration, an XML file can also contain other declarations, such as the one shown here. This line implicitly indicates the start of the actual contents in the XML file. Note that the word BOOKS is delimited by the symbols < and >. In XML, this whole text (i.e., ) is called as an element or a tag. Thus, an element or a tag in XML consists of the following parts: < is the start indicator for an element. BOOKS is the name of the element (BOOKS is just an example). > is the end indicator for an element. Thus, some of the other element names are , , and . Also, the first element in an XML file is called as the root element or the root tag. Thus, is the root element of this XML file. Quite clearly, every XML file must have exactly one root element. We should now be able to realize that this is also an element by the name BOOK. Like the previous element, there is a start indicator (<), followed by an element name (BOOK), followed by some other text (pubyear=”1929”), ending with the end indicator (>). The other text, i.e., pubyear=”1929” is called as an attribute in XML. An attribute serves the purpose of providing more information about an element. For example, here, the attribute informs us that the book being described was published in 1929. Attribute declarations consist of two portions, the attribute name and the attribute value. In this case, we have: pubyear as the attribute name and 1929 as the attribute value This is another element declaration. The name of the element is BOOK_TITLE, enclosed, as before, inside the start indicator (<) and the end indicator (>). However, this declaration of is followed by some other text, namely Look Homeward, Angel . What is this text about? Look Homeward, Angel is the element value. indicates the end of the element declaration. Now, this may sound confusing and raises the following issues. 1. Why did we not have the end of the element declaration for the previous elements (i.e., for and )? Well, every element in an XML file must have an end element declaration. That is, and elements also have their corresponding end element declarations. (Contd)

Web Technologies

452 Table 13.1 contd...

Contents of the XML file

Wolfe, Thomas



Remaining tags

Description Look for the
and
elements in the XML document. The only question then remains is, why do they not immediately follow the element declarations, i.e., why are there a number of other things between and , and between and ? This is exactly where the point of arranging information in a hierarchical manner comes into picture. That is, we wish to include all our book details inside the and tags. Within this, we want each individual book to be described under its own and tags. This is a hierarchy of information, and it can be described by using this technique of including all contents under the and tags, and an individual book inside the and tags. 2. Why did the previous element (i.e., ) not have an element value, whereas this one has? Well, elements may or may not have any element value. The previous two elements did not have any value, but this one has. 3. What about attributes? The previous element (i.e., ) had an attribute called as pubyear with an attribute value of 1929.Well, like element values, attributes (and therefore, even attribute values) are also optional. The previous element had an attribute, but the current element does not. This is perfectly acceptable. This element should be clearly understood by us without any explanation. It is simply the second sub-element under the first element. It does have an element value, but does not have any attribute. There is nothing special about this declaration. This declaration indicates the end of the first element. Thus, whatever follows would not be a part of the element now. Instead, it would be a part of the element. Incidentally, what would be a part of the element? Quite clearly, whatever falls within the range of the and elements, would be part of above. That is, in this case, it would consist of the two tags shown below: Look Homeward, Angel Wolfe, Thomas We will not describe the remaining tags/elements, since they are quite

similar to what we have discussed here.

At this stage, we should be quite familiar with the basic XML terminology. In case we are not, it is suggested that we re-read the example and its description until it is clear. This is because the rest of the discussion assumes that we have a good understanding of these . The following exercises will refresh what we have learnt so far.

Introduction to XML

453

Exercise 1 Create an XML document template to describe the result of students in an examination. The description should include the student’s roll number, name, three subject names and marks, total marks, percentage, and result. Solution 1(a) This can be done in more than one ways. The following is one such possible way. <exam_result> <student_name> … <subject_1> <subject_1_name> … <subject_1_marks> … <subject_2> <subject_2_name> … <subject_2_marks> … <subject_3> <subject_3_name> … <subject_3_marks> …

Note that Solution 1(a) provides an elegant way of providing a template (i.e., structure) for constructing an XML message to store examination results. This could have been done in another manner, as shown in Solution 1(b).

Solution 1(b) This solution offers another way to describe the XML message for examination results. It does not break down the hierarchy to the lowest possible level. That is, the information about subjects and the marks therein are at the same level, which is not a great approach. <exam_result> <student_name> … <subject_1_name> … <subject_1_marks> … <subject_2_name> … <subject_2_marks> … <subject_3_name> … <subject_3_marks> …

Notice that we have got rid of the elements that start and end the description of a particular subject, i.e., tags such as <subject_1> and , etc. It is generally not advisable.

Web Technologies

454 Let us now have an exercise to recap the XML terminologies that we had studied earlier.

Exercise 2 With reference to Solution 1(a), describe the various XML found there. Solution 2 The XML terminology with reference to Solution 1(a) is as follows. Sr No 1 2 3 4 5

XML term XML document indicator Root element Element Element name Element end indicator

Example <exam_result>

Note that our example does not have any attributes. To understand the concepts learned so far better, let us consider a few more XML examples as shown in the exercises below.

Exercise 3 Suppose we want to store information regarding employees in the following format in XML. Show such a file with one example: Employee ID Employee Name Employee Department Role Manager

Numeric Alphanumeric Alphanumeric Alphanumeric Alphanumeric

5 positions 30 positions 2 positions 20 positions 30 positions

Solution 3 <EMPLOYEES> <EMPLOYEE> <EMP_ID>9662 <EMP_NAME>Atul Kahate <EMP_DEPT>PS Project Manager <MANAGER>S Ketharaman

Exercise 4 Suppose our banking application allows the to perform an online funds transfer. This application generates an XML message, which needs to be sent to the database for actual updates. Create such a sample message, containing the following details: Transaction reference number Numeric 10 positions From Numeric 12 positions To Numeric 12 positions Amount Numeric 5 positions (No fractions are allowed) Date and time Numeric Timestamp field

Solution 4 9101216130 003901000649

Introduction to XML

455 003901000716 10000 <TIMESTAMP>11.09.2005:04.05.00


As we can see, XML can be used in a variety of situations to represent any kind of data. It need not be restricted to a particular domain, technology, or application. It can be used universally. We will study a lot more about the various aspects of XML and its terminologies now.

13.5 INTRODUCTION TO DTD Consider an XML document that we intend to write for capturing bank information. We would like to see data such as the number, holder’s name, opening balance, type of , etc., as the fields for which we want to capture information. However, at the same time, we also wish to ensure that this XML document does not contain any other irrelevant information For instance, we would like to make sure that our XML document does not contain information about students, books, projects, or data not needed. In short, we need easy mechanisms for validating an XML document. For example, we should be able to specify and validate, which elements, attributes, etc., are allowed in an XML document. The idea is shown in Fig. 13.19.

Fig. 13.19

The need for validating contents of an XML document

This is where a Document Type Definition (DTD) comes to the rescue! A DTD allows us to validate the contents of an XML document. For example, a DTD will allow us to specify that a book XML document can contain exactly one book name and at the most two author names. A DTD is usually a file with an extension of DTD, although this extension is optional. Technically, a DTD file need not have any extension. We can specify the relationship between an XML document and a DTD. That is, we can mention that for a given XML file, we want to use a given DTD file. Also, we specify the rules that we want to apply in that DTD file. Once this linkage is established, the DTD file checks the contents of the XML document with reference to these rules automatically whenever we attempt to make use of the XML document. This concept is shown in Fig. 13.20. Imagine a situation where we do not have anything such as a DTD. Yet, let us imagine that we want to apply certain rules. How can we accomplish this? Well, there is no simple solution here. The programs that use the XML document will need to perform all these validations before they can make use of the contents of the XML document. Of course, it is not impossible. However, it would need to be performed by every program,

Web Technologies

456 which wants to use this XML document for any purposes. Otherwise, there is no guarantee that the XML document contains bad data! This situation is depicted in Fig. 13.21.

Fig. 13.20 Relationship between an XML document and a DTD file

Fig. 13.21

Situation in the absence/presence of a DTD

As we can see, a DTD will free application programs from the worry of validating the contents of an XML document. It will take this responsibility on itself. Therefore, the portion of validation is concentrated in just once place—inside the DTD. All other parties interested in the contents of an XML document are free to concentrate on what they want to do, i.e., to make use of the XML document the way they want and process it,

Introduction to XML

457 as appropriate. On the other hand, the DTD would be busy validating the contents of the XML document on behalf of any program or application. DTD helps us in specifying the rules for validating the contents of an XML document at once place, thereby allowing the application programs to concentrate on the processing of the XML document. We have mentioned earlier that a DTD is a file with a DTD extension. The contents of this file are purely textual in nature. Let us now examine the various aspects of a DTD and how they help in validating the contents of an XML document.

13.6 DOCUMENT TYPE DECLARATION An XML document contains a reference to a DTD file. This is similar to, for example, how a C program would include references to various header files, or a Java program would include packages. A DOCTYPE declaration in an XML document specifies that we want to include a reference to a DTD file. Whenever any program (usually called as an XML parser) reads our XML document containing a DOCTYPE tag, it understands that we have defined a DTD for our XML document. Therefore, it attempts to also load and interpret the contents of the DTD file. In other words, it applies the rules specified in the DTD to the contents of our XML document for ing them. The DOCTYPE declaration stands for a document type declaration. Conceptually, this is illustrated in Fig. 13.22. Note that we are ignoring syntactical correctness for the moment, just for the sake of understanding.

Fig. 13.22

Using the DOCTYPE tag

Note that the DOCTYPE tag is written as . There are two types of DTDs, internal DTD and external DTD, also respectively called as internal subset and external subset. Figure 13.23 shows this.

Fig. 13.23 Classification of DTD The two types differ from each other purely on the basis of where they are defined.

Web Technologies

458 An internal subset means that the contents of the DTD are inside an XML document itself. On the other hand, an external subset means that an XML document has a reference to another file, which we call as external subset. Let us take a simple example. Suppose we want to define an XML document containing a book name as the only element. We also wish to write a corresponding DTD, which will define the template or rule book for our XML document. Then we have two situations: the DTD can be internal or external. Let us call our XML document as book.xml, and our external DTD as book.dtd. Note that when the DTD is internal, there is no need to provide a separate name for the DTD (since the contents of the DTD are inside the contents of the XML document anyway). But when the DTD is external, we must provide a name to this DTD file. We take a look at the internal and the external DTD, as shown in Fig. 13.24.

Fig. 13.24

Internal and external DTD examples

As we can see, when a DTD is internal, we embed the contents of the DTD inside the XML document, as shown in case (a). However, when a DTD is external, we simply provide a reference to the DTD inside our XML document, as shown in case (b). The actual DTD file has a separate existence of its own. Of course, we have not yet described the syntax completely, which we shall do very soon. When should we use an internal DTD, and when should we use an external DTD? For simple situations, internal DTDs work well. However, external DTDs help us in two ways. 1. External DTDs allow us to define a DTD once, and then refer to it from any number of XML documents. Thus, they are reusable. Also, if we need to make any changes to the contents of the DTD, the change needs to be made just once (to the DTD file).

Introduction to XML

459 2. External DTDs reduce the size of the XML documents, since the XML documents now contain just a reference to the DTD, rather than the actual contents of the DTD. Another keyword we need to in the context of internal DTDs. An XML document can be declared as standalone, if it does not depend on an external DTD. The keyword standalone is used along with the XML opening tag, as shown in Fig. 13.25.

(#PCDATA)> (#PCDATA)>

<employee> <emp_name>Sachin Tendulkar <salary>infinite

Fig. 13.25

Use of the standalone keyword

Let us now understand the syntax of the DTD declaration or reference, i.e., regardless of whether the DTD is internal or external. We know that the internal DTD declaration looks like this in our example: ]>

This DTD declaration indicates that our XML document will contain a root element called as myBook, which, in turn, contains an element called as book_name. We will talk more about it soon. Also, the contents of the DTD need to be wrapped inside square brackets. This informs the XML parser to know the start and the end of the DTD syntax, and also to help it differentiate between the DTD contents and the XML contents. On the other hand, the external DTD reference looks like this:
SYSTEM “myBook.dtd”>

This does not give us any idea about the actual contents of the DTD file, since the DTD is external. Let us now worry about the DOCTYPE syntax. In general, the basic syntax for the DOCTYPE line is as shown in Fig. 13.26.

Fig. 13.26

DOCTYPE basic syntax

Let us understand what it means. 1. The DOCTYPE keyword indicates that this is either an internal declaration of a DTD, or a reference to an external DTD. 2. Regardless of whether it is internal or external, this is followed by the name of the root element in the XML document.

Web Technologies

460 3. This is followed by the actual contents of the DTD (if the DTD is internal), or by the name of the DTD file (if it is an external DTD). This is currently shown with dots (…). Therefore, we can now enhance our DOCTYPE declaration, as shown in Fig. 13.27.

Fig. 13.27

Internal versus external DTD: The actual difference

13.7 ELEMENT TYPE DECLARATION We know that elements are the backbone of any XML document. If we want to associate a DTD with an XML document, we need to declare all the elements that we would like to see in the XML document, also in the DTD. This should be quite obvious to understand. After all, a DTD is a template or rule book for an XML document. An element is declared in a DTD by using the element type declarations (ELEMENT tag). For example, we can declare an element called as book_name, we can use the following declaration:

As we can see, book_name is the name of the element, and its data type is PCDATA. We will discuss these aspects soon. The XML jargon calls an element name as generic identifier. The data type is called as content specification. The element name must be unique within a DTD. Let us consider an example. Suppose that we want to store just the name of a book in our XML document. Figure 13.28 shows a sample XML document and the corresponding DTD that specifies the rules for this XML document. Note that we are using an external DTD. We have added line numbers simply for the sake of understanding the example easily by providing references during our discussion. The actual XML document and DTD will never have line numbers. XML document (book.xml) 1. 2. 3. 4. <myBook> 5. Computer Networks 6. DTD file (book.dtd) 1. 2.

Fig. 13.28 Book XML document and external DTD declaration

Introduction to XML

461 Let us understand this example line by line. Understanding the XML document (book.xml) n n n

n

Line 1 indicates that this is an XML document. Line 2 is a comment. Line 3 declares a document type reference. It indicates that our XML document makes use of an external DTD. The name of this external DTD is book.dtd. Also, the root element of our XML document is an element called as myBook. Lines 4–6 define the actual contents of our XML document. These consist of an element called as book_name.

Understanding the DTD (book.dtd) n

n

Line 1 is an element type reference. It indicates that the root element of the XML document that this DTD will be used to , will have a name myBook. This root element (myBook) contains one subelement called as book_name. Line 2 states that the element book_name can contain parsed character data.

13.7.1 Specifying Sequences, Occurrences and Choices So far, we have discussed examples where the DTD contained just one element inside the root element. Real life examples are often far more complex than this.

Sequence The first question is how we add more element type declarations to a DTD. For example, suppose that our book DTD needs to contain the book name and author name. For this, we simply need to add a comma between these two element type declarations. For example:

This declaration specifies that our XML document should contain exactly one book name, followed by exactly one author name. Any number of book name-author pairs can exist. Figure 13.29 shows an example of specifying the address book.
address (street, region, postal-code, locality, country)> street (#PCDATA)> region (#PCDATA)> postal-code (#PCDATA)> locality (#PCDATA)> country (#PCDATA)>

Fig. 13.29

Defining sequence of elements

As we can see, our address book contains sub-elements, such as street, region, postal code, locality, and country. Each of these sub-elements is defined as a parsed character data field. Of course, we can extend the concept of sub-elements further. That is, we can, for example, break down the street sub-element into street number and street name. This is shown in Fig. 13.30.

Web Technologies

462
address (street, region, postal-code, locality, country)> street (street_number, street_name)> street_number (#PCDATA)> street_name (#PCDATA)> region (#PCDATA)> postal-code (#PCDATA)> locality (#PCDATA)> country (#PCDATA)>

Fig. 13.30 Defining sub-sub-elements within sub-elements Choices Choices can be specified by using the pipe (|) character. This allows us to specify options of the type A or B. For example, we can specify that the result of an examination can be that the student has ed or failed (but not both), as follows.

Figure 13.31 shows a complete example. To a guest, we want to offer tea or coffee, but not both!
guest (name, purpose, beverage)> name (#PCDATA)> purpose (#PCDATA)> beverage tea | cofee>

Fig. 13.31 Specifying choices Occurrences The number of occurrences, or the frequency, of an element can be specified by using the plus (+), asterisk (*), or question mark (?) characters.

If we do not use any of the occurrence symbol (i.e., +, *, or ?), then the element can occur only once. That is, the default frequency of an element is 1. The significance of these characters is tabulated in Table 13.2.

Table 13.2 Specifying frequency of elements Character

Meaning

+ * ?

The element can occur one or more times The element can occur zero or more times The element can occur zero or one times

The plus sign (+) indicates that the element must occur at least once. The maximum frequency is infinite. For example, we can specify that a book must contain one or more chapters as follows.
(chapter+) >

Introduction to XML

463 We can use the same concept to apply to a group of sub-elements. For example, suppose that we want to specify that a book must contain a title, followed by at least one chapter and at least one author, we can use this declaration.
(title, (chapter, author) + )>

A sample XML document conforming to this DTD declaration is shown in Fig. 13.32. New to XML? w5j6c Basics of XML Jui Kahate Advanced XML Harsh Kahate

Fig. 13.32

Specifying frequency of a group of elements

Of course, the grouping of sub-elements for the purpose of specifying frequency is not restricted to the plus sign (+). It can be done equally well for the asterisk (*) or question mark (?) symbols. The asterisk symbol (*) specifies that the element may or may not occur. If it is used, it can repeat any number if times. Figure 13.33 shows two examples of the possibilities that are allowed.

Fig. 13.33

Using an asterisk to define frequency

As we can see, our DTD specifies that the XML document can depict zero or more employees in an organization. One sample XML document has three employees, the other has none. Both are allowed. On the other hand, if we replace the asterisk (*) with a plus sign (+), the situation changes. We must now have at least one employee. Therefore, the empty organization case (i.e., an organization containing no employees) is now ruled out. Figure 13.34 shows this.

Web Technologies

464

Fig. 13.34

Using a plus sign to define frequency

Finally, a question mark (?) indicates that the element cannot occur at all or can occur exactly once. A nation can have only one president. This is indicated by the following declaration.

At times, of course, the nation may be without a president temporarily. However, at no point can a nation have more than one president. Figure 13.35 shows these possibilities.

Fig. 13.35 Using a question mark to define frequency

13.8 ATTRIBUTE DECLARATION Elements describe markup of an XML document. Attributes provide more details about the elements. An element can have 0 or more attributes. For example, an employee XML document can contain elements to depict the employee number, name, designation, and salary. The designation element, in turn, can have a manager attribute that indicates the manager for that employee. The keyword ATTLIST describes the attribute(s) for an element.

Introduction to XML

465 Figure 13.36 shows an XML document containing an inline DTD. We can see that the element contains an attribute.

(message) > (#PCDATA)> from to subject

CDATA #REQUIRED> CDATA #REQUIRED> CDATA #REQUIRED>

<email> <message from = “jui” to = “harshu” subject = “where are you?”> It is time to have food!

Fig. 13.36

Declaring attributes in a DTD

We can see that the message element has three attributes: from, to, and subject. All the three attributes have a data type of CDATA (which stands for character data), and a #REQUIRED keyword. The #REQUIRED keyword indicates that this attribute must be a part of the element.

13.9 LIMITATIONS OF DTDs In spite of their several advantages, DTDs suffer from a number of limitations. Table 13.3 summarizes them.

Table 13.3

Limitations of DTDs

Limitation Non-XML syntax

One DTD per XML

Weak data typing No inheritance

Explanation Although DTDs do have the angled bracket syntax (e.g., ), this is quite different from the basic XML syntax. For example, a DTD does not have the standard tag, etc. More specifically, a DTD file is not a valid XML document. This means duplication of validating efforts; one logic for XML, another for DTD. We cannot use multiple DTDs to validate one XML document. We can include only one DTD reference inside an XML document. Although parameter entities make things slightly more flexible, their syntax is quite cryptic. DTD defines very basic data types. For real-life applications that demand more finegrained and specific data types, this is not sufficient in many situations. DTDs are not object-oriented in the sense that they do not allow the designer to create some data types and extend them as desired. (Contd)

Web Technologies

466 Table 13.3 contd...

Limitation Overriding a DTD

No DOM

Explanation An internal DTD can override an external DTD. (This is perhaps DTD’s idea of inheritance!). This allows certain flexibility, but often also creates a lot of confusion and leads to clumsy designs. We shall study later that the Document Object Model (DOM) technology is used to parse, read, and edit XML documents. It cannot be used for DTDs, though.

13.10 INTRODUCTION TO SCHEMA We have studied the concept of a Document Type Definition (DTD) in detail. We know that a DTD is used for validating the contents of an XML document. DTD is undoubtedly a very important feature of the XML technology. However, there are a number of areas in which DTDs are weak. The main argument against DTDs is that their syntax is not like that of XML documents. Therefore, the people working with DTDs have to learn new syntax to work with DTDs. Furthermore, this leads to problems, such as, we cannot search for information inside DTDs, we cannot display their contents in the form of HTML, etc. A schema is an alternative to DTD. It is expected that schemas would eventually completely replace most (but not all) features of DTDs. DTDs are easier to write and provide for some features (e.g., entities) better. However, schemas are much richer in of their capabilities and extensibility. A schema document is a separate document, just like a DTD. However, the syntax of a schema is like the syntax of an XML document. Therefore, we can state: The main difference between a DTD and a schema is that the syntax of a DTD is different from that of XML. However, the syntax of a schema is the same as that of XML. In other words, a schema document is an XML document. For example, we declare an element in a DTD by using the syntax . This is clearly not legal in XML. We cannot begin an element declaration with an exclamation mark, as happens in the case of a DTD. We can use a very simple, yet powerful example to illustrate the difference between using a DTD and using a schema. Suppose that we want to represent the marks of a student in an XML document. For this purpose, we want to add an element called as Marks to our root element Student. We will declare this element as of type PCDATA in our DTD file. This will ensure that the parser checks for the existence of the Marks element in the XML document. However, can it ensure that marks are numeric? Clearly, no! We cannot control what contents the element Marks can have. These contents can very well be alphabetic or alphanumeric! This is shown in Fig. 31.37. As we can see, the usage of PCDATA in the declaration of an element does not stop us from entering alphabetic data in a Marks element. In other words, we cannot specify exactly what should our elements contain. This is quite clearly not desirable at all. In the case of a schema, we can very well specify that our element should only contain numeric data. Moreover, we can control many other aspects of the contents of elements, which is not possible in the case of DTDs. We use similar terminology for checking the correctness of XML documents in the case of a schema (as in the case of DTDs). An XML document that conforms to the rules of a schema is called as a valid XML document. Otherwise, it is called as invalid. It is interesting to note that we can associate a DTD as well as a schema with an XML document.

Introduction to XML

467

Fig. 13.37

Use of PCDATA does not control data type

Let us now take a look at a simple schema. Consider an XML document which contains a greeting message. Let us write a corresponding schema for it. Figure 13.38 shows the details. <MESSAGE xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”message.xsd”> Hello World!

<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”> <xsd:element name = “MESSAGE“ type = “xsd:string”/>

XML document: message.xml

Schema: message.xsd

Fig. 13.38 Example of XML document and corresponding schema We will notice several new syntactical details in the XML document and the schema file. Let us, therefore, understand this step by step. First and foremost, an XML schema is defined in a separate file. This file has the extension xsd. In our example, the schema file is named message.xsd. The following declaration in our XML document indicates that we want to associate this schema with our XML document: <MESSAGE xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”message.xsd”>

Let us dissect this statement. 1. The word MESSAGE indicates the root element of our XML document. There is nothing unusual about it. 2. The declaration xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” is an attribute. It defines a namespace prefix and a namespace URI. The namespace prefix is xmlns. The namespace URI is http://www.w3.org/2001/XMLSchema-instance. The namespace prefix can change. The namespace

Web Technologies

468 URI must be written exactly as shown. The namespace URI specifies a particular instance of the schema specifications to which our XML document is adhering. 3. The declaration xsi:noNamespaceSchemaLocation=”message.xsd” specifies a particular schema file which we want to associate with our XML document. In this case, we are stating that our XML document wants to refer to a schema file whose name is message.xsd. This is followed by the actual contents of our XML document. In this case, the contents are nothing but the contents of our root element. These explanations are depicted in Fig. 13.39.

This is normal XML declaration. There is nothing unusual or unique about this.. <MESSAGE xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”message.xsd”>

MESSAGE: This is the root element. xmlns is the XML schema reference for our schema. xsi:noNamespaceSchemaLocation provides a pointer to our schema. In this case, it is message.xsd. Hello World!

This is also nothing unusual. We simply specify the contents of our root element, and then signify the end of the root element (and hence that of the XML document).

Fig. 13.39

Understanding our XML document

It is now time to understand our schema (i.e., message.xsd). Note that the schema file is an XML file with an extension of xsd. That is, like any XML document, it begins with an declaration. The following lines specify that this is a schema file, and not an ordinary XML document. They also contain the actual contents of the schema. Let us first reproduce them: <xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”> <xsd:element name = “MESSAGE“ type = “xsd:string”/>

Let us understand this step by step. 1. The declaration <xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”> indicates that this is a schema, because its root element is named schema. It has a namespace prefix of xsd. The namespace URI is http://www.w3org/2001/XMLSchema. This means that our schema declarations conform to the schema standards specified on the site http://www.w3org/2001/XMLSchema, and that we can use a namespace prefix of xsd to refer to them in our schema file. 2. The declaration <xsd:element name = “MESSAGE” type = “xsd:string”/> specifies that we want to use an element called as MESSAGE in our XML document. The type of this element is

Introduction to XML

469 string. Also, we are using the namespace prefix xsd. Recall that this namespace prefix was associated with a namespace URI http://www.w3org/2001/XMLSchema in our earlier statement. 3. The line specifies the end of the schema. These explanations are depicted in Fig. 13.40.

This is normal XML declaration. There is nothing unusual or unique about this. <xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>

xsd:schema indicates that this is a schema definition. xsd is the namespace prefix. It is associated with an actual namespace URI http://www.w3org/2001/XMLSchema. <xsd:element name = “MESSAGE“ type = “xsd:string”/>

This declares that our XML document will have the root element named MESSAGE of type string.

This signifies the end of our schema file.

Fig. 13.40 Understanding our XML schema Based on this discussion, let us have a small exercise.

13.11 COMPLEX TYPES 13.11.1 Basics of Simple and Complex Types Elements in schema can be divided into two categories: simple and complex. This is shown in Fig. 13.41.

Fig. 13.41

Classification of elements in XML schemas

Let us understand the difference between the two types of elements in schema.

Web Technologies

470

Simple elements Simple elements can contain only text. They cannot have sub-elements or attributes. The text that they can contain, however, can be of various data types such as strings, numbers, dates, etc. Complex elements Complex elements, on the other hand, can contain sub-elements, attributes, etc. Many times, they are made up of one or more simple element. This is shown in Fig. 13.42.

Fig. 13.42 Complex element is made up of simple elements Let us now consider an example. Suppose we want to capture student information in the form of the student’s roll number, name, marks, and result. Then we can have all these individual blocks of information as simple elements. Then we will have a complex element in the form of the root element. This complex element will encapsulate these individual simple elements. Figure 13.43 shows the resulting XML document, first. <STUDENT xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”student.xsd”> 100 Pallavi Joshi <MARKS> 80 Distinction

Fig. 13.43 XML document for Student example Let us now immediately take a look at the corresponding schema file. Figure 13.44 shows this file. <xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”> <xsd:element name = „STUDENT“ type = “StudentType”/> <xsd:complexType name = “StudentType”> <xsd:sequence> <xsd:element name = “ROLL_NUMBER” type <xsd:element name = “NAME” type <xsd:element name = “MARKS” type <xsd:element name = “RESULT” type

= = = =

“xsd:string”/> “xsd:string”/> “xsd:integer”/> “xsd:string”/>



Fig. 13.44

Schema for Student example

Introduction to XML

471 Let us understand our schema. 1. <xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”> We know that the root element of the schema is a reserved keyword called as schema. Here also, same is the case. The namespace prefix xsd maps to the namespace URI http://www.w3.org/2001/ XMLSchema, as before. In general, this will be true for any schema that we write. 2. <xsd:element name = “STUDENT” type = “StudentType”/> This declares STUDENT as the root element of our XML document. In the schema, it is called as the top-level element. that in the case of a schema, the root element is always the keyword schema. Therefore, the root element in an XML document is not the root of the corresponding schema. Instead, it appears in the schema after the root element schema. The STUDENT element is declared of type StudentType. This is a -defined type. Conceptually, a -defined type is similar to a structure in C/C++ or a class in Java (without the methods). It allows us to create our own custom types. In other words, the schema specification allows us to create our own custom data types. For example, we can create our own types for storing information about employees, departments, songs, friends, sports games, and so on. We recognize this as a -defined type because it does not have our namespace prefix xsd. that all the standard data types provided by the XML schema specifications reside at the namespace http://www.w3.org/2001/XMLSchema, which we have prefixed as xsd in the earlier statement. 3. <xsd:complexType name = “StudentType”> Now that we have declared our own type, we must explain what it represents and contains. That is exactly what we are doing here. This statement indicates that we have used StudentType as a type earlier, and now we want to explain what it means. Also, note that we use a keyword complexType to designate that StudentType is a complex element. This is similar to stating struct StudentType or class StudentType in C++/Java. 4. <xsd:sequence> Schemas allow us to force a sequence of simple elements within a complex element. We can specify that a particular complex element must contain one or more simple elements in a strict sequence. Thus, if the complex element is A, containing two simple elements B and C, we can mandate that C must follow B inside A. In other words, the XML document must have:

This is accomplished by the sequence keyword. 5. <xsd:element name = “ROLL_NUMBER” type = “xsd:string”/> This declaration specifies that the first simple element inside our complex element is ROLL_NUMBER, of type string. After this, we have NAME, MARKS, and RESULT as three more simple elements following ROLL_NUMBER. We will not discuss them. We will simply observe for now that ROLL_NUMBER has a different data type: an integer. We will discuss this in detail subsequently. We will also not discuss the closure of the sequence, ComplexType, and schema tags.

Web Technologies

472

13.12 EXTENSIBLE STYLESHEET LANGUAGE TRANSFORMATIONS (XSLT) We have had an overview of XSL earlier. We have also studied the XPath technology in detail. These two together are sufficient for us to now start discussing XSLT. We know that XSL consists of two parts: XSL Transformation Language (XSLT) and XSL Formatting Objects (XSL-FO). In this section, we would cover XSLT in detail. XSLT is used to transform one XML document from one form to another. XSLT uses XPath to perform a matching of nodes for performing these transformations. The result of applying XSLT to an XML document could be another XML, HTML, text, or any other document. The idea is shown in Fig. 13.45.

Fig. 13.45

XSLT basics

From a technology perspective, we need to that XSLT code is also written inside an XML file, with an extension of .xsl. In other words, XSLT is a different kind of XML file. Also, in order to work with XSLT, we need to make use of what is called as an XSLT processor. The XSLT processor is conceptually an XSLT interpreter. That is, it would read an XSL file as source, and interpret its contents to show its effects. Several companies provide XSLT processors. Some of the more popular ones at the time of writing this are Xalan from Apache, and MS-XML from Microsoft. These parsers are programming-language specific. Therefore, the Xalan parser from Apache, for example, would be different for Java and C++. In the following sections, we take a look at the various features in XSLT.

13.12.1 Templates An XSLT document is an XSLT document, which has the following. (a) A root element called as style sheet. (b) A file extension of .xsl. The syntax of XSLT, i.e., what is allowed in XSLT and what is not, is specified in an XML namespace, whose URL is http://www.w3.org/1999/XSL/Transform. Therefore, we need to include this namespace in an XSLT document. In general, an XSLT document reads the original XML document (called as the source tree) and transforms it into a target document (called as the result tree). The result tree, of course, may or may not be XML. The concept is shown in Fig. 13.46. Let us now consider a simple example where we can use XSLT. Suppose that we have a simple XML file that contains a name. Now suppose that we want to apply an XSLT style sheet to it, so that the XML document gets displayed as an HTML document, with the name getting outputted in bold. How would we achieve this? We would need to do several things, as listed below.

Introduction to XML

473 1. In our XML document, we would need to specify that we want to make use of a specific XSLT document (just as we need to mention the name of a DTD or schema, when we use one, inside our source XML document). 2. Our XSLT document (i.e., the XSLT style sheet) would contain appropriate rules to display the contents of the above XML document in the HTML format. One of the things our XSLT document needs to do is to display the name in bold. 3. To view the outcome, we need to open our source XML document in a Web browser. The Web browser would apply the XSLT style sheet to the XML document, and show us the output in the desired HTML format.

Fig. 13.46 XSLT transformations using tree concept Let us start with the source XML file. We have deliberately kept it quite simple. As we can see in Fig. 13.47, the XML document contains two elements: a root element by the name myPerson, which, in turn, contains the actual name of the person inside a sub-element called as personName. Note that it has reference to a style sheet named one.xsl. <myPerson> Sachin Tendulkar

Fig. 13.47 Source XML document (one.xml) Figure 13.48 shows the corresponding XSLT document, which would convert our XML document into HTML format.

Web Technologies

474 <xsl:stylesheet version = “1.0” xmlns:xsl = “http://www/w3.org/1999/XSL/Transform”> <xsl:template match = “myPerson”> <xsl:value-of select = “personName”/>


Fig. 13.48

XSLT document (one.xsl)

The resulting output is shown in Fig. 13.49.

Fig. 13.49

Resulting output

Let us now understand how we have achieved this. Understanding changes done to the source XML file (one.xml) Let us first see what changes we have done to our source XML document. We have simply added the following line to it:

This statement indicates that we want our XML document to be processed by an XSLT style sheet contained in a file named one.xsl. Because we have not specified any directory path, it is assumed that the XSLT style sheet is present in the same directory as of the source XML file. Understanding the XSLT style sheet file (one.xsl) Now, let us understand the meaning and purpose of the XSLT style sheet file. The first line declares the fact that this document is an XSLT style sheet: <xsl:stylesheet version = “1.0” xmlns:xsl = “http://www/w3.org/1999/XSL/ Transform”>

Introduction to XML

475 The keyword stylesheet indicates that this is a style sheet. The namespace for the XSLT specifications is then provided. <xsl:template match = “myPerson”>

This line indicates a template element. It uses the attribute match to specify the condition. After match, we can specify any valid XPath expression. In the current example, in plain English, this would read as follows. If an element by the name myPerson is found … That is, we are trying to go through our XML document (i.e., one.xml) to see if we can locate an element named myPerson there. If we do find a match, we want to perform some action, which we shall discuss next.

This is clearly plain HTML. Therefore, what we are saying is that if we find a myPerson element in our XML document, we want to start outputting HTML contents. More specifically, we want to start with the and tags. Obviously, this has nothing to do with the XSLT technology. <xsl:value-of select = “personName”/>

Now, we indicate that our output should be in bold font (indicated by the tag). This is followed by some XSLT code: <xsl:value-of select = “personName”/>. This code says that we want to select the value of an element called as personName, located in our XML document. After this, we close the bold tag.

This indicates the end of our template declaration. The remaining code is plain HTML. Thus, we can summarize our observations as shown in Table 13.4.

Table 13.4

Purpose of basic template tags Syntax

<xsl:template match = xyz> <xsl:value-of select = pqr>

Purpose Search for a matching tag named xyz in our XML document Display the value of all tags named pqr at this place

Let us study a few more examples to understand XSLT templates better.

Problem Consider the following XML document: <TITLE>Computer Networks Andrew Tanenbaum 2003 250

Web Technologies

476 <TITLE>Web Technologies Achyut Godbole Atul Kahate 2002 250


Write an XSLT code to only retrieve the book titles and their prices.

Solution We want to do the following here: 1. Search for a BOOK tag in our XML document. 2. Whenever found, display the contents of the TITLE and PRICE tags. The corresponding syntaxes for these will be: 1. Search for a BOOK tag in our XML document. <xsl:template match=”BOOK”>

2. Whenever found, display the contents of the TITLE and PRICE tags. Name: <xsl:value-of select=”TITLE”/> Price: <xsl:value-of select=”PRICE”/> Therefore, our XSLT style sheet would contain the following: <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”BOOK”> Book Name: <xsl:value-of select=”TITLE”/> Price:<xsl:value-of select=”PRICE”/>

The final XML document and the corresponding XSLT style sheet are shown in Fig. 13.50. XML document (two.xml) <TITLE>Computer Networks Andrew Tanenbaum 2003 250 <TITLE>Web Technologies

(Contd)

Introduction to XML

477 Fig. 13.50 contd... Achyut Godbole Atul Kahate
2002 250
XSLT style sheet (two.xsl) <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”BOOK”> Book Name: <xsl:value-of select=”TITLE”/> Price: <xsl:value-of select=”PRICE”/>

Fig. 13.50 Book example The resulting output is shown in Fig. 13.51.

Fig. 13.51

Output of Book example

An interesting question at this stage is, can we specify multiple search conditions in an XSLT style sheet? That is, suppose that in the above example, we first want to display book titles with their prices. Later, as an independent activity, we want to display the titles with the authors’ names and the years when published. Is this possible? It is perfectly possible, and the way it works is depicted in Fig. 13.52. Here, we have shown the generic manner in which XSLT processes an XML document (the source tree) to produce the desired output (the result tree). As we can see, XSLT technology keeps looking for possible template matches on elements/tags in the source XML document. As and when it finds a match, XSLT outputs it as specified by the . This also means that there can be multiple independent template matches (i.e., search conditions) in the same XSLT style sheet. Let us consider a few more examples to understand this.

Web Technologies

478

Fig. 13.52

XSLT processing overview

Problem Consider the following XML document, titled emp.xml: <EMP_INFO> <EMPLOYEE> <EMP_NAME empID=”9662"> Sachin Tendulkar

Write an emp.xsl file mentioned above, which would: (i) Display a heading Emp Name:, followed by the employee’s name. (ii) Display the employee number below this, in a smaller font.

Solution Step 1 We need to extract the values of the tags FIRST and LAST, and display them along with the text EmpName: as an HTML heading. For extracting the FIRST and LAST tags, we need the following XSLT code: <xsl:value-of select=”EMPLOYEE/EMP_NAME/FIRST”/> <xsl:value-of select=”EMPLOYEE/EMP_NAME/LAST”/>

Step 2 After this, we need to extract the value of the attribute empID and display it below this. For this purpose, we need the following XSLT code: <xsl:value-of select=”EMPLOYEE/EMP_NAME/@empID”/>

Introduction to XML

479 Along with these XSLT syntaxes, we need to ensure that we have the right HTML code for formatting the output. This simply means that we need to embed the FIRST and LAST tags inside an HTML heading, say H1; and the emp ID inside another heading, say H3. The resulting code for the XSLT style sheet is as follows: <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”EMP_INFO”> Emp Info! 6q6g31

Emp Name: <xsl:value-of select=”EMPLOYEE/EMP_NAME/FIRST”/> <xsl:value-of select=”EMPLOYEE/EMP_NAME/LAST”/> 2b4d4b

<xsl:value-of select=”EMPLOYEE/EMP_NAME/@empID”/> 634x2r



Figure 13.53 shows the original XML document in the browser, without applying the style sheet, and Fig. 13.54 shows the version when the style sheet is applied.

Fig. 13.53

Original XML document without applying the style sheet

Web Technologies

480

Fig. 13.54

XML document after applying the style sheet

Problem Consider the following XML document, titled history.xml: <subject> History of India

Write history.xsl file mentioned above, which would: (iii) Display a heading XSL Demo. (iv) Display the contents of the info tag on the next line.

Solution We will not describe the whole style sheet this time, since how to write its contents should be obvious by now. The result is shown below. <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”subject”> Hello World 3k696j

XSL demo 3n1q2a

<xsl:value-of select=”info”/> 1z6k57



Figures 13.55 and Fig. 13.56 show the original (without style sheet) and the formatted (with style sheet) outputs in the browser.

Introduction to XML

481

Fig. 13.55

Original XML document without applying the style sheet

Fig. 13.56

XML document after applying the style sheet

Problem Consider the following XML document, titled portfolio.xml: <portfolio> <stock exchange=”nyse”> zacx corp <symbol>ZCXM <price>28.875

Web Technologies

482 <stock exchange=”nasdaq”> zaffymat inc <symbol>ZFFX <price>92.250 <stock exchange=”nasdaq”> zysmergy inc <symbol>ZYSZ <price>20.313

Write a portfolio.xsl file mentioned above, which would display the stock symbols followed by the price.

Solution We will not describe the whole style sheet this time, since how to write its contents should be obvious by now. The result is shown below. <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”stock”> Symbol: <xsl:value-of select=”symbol” />, Price: <xsl:value-of select=”price” />


Figures 13.57 and Fig. 13.58 show the original (without style sheet) and the formatted (with style sheet) outputs in the browser.

Fig. 13.57

Original XML document without applying the style sheet

Introduction to XML

483

Fig. 13.58

XML document after applying the style sheet

13.12.2 Looping and Sorting In this section, we shall look at two important XSLT constructs: iterating through a list of items by using the <xsl:for-each> syntax, and then sorting information by using the <xsl:sort> syntax.

Looping using <xsl:for-each> The XSLT <xsl:for-each> syntax is used to loop through an XML document. It allows us to embed one template inside another. In other words, it can act as an alternative to an <xsl:apply-templates> syntax. This can be slightly confusing to understand. Therefore, we illustrate this with the help of a simple example. Suppose that we want to work with an XML document containing book details. At the first level, we want to go through the BOOK element. This element can, in turn, have a number of CHAPTER sub-elements. We know that if we now want to iterate through all the CHAPTER sub-elements, we need to use an <xsl:apply-templates select = “CHAPTER”/> syntax. This causes the XSLT to find a match on every chapter element, and apply the style sheet as later defined in the <xsl:template match = “CHAPTER”> syntax. The corresponding sample code is shown in Fig. 13.59. <xsl:stylesheet …> <xsl:template match = “BOOK”> … <xsl:apply-templates select = “CHAPTER”/> … <xsl:template match = “CHAPTER”> …

Fig. 13.59 Using <xsl:apply-templates> syntax for selecting all sub-elements It is worth repeating that this syntax causes the XSLT to loop over all the CHAPTER sub-elements of the BOOK element.

Web Technologies

484 Now let us re-write the code by using an <xsl:for-each> construct. Here, we eliminate the nesting which was used in the earlier syntax. Instead, the <xsl:for-each> syntax causes the XSLT to loop over all the CHAPTER sub-elements one by one. The resulting code is shown in Fig. 13.60. <xsl:stylesheet …> <xsl:template match = “BOOK”> … <xsl:for-each select = “CHAPTER”> …

Fig. 13.60

Using <xsl:for-each> syntax for selecting all sub-elements

Quite clearly, the <xsl:for-each> syntax is very close to the traditional programming languages. It achieves the same result as the earlier <xsl:apply-templates> syntax. So, an obvious question is, which of these syntaxes should be used? There is no clear answer. It all depends on an individual’s style preferences and comfort levels. Let us consider a complete example to illustrate the usage of the <xsl:for-each> syntax. Consider an XML document containing a list of customers, as shown in Fig. 13.61. <customers> <customer> Mahesh Katare
Eve’s Plaza, Bangalore
<state>Karnataka (80) 3247890 <customer> Naren Limaye
Shanti Apartments, Thane
<state>Maharashtra (22) 82791810 <customer> Uday Bhalerao
Kothrud, Pune
<state>Maharashtra (20) 25530834 <customer> Amol Kavthekar
Station Road, Solapur
<state>Maharashtra (217) 2729345 <customer> Meghraj Mane

(Contd)

Introduction to XML

485 Fig. 13.61 contd...
Connuaght Place, Delhi
<state>Delhi (11) 57814091 <customer> Sameer Joshi
Gullapetti, Hyderabad
<state>Andhra Pradesh 93717-90911

Fig. 13.61

XML document containing a list of customers (foreach.xml)

Now, suppose that we want to display the names, addresses, and phone numbers of the customers in the form of a table. The simplest way to do this is to read the contents of our XML document, and display the required fields inside an HTML table. For this purpose, we can make use of the <xsl:for-each> syntax as shown in Fig. 13.62. <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”/”> <xsl:for-each select=”customers/customer”>
<xsl:value-of select=”name” /> <xsl:value-of select=”address” /> <xsl:value-of select=”phone” />


Fig. 13.62

XSLT document for tabulating customer data

The logic of this XSLT can be explained in simple as shown in Fig. 13.63. 1. Read the XML document. 2. Create an HTML table structure for the output. 3. For each customer sub-element in the customers element: (a) Display the values of the name, address, and phone elements in a row of the HTML table. 4. Next.

Fig. 13.63 Understanding the <xsl:for-each> syntax

Web Technologies

486 The resulting output is shown in Fig. 13.64.

Fig. 13.64

Resulting output

Instead of this code, we could have, very well, used the standard code that does not use <xsl:for-each> syntax. This XSLT is shown in Fig. 13.65. <xsl:stylesheet version=”1.0" xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> <xsl:template match=”/”> <xsl:apply-templates/>
<xsl:template match=”customers/customer”> <xsl:value-of select=”name”/> <xsl:value-of select=”address”/> <xsl:value-of select=”phone”/>

Fig. 13.65 XSLT document for tabulating customer data without using <xsl:for-each>

Introduction to XML

487 It is needless to say that this XSLT would also produce the same output as produced by the <xsl:foreach> syntax. We will not show that output once again.

13.13 BASICS OF PARSING 13.13.1 What is Parsing? The term parsing should not be new to the students and practitioners of information technology. We know that there are compilers of programming language, which translate one programming language into an executable language (or something similar). For example, a C compiler translates a C program (called as object program) into an executable language version (called as object program). These compilers use the concept of parsing quite heavily. For example, we say that such compilers parse an expression when they convert a mathematical expression such as a = b + c; from C language to the corresponding executable code. So, what do we exactly mean? We mean that a compiler reads, interprets, and translates C into another language. More importantly, it knows how to do this job of translation, based on certain rules. For example, with reference to our earlier expression, the compiler knows that it must have exactly one variable before the = sign, and an expression after it, etc. Thus, certain rules are set, and the compiler is programmed to and interpret those rules. We cannot write the same expression in C as b + c = a; because the compiler is not programmed to handle this. Thus, we can define parsing in the context of compilation process as follows. Parsing is the process of reading and validating a program written in one format and converting it into the desired format. Of course, this is a limited definition of parsing, when applied to compilers. Now, let us extend this concept to XML. We know that an XML document is organized as a hierarchical structure, similar to a tree. Furthermore, we know that we can have well-formed and valid XML documents. Thus, if we have something equivalent to a compiler for XML that can read, validate, and optionally convert XML, we have a parser for XML. Thus, we can define the concept of a parser for XML now. Parsing of XML is the process of reading and validating an XML document and converting it into the desired format. The program that does this job is called as a parser. This concept is shown in Fig. 13.66.

Fig. 13.66

Concept of XML parsing

Let us now understand what a parser would need to do to make something useful for the application programmer. Clearly, an XML file is something that exists on the disk. So, the parser has to first of all bring it from the disk into the main memory. More importantly, the parser has to make this in memory representation of an XML file available to the programmer in a form that the programmer is comfortable with.

Web Technologies

488 Today’s programming world is full of classes and objects. Today’s popular programming languages such as Java, C++, and C# are object-oriented in nature. Naturally, the programmer would live to see an XML file in memory also as an object. This is exactly what a parser does. A parser reads a file from the disk, converts it into an in-memory object and hands it over to the programmer. The programmer’s responsibility is then to take this object and manipulate it the way she wants. For example, the programmer may want to display the values of certain elements, add some attributes, count the total number of elements, and so on. This concept is shown in Fig. 13.67.

Fig. 13.67

The parsing process

This should clarify the role of a parser. Often, application programmers are confused in of where parser starts and where it ends. We need to that the parser simply assists us in reading an XML file as an object. Now an obvious question is, why do we need such a parser? Why can we ourselves not do the job of a parser? For example, if we disregard XML for a minute and think about an ordinary situation where we need to read, say, an employee file from the disk and produce a report out of it, do we use any parser? Of course, we do not. We simply instruct our application program to read the contents of a file. But wait a minute. How do we instruct our program to do so? We know how the file is structured and rely on the programming environment to provide us the contents of the file. For example, in C# or Java, we can instruct our application program to read the next n bytes from the disk, which we can treat as a record (or the fields of a record). In a more programmerfriendly language such as COBOL, we need not even worry about asking the application program to read a certain number of bytes from the disk, etc. We can simply ask it to read the next record, and the program knows what we mean.

Introduction to XML

489 Let us come back to XML. Which of the approaches should we use now? Should we ask our application program to read the next n bytes every time, or say something like read the next element? If we go via the n bytes approach, we need to know how many bytes to read every time. Also, that apart from just reading next n bytes, we also need to know where an element begins, where it ends, whether all its attributes are declared properly, whether the corresponding end element tag for this element is properly defined, whether all sub-elements (if any) are correctly defined, and so on! Moreover, we also need to validate these next n bytes against an appropriate section of a DTD or schema file, if one is defined. Clearly, we are getting into the job of writing something similar to a compiler ourselves! How nice it would be, instead, if we can just say in the COBOL style of programming, read the next record. Now whether that means reading the next 10 bytes or 10,000 bytes, ensuring logic and validity, etc., need not be handled by us! that we need to deal with hundreds of XML file. In each of our application programs, we do not want to write our own logic of doing all these things ourselves. It would leave us with humungous amount of work even before we can convert an XML file into an object. Not only that, it would be quite cumbersome and error-prone. Therefore, we rely on an XML parser to take care of all these things on our behalf, and give us an XML file as an object, provided all the validations are also successful. If we do not have XML parsers, we would need logic to read, validate, and transform every XML file ourselves, which is a very difficult task.

13.13.2 Parsing Approaches Suppose that someone younger in your family has returned from playing a cricket match. He is very excited about it, and wants to describe what happened in the match. He can describe it in two ways, as shown in Fig. 13.68. “We won the toss and were elected to bat. Our opening pair was Sachin and Viru. They gave us an opening partnership of 100 when Viru was dismissed. It was the 16th over. Sachin was batting beautifully as usual. … … … Thus, while chasing 301 runs to win in 50 overs, they were dismissed for 275 and we won the match by 25 runs. Sachin was declared the man of the match.”

“We won today! We had batted first and made 300. While chasing this target, they were dismissed for 275, thus giving us a victory by 25 runs. Sachin was declared the man of the match. The way it started was that Sachin and Viru opened the innings and added 100 for the first wicket. … … … This is what happened in the match today, and thus we won.”

Fig. 13.68 Two ways of describing events of a cricket match Now we will leave this example for a minute and come back to it in some time after establishing its relevance to the current technical discussion. There is tremendous confusion about the various ways in which XML documents can be processed inside a Java program. The problem is that several technologies have emerged, and there has been insufficient clarity in of which technology is useful for what purposes. Several have been in use for many years, most prominently SAX, DOM, JAXP, JDOM, Xerces, dom4j, and TrAX. Let us first try to make sense of them before we actually embark on the study of working with XML inside Java programs.

Web Technologies

490 We have noted earlier that the job of an XML parser is to read an XML document from the disk, and present it to a Java program in the form of an object. With this central theme in mind, we need to know that over several years, many ways were developed to achieve this objective. That is what has caused the confusion, as mentioned earlier. Let us demystify this situation now. When an XML document is to be presented to a Java program as an object, there are two main possibilities. 1. Present the document in bits and pieces, as and when we encounter certain sections or portions of the document. 2. Present the entire document tree at one go. This means that the Java program has to then think of this document tree as one object, and manipulate it the way it wants. We have discussed this concept in the context of the description of a cricket match earlier. We can either describe the match as it happened, event by event; or first describe the overall highlights and then get into specific details. For example, consider an XML document as shown in Fig. 13.69. <employees> <employee> Umesh <department>EDIReader 11 <employee> Pallavi <department>XSLT 12

Fig. 13.69 Sample XML document Now, we can look at this XML structure in two ways. 1. Go through the XML structure item by item (e.g., to start with, the line , followed by the element <employees>, and so on). 2. Read the entire XML document in the memory as an object, and parse its contents as per the needs. Technically, the first approach is called as Simple API for XML (SAX), whereas the latter is known as Document Object Model (DOM). We now take a look at the two approaches diagrammatically. More specifically, they tell us how the same XML document is processed differently by these two different approaches. Refer to Fig. 13.70. It is also important to know the sequence of elements as seen by XSLT. If we have an XML document visualized as a tree-like structure as shown in Fig. 13.71, then the sequence of elements considered for parsing by XSLT would be as shown in Fig. 13.72.

Introduction to XML

491

Fig. 13.70

SAX approach for our XML example

In general, we can equate the SAX approach to our example of the step-by-step description of a cricket match. The SAX approach works on an event model. This works as follows. (i) The SAX parser keeps track of various events, and whenever an event is detected, it informs our Java program. (ii) Our Java program needs to then take an appropriate action, based on the requirements of handling that event. For example, there could be an event Start element as shown in the diagram. (iii) Our Java program needs to constantly monitor such events, and take an appropriate action. (iv) Control comes back to SAX parser, and steps (i) and (ii) repeat.

Web Technologies

492 This is shown in Fig. 13.73.

Fig. 13.71 An XML document depicted as a tree-like structure

Fig. 13.72 SAX view of looking at a tree-like structure In general, we can equate the DOM approach to our example of the overall description of a cricket match. This works as follows. (i) The DOM approach parses through the whole XML document at one go. It creates an in-memory tree-like structure of our XML document, the way it is depicted in Fig. 13.74. (ii) This tree-like structure is handed over to our Java program at one go, once it is ready. No events get fired unlike what happens in SAX. (iii) The Java program then takes over the control and deals with the tree the way it wants, without actively interfacing with the parser on an event-by-event basis. Thus, there is no concept of something such as Start element, Characters, End element, etc. This is shown in Fig. 13.75.

Introduction to XML

493

Fig. 13.73 SAX approach explained further

Fig. 13.74 DOM approach for our XML example

Fig. 13.75 DOM approach explained further

Web Technologies

494

13.14

JAXP

The Java API for XML Processing (JAXP) is a Sun standard API which allows us to validate, parse, and transform XML with the help of several other APIs. It is very important to clarify that JAXP itself is not a parser API. Instead, we should consider JAXP as an abstraction layer over the actual parser APIs. That is, JAXP is nor at all a replacement for SAX or DOM. Instead, it is a layer above them. This concept is shown in Fig. 13.76.

Fig. 13.76 Where JAXP fits As we can see, our application program would need to interface with JAXP. JAXP, in turn, would interface with SAX or DOM, as appropriate. JAXP is not a new means for parsing XML. It does not also add to SAX or DOM. Instead, JAXP allows us to work with SAX and DOM more easily and consistently. We must that without SAX, DOM, or another parser API (such as JDOM or dom4j), we cannot parse an XML document. We need to this. SAX, DOM, JDOM and dom4j parse XML. JAXP provides a way to invoke and use such a parser, but does not parse an XML document itself. At this juncture, we need to clarify that even JDOM and dom4j sit on top of other parser APIs. Although both APIs provide us a different approach for parsing XML as compared to SAX and DOM, they use SAX internally. In any case JDOM and dom4j are not popular as standards, and hence we would not discuss them. Instead, we would concentrate on JAXP, which is a standard.

13.14.1 Sun’s JAXP A lot of confusion about JAXP arises because of the way Sun’s version of it has been interpreted. When the idea of JAXP was born, the concept was very clear. JAXP was going to be an abstraction layer API that would interface with an actual parser API, as illustrated earlier. However, this was not going to be sufficient for developers, since they needed an actual parser API as well, so as to try out and work with JAXP. Otherwise, they would only have the abstract API of JAXP, which would not do any parsing itself. How would a developer then try it out?

Introduction to XML

495 To deal with this issue, when Sun released JAXP initially, it included the JAXP API (i.e., the abstract layer) and a parser API (called as Crimson) as well. Now, JAXP comes with Apache Xerces parser, instead. Thus, the actual JAXP implementation in real life slightly modified our earlier diagram, as shown in Fig. 13.77.

Fig. 13.77

Understanding where JAXP fits—modified

Let us now understand how this works as the coding level. Whenever we write an application program to deal with XML documents, we need to work with JAXP. It should be clear by now. How should our application program work with JAXP? 1. Clearly, looking at the modified diagram, our application program would interface with the abstraction layer of JAXP API. 2. This abstraction layer of the JAXP API, in turn, interfaces with the actual implementation of JAXP (such as Apache Xerces). This allows our application program to be completely independent of the JAXP implementation. Tomorrow, if we replace the JAXP implementation with some other parser, our application program would remain unchanged. 3. The JAXP implementation (e.g., Apache Xerces) would then perform parsing of the XML document by using SAX or DOM, as appropriate to our given situation. Of course, whether to use SAX or DOM must be decided and declared in our application program. To facilitate this, Sun’s JAXP API first expects us to declare (a) which parser implementation we want to use (e.g., Apache Xerces), and (b) whether we want to use SAX or DOM as the parsing approach. We have discussed that the aim is to keep our application program independent of the actual parser implementation or instance. In other words, we should be expected to code our application program in exactly the same manner, regardless of which parser implementation is used. Conceptually, this is facilitated by talking to the abstraction layer of the JAXP API. This is achieved by using the design pattern of abstract factory. The subject of design patterns is separate in itself, and is not in the scope of the current discussion. Design patterns allow us to simplify our application design by conforming to certain norms. There are many design patterns, of which one is abstract factory. However, we can illustrate conceptually how the abstract factory works, as shown in Fig. 13.78 in the context of JAXP.

Web Technologies

496

Fig. 13.78 How to work with JAXP at the code level—Basic concepts Let us understand this in more detail. import javax.xml.parsers.SAXParserFactory;

This import statement makes the SAX parser factory package defined in JAXP available to our application program. As we had mentioned earlier, an abstract factory design pattern allows us to create an instance of a class without worrying about the implementation details. In other words, we do not know at this stage whether we want to eventually create an instance of the Apache Xerces parser, or any other parser. This hiding of unwanted details from our code, so that it will work with any parser implementation, is what abstract factory gives us. SAXParserFactory spf = SAXParserFactory.newInstance ();

This line tells us that we want to create some instance of the SAX parser factory, and assign it to an object named spf. This statement tells JAXP that we are interested in using SAX later in the program. But at this stage, we simply want to create an instance of some SAX parser. But then which SAX parser? Is it the Apache Xerces version of SAX, or something else? This is hidden from the application programmer in a beautiful manner. Whether to use Apache Xerces or any other implementation of the parser is defined in various ways, but away from the code (to make it implementation-independent). For example, this property can be defined in a Java system property named javax.xml.parsers.SAXParserFactory, etc. There, we can set the value of this

Introduction to XML

497 property to Apache Xerces, or the parser name that we are using. This is how the abstract layer of the JAXP API knows which implementation of JAXP should be used. SAXParser parser = spf.newSAXParser ();

Now that we have specified that we want to use a certain implementation of SAX as outlined above, we want to create an instance of that implementation. This instance can be used to work with the XML document we want to parse, as we shall study later. Think about this instance as similar to how a file pointer or file handle works with a file, or how a record set works with a relational database table. Of course, this example showed the basic concepts of starting to work with SAX in JAXP. These remain more or less the same for DOM, as we shall study later. What will change are the package names, class names, etc. Regardless of that, we can summarize the conceptual approach of working with JAXP as shown in Fig. 13.79. 1. Create an instance of the appropriate JAXP factory (SAX or DOM). 2. The factory will refer to some properties file to know which implementation (e.g., Apache Xerces) of the JAXP parser to invoke. 3. Get an instance of the implemented parser (SAX or DOM as implemented by Apache Xerces or another implementation, as defined in the properties file above).

Fig. 13.79 Initial steps in using JAXP Now it should be quite clear how JAXP makes our application program independent of the parser implementation. In other words, our application program talks to the abstraction layer of JAXP, and in our properties file, we specify which JAXP implementation this abstract layer should be linked with.

13.14.2 Actual Parsing Once the above steps are performed, the program is ready to parse the XML documents. In other words, the program can either respond to events as and when they occur (i.e., the SAX approach), or ask the parser to build the document in the memory as a tree-like structure, and then call various methods to query the tree-like structure (i.e., the DOM approach). Our aim here is not to learn the details of how the parsing code works. However, for the sake of completeness, Fig. 13.80 shows a SAX example, and Fig. 13.81 shows a DOM example. import import import import import import import import import import import import import

java.io.IOException; java.lang.*; javax.xml.parsers.SAXParser; javax.xml.parsers.SAXParserFactory; org.xml.sax.Attributes; org.xml.sax.Locator; org.xml.sax.SAXException; org.xml.sax.SAXNotRecognizedException; org.xml.sax.SAXNotedException; org.xml.sax.SAXParseException; org.xml.sax.XMLReader; org.xml.sax.ext.LexicalHandler; org.xml.sax.helpers.DefaultHandler;

(Contd)

Web Technologies

498 Fig. 13.80 contd... import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory; public class BookCount extends DefaultHandler{ private int count = 0; public void startDocument() throws SAXException System.out.println(“Start document ...”); }

{

public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { int year = 0; String attrValue; System.out.println (“Current element = “ + raw); if (raw.equals (“book”)) { count++; } } public void endDocument() throws SAXException { System.out.println(“The total number of books = “ + count); } public static void main (String[] args) throws Exception { BookCount handler = new BookCount (); try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse (“book.xml”, handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Fig. 13.80

SAX example using JAXP

import org.w3c.dom.*; import javax.xml.parsers.*; import org.xml.sax.*; public class DOMExample2 { public static void main (String[] args) { NodeList elements; String elementName = “cd”; try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance (); DocumentBuilder builder = factory.newDocumentBuilder ();

(Contd)

Introduction to XML

499 Fig. 13.81 contd... Document document = builder.parse (“cdcatalog.xml”); Element root = document.getDocumentElement (); System.out.println (“In main ... XML file openend successfully ...”); elements = document.getElementsByTagName(elementName); // is there anything to do? if (elements == null) { return; } // print all elements int elementCount = elements.getLength(); System.out.println (“Count = “ + elementCount); for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); System.out.println (“Element Name = “ + element.getNodeName()); System.out.println (“Element Type = “ + element.getNodeType()); System.out.println (“Element Value = “ + element.getNodeValue()); System.out.println (“Has attributes = “ + element.hasAttributes()); } } catch (ParserConfigurationException e1) { System.out.println (“Exception: “ + e1); } catch (SAXException e2) { System.out.println (“Exception: “ + e2); } catch (DOMException e2) { System.out.println (“Exception: “ + e2); } catch (java.io.IOException e3) { System.out.println (“Exception: “ + e3); } } }

Fig. 13.81

DOM example using JAXP

SUMMARY l l l l

l

The Extensible Markup Language (XML) can be used to exchange data across the Web. XML can be used to create data structures that can be shared between incompatible systems. XML is a common meta-language that enables data to be transformed from one format to another. An extremely useful feature of XML is the idea that documents describe themselves—a concept called as metadata. The point to note is that if the tags and attributes are well-designed and descriptive, both humans and machines can read and use the information contained in the XML document.

Web Technologies

500 l l

l

l

l l

l l

l

l

l

l

l

l

An XML parser is an interface that allows a developer to manipulate XML documents. In an XML document, an element is a group of tags as well as data. Elements can contain character data, child elements, or a mixture of both. In addition, they can have attributes. The process of describing what a valid XML document would consist of, and look like, is called as creating a Document Type Definition (DTD). A DTD can be internal (i.e., combined with the XML content) or external (i.e., separate from the XML content). An XML schema is similar in concept to a DTD. Like a DTD, a schema is used to describe the data elements, attributes and their relationships of an XML document. Schema, unlike a DTD, is an XML document itself (but with a separate extension of .xsd). Schema has many powerful features as compared to DTD. For instance, schema s a number of data types, flexible rules, very specific validations, and clean syntax. The Simple API for XML (SAX) parser approach considers an XML document to be composed of many elements and deals with it one element at a time. Therefore, this is an incremental, step-wise sequential process. Unlike SAX, the Document Object Model (DOM) approach treats an XML document as a tree-like hierarchical structure. It then parses this tree-like structure at once, in entirety. Here, unlike the SAX approach, the data from the XML document can be accessed randomly in any order. Sun Microsystems has provided the Java API for XML Processing (JAXP), which allows a Java programmer to work with XML documents. JAXP allows a programmer to read/modify/create an XML document using either SAX or DOM. A new approach called as StAX can also be used. The Extensible Stylesheet Language Transformations (XSLT) specifies how to transform an XML document into another format (e.g., HTML, text). XSLT solves the problem of how to format an XML document at the time of display. XSL deals with two main aspects, (a) how to transform XML documents into (HTML) format, and (b) how to conditionally format XML documents.

REVIEW QUESTIONS Multiple-choice Questions 1. XML is a standard. (a) data representation (b) interface (c) database (d) display 2. An element can be defined in a DTD by using the keyword. (a) TAG (b) NEW (c) DATA (d) ELEMENT 3. An XML document can have a DTD declaration by using the keyword. (a) DOCTYPE (b) DTD (c) DOCUMENT (d) DESIGN 4. The data type used most extensively in DTDs is . (a) #INTEGER (b) #PCDATA (c) #STRING (d) #CHAR

Introduction to XML

501 5. Choices in DTD can be specified by using the symbol. (a) | (b) OR (c) || (d) ALTERNATIVE 6. In XSLT, the tag should be used to retrieve and display the value of an element in the output. (a) <xsl:display> (b) <xsl:output> (c) <xsl:value-of select> (d) <xsl:select> 7. An element in schema that has a sub-element or an attribute automatically becomes a element. (a) simple (b) composite (c) multiple (d) complex 8. In XSLT, the tag should be used to retrieve and display the value of an element in the output. (a) <xsl:display> (b) <xsl:output> (c) <xsl:value-of select> (d) <xsl:select> 9. In the approach, elements in an XML document are accessed in a sequential manner. (a) SAX (b) DOM (c) JAXP (d) complex 10. The Java programming language s XML by way of the technology. (a) JAXR (b) JAXM (c) JAXP (d) JAR

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Explain the need for XML in detail. What is EDI? How does it work? What are the strengths of XML technology? What are DTDs? How do they work? Explain the differences between external and internal DTDs. What are XML schemas? How are they better than DTDs? Explain the XSLT technology with an example. Discuss the idea of JAXP. Contrast between SAX and DOM. Elaborate on the practical situations where we would use either SAX or DOM.

Exercises 1. Study the real-life examples where XML is used. For example, study how the SWIFT payment messaging standard has moved to XML-based messaging. 2. Investigate the for XML in .NET. 3. Study the concept of middleware and see how XML is used in all messaging applications based on middleware technologies. 4. What are the different languages based on XML (e.g., BPEL)? Study at least one of them. 5. Which are the situations where XML should not be used in the messaging applications? Why?

Web Technologies

502

Web Services and Middleware

+D=FJAH

14

MIDDLEWARE CONCEPTS What is Middleware? The term middleware is used quite extensively in information technology. In the last few years, middleware has become the backbone of all critical applications almost universally. However, people use the term middleware quite vaguely. What is middleware, and how does it relate with the Web technologies? Let us study this topic now. Figure 14.1 shows the basic idea of middleware at a high level.

Fig. 14.1

Middleware concept

Web Services and Middleware

503 As we can see, if two computers A and B want to (remotely) communicate with each other and perform any business operations, middleware has a big role to play in many ways. Let us examine the various aspects of middleware outlined here. n

n

n

The communication link and the protocols allow the basic communication between A and B. The physical communication between A and B can be done using wired networks (e.g., LAN or WAN), or it can be done wirelessly as well (e.g., cellular network or wireless LAN). However, what is important to understand is that here are two sets of protocols that we are talking of. The first is the lower layer communication protocol, which is responsible for the actual transmission of bits from A to B and vice versa. The other one, which allows the dialog between A and B, is the middleware protocol. The middleware protocol assumes the availability and reliability of the lower layer protocol. The programming interface and the common data formats specify how A and B can communicate with each other via the middleware. In other words, we are actually not worried about the communication of A and B directly in this sense, but we are worried about their communication with the middleware. The data formats used should enable the middleware to communicate between A and B in a uniform manner, and the programming interface should also be such that there are no gaps in communication. The other elements are add-ons. For example, the directory service would help A locate the various services available on B, and to make use of them as appropriate. Security would ensure that the communication between A and B is safe. Process control would ensure that B is able to handle multiple requests efficiently, providing reasonably good response time.

Such an architecture, where applications can utilize the services of each other without worrying about their internal details or physical characteristics help us create what is called as Service Oriented Architecture (SOA). One way to achieve this is to turn A and B into Web services (client and server, respectively). However, it is not always necessary that A and B must be Web services to participate in an SOA-based architecture.

Remote Procedure Calls (RPC) Procedure calls were a major feature of most standard programming languages, such as COBOL and C. If we need to access a service (e.g., read something from a file), we can make use of a procedure to get it done. If we need a similar service that is located on a different computer, we can make use of a Remote Procedure Call (RPC). The idea behind RPC is that the basic syntax and communication mechanism between the calling program and the called program should remain the same regardless of whether they are located on the same computer, or on different ones. The way this works is as follows. Imagine that X and Y are two programs on the same computer. X wants to call a procedure that is available in Y. If X and Y are local, X will include a header file provided by Y, which will contain the callable procedure declarations of Y (but not the actual logic). For example, this header file could tell X that Y provides a procedure/function called as add, which expects two integers and returns their sum also as an integer. X can then make use of this procedure. When program X is compiled and linked, it would sort out the call to this procedure, which is actually available in Y. Now suppose that X and Y are remote. Several challenges come up, the most remarkable one being that the computing environments of X and Y could now be completely different. For instance, program X could be running on the Windows operating system, using Intel U; whereas program Y could be running on a Linux server, using the Sun hardware architecture. Thus, the internal data representation, size of integer, etc., would all be different on these two computers. This needs to be carefully handled. In such cases, it is not sufficient to provide a header file of Y to X. Instead, we need to make what is called as an Interface Definition File (IDL).

Web Technologies

504 In of syntax, an IDL file is quite similar to that of a header file. However, it does much more than what a header file does. The IDL generates a stub on the client computer running program X, and a skeleton on the server computer running program Y. The purpose of the stub is to convert the parameters it needs to to the add procedure into raw bits in some universal format and send them over to the server. The skeleton needs to transform the universal format bits back into the format that program Y understands. The idea is illustrated in Fig. 14.2.

Fig. 14.2 Concept of the IDL file, stub, and skeleton The process of the stub and the skeleton performing transformations of procedure calls into bit strings for communication in a universal format and back is called as marshalling and unmarshalling respectively. These days, it is also called as serialization and deserialization. This is explained later when we discuss CORBA.

Object Request Brokers (ORB) Traditionally, remote applications communicated with each other using Remote Procedure Calls (RPC). That is, a client application would typically call a procedure, say, Get-data. It would not know that Get-data actually resides on a server that is accessible not locally, but remotely over a network. However, the client calls it as if the procedure were available locally. The RPC software on the client then ensures that the call reaches the server using the underlying network, and manages the interaction between the client, the server and Getdata. This is shown in Fig. 14.3.

Fig. 14.3 Remote Procedure Call (RPC) With the popularity of object-oriented systems increasing very rapidly over the last decade or so, the procedural way of calling procedures remotely has also changed. This procedure now also has an object-

Web Services and Middleware

505 oriented flavour. Technologies such as DCOM, CORBA, IIOP and RMI are conceptually very similar to RPC. However, the main difference between them and RPC is that whereas the former are object-oriented in nature, RPC, as we have noted, is procedural. This means that a logical DCOM/CORBA/IIOP/RMI pipe exists between two objects, rather than two procedures, which are physically apart on two different networks, and which interact via this pipe. These technologies fall in the category of Object Request Brokers (ORB). These formed the backbone of a part of modern distributed applications. We shall study some of these ORB technologies. However, let us take a look at the broad-level ORB call as shown in Fig. 14.4. Note that the calls are now between objects, rather than procedures, as was the case with RPC.

Fig. 14.4 Object Request Broker (ORB)

Component Transaction Monitors (CTM) A combination of distributed transactional objects (using object-oriented TP monitors such as Microsoft’s Microsoft Transaction Server or MTS and Sun’s Enterprise JaveBeans or EJB) and ORB (such as DCOM/ CORBA/RMI/IIOP) has made possible for a new concept to emerge: the Component Transaction Monitors (CTM). When the concept of distributed objects using ORB started to become popular, a number of systems based on the idea were developed. This allowed distributed business objects to communicate with each other via a communication backbone. In this architecture, the clients access services of remote servers using the ORB backbone, in an object-oriented fashion. That is, the underlying mechanism is an ORB such as DCOM, CORBA, RMI or IIOP. The transaction monitor is MTS or EJB. Thus, CTM = TP monitor + ORB. The idea here is that the application is made up of a number of components, which are distributed across one or more servers (hence the name distributed components or distributed objects). Client applications have the knowledge of how to invoke the services of these components (i.e., which function to call, what parameters to , etc.). However, they do not have to know how the components that they call, work internally. Also, the clients do not need to know on which servers these components are located. Locating and invoking the functions of a component is the responsibility of the underlying ORB. Also, all these components the concept of transactions. Therefore, we have the concept of CTMs. The basic services provided by a CTM are distributed objects, transaction management, and a server-side model that includes load balancing (which means that to handle a large number of clients, a number of server computers are set up with identical set of data and applications, and any one of them can serve the client, depending on how busy that server and other servers are, thus balancing the client load), security and resource management. Another important product that we must mention is BEA’s Tuxedo. BEA was founded in 1995, and its basic aim was to become a transaction company. It actually bought Tuxedo from Novell. In 1997, BEA bought

Web Technologies

506 Object Broker (a CORBA ORB) and MessageQ (a Message Oriented Middleware or MOM) from digital. In 1998, BEA acquired WebLogic, the premier application server vendor for EJB. One of the most important features of a CTM is dealing with connection management. As the number of databases that an application needs to interact with increases, the number of open connections that the application needs with all these databases also increases. This adds to the processing load as well as network management and configuration on the client. Suppose four clients want to connect to four databases. This means that a total of 4 × 4 = 16 connections are required. This is shown in Fig. 14.5.

Fig. 14.5 Situation in the absence of a transaction monitor such as MTS However, with the use of a CTM, the situation changes dramatically. Rather then every client needing to maintain a separate connection with each database, the procedure is as follows. A client maintains a single connection with the CTM. The CTM, in turn, maintains a single connection with each database. This enables the CTM to monitor and control all the traffic between clients and databases, ing the queries to the appropriate databases and returning the query results back to the appropriate clients, and also to maintain only as much database connections as are required currently. This reduces the network demands on both clients and databases, and results in a much better performance. This is shown in Fig. 14.6. Using this philosophy, the CTM involves creation of a transaction, its processing and finally, a commit or an abort, as in any transaction environment. Three points are important in this context: n

Begin Transaction Commit

n

Abort

n

- Starts a transaction - Commits the transaction (after the completion erase the memory where the before and after update copies/logs are maintained) - Aborts it

Web Services and Middleware

507

Fig. 14.6 Situation in the presence of a CTM The CTM can run on the same computer that hosts the Web server, or on a separate computer, called as the application server. Therefore, the architecture of Web applications now looks as shown in Fig. 14.7.

Fig. 14.7

Application server concept

Message Queuing Examples of middleware that we discussed so far were synchronous in nature. Synchronous communication requires that the computers of both the parties are up and running (connected). However, it is not so in the case of asynchronous communication. It allows the parties (in this case, say, software components) to communicate indirectly through a message queue. The software that manages these message queues is called as Message Oriented Middleware (MOM). The sender sends a message and continues with its other work without waiting for an acknowledgement. The message goes into a message queue and waits until the receiver is ready to receive and process it. Usually, the sender and the receiver, both have message queue software set up at their respective ends. The message queues, in turn, store outgoing messages into and read incoming messages from a local messages database. This is shown in Fig. 14.8.

Web Technologies

508 Here, the sender A sends a message for the recipient B. The message goes to the message queue server of A. This server has messaging software and a message database. The new message is added to the queue maintained in the database. The messaging software is responsible for depositing these messages in the database, scheduling them, retrieving them one by one when their turn comes, and then transporting them to the destination (B in this case). Here it is received by the messaging software of the message queue server at B. The software at B also stores it in the database, until B retrieves it. Thus, this operation is similar to the way an e-mail works.

Fig. 14.8 Message queues For example, suppose the sender sends an order for 5 items through a Web page on which he has entered various details about the items, as required. When this request reaches the server program that is quite busy, the connection between the client and server will have to be held, without doing anything, if it was a synchronous communication. This is clearly wasteful. Instead, the request could be logged into message queue software, such as WebSphere MQ (earlier called as IBM MQSeries) or Microsoft’s MSMQ, and when the program comes back into action, it can open its queue to see that a request has come, which it can now deal with. Based on these details, we now review several middleware technologies, which have been in use for quite some time. We must it that some of them are getting obsolete, but while they fade away, their concepts are key to understand the modern middleware approaches, and hence we have retained them.

14.1 CORBA The Common Object Request Broker Architecture (CORBA) is not a product. It is a specification. It specifies how components can interact with each other over a network. CORBA allows developers to define distributed component architectures without worrying about the underlying network communications and programming languages. The language-independence comes from the fact that the components in CORBA are declared in a language called as Interface Definition Language (IDL). Developers can write the actual internal code of the components in a programming language of their choice, such as Java, C++, or even COBOL. Let us understand this.

Web Services and Middleware

509 In order to achieve programming language independence, the object-oriented principle of interfaces is used. An interface is simply a set of functions (methods) that signify what that interface can do (i.e., the behaviour of the interface). It does not contain the implementation details (i.e., how it is done). Let us understand this with a simple example. When we buy an audio system, we do not worry about the internal things like the electronic components inside, the voltages and currents required to work with them, etc. Instead, the manufacturer provides a set of interfaces to us. We can press a button to eject a CD, change the volume or skip a track. Internally, that translates to various operations at the electronic component level. This set of internal operations is called as implementation. Thus, our life becomes easy since we do not have to worry about the internal details (implementation) of an operation. We need to only know how to use them (interface). This is shown in Fig. 14.9.

Fig. 14.9

Interface and Implementation

Using these principles, let us understand how CORBA helps an application keep the interface separate from its implementation, thus resulting into an e-commerce architecture that is not tied to a specific programming language. In the above figure, we have separated the implementation from the interface with the help of a thick line. In CORBA world, this thick line represents the glue between the interface and its implementation, and is thus the middleware.

14.1.1 Interface Basics At the heart of CORBA is the evolution of distributed components architecture, also called as distributed objects architecture. Although these seem scary, they are actually quite simple. This architecture extends the n-tier (also called multi-tier) client-server e-commerce applications to their logical conclusion. Whereas the n-tier client-server model strives to differentiate between the business logic and the data access, the distributed components architecture simply considers the application to be made up of more than one component, each of which can use the functionalities provided by other components in the same or even other systems. In fact, at times, this blurs the distinction between the client and server, because components can play either role. This makes the application extremely flexible. As we have discussed earlier, this is achieved by keeping the interface of a component distinct from its internal implementation. Most importantly, once a component’s interface (i.e., how it should be used or called, etc.) is published (that is, it is made available for use by other components), it must not be changed. However,

Web Technologies

510 its internal implementation can very well change. Let us extend the example of our audio system. Suppose the customer upgrades to a better wattage system in the same model category. The customer would naturally expect that there are interfaces (buttons) provided in the new model, similar to the previous one. If the new model, on the other hand, expects the customer to press the ‘Stop’ button three times to eject the disk, the customer would not be very happy. In fact, the customer may reject the product itself. Therefore, manufacturers design the interfaces with a lot of care. They ensure that the customer is usually given a set of buttons that he is familiar with, and is not changed frequently. Internally, the manufacturer is free to make any changes to the technology that does not affect the external interface. In the distributed components world, for example, if a developer creates a component that declares an interface, which accepts a book name for searching and returns the status (whether it is available or not), this interface must not ever change. Internally, however, the developer might make dramatic changes such as first using a relational database, and within that, may change the vendor (e.g., Sybase to Oracle, etc.) and then change it to an object-oriented database. As long as the interface still expects a book name to be searched, in a specific format and boundaries (e.g., maximum 30 characters, etc.), it does not matter what the developer does with its implementation! However, the same interface must not expect a book number instead of a book name now. The old s of the component would not be able to use the interface at all! Therefore, the interfaces in distributed components architecture must be designed and defined extremely carefully. Thus, we can summarize an interface as a protocol of communication between two separate components. The interfaces describe what services are provided by a component and what protocols are required to use those services. In general, we can think of an interface as a collection of functions defined by the component, along with the input and output parameters for each one of them. CORBA allows developers to create components, define their interfaces and then takes care of the interaction between them. It takes upon itself the job of locating components/interfaces and invoking them as and when necessary. The trouble is, there were many standards before CORBA emerged. Each one of them specified its own way of calling interface methods, specifying component architectures, etc. Therefore, there were again incompatibility issues. In standardizing all these standards, CORBA was a very crucial factor.

14.1.2 CORBA Architecture Let us first take a look at the typical architecture employed in an e-commerce application that employs CORBA, as shown in Fig. 14.10. This would help us understand the broad-level architecture. We shall then look into its specific details. As the figure shows, there are two sets of interaction involved here. (a) The usual interaction between a browser and the Web server is via the HTTP protocol. This is used by the browser to retrieve HTML pages from the server. (b) The second and new interaction is now directly happening between the browser and the application server using a protocol called as IIOP. We shall study this later. First, let us understand the flow of a typical CORBA application. 1. The client requests for a Web page (say, the home page) using HTTP, as shown in Fig. 14.11. Suppose this is our bookstore e-commerce application. 2. The Web server receives the request, processes it and as a result, the client receives the Web page using HTTP response. The browser interprets this page and displays the contents on the client’s screen, as shown in Fig. 14.12.

Web Services and Middleware

511

Fig. 14.10 CORBA architecture

Fig. 14.11

Fig. 14.12

HTTP request from the client

Server sends back HTTP response

Web Technologies

512 Note that the Web page received by the client contains not only HTML and scripting elements but also one or more Java applets. The home page shows the various options (such as, ‘Search’, ‘Purchase Books’, etc.) in the form of data entry boxes and buttons. (a) The can press any of these buttons. Internally, each button is associated with an applet. On the button’s press, that applet is invoked. For example, suppose the enters a book name and presses the button ‘Search Book’. (b) The applet associated with this button has to ultimately now call a function (called as method) on the application server that corresponds to this search functionality. Suppose the method is called as ‘Search’. However, the applet does not know where the Search method is located and how to reach there. Therefore, its job is restricted to hand over the request to its local Object Request Broker (ORB). At the heart of distributed components architecture is the ORB. ORB is the glue that holds the distributed objects together. ORB is actually a software application that makes communication between the various distributed objects possible. We shall study ORB in detail, later. The ORB is supposed to marshal the applet’s request further. For this reason, the applet now hands over the book name to be searched and the name of the method to be called (Search) to the ORB on the client, as shown in Fig. 14.13.

Fig. 14.13

Applet invokes ORB method

3. The ORB is responsible for locating the Search method on the application server and invoking it. It uses the principles of remote method calling (similar to the traditional Remote Procedure Call or RPC). An interesting feature here is that the Search method would have an implementation and an interface, as discussed before. When the applets are ed from the Web server to the client in step 3, these interfaces are also ed to the client. This is possible because the interfaces are available on the application server as well as the Web server. Therefore, when we say that the applet would hand over the request to the ORB, actually it calls the local interface of the Search method. The client-side ORB realizes this and that is why we say that the ORB takes control. The client-side ORB now es the book name and the Search method to the server-side ORB, as shown in Fig. 14.14. This communication takes place using the IIOP protocol.

Web Services and Middleware

513

Fig. 14.14

Client ORB calls server ORB

4. The server-side ORB calls the actual Search method on the server-side, ing it the book name entered by the , as shown in Fig. 14.15.

Fig. 14.15

Server ORB calls the appropriate method on the application server

Web Technologies

514 5. The Search method on the application server thus gets invoked. It now performs the server-side logic (such as performing a search on the appropriate databases with the help of the database server, etc.) and gives the result to the server-side ORB, as shown in Fig. 14.16.

Fig. 14.16

The method performs appropriate tasks and returns results to server ORB

6. The server-side ORB es the return values and any other results back to the client side ORB, again using IIOP protocol, as shown in Fig. 14.17. 7. The client-side ORB gives the returned data back to the interface of the Search method on the clientside. The client-side interface hands the return values to the original applet, as shown in Fig. 14.18. Note that from the applet’s point of view, a method called as ‘Search’ was invoked. It does not know anything beyond that. There are many facts involved here, as follows. The client-side ORB called the serverside ORB, the server-side ORB called the implementation of the Search method on the application server, and the Search method processed that request and returned the results back to the server-side ORB, which, in turn, ed it on to the client-side ORB and then the same became the return values of the client-side Search method. This is hidden from the applet. The applet believes that a local Search method was invoked.

Web Services and Middleware

515

Fig. 14.17 Server ORB returns results back to client ORB using IIOP These are various players in the CORBA architecture as given below. n n n

Object Request Broker (ORB) Interface Definition Language (IDL) Internet Inter-ORB Protocol (IIOP)

We will now study these, one by one, in detail.

14.1.3 The Object Request Broker (ORB) We have mentioned the term Object Request Broker (ORB) many times. It is at the heart of distributed components architecture. As we mentioned, ORB is the glue that holds the distributed objects together. ORB is actually a software program that runs on the client as well as the application server, where most of the components reside. It is responsible for making the communication between various distributed objects possible. ORB performs two major tasks.

Web Technologies

516

Fig. 14.18 Client ORB returns the results to the applet (a) The ORB locates a remote component, given its reference. We would be interested in invoking one of the methods of this component. (b) The ORB also makes sure that the called method receives the parameters over the network correctly. This process is called as marshaling. In the reverse manner, the ORB also makes sure that the calling component receives back return values, if any, from the called method. This process is called as unmarshaling. Marshaling and un-marshaling actually carry out the conversions between the application data formats and the network data formats (which mean, the binary values agreed by the protocol). This process is shown in Fig. 14.19. As we have noted, the client is not aware of these operations. Once a reference to a method of interest is obtained, the client believes that these operations were performed locally. Internally, ORB handles all the issues. It is for this reason that some portion of ORB is resident at all the clients and servers participating in this method. When a client wants to execute a method of a component residing somewhere else, it only requests its local ORB portion (called ORB client). The ORB client makes connection to the appropriate server ORB, which in turn locates the component, executes that method and sends back the results to the client ORB. The client, however, believes that it was all a local operation. In the following figure, component A wants to use the services of a method Insert.

Web Services and Middleware

517 1. Component A calls the Insert method and es two parameters, namely a and b, to this method. We are not interested in knowing what a and b are, or what they contain in the current context.

Fig. 14.19 Component calls an Insert method 2. The ORB picks up this request and realizes that the Insert method is defined in component B. How does it know this? For this to be possible, whenever a CORBA component is created by a developer, it is ed with the local operating system. Therefore, in the second step, it es the method name and the parameters in binary form (i.e., marshals it) across the network to the ORB at the node where component B is located. Obviously, this communication presumes the underlying (inter)networking protocols such as IIOP. This is shown in Fig. 14.20.

Fig. 14.20

ORB forwards the call to its counterpart

3. Now, the ORB at component B’s end picks up this request and converts the binary data back into its original form (i.e., un-marshals the request). It realizes that it needs to call the Insert method of component B with parameters as a and b. Therefore, it calls the Insert method, ing it the appropriate parameters. The Insert method is executed and its return value is returned to the ORB running on the same machine where the called component (B) resides. This is shown in Fig. 14.21.

Fig. 14.21 Actual Insert method gets called 4. Finally, the ORB at component B’s end takes this return value, converts it into binary data and sends it across the network back to the ORB at component A’s end, as shown in Fig. 14.22.

Web Technologies

518

Fig. 14.22

Called ORB returns results to calling ORB

5. The ORB at component A’s end picks up this binary data, converts it back into the original values and gives it to component A, as shown in Fig. 14.23.

Fig. 14.23

Calling ORB returns results to the original component

VisiBroker is an example of ORB. It is a software product that is written entirely in Java and performs the job of the client-side as well as server-side ORB.

14.1.4 Interface Definition Language (IDL) We have mentioned IDL before. IDL specifies the interfaces between the various CORBA components. IDL ensures that CORBA components can interact with each other without regard to the programming language used. Let us understand what this means. When a component is interested in invoking a method of another component, the calling component must know about the interface of the called component. For instance, in our earlier example, component A knew that there is a method called as Insert, which expects two parameters in a specific sequence (It would also know their data types. However, we have not shown this detail). As we know, IDL is used to describe the interfaces of CORBA components. Thus, no matter which programming language the component is actually written in, it has to expose its interface through IDL. From the outside world’s perspective, it is IDL interface that is seen. Internally, the component may be implemented in any language. Thus, a CORBA component can expect every other CORBA component to expose its interface using IDL. Let us take an example of an interface defined in IDL. The example shows an interface called as StockServer that provides stock market information. The StockServer interface is expected to run at an application server, therefore, the naming conventions identify it as a server method to make it more readable. The interface has two methods. (a) The getStockPrice method returns the current stock price of a particular stock, based on the stock symbol it receives. It expects one input parameter as a string. It has no intention of changing this parameter (which it calls as symbol), and hence, it is pre-fixed with the word ‘in’ (which means it is input only). This method would return a floating-point value to the caller.

Web Services and Middleware

519 (b) The getStockSymbolList method returns a list of all the stock symbols present in this stock exchange and does not expect any input parameters (as indicated by empty brackets after the method name). The return type is sequence <string>, which means a list of string values. A portion of the IDL definition for this interface is as follows. Interface StockServer { float getStockPrice (in string symbol); sequence <string> getStockSymbolList (); };

The actual code for this interface and the two methods that it contains can be written in any programming language of the developer’s choice—Java, C++ or any other. Any component that calls the StockServer interface would not bother about its implementation, and therefore, its programming language. As a result, the caller’s implementation can be in say Java, whereas StockServer could be implemented in C++. This is perfectly fine, since both would have their respective external interfaces defined in IDL, which does not depend on either Java or C++.

14.1.5 Internet Inter-ORB Protocol (IIOP) As we saw in earlier diagrams, CORBA ORBs communicate with each other using the Internet Inter-ORB Protocol (IIOP). The early versions of CORBA were concerned only with creation of portable componentbased applications, i.e., a component created on machine A could be ported to machine B and executed there. However, if the component was to remain on machine B where it is desired to be executed remotely from machine A, there was no standard way of communication between these various nodes. HTTP obviously was not useful here. You needed a different protocol. Thus, the actual implementation of ORBs that facilitates the communication between these components was not a part of the standards in those days. The result was that although components were portable, they were not inter-operable. That is, there was no standard mechanism for components to interact with each other. For instance, if a calling component A wanted to call a method named as sum belonging to another component B residing on a different computer, there was no guarantee that this would be possible. This was because there was no standard mechanism for a component to remotely call a method of other component, if the two components were not on the same physical computer. This could lead to problems such as some protocols ed the parameters from left to right, others from right to left; some considered the sign bit as the first bit, others interpreted the sign bit as the last bit, and so on. Thus, remote distributed component-based computing was not standardized. There were some vendors who provided this feature with proprietary solutions. Consequently, the solution provided by one vendor was not compatible with that provided by another vendor. Therefore, even if distributing components and then facilitating communication between them was possible with some proprietary solutions, this would not be compatible with another set of components that used a different vendor’s solution. In summary, only if the calling and the called components resided on the same machine, then an interaction between them was absolutely guaranteed. Therefore, the next version of CORBA standard came up with IIOP. IIOP is basically an additional layer to the underlying T/IP communication protocol. An ORB uses this additional CORBA messaging layer for communicating with other ORBs. Thus, every ORB must now provide for IIOP stack, just as every browser and Web server on the Internet must provide for a HTTP stack.

Web Technologies

520 Since HTTP and IIOP both use the Internet infrastructure (i.e., T/IP) as the backbone, they can co-exist on the same network. This means that the interaction between a client and the server can be through HTTP as well as through IIOP. Of course, HTTP would be primarily used for ing Web pages, applets and images from the Web server, whereas IIOP would be used for the component-level communication between CORBA clients (usually applets) and CORBA servers (usually component-based applications running on an application server). This situation is depicted in the Fig. 14.24 in a summarized fashion.

Fig. 14.24 Use of IIOP for OR-to-ORB communication The figure shows what we have studied so far, albeit in a slightly different fashion. The main aim is to understand that HTTP and IIOP can co-exist. We will realize that in steps 1 and 2, there is an interaction between the browser and Web server by using HTTP for requesting and obtaining HTML pages and Java applets. In step 3, the client invokes the Java applet, which in turn, invokes the services of one or more business components on the application server using the CORBA ORB. The notable fact here is that it uses IIOP and not HTTP. The business components are shown to interact with databases and Transaction Processing monitors (TP monitors) for server-side processing. A pertinent question is, why is IIOP required? Can the Java applets and business components not use HTTP for their communication? The answer is that HTTP was specifically devised for HTML transport. Also, for this reason, it is stateless. That is why IIOP was devised, which is a stateful protocol. This means that the session between a Java applet on the client and the business components on the application server is maintained automatically by the application until one of them decides to terminate it (similar to what happens in a client-server application).

Web Services and Middleware

521

14.2 JAVA REMOTE METHOD INVOCATION (RMI) The Java programming language has in-built for the distributed component-based computing model. This is provided by the Remote Method Invocation (RMI) service. RMI is an alternative technology for CORBA, although functionally, it is pretty much similar to CORBA It is the Java standard for distributed components. RMI allows components written in Java to communicate with other Java components that reside on different physical machines. For this purpose, RMI provides a set of application programming interface (API) calls, as well as the basic infrastructure, similar to what CORBA’s ORB provides. Just as CORBA uses IIOP as the protocol for communication between ORBs across a network, the early versions of RMI used the Java Remote Method Protocol (JRMP), which performed the same task as IIOP. However, the latest versions of Java now IIOP for RMI as well. It means that RMI can now have IIOP as the underlying protocol for communication between distributed components across a network. RMI and CORBA are remarkably similar in of concepts. RMI has two new types of components: stubs and skeletons. A stub is a client-side component, which is a representation of the actual component on the server and executes on the client machine. On the other hand, a skeleton is a server component. It has to deal with the fact that it is a remote component. That is, it has to take care of responding to other components that request for its services. This is the same interface-implementation concept. The beauty of this scheme is that a developer does not have to write the stub and skeleton interfaces. The Java environment generates it once the basic class is ready. For instance, suppose there is a Search class written in Java that allows the to search for a specific record from a database. Then, the developer has to simply write the Search class. A special compiler can generate the stub and skeleton interfaces for this class. RMI is essentially the Java version of Remote Procedure Calls (RPCs). The basic infrastructure of an RMI-based system looks pretty similar to an RPC-based system, as shown in the Fig. 14.25.

Fig. 14.25 RMI architecture The RMI philosophy is very similar to that of CORBA. Components call upon the RMI services first. The RMI services then use the JRMP/IIOP protocols for client-server communications. Whenever any component wants to use the services of another component, the caller must obtain a reference to the component to be used. Let us understand how this works, with an example. You can skip this portion if you are not very keen about the RMI syntax. Conceptually, this would be very similar to our earlier discussion about the CORBA model.

Web Technologies

522 Suppose a client component wishes to invoke a server-side component called as SearchBookServer that allows a book to be searched. This would require the following steps. 1. The client has to create a variable of type SearchBookServer. This is similar to declaring an integer variable, when we want to use that integer in some calculations. In simple , this means that the client component has to declare a variable that can act as a reference to a component of type SearchBookServer. This would be done with this statement: SearchBookServer ref = null;

By setting the variable to null, we are declaring that the variable is not pointing to any object in memory. As a result, at the client side, a variable called as ref is created. However, at this moment, it is not serving any purpose. We have told the Java system that it can, in future, point to an object of type SearchBookServer (which is a class on the remote server). Recall that we would have an interface of the SearchBookServer class on the client. Therefore, the compiler would have no problems in understanding that SearchBookServer is a class on the server, whose interface is available on the client as well. 2. RMI s certain naming services that allow a client to query about a remote component. It is like a telephone directory service. Just as we can request for a person’s telephone number based on the name and address, here, the naming service accepts the component’s name along with its full URL and returns a reference to it. For this, the Naming.lookup method needs to be used. This method accepts the URL name and returns a reference to the object being remotely referred to. This is done by using the following statement: ref = Naming.lookup (“rmi://www.myserver.com/SearchBookServer”);

We have avoided some more syntactical jargon, which is unnecessary while understanding the basic concepts. 3. Having obtained a reference to the remote component, we can now call a remote method of that component. Suppose the component s a method called as getAuthor, which expects a book title as the input and returns the author name to the caller. Then, we can invoke this method as shown: uAuthor = ref.getAuthor (“Freud for beginners”);

This method would accept the string ed as book title, call the getAuthor method of the remote component and return the author’s name. This returned value would be stored in the variable uAuthor. This can be sent back to the caller’s computer using JRMP or IIOP, which, in turn, uses T/IP as a basic method of transport. From the above discussion, it would become clear that the RMI infrastructure is extremely similar to CORBA. In fact, an e-commerce architecture based on RMI would look extremely similar to the one we have seen using CORBA. The client would be a Web browser that s Java (which means that it has the Java Virtual Machine or JVM in-built). There would be two servers: a Web server and an application server. The interaction between the browser and the Web server would continue to be based on HTTP. This would result into the ing of HTML pages and applets from the Web server to the browser. Once the applets are ed to the browser, the applets would take charge and invoke the remote methods of the server-side components using RMI. As we have seen in the example, the client applet can invoke the remote methods without worrying about their implementation details. All they need to do is to obtain a reference of the remote object and then they can invoke any methods belonging to that object as if it were a local method.

Web Services and Middleware

523 This is shown in Fig. 14.26.

Fig. 14.26 RMI architecture in detail The obvious question now is, when CORBA already exists, why is RMI required at all? The reasons for this are as follows. 1. CORBA is a standard. Developers using Java or any other language can implement it. However, RMI is a part of the Java programming language itself. Thus, RMI is tightly integrated with Java only. 2. The goal of the Java creators was to have a full-fledged programming language that is platform independent. It means that they wanted to maximum functionality that is required for all sorts of applications. Since remote method calls is an important issue these days (thanks to the Internet), RMI was perceived as a necessity. In practice, RMI is used more often than CORBA. The most popular CORBA implementations in of product offerings are ObjectBroker from Digtial, SOM from IBM, NEO and JOE from Sun, ORB Plus from HP, PowerBroker from Experspft, Orbix from Iona and BlackWidow from PostModern.

14.3 MICROSOFT’S DISTRIBUTED COMPONENT OBJECT MODEL (DCOM) Microsoft’s version of the distributed component-based computing solutions is the Distributed Component Object Model (DCOM). No wonder that it is extremely similar to CORBA and RMI. DCOM is popularly known as COM with a long wire. The Component Object Model (COM) is now the basis for most of Microsoft products such as its Windows operating system, Active Server Pages (ASP) and even its other successful products such as Word and Excel. The COM specification is based on the object-oriented principle of keeping an object’s interface separate from its implementation. A component in COM is the same concept as any other

Web Technologies

524 component in C++ or Java—it is a set of methods that perform some useful tasks. For example, the SearchBookServer component in our example can very well be a COM component that searches for an author’s name, based on its title. Thus, two or more COM components can interact with each other (similar to distributed CORBA components or RMI components) over the network, making it distributed COM, or DCOM. Most concepts in DCOM are so similar to CORBA and RMI that we need not even discuss them. However, let us pinpoint the differences. 1. Java applets are the clients in case of CORBA or RMI. However, in case of DCOM, ActiveX controls are usually the clients. ActiveX controls are extremely similar to Java applets. They are hosted by a Web server and get ed to the browser on demand, and then are executed on the browser. However, there are two major differences between a Java applet and an ActiveX control. (a) A Java applet is designed, keeping in mind the security issues. This means that a Java applet cannot write to the local disk of the browser, for example. However, ActiveX controls do not have any such restrictions. Therefore, they can be risky, but then they can provide a richer functionality. (b) By virtue of the Java heritage, applets are platform independent. This means that any browser that has the JVM setup in it can execute an applet. However, ActiveX controls are executable components that are meant for Windows platform only, and therefore, they can run only on Internet Explorer browser. Therefore, they are tied up to the Microsoft platform. 2. The client-side infrastructure in case of DCOM is called as proxy and the server-side infrastructure is called as stub. Since RMI prefers to call the client-side setup as stub, there can be confusion when referring to a stub without context. 3. In DCOM, when a component wants to invoke a method of another component that is remotely located, the caller has to obtain what is called as a CLSID of the component to be called. A CLSID is nothing but a class identifier that identifies any DCOM component uniquely all over the world. This is achieved by having a CLSID of 128 bits that includes the date and detailed time values when the component was created. Every COM component on a particular computer is known to its operating system because every such component is ed with the operating system. The operating system records all local COM components into its Windows registry, which is like a repository of all the components. A registry is another Windows operating system concept wherein all the local information (such as program parameters, initialization routines, application settings, etc.) for every application on that computer is stored. For instance, when you instal a new printer in Windows NT, the registry on that computer is updated to have the details of this printer. When you want to print a document, Windows NT obtains the printer information from the registry. In a similar fashion, all COM components are also recorded in the registry. Thus, when a component sends a request for another component that is stored on the local computer (using its CLSID), the operating system consults the registry and invokes an instance of the component that was called. Because COM-DCOM uses registry so heavily, and because registry itself is a Windows operating system concept, COM-DCOM are restricted to Windows family of operating systems, and are not easily portable to other environments. Keeping these differences in mind, let us draw a DCOM infrastructure, which is essentially very similar to the CORBA or RMI application, as shown in Fig. 14.27.

Web Services and Middleware

525

Fig. 14.27 DCOM architecture As the figure shows, the DCOM architecture is extremely similar to CORBA and RMI. DCOM, however, uses its own protocol (called as DCOM network protocol) instead of either IIOP or JRMP/RMI. This protocol is again a higher-level protocol that sits on top of T/IP. To summarize, let us take an example of a client wanting to search the details of a book, given its name, using the DCOM architecture. Since this is exactly like a CORBA or IIOP interaction, we shall not discuss it in great detail, and instead focus our attention more on the specific features of DCOM. 1. The initial HTTP request-response interaction between the browser and the Web server brings a HTML document and an ActiveX control from the Web server to the browser. 2. The browser now invokes the ActiveX control on the client. 3. The ActiveX control calls the method search, belonging to an object book. However, it does not know where the book object and its search method reside. Therefore, it es this request to the local proxy of the book object in the form: search (“The Wall Street”); 4. The proxy of the book object consults its own registry to see where the stub (i.e., the actual code) of the book object resides. It realizes that it is on the application server. 5. The proxy, therefore, es the search method call on to the stub using the DCOM protocol (which, underneath, uses T/IP). 6. The stub invokes the actual search method call on the application server. 7. The search method call performs the logic for searching the specified book. In the process, it might interact with one or more databases, database servers, and transaction processing monitors. When the processing is complete, it gives the results to the stub. 8. The stub then sends the results back to the proxy using the DCOM protocol. 9. The proxy returns the results back to the ActiveX control, which had invoked the search method in the first place. 10. The ActiveX control displays the results on the screen.

Web Technologies

526 As you can see, this is remarkably similar to the CORBA or RMI model of distributed component interactions.

14.4 WEB SERVICES 14.4.1 Web Services Concept The term Web Service has created a great deal of aura around itself in the last few years. As computer technology constantly strives to find newer ways of doing old functions, as well as doing new functions itself, Web Services were easily one of the fancy to catch on. Web Services have been called the next wave of computing. Going hand in hand with the other buzzword of Service Oriented Architecture (SOA), Web Services have become the subject of choice when someone wants to throw jargons at others! What is a Web Service, after all? Several definitions exist. While most of them are variations of the other, perhaps the simplest of them is this: A Web Service is software system designed to hardware-and-software-independent computer-to-computer interaction over a network. The concept is illustrated in Fig. 14.28. The server is hosting a number of Web Services, of which Web Service 1 is being called by a particular client computer.

Fig. 14.28

Web Services concept

In this context, several are important here. 1. Web Services are independent of the hardware and software. l This means that Web Services are expected to execute on any hardware (i.e., U, architecture) and any software (i.e., operating system, programming environment). l This has tremendous amount of implications. This means that a Web Service can allow communication between (a) a Java program running on Windows operating system using Intel U and (b) a C# program running on UNIX operating system using Sun hardware.

Web Services and Middleware

527 l l

While this was not impossible earlier, it was certainly tedious to achieve, as explained subsequently. In the above diagram, for instance, technically the client could be a Java program, sending a request to an ASP.NET page (a C# program), to perform some task. This makes the whole architecture very flexible and loosely-coupled.

2. A Web Service is a computer-to-computer interaction. l What this actually means is that Web Services are meant for program-to-program interaction. l In other words, Web Services are not intended for human-to-human, or human-to-computer interactions. l However, Web Services can be the “end point” of other types of interaction, e.g., of a human-tocomputer interaction. As an example, if a person sends her credit card details for making an online payment, the card validation could be performed by a Web Service. But the invoking of this Card validation Web Service would normally be done by another program, and not by a human being. 3. A Web Service runs over a network. l Web Services run over networks. This means that although strictly not always necessary, they are usually distributed in nature.

14.4.2 How are Web Services Different? But this brings up another critical question, which we have partly answered. If all of this is what is collectively called as a Web Service, how are Web Services different from the following? (a) Traditional server-side Web technologies, such as Java Servlets/JSP, Microsoft’s ASP.NET, the open source PHP, or Struts (b) Distributed computing technologies such as CORBA, DCOM, and RMI Let us understand this clearly. (a) Web Services are technology-independent, and their aim is not to render HTML pages to the . This is best done by existing Web Technologies such as Java Servlets/JSP, ASP.NET, PHP, etc. In other words, the whole reason why Web Services were invented was never to replace these server-side Web Technologies, but was to augment these technologies in several ways. The presentation of all sorts of information to the would continue to be done by these technologies, and not by Web Services. However, and this may be confusing, a Web Service can be implemented by a Java Servlet or by an ASP.NET page. But in such cases, the aim of these “Web Services enabled” Servlets or ASP.NET pages changes from serving HTML content to the , to provide some business services to the calling application (i.e., the client). (b) Web Services are different from earlier distributed object communication systems, such as CORBA, DCOM, and RMI (which were proprietary, object-based, and binary in nature). We know that the earliest way of communication between remote computers was by using the Remote Procedure Call (RPC), which allowed one computer to send a request for performing some task to another. These computers would communicate over a network. The term procedure indicates that the calls were made

Web Technologies

528 to a procedure remotely. A procedure, in this case, used to be a function in C language, almost all the time. This became very popular with C/UNIX first, followed by C++/UNIX. Then it also spread to other technologies. In RPC, a client procedure (read function) would call a server procedure (read function) remotely. As Object-oriented systems became popular, procedural programming had its days numbered. RPC soon gave way to DCOM, CORBA, and RMI. All these technologies became popular in their respective domains, and have enjoyed widespread success. However, they allow binary communication between objects. On the other hand, Web Services allows a client and a server to communicate with each other by exchanging a human-readable XML document. This makes the conversation between the client and the server much more human-friendly (although it may not be that machine-friendly). This is shown in Fig. 14.29. As we can see, the client has sent an XML message to the server, asking it to validate a credit card. This message is largely human-understandable. The server’s response is also an XML message, informing that the credit card is valid.

Fig. 14.29 Web Service message exchange Using Web Services, clients and servers exchange such readable XML messages. As we can see, they are enclosed inside a special tag called as Envelope. This is just a Web Services convention, and nothing more. A more detailed version of the request and response messages can be shown in Fig. 14.30, for more clarity. Request message

<Envelope> 1234567.. Visa <Expires>02-10

Response message

<Envelope> <Status>Valid X0100a267-990

Fig. 14.30 Envelopes carry Web Services messages

Web Services and Middleware

529

14.4.3 The Buzzwords in Web Services There are so many buzzwords and jargons in Web Services that one article of this nature would not be able to sufficiently cover them. However, we quickly mention some of the most important ones in Fig. 14.31. n

n

n

Web Services expose useful functionality to Web s through a standard Web protocol. In most cases, the protocol used is Simple Object Access Protocol (SOAP). In our example, the XML messages that we saw are encapsulated inside this SOAP protocol header. In other words, clients and servers exchange SOAP messages, which are internally XML messages. Web services provide a way to describe their interfaces in enough detail to allow a to build a client application to talk to them. This description is usually provided in an XML document called a Web Services Description Language (WSDL) document. For example, when the provider of the credit card validation Web Service wants to make the Web Service available for others, it describes what the Web Service can do, and what data (e.g., parameters) it needs to perform its task, using a file called as WSDL file, which it can publish. Web services are ed so that potential s can find them easily. This is done with Universal Discovery Description and Integration (UDDI). In our case, the provider of the credit card validation Web Service would and its Web Service on the UDDI directory, where it would also keep the WSDL file; so that everyone knows what the Web Service can or cannot do.

Fig. 14.31 Web Services jargon Web Services can be created and deployed by using a number of technologies. As ever, two implementations are most popular: the Java Web Services and the Microsoft Web Services. The Java Web Services implementations are from Sun itself, or from many vendors, most popularly from Apache (using a product called as Apache Axis). Microsoft’s .NET platform incorporates Web Services in such a manner that an ASP.NET page can be converted into a Web Service in a matter of minutes.

14.5 WEB SERVICES USING APACHE AXIS—A CASE STUDY Apache Axis (http://ws.apache.org/axis) is an Open Source implementation of Web Services. In simple , Apache Axis allows a developer to develop Web Services (on the server side), which can be accessed by the clients remotely. Apache Axis is available both in Java and C++ flavours, but the former is clearly more popular. Apache Axis allows the client to send a request to a Web Service. The request is nothing more than an XML message. The XML message itself is wrapped inside another envelope. This outer envelope is called as a Simple Object Access Protocol (SOAP) message format. SOAP is a Web Services standard for message exchange between clients and servers. In return, the server processes the SOAP request and sends back a SOAP response back to the client. The SOAP response contains the results of the processing done by the server.

14.5.1 Installing and Configuring Apache Axis It is surprisingly easy to get Apache Axis up and running. One needs no great training for doing this, provided one is familiar with the basics of the Tomcat Servlet container. Following are the steps for installing Apache Axis. 1. and instal Tomcat; from http://jakarta.apache.org/tomcat, unless it is already installed. 2. Axis 1.3 from http://ws.apache.org/axis. 3. Unzip it. (Note: Axis runs as a Servlet inside Tomcat).

Web Technologies

530 4. From the above structure, copy the webapps\axis tree to the webapps directory of Tomcat 5.x. 5. Start Tomcat and give the URL in the browser as http://localhost:8080/axis. You should see a screen similar to the one in Fig. 14.32.

Fig. 14.32

Apache Axis—Home page

6. If the above screen is visible, it indicates that Apache Axis has been installed successfully. If this screen does not appear, revisit the steps mentioned earlier and see what has gone wrong. We may also want to consult http://ws.apache.org/axis/java/install.html. 7. Click on the ‘Validation’ link. We should see the screen shown in Fig. 14.33. If it does not appear, then one of the likely causes is that the activation.jar file is missing. 8. In the c:\tomcat\webapps directory, there is a WEB-INF sub-directory, which contains the basic configuration information and sub-directories that contain, amongst other things, the Java libraries (in lib and the Web service classes to be deployed (in classes). At this stage, some environment variables need to be set up to specify things like the Axis home directory, the library directory and the Java class path needed for the compilation of this example. set CATALINA_HOME c:\tomcat set AXIS_HOME $CATALINA_HOME/webapps/axis

Web Services and Middleware

531 set AXIS_LIB $AXIS_HOME/WEB-INF/lib set AXISCLASSPATH $AXIS_LIB/axis.jar:$AXIS_LIB/commons-discovery.jar: $AXIS_LIB/commons-logging.jar:$AXIS_LIB/jaxrpc.jar:$AXIS_LIB/saaj.jar: $AXIS_LIB/log4j-1.2.8.jar:$AXIS_LIB/xml-apis.jar: $AXIS_LIB/xercesImpl.jar set PATH $CATALINA_HOME/bin set CLASSPATH ./:$AXISCLASSPATH

Fig. 14.33

Axis success page

14.6 A WEB SERVICE WRITTEN IN APACHE AXIS USING JAVA The CalculatorService will perform the simple operations like add, subtract, multiply and divide. The code for our simple Web service is shown in Fig. 14.34. Now, we need to compile this code and place it in the correct place. The compiled version (i.e., CalculatorService.java ) should be copied into the folder c:\tomcat\webapps\axis\WebINF\classes.

Web Technologies

532 /** * CalculatorService.java */ public class CalculatorService { public Object getCalculate(Object opr1, Object opr2,Object opr ) { System.out.println (“Inside CalculatorService”); Object result=null; if (opr.toString().equals(“+”)) result = new Integer (((Integer) opr1).intValue()+ ((Integer) opr2).intValue()); else if (opr.toString().equals(“-”)) result = new Integer (((Integer) opr1).intValue ()-((Integer) opr2).intValue()); else if(opr.toString().equals(“*”)) result = new Integer (((Integer) opr1).intValue()*((Integer) opr2).intValue()); else if(opr.toString().equals(“/”)) result = new Integer (((Integer) opr1).intValue()/((Integer) opr2).intValue()); System.out.println(“Completed CalculatorService”); return result; } }

Fig. 14.34 CalculatorService code

14.7 CONFIGURING A WEB SERVICE USING AXIS To deploy the Web Service, we need to configure an Axis Web Service Deployment Descriptor (WSDD) file to describe the service we wish to deploy, as shown in Fig. 14.35. <deployment xmlns=”http://xml.apache.org/axis/wsdd/” xmlns:java=”http://xml.apache.org/axis/wsdd/providers/java”> <service name=” CalculatorService” provider=”java:RPC”> <parameter name=”className” value=”CalculatorService “/> <parameter name=”allowedMethods” value=”*”/>

Fig. 14.35

Deployment descriptor for the Web Service

Let us name this file as deploy.wsdd and understand the contents of this file.

Deployment element This element provides the namespace details. There is nothing specific about this element. It needs to be specified exactly as shown.

Service element The service element has a name attribute, which signifies the name of our Web Service. The provider attribute with value as java: RPC indicates that the underlying communication between the client and the server would happen by using the Remote Procedure Call (RPC) mechanism.

Web Services and Middleware

533

Parameter sub-element The parameter sub-element specifies the class (i.e., CalculatorService) that the Web Service needs to load and the methods in that class to be called. Here, we specify that any public method on our class may be called (by indicating an asterisk there). Alternatively, the accessible methods could be restricted by using a space or comma separated list of available method names.

14.8 DEPLOYING A WEB SERVICE USING AXIS The WSDD description created above is ed to the Axis Client, which compiles a Web service based on these parameters and deploys the Web service in the appropriate place, as follows. java org.apache.axis.client.Client deploy.wsdd

It should show the screen shown in Fig. 14.36.

Fig. 14.36

Deploying a Web Service

To ensure that the Web service has been installed correctly, we can check the Web services deployed within the Axis environment using a Web browser and navigating to the View the list of deployed Web services option on the Axis configuration page. This should display something similar to the screen shown in Fig. 14.37, if the Web Service has been deployed correctly.

Fig. 14.37 Confirmation of deployment of the Web Service

Web Technologies

534

14.9

TESTING THE WEB SERVICE

Once the Web service is deployed, it can be invoked by using the Axis toolkit from a Java program or web client. For our example, the following code shown in Fig. 14.38 provides the client-side implementation. /** * CalculatorClient.java */ import java.io.IOException; import java.io.PrintWriter; import javax.servlet.ServletConfig; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import javax.xml.namespace.QName; import org.apache.axis.client.Call; import org.apache.axis.client.Service; public class CalculatorClient extends HttpServlet { public void init(ServletConfig config) throws ServletException { super.init(config); System.out.println(“CalculatorClient Initialized.”) ; } protected void doGet(HttpServletRequest request ,HttpServletResponse response)throws ServletException, IOException { processRequest(request, response); } protected void doPost(HttpServletRequest request ,HttpServletResponse response)throws ServletException, IOException { processRequest(request, response); } protected void processRequest (HttpServletRequest request ,HttpServletResponse response)throws ServletException, IOException { System.out.println(“Processing Request.”); response.setContentType(“text/html”); PrintWriter out = response.getWriter (); String firstOper = request.getParameter(“oper1”); String secondOper = request.getParameter(“oper2”); String operator = request.getParameter(“Oper”); String result = null; out.println(“”); out.println(“ Axis 63u2a ”); out.println(“”); out.println(“
”); out.println(“





”); (Contd) Web Services and Middleware 535 Fig. 14.38 contd... out.println(“ ”); out.println(«
Webservice-Axis Demo
»); out.println(“”); out.println(“”); out.println(“”); if(firstOper != null && secondOper != null && operator != null) { try { result = callAxisWebService(firstOper,secondOper,operator); } catch (Exception e) { e.printStackTrace(); System.out.println(“Exception in processing request.”); } out.println(“”); out.println(“ Axis 63u2a ”); out.println(“”); out.println(“
”); out.println(“
”); out.println(“ ”); out.println(“ ”); out.println(“ ”); out.println(“ ”); out.println(“ ”); out.println(“ ”); out.println(“
Response from Calculate Webservice
”); out.println(“
Operand 1: ”+firstOper+”
Operand 2: ”+secondOper+”
Operator: ”+operator+”
Result: ”+result+”
”); out.println(“
”); out.println(“”); out.println(“”); } out.println(“”); out.println(“ Axis 63u2a ”); out.println(“”); out.println(“
”); out.println(“
”); out.println(“
”); out.println(“





”); out.println(“ ”); out.println(“ ”); (Contd) Web Technologies 536 Fig. 14.38 contd... out.println(“ ”); out.println(“ ”); out.println(“ ”); out.println(“
WebService-Axis Client
”); out.println(“
Enter Operand 1:
Enter Operand 2:
Select operation: <SELECT NAME=’Oper’>”); out.println(“
”); out.println(“
”); out.println(“”); out.println(“”); if ( out != null ) out.close(); System.out.println(“Process Request Completed.”); } private String callAxisWebService(String firstOper, String secondOper, String operator)throws Exception { Object ret = null; String endpointURL = “http://localhost:8080/axis/services/CalculatorService”; try { Integer first = new Integer(firstOper); Integer second = new Integer(secondOper); String oper = new String(operator); Service service = new Service(); Call call = (Call) service.createCall(); call.setTargetEndpointAddress(new java.net.URL(endpointURL)); call.setOperationName(new QName(“CalculatorService”, “getCalculate”)); ret = call.invoke( new Object[] { first, second, oper } ); System.out.println(“Object = “ + ret.getClass().getName()); System.out.println(“Number Returned : “ + ret.toString()); } catch(Exception e) { e.printStackTrace(); } return ret.toString(); } }

Fig. 14.38

Web Service code

Here, first, we create new Axis Service and Call objects, which store metadata about the service to invoke. We set the endpoint address URL to specify the actual location of the class. Here, our CalculatorService class is located in the http://localhost:8080/axis/services/ directory.

Web Services and Middleware

537 We then set the operation name, i.e., the method call that we wish to invoke on the service (i.e., getCalculate ()). We can now invoke the service by ing it any Java Object or an array of Java Objects.

Here, we it three String objects to run the client-side implementation. Here we have deployed CalculatorClient Servlet on the Tomcat under the context DemoWebserviceAxis. To invoke the Web Service like any other the Web application, as follows. http://localhost:8080/DemoWebserviceAxis/CalculatorClient This will produce the output shown in Fig. 14.39 (Part 1).

Fig. 14.39 Output of the Web Service—Part 1 Enter the requested values for Operand1, Operand2 and select the operation you want to perform on the operands as shown in Fig. 14.39 (Part 2). After clicking of the calculate button the getCalculate () method of the CalculatorService is invoked by the code highlighted with grey colour in the CalculatorClient.java.

Web Technologies

538

Fig. 14.39

Output of the Web Service—Part 2

Significance of each line is explained below: // Endpoint is used for making the call String endpointURL=”http://localhost:8080/axis/services/CalculatorService”; // The Service object is the starting point for accessing the web service. Service service = new Service (); //The call object is used to actually invoke the web service. Call call = (Call)service.createCall(); // Sets the call objects endpoint address call.setTargetEndpointAddress(new java.net.URL(endpointURL)); // Sets the operation name associated with this Call object. call.setOperationName(new QName(“CalculatorService”, “getCalculate”)); // Calls the object, ing in the String objects for the operands & Operators. //The return value is stored as an object named “ret”. ret = call.invoke (new Object [] {first, second, oper } );

The response from the CalculatorService is shown in Fig. 14.39 (Part 3).

Web Services and Middleware

539

Fig. 14.39 Output of the Web Service—Part 3

14.10 CLEANING UP AND UN-DEPLOYING To un-deploy the Web service, we first need to create a corresponding WSDD undeployment file to the previous deployment file, as shown in Fig. 14.40. <service name=” CalculatorService “/>

Fig. 14.40

Undeploying a Web Service

Be careful to check the spelling of the service name. If this is incorrect then the service will not be undeployed and no corresponding error will be provided to indicate this. The un-deployment file is ed to the Client Axis class for processing, as shown in Fig. 14.41: java org.apache.axis.client.Client undeploy.wsdd

Again, we can that the service has been correctly un-deployed by checking the list of deployed services using a Web browser as described earlier.

Web Technologies

540

Fig. 14.41 Undeploying a Web Service

14.11 ENABLING THE SOAP MONITOR SOAP Monitor allows the monitoring of SOAP requests and responses via a Web browser with Java plug-in 1.3 or higher. By default, the SOAP Monitor is not enabled. The basic steps for enabling it are compiling the SOAP Monitor java applet, deploying the SOAP Monitor web service and adding request and response flow definitions for each monitored web service. In more detail: 1. Compile SOAPMonitorApplet.java and Copy all resulting class files into the folder c:\tomcat\webapps\axis. 2. Deploy the SOAPMonitorService web service with the client and the deploy-monitor.wsdd file (shown below). Go to the directory where deploy-monitor.wsdd is located and execute the command below. The commands assume that /axis is the intended web application and it is available on port 8080. java org.apache.axis.client.Client deploy-monitor.wsdd SOAPMonitorService Deployment Descriptor (deploy-monitor.wsdd) <deployment xmlns=”http://xml.apache.org/axis/wsdd/” xmlns:java=”http://xml.apache.org/axis/wsdd/providers/java”> <parameter name=”wsdlURL” value=”/axis/SOAPMonitorService-impl.wsdl”/> <parameter name=”namespace” value=”http://tempuri.org/wsdl/2001/12/SOAPMonitorServiceimpl.wsdl”/> <parameter name=”serviceName” value=”SOAPMonitorService”/> <parameter name=”portName” value=”Demo”/> <parameter name=”port” value=”5001"/> <service name=”SOAPMonitorService” provider=”java:RPC”> <parameter name=”allowedMethods” value=”publishMessage”/> <parameter name=”className” value=”org.apache.axis.monitor.SOAPMonitorService”/> <parameter name=”scope” value=”Application”/>

Web Services and Middleware

541 3. For each Web Service that is to be monitored, add request and response flow definitions to the service’s deployment descriptor and deploy (or redeploy) the service. The requestFlow and responseFlow definitions follow the start tag of the <service> element. If a service is already deployed, undeploy it and deploy it with the modified deployment descriptor. An example is shown below: ... <service name=”xmltoday-delayed-quotes” provider=”java:RPC”> <requestFlow> ...

4. With a web browser, go to http[s]://host[:port][/webapp]/SOAPMonitor (e.g., http://localhost:8080/ axis/SOAPMonitor) substituting the correct values for your web application. This will show the SOAP Monitor applet for viewing service requests and responses. Any requests to services that have been configured and deployed correctly should show up in the applet. The screen in Fig. 14.42 shows the SOAP messages monitored by SOAPMonitor.

Fig. 14.42

SOAPMonitor screenshot

Web Technologies

542

SUMMARY The term middleware is very important in all enterprise applications of today and tomorrow. Middleware allows different applications to communicate with each other, by acting as the plumbing layer. Middleware can be used to bridge gaps due to hardware, software, or design/architecture. Earlier middleware technologies were either too generic (CORBA) or too specific (DCOM, RMI). Modern middleware applications are based on XML as the messaging standard and Web services as the platform for hosting and communication. CORBA allows different components to communicate with each other in a distributed environment remotely. CORBA uses the concept of an Interface Definition Language (IDL) to allow the client and server to communicate with each other remotely. IDL is platform/language neutral. The CORBA client uses IDL to prepare and send a request to the CORBA server. Hence, client and server implementations may be in different languages. The CORBA infrastructure relays calls between the client and the server using a technique called as marshalling. CORBA was too ambitious, and too generic; and hence it has failed. DCOM (Distributed Component Object Model) was Microsoft’s version of distributed component technology. DCOM is a middleware technology that works only on Microsoft Windows family of operating systems. In concept, DCOM works similar to CORBA. Java’s version of the component-based middleware is the Remote Method Invocation (RMI) approach. RMI allows distributed application client and server to communicate with each other over a network. RMI uses a protocol called as IIOP for the actual communication, like CORBA. Web services is a new concept, that allows middleware to be completely platform neutral, and also facilitates communication between the client and the server using text format (unlike CORBA, etc.). Web services client sends a message to a server using an XML-based messaging protocol, called as Simple Object Access Protocol (SOAP). SOAP is an XML-based standard with a specific message structure. The Web Services Description Language (WSDL) specification is used to describe how a Web service would look like, how it can be called, and what services it would provide. The Universal Discovery Description and Integration (UDDI) service allows servers to define their Web services and clients to locate them.

l l l l l

l

l

l

l

l l

l l l l l l

l

l l

l

REVIEW QUESTIONS Multiple-choice Questions 1. (a) Object

holds all the components together. (b) Class (c) Middleware

(d) None of the above

Web Services and Middleware

543 2. CORBA is a . (a) product (b) standard (c) product and a standard (d) none of the above 3. The underlying network protocol used in CORBA is . (a) DCOM (b) RPC (c) JRMP (d) IIOP 4. holds all the objects in a distributed environment together. (a) IDL (b) ORB (c) Directory (d) Database 5. To allow components developed in different programming languages to work together, the language is used. (a) IDL (b) COM (c) CORBA (d) RMI 6. IIOP runs T/IP. (a) as a replacement for (b) below (c) on top of (d) as a part of . 7. The client in DCOM is called as (a) proxy (b) skeleton (c) stub (d) none of the above 8. Web service messages are exchanged in the format. (a) SOAP (b) UDDI (c) WSDL (d) UDDI

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9.

Elaborate on the IDL, interface and implementation. Describe a typical operation involving a middleware such as Web Services. What are stub, proxy, and skeleton? What is ORB? Why is it important? How are CORBA and IIOP related? How is RMI different from CORBA? Is COM the same as DCOM? If not, what are the differences between them? What is CLSID in DCOM? Why is DCOM dependent on Microsoft technology?

Exercises 1. Find out the differences between Java and .NET Web Services at the implementation level. 2. Find out the key differences between CORBA, COM and RMI at the implementation level? 3. Study a practical implementation of Web Services and see what is involved when developing a Web Services application. 4. Investigate why Microsoft has stopped ing session-aware components in COM+. 5. Trace the development of middleware technologies since the beginning of distributed computing.

Web Technologies

544

Wireless Internet

+D=FJAH

15

INTRODUCTION The Internet was created with one simple assumption in mind, that the clients, routers, and servers would be stationary. The whole concept of IP addressing, T connection management and overall delivery of datagrams between the sender and the receiver is based on this philosophy. However, as we know by now, this is not quite true! Clients can and do move freely (since a mobile phone can be an Internet client now), and even routers do (for example, when an airplane moves). Hence, the existing routing technology is inadequate while dealing with the mobile world, such as mobile IP and mobile T. We need to make changes in technology to deal with this situation. In the late 1990’s, wireless computing stormed the world of the Internet. Until that time, the Internet was accessible only if one had a PC. However, that changed in 1997 with the arrival of a new set of standards for wireless Internet access through wireless handheld devices and Personal Digital Assistants (PDAs). The Wireless Application Protocol (WAP) had arrived. In simple , WAP is a communication protocol that enables wireless mobile devices to have an access to the Internet.

15.1 MOBILE IP 15.1.1 Understanding Mobile IP The best way to understand mobile IP is to consider what happens when we move house. When we relocate to a new house, we still want to receive mail addressed to us. Merely moving to a new house should not mean that we lose mail delivered to our old address. Mobile Internet is faced with a similar challenge. When a mobile station (say, a mobile phone with Internet access) moves from one place to another, it still wants to continue accessing the Internet as if nothing has happened. However, from the Internet’s point of view, a lot has actually happened! Why? This is because the e philosophy of routing datagrams to hosts is based on the principle of network address and host address. We know that every host belongs to some network address (i.e., the address of the router for that network). Now, if this host suddenly moves, where should datagrams be routed? This problem is illustrated in Fig. 15.1.

Wireless Internet

545

Fig. 15.1

The need for mobile IP

As we can see in the diagram, at time T2, the mobile host has moved out. Hence, router A cannot reach it. This also means that datagrams intended for the mobile host cannot be delivered by the router A to the mobile host. What should we do to tackle this situation now? Let us examine the ideas illustrated in Fig. 15.2. n n n n

Postal mail is sent to us by placing a letter (Packet data) in an envelope addressed to us (IP header) Letter (datagram) arrives at our local post office (router) and is delivered to our home address (host address) When we move to a different house (new router), we inform the local post office (Home agent) to forward letters (IP datagrams) to our new address (Care of address) Letter is then forwarded to the post office that services our new location (Foreign agent) and is delivered to our new address (Care of address)

Fig. 15.2

Outline of mobile IP

Let us understand this in more detail now.

15.1.2 Mobile IP and Addressing We know that the IP addressing philosophy is based on the assumption that a host is stationary. But when it moves, the mobile host needs a new IP address. IP addressing cannot work without such a scheme. Therefore, mobile IP has devised an elegant solution. It works on the basis that every mobile host can potentially have two IP addresses. The usual IP address that the mobile host has is called as its home address. In addition, whenever it moves, it obtains a temporary address called as a care-of address. Figure 15.3 shows the concept.

Web Technologies

546

Fig. 15.3 Home address and Care-of address As we can see, the usual IP address (the home address) of the mobile host is 131.5.24.8. This is true whenever the mobile host is in its home network (with address 131.5.0.0). Currently, the mobile host has moved out of its home network, and has roamed into a network with address 14.0.0.0. Here, it has acquired a temporary care-of address as 14.13.16.9. The way this works is as follows. When the mobile host is roaming and is attached to a foreign network, the home agent (router) receives all the datagrams and forwards them to the foreign agent (router) in the current location. Whenever the mobile host moves, it s its care-of address with its home agent. The original IP datagram is encapsulated inside another datagram, with the destination IP address as the care-of address, and forwarded there. This process is called as tunneling. If the mobile host acts as a foreign agent, its care-of address is called as co-located care-of address. In this case, the mobile host can move to any network without worrying about the availability of a foreign agent. This means it needs to have extra software to act as its own foreign agent. Let us now understand how mobile IP communication happens, based on Fig. 15.4. The steps illustrated in the diagram can be explained as follows. 1. Server X transmits an IP datagram destined for mobile host A, with A’s home address in the IP header. The IP datagram is routed to A’s home network. 2. At the home network, the IP datagram is intercepted by the home agent. It tunnels the entire datagram inside another IP datagram, which has A’s care-of address as the destination address. It routes this IP datagram to the foreign agent. 3. The foreign agent strips off the outer IP header, obtains the original IP datagram, and forwards it to A. 4. A sends a response, with X as the destination IP address. This first reaches the foreign agent. 5. The foreign agent forwards the IP datagram to X, like the way routing happens on the Internet.

Wireless Internet

547

Fig. 15.4 Mobile IP based communication

15.1.3 Discovery, Registration, and Tunnelling For mobile IP to work, three things are essential.

Discovery A mobile host uses a discovery procedure to identify prospective home agents and foreign agents. Registration A mobile host uses a registration procedure to inform its home agent of its care-of address. Tunnelling This is used to forward IP datagrams from a home address to a care-of address. From a protocol point of view, the overall view looks as shown in Fig. 15.5.

Fig. 15.5 Mobile IP protocol As we can see, the mobile IP steps make use of the UDP, ICMP and IP protocols. Let us understand these steps now.

Web Technologies

548

Discovery The mobile host is responsible for an ongoing discovery process. It must determine if it is attached to: n n

Its home network (in which case, IP datagrams can be received without forwarding) A foreign network (in which case, handoff is required at the physical layer)

This is a continuous process of discovery. The discovery process is built over the ICMP router discovery and ment procedure. Router can detect whether a new mobile host has entered into the network. Mobile host can determine if it is in a foreign network. Routers and agents periodically issue router ment ICMP messages. Receiving host compares the network portion of the router’s IP address with the network portion of its own IP address allocated by the home network. Accordingly, it knows whether it is in a foreign network or not. If a mobile host needs a care-of address without waiting for router ment, it can broadcast a solicitation request.

Registration The registration process consists of four steps. 1. Mobile host must itself with the foreign agent. 2. Mobile host must itself with its home agent. This is done normally by the foreign agent on behalf of the mobile host. 3. Mobile host must renew registration, if it has expired. 4. Mobile host must cancel its registration, when it returns home. The actual flow of information is shown in Fig. 15.6.

Fig. 15.6

Registration concept

Tunnelling Here, an IP-within-IP encapsulation mechanism is used. The home agent adds a new IP header called as tunnel header. Tunnel header contains the mobile host’s care-of IP address as the tunnel destination IP address. Tunnel source IP address is the home agent’s IP address.

Wireless Internet

549

15.2 MOBILE T Before examining mobile T, let us understand the principles of traditional T when it has to deal with congestion problems. Traditional T uses the sliding window protocol, based on the following concepts. Each sender maintains two windows, one that the receiver has granted to it, and another called the congestion window. Both windows reflect the number of bytes the sender can send; the minimum of the two is chosen. For example: n

n

If the receiver says “Send 8 KB”, but the sender knows that bursts of more than 4 KB would clog the network, it would send only 4 KB. If the receiver says “Send 8 KB”, but the sender knows that bursts up to 32 KB would travel over the network, it would send only 8 KB.

Here, the concept of slow start also emerges. This means that when the T connection is established, the sender initializes the congestion window to the size of the maximum segment possible and sends one maximum segment. If the sender receives an acknowledgement before time-out, it makes the size of congestion window to two, and attempts to send two segments, and so on. This is not slow at all, but is actually exponential! In theory, transport layer should be independent of the lower layer protocols. In practice, it matters. Most T implementations assume that timeouts happen because of congestions, not by lost datagrams. Therefore, the cure is to slow down transmissions. In wireless transmissions, on the other hand, datagram loss is quite common. Care should be to resend them, and as quickly as possible—slowing down makes things worse! Thus, we can summarize the problem of handling lost datagrams. n n

In wired networks, slow down. In wireless networks, try harder, and faster.

We will now discuss some of the solutions that emerge. In indirect T, we recognize that what makes matters worse is that some part of the channel may be wired, and some may be wireless. For example, the first 1000 km may be wired, last 1 km may be wireless. Hence, the solution is to create two T connections: (a) sender to base station, and (b) base station to receiver. The base station simply copies packets between the two connections in both directions. Timeouts can be handled independently. If we use snooping T, we need to make several small modifications to the network layer in the base station. We need to add a snooping agent that observes and caches T segments going out to the mobile host, and acknowledgements coming back from it, and which uses a short timer to track this. If it sees a T segment going out to the mobile host but does not see an acknowledgement, it retransmits it, without telling the original sender. In selective retransmission, when a base station notices a gap in the inbound sequence numbers of the T segments, it generates a request for selective repeat of the missing bytes by using a T option. In transactional T, we reduce the number of steps in T connection management to just 3. The resulting interaction between a client and a server is depicted in Fig. 15.7. Contrast this with the traditional T, which needs so many (9) interactions, as shown in Table 15.1.

Web Technologies

550

Fig. 15.7

Transactional T

Table 15.1 Traditional T 1. 2. 3. 4. 5. 6. 7. 8. 9.

15.3

C-S S-C C-S C-S C-S S-C S-C S-C C-S

SYN SYN + 1, ACK (SYN) ACK (SYN + 1) Request FIN ACK (Request + FIN) Reply FIN ACK (FIN)

GENERAL PACKET RADIO SERVICE (GPRS)

15.3.1 What is GPRS? GPRS stands for General Packet Radio Service. It is a technology that allows GSM mobile phones to send or receive data traffic, instead of the usual voice traffic. When the mobile phone technology became more popular, it became clear that data services would be the next major revenue generator, rather than voice services. This led to the development and popularity of GPRS. We need to differentiate between voice traffic and data traffic, because voice calls are long and continuous in nature, but carry little traffic. On the other hand, data traffic is of shorter durations, bursty, and carries a lot of data and then suddenly no data at all. Consequently, the GSM technology had to be divided into two categories—one for voice traffic, and the other for data traffic. The category of data traffic is what we call as GPRS. GPRS, therefore, is packet-switched, and not circuit-switched. The ’s mobile number is mapped to a unique IP address, so that the mobile phone can double up as an Internet-enabled device. A dual-mode mobile phone allows the to speak/hear in a voice call and browse the Internet, at the same time. Single-mode mobile phones only one of these activities.

Wireless Internet

551

15.3.2 GPRS Architecture The conceptual architecture of a GPRS system is shown in Fig. 15.8.

Fig. 15.8

GPRS architecture

In the diagram: n n

n

BS stands for the Base Station of the mobile service provider providing GSM/GPRS service. SGSN stands for Serving GPRS Node, which helps the base station connect to the Internet, as explained subsequently. GGSN stands for Gateway GPRS Service Node, which is the gateway of the mobile service provide to the Internet, as explained subsequently.

The way the whole thing works is as follows. Whenever the wants to send a request to an Internet server (say a Web server), the ’s mobile device prepares the IP datagram as usual, and puts the address of the Internet server (say A) as the destination address. SGSN receives this as a default, and encapsulates it inside a new IP datagram. This outer IP datagram specifies the IP address of GGSN as the destination IP address (say B), and sends it to GGSN. This is because GGSN is the gateway of the mobile GPRS network to the Internet. Once GGSN receives this datagram, it removes the outer header, realizes that it should be routed to an Internet server with address A, and sends it like any other IP datagram to Internet server A. A response from the Internet server A back to the original mobile would come back in the reverse direction. This is illustrated in Fig. 15.9. Following are the advantages of GPRS. 1. GPRS provides data access with the traditional GSM mobile phones, at some nominal extra charge, which is a great convenience. 2. Using GPRS is not very different from using the traditional mobile phone or the traditional Internet. Following are the drawbacks of GPRS. 1. GPRS Internet access is slow as compared to traditional Internet (data rates of up to 28.8 kbps are ed). 2. Generally, GPRS data traffic has lower priority than the voice traffic; and hence, may provide lower throughput.

Web Technologies

552

Fig. 15.9 How GPRS works

15.4 WIRELESS APPLICATION PROTOCOL (WAP) 15.4.1 WAP Architecture Let us start by comparing the basic WAP architecture with the Internet architecture. Both the architectures are based on the principle of client-server computing. However, the difference is between the number of entities involved. As Fig. 15.10 shows, in case of the simple Internet architecture, usually we have just two parties interacting with each other, the client and the server.

Fig. 15.10 The Internet architecture of a Web browser and a Web server However, in case of the WAP architecture, we have an additional level of interface: the WAP gateway, which stands between the client and the server. Simplistically, the job of the WAP gateway is to translate client requests to the server from WAP to HTTP, and on the way back from the server to the client, from HTTP to WAP, as shown in Fig. 15.11. The WAP requests first originate from the mobile device (usually a mobile phone), which travel to the network carrier’s base station (shown as a tower), and from there, they are relayed onto the WAP gateway where the conversion from WAP to HTTP takes place. The WAP gateway then interacts with the Web server (also called as origin server) as if it is a Web browser, i.e., it uses HTTP protocol for interacting with the Web server. On return, the Web server sends a HTTP response to the WAP gateway, where it is converted into a WAP response, which first goes to the base station, and from there on, to the mobile device. We shall discuss it in more detail in the next section.

Wireless Internet

553

Fig. 15.11

Interaction of a mobile phone with the Internet

15.4.2 WAP Gateway The WAP gateway is the device that logically sits between the client (called as WAP device) and the origin server. Several new have been introduced here. So, let us understand them. A WAP device is any mobile device such as a mobile phone or a PDA that can be used to access mobile computing services. The whole idea is that the device can be any mobile device as long as it s WAP. An origin server is any Web server on the Internet. This is just another term for the same concept. The WAP gateway enables a WAP device to communicate with an origin server. In the normal Internet architecture, both the client (a Web browser) and the server (a Web server) understand HTTP protocol. Therefore, no such gateway is required between them. However, in case of WAP, the client (a WAP device) runs WAP as the communications protocol: not HTTP. The server (the origin server) continues to run HTTP as before. Therefore, translation is required between the two. This is precisely what a WAP gateway does. It acts as an interpreter that does two functions. 1. It takes WAP requests sent by the WAP device and translates them to HTTP requests for forwarding them on to the origin server. 2. It takes HTTP responses sent by the origin server and translates them to WAP responses for forwarding them on to the WAP device. This is shown in Fig. 15.12. Therefore, we can describe a simple interaction between a mobile and the Internet with the help of the following steps. 1. The presses a button, selects an option (which internally picks up the buried URL associated with that URL) or explicitly enters a URL on the mobile device. This is similar to the way an Internet makes a request for a Web page. This is received by the WAP browser running in the mobile device. A WAP browser is a software program running on the WAP device that interprets WAP content, similar to the way a Web browser interprets HTML content. We shall subsequently see how to create WAP content. The WAP browser is responsible for sending requests from the WAP device to the WAP gateway and receiving responses from the WAP gateway, and interpreting them (i.e., displaying them on the screen of the mobile device). 2. The WAP browser sends the ’s request, which travels via the Wireless network set up by the network operator to the WAP gateway. This is a WAP request, which means that the request is in the

Web Technologies

554 form of WAP commands. Note that this in contrast to the normal interaction between a Web browser and a Web server, which starts with an HTTP request.

Fig. 15.12

The way WAP gateway works

3. The WAP gateway receives the WAP request, translates that to the equivalent HTTP request and forwards it to the origin server. 4. The origin server receives the HTTP request from the WAP gateway. This request could be for obtaining a static HTML page, or for executing a dynamic server-side application written in languages such as ASP, JSP, servlets or CGI—just like the normal Internet HTTP requests. In either case, the origin server takes an appropriate action, the final result of which is an HTML page. However, a WAP browser is not created with the intention of interpreting HTML. HTML has now grown into a highly complex language that provides a number of features that are not appropriate for mobile devices. Therefore, a special program now converts the HTML output to a language called as Wireless Markup Language (WML). WML is a highly optimized language that is invented keeping in mind all the shortcomings of mobile devices, which suits these devices very well. Of course, rather than first producing HTML output and then translating it into WML, some origin servers now directly produce WML output, bying the translation phase. We shall also discuss this possibility and what the WAP gateway does in that case, later. The point is that, the outcome of this process is some output that conforms to the WML standards. The origin server then encapsulates these WML contents inside a HTTP response, and sends it back to the WAP gateway. 5. The WAP gateway receives the HTTP response (which has the WML code encapsulated within it) from the origin server. It now translates this HTTP response into WAP response, and sends it back to the mobile device. WAP response is a representation of the HTTP response that a mobile device can understand. The WML inside remains as it was. 6. The mobile device now receives the WAP response along with the WML code from the WAP gateway, and hands it over to the WAP browser running on it. The WAP browser takes the WAP response and interprets it. The result of this is some display on the screen of the mobile device.

Wireless Internet

555 This should give us a good idea about the interaction of a mobile with the Internet. Of course, it is simplistic, and many finer details are not described here. We shall elaborate on those throughout the rest of our discussion. As we have mentioned, just as HTML is the language used to write HTTP contents in case of the Internet, WML is the language that WAP speaks! The reason behind not using HTML and inventing a new language was the same as before. HTML contains many features that are unnecessary for the mobile devices, which makes HTML bulky. If the browser of a mobile device has to have a HTML interpreter like a normal Web browser, it would demand too much of processing power and memory. Instead, WML, which is a lightweight language, puts significantly less demands on the browser hardware of the mobile phone. An obvious question now would be, in case of the Internet, static HTML is now not the norm: you also have client-side scripts in the form of scripting languages such as JavaScript and VBScript. What about WML? In case of WML, a similar concept has been developed. The client side scripts are possible here, as well. These scripts, however, are not written in the usual scripting languages such as JavaScript and VBScript. Instead, a new scripting language called as Wireless Markup Language Script (WMLScript) was developed, that is conceptually similar to JavaScript and VBScript. Functionally, it allows interactivity to the WAP clients. We shall study WML and WMLScript later. We also have mentioned that a mobile device contains a different browser called as a WAP browser. Therefore, a WAP device is different from a normal Web client. Let us discuss its architecture now.

15.4.3 WAP Device A WAP device, or more commonly, a WAP client, allows the of a mobile device (such as a mobile phone) to access the Internet. The WAP specification mandates that to be WAP-compliant, a mobile device must have three pieces of software running in it. These software programs are: WAE agent, WTA agent and WAP stack. Let us have a look at the conceptual view of a WAP client before we discuss these three software programs. This is shown in Fig. 15.13.

Fig. 15.13

The organization of software inside a WAP client

The WAP client it is classified into three main pieces of software, as follows.

WAE agent The Wireless Application Environment agent (WAE agent) is a micro browser that runs on a WAP client and is also called as a WAP browser. The main job of a WAE agent is to interpret WML contents to display the corresponding output on the screen of the WAP device. Thus, it functions pretty similar to the way a Web browser works. The WAE agent receives compiled WML, WMLScript and images, and renders them on the screen of the mobile device. It also manages the interaction between a and an application, similar to what a Web browser does.

Web Technologies

556

WTA agent The Wireless Telephony Applications agent (WTA agent) receives compiled WTA files from the network operator and executes them on the mobile devices. The WTA files encapsulate the services normally required by a mobile phone , such as number dialing, call answering, phonebook organization, message management, and location information services. Note that WTA is not a requirement of only WAP-enabled mobile phones. Any mobile phone would need similar services and would employ WTA in some form. WAP stack The WAP stack allows a mobile device to connect to a WAP gateway with the help of the WAP protocols. This is conceptually very similar to the way a Web browser runs on top of the T/IP stack of protocols for interacting with the Web servers.

15.4.4 Internal Architecture of a WAP Gateway Having looked at the basic concepts of a WAP gateway and a WAP client, let us now examine the internal architecture of a WAP gateway. Quite clearly, if a WAP gateway has to interact with a WAP device using WAP as the communications protocol, it must also have a WAP stack running. Interestingly, on the other hand, a WAP gateway has to also interact with the traditional Web servers using HTTP. This means that it has to also have a traditional T/IP stack running. As a result, a WAP gateway runs WAP stack on one side (for interacting with a mobile device) and the T/IP stack on the other (for interacting with a Web server), as shown in Fig. 15.14.

Fig. 15.14 Internal view of the WAP gateway The WAP gateway acts as an interpreter between a mobile device and the origin server. Apart from this, the WAP gateway performs one more important task. In case of the wired Internet, the Web server sends HTML contents to the browser, which interprets them and produces some display on the ’s screen. We had said that in case of WAP, the browser receives WML and WMLScript. Although conceptually correct, this is not entirely accurate. The bandwidth issues of the wireless networks are so severe that sending WML and WMLScript to the mobile devices via them is also not a trivial issue. Instead, the WAP gateway is programmed to compile these, and send the compiled binary code (which is significantly smaller than the original WML code) to the micro browser of the mobile device. The micro browser then interprets this compiled code and produces the appropriate output on the screen of the mobile phone. This is similar to the way a Java applet is first compiled to bytecode (compiled binary form of the original code) and the bytecode are sent to the Web browser. This is shown in Fig. 15.15.

Wireless Internet

557

Fig. 15.15 Logical flow of information from and to the WAP gateway

15.4.5 The WAP Stack It is now time to examine the WAP stack in detail. More specifically, we shall attempt to map the WAP stack on to the T/IP stack of the Internet, so as to get a feel of the similarities and differences between them. However, we must note that the WAP stack is based more on the OSI model, rather than the T/IP model. Figure 15.16 shows the WAP stack.

Fig. 15.16

WAP stack

The WAP stack consists of five protocol layers, which are:

Application Layer The application layer is also called as Wireless Application Environment (WAE). This layer provides an environment for wireless application development, similar to the application layer of the T/IP stack. Session Layer Also called as Wireless Session Protocol (WSP), the session layer provides methods for allowing a client-server interaction in the form of sessions between a mobile device and the WAP gateway. This is conceptually similar to the session layer of the OSI model.

Web Technologies

558

Transaction Layer The transaction layer is also called as Wireless Transaction Protocol (WTP) in the WAP terminology. It provides methods for performing transactions with the desired degree of reliability. Such a layer is missing from the T/IP and OSI models.

Security Layer The security layer in the WAP stack is also called as Wireless Transport Layer Security (WTLS) protocol. It is an optional layer, that when present, provides features such as authentication, privacy and secure connections—as required by many modern e-commerce and m-commerce applications.

Transport Layer The transport layer of the WAP stack is also called as Wireless Datagram Protocol (WDP) and it deals with the issues of transporting data packets between the mobile device and the WAP gateway, similar to the way T and UDP work. Let us now discuss each of the five layers of the WAP stack in detail. Before we do so, let us view the WAP stack vis-à-vis the conceptual equivalents in the T/IP stack for identifying conceptual similarities and differences, as shown in Fig. 15.17. We shall also refer to the OSI model, whenever appropriate, because the T/IP stack (also called as T/IP model) is based on OSI model.

Fig. 15.17 WAP stack and its equivalents in T/IP Keeping this comparison in mind, let us now study the various layers in the WAP stack, in detail.

15.4.6 The Application Layer—Wireless Application Environment (WAE) The application layer of the WAP stack provides for all the features that are required by a wireless application development and execution. This layer specifies the text and image formats that the mobile device must comply with. The main issues that the application layer deals with are the presentation of the Web contents in a specific form. As we have noted, there are two main standards that are ed by this layer: WML and WMLScript. Therefore, to put simply, the application layer specifies what WML and WMLScript can and cannot contain. Let us discuss WML and WMLScript now.

Wireless Internet

559

Wireless Markup Language (WML) WML Basics The original intention behind the creation of the Hyper Text Markup Language (HTML) was to specify how to display the contents of pure text-based Web pages using a Web browser. However, the demands on HTML grew so rapidly that in a few years time, only text-based interfaces were replaced by images, audio and video. HTML is now a language that s all these multimedia features. However, due to the limitations discussed earlier, the WAP-enabled mobile devices cannot use HTML in its current form— it is too vast for them. As a result, the Wireless Markup Language (WML) was devised with a specific aim of presenting content suitable for mobile devices, which have great limitations. WML mainly specifies how to display text contents on a mobile device. Additionally, it has a limited for images. Syntactically, it looks pretty similar to HTML as well as XML. However, technically, its basic type is defined as XML, which means that WML can be understood by any device that understands XML. WML uses tags, just as HTML does, to demarcate blocks of text for specifying how to display them. Every WML document has an extension .wml and is called as a deck, which is composed of one or more cards. When a accesses a WAP site, the site sends back a deck to the ’s mobile device. The micro browser inside the mobile device interprets the first card in the deck, and displays its contents on the ’s screen. Thereafter, based on the ’s action, another card in the same deck is usually shown. Of course, if the takes an entirely different action that is in no way related to the current scope of his actions, the deck may be discarded, and a new deck may be requested for. For instance, suppose the initially sends a request for viewing his bank details to his bank’s WAP site. This may result in the ’s mobile device receiving a deck of cards related to his details. Now, if he suddenly decides to read some political news, the current deck is useless, and a new deck related to the political news from a different WAP site is required. The most common WML features are summarized in Table 15.2.

Table 15.2 Feature Text Images

input

Variables

A summary of WML features Details Like HTML, WML s text-enhancement features such as displaying the text in boldface, as underlined, or in italics, etc. WML s a new image format called as Wireless Bitmap (WBMP). These are typically small images that are only in black and white, and are optimized considering the limitations of the wireless network and mobile devices. Similar to what forms offer in HTML, WML also has the concept of input. The can enter text, select one of the displayed options, click on a hyper link, or go to the previous or next card in the deck, etc. WML s the concept of variables, that can be used for a variety of purposes such as hidden information, accepting, validating and manipulating the input, etc.

WML Example Figure 15.18 shows a sample WML document that simply displays the text Welcome to WML on the ’s micro browser, when it is executed. As you can see, it looks very similar to an HTML document. However, unlike HTML, the WML document starts with a few header lines, before the actual contents of the WML page start.

Web Technologies

560 <wml>

Welcome to WML



Fig. 15.18 WML document that displays Welcome to WML The WAP code in a WAP simulator provided by Nokia and its corresponding output as shown in a Nokia phone, is shown in Fig. 15.19. This is just for getting a feel of how the output on a WAP phone looks like.

Fig. 15.19 The WML code and its corresponding output as seen in a simulator Let us examine the WML document shown, line by line.

Wireless Internet

561 This line indicates that the WML document conforms to XML standards. The XML standard says that every XML document must begin with this line, and since every WML document is a XML document in turn, it must start with this line.
This line signifies that the current document is a WML document and that it conforms to the English version of the publicly available WML standards version 1.1, as decided by the WAP forum. This is a standard line in a WML document, and can be safely ignored for the sake of the current discussion. “http://www.wapforum.org/DTD/wml_1.1.xml”>

This line specifies the hyper link, which can be accessed for obtaining the WML standards, if the is interested in finding them out. This is also a part of the WML document header. <wml>

This tag, pretty similar to , specifies that the actual WML document starts here.

This line gives an identifier to the WML page, and also displays the title of the WML page on the top of the display screen of the mobile device as can be verified from the figure.



Same as the HTML

tag, this tag starts a new paragraph. Welcome to WML

This line displays what it says: Welcome to WML, on the screen of the ’s mobile as shown in the figure.



As expected, this tag ends the paragraph started earlier.


This tag signifies the end of the card.

Finally, this tag signifies that the WML page ends here.

Other main features of WML Like HTML, WML provides most of the basic features, as illustrated in a table earlier. We will not examine them in detail. However, some of the WML features need to be discussed— especially how to accept input from a .

Displaying options Options can be displayed on a WML screen by using the <do> tags. Within these tags, all the options that we want to display on the WML page can be delimited. Figure 15.20 shows one such example. Using this WML code, two options are displayed on the screen, and with each one, a hyper link is associated with the help of the anchor (i.e. ) tags. How the anchors relate the source card with the hyper-linked cards is also shown in the figure.

Web Technologies

562 <wml> <do type = “accept” label = “Apples”> <do type = “accept” label = “Oranges”>

An apple a day keeps the doc away!

Oranges are good for Vitamin C!



Fig. 15.20

Displaying options for the in WML

The two-step output produced by this code is shown in Figs 15.21 and 15.22.

Fig. 15.21

Displaying options using WML

Wireless Internet

563

Fig. 15.22 The result of selecting an option on the screen shown in the earlier figure The WML code shown in Fig. 18.12 works as follows.

These are the standard headers in any WML document, and we shall not discuss them. <wml>

This statement indicates the start of the WML document. At the end, the end of the WML document is signified by the closing tag .

This statement signifies the start of a WML card. The name of the card is card1, and when on the screen, a message Choose Option would be displayed for this card. <do type=”accept” label=”Apples”>

This statement displays a radio button, and displays a message Apples next to that button. The type=”accept” command indicates that this radio button is expecting an input from the . When this WML page is shown on the ’s screen, it displays two options: Apples and Oranges. The can move around these options with the help of the appropriate buttons provided on the mobile device,

Web Technologies

564 and finally select one of them by clicking over the selected option. When this happens, the position of the cursor is used to find out the option selected, the corresponding address of the card (either for apples or oranges) is picked up and the WML page transfers control to the appropriate card—i.e., the portion of the code, and it would display a line about that fruit. For example, consider the following line.

This line specifies a hyper link, which points to a card called as apples, which is a part of the same WML deck. This is how a can navigate to other cards in the same deck. Other hyper links work in the same fashion. The
tag is equivalent to the tag.

Accepting inputs The forms-like feature of WML allows us to accept inputs from a . For this, the tag is used. For example, take a look at the WML code shown in Fig. 15.23. It requests the to

enter his name, and when the enters it, simply displays a greeting message for the . <wml>

Enter name: <do type = “accept” label = “Name”>

Hello $(person_name)



Fig. 15.23 Accepting data from the in WML The three-step processing of the above WML code is shown in Figs 15.24, 15.25 and 15.26. As you can see, the tag accomplishes accepting any inputs from the . In this example, the tag is coded as follows.

This creates a variable called as person_name, and also signifies that the micro browser should accept a value in this variable using an input box. The second card in the deck (display) then displays Hello along with the name the has entered previously. Since these and other features of WML are quite similar to HTML, we shall not spend too much time discussing them. Instead, let us summarize the main WML tags as shown in Table 15.3.

Wireless Internet

565

Fig. 15.24 Accepting input from the

Fig. 15.25 types a name

Web Technologies

566

Fig. 15.26 Table 15.3

WML displays back the entered name

Summary of most common WML tags

WML tag <wml> ... ...

...

... <small> ... ... ... ... ...

... <do type=”...”> ...

Purpose Signify the start and end of a WML document Start and end a specific card within a WML document (or deck) Signify the boundaries of a paragraph Display the text enclosed within these tags in a bigger font Display the text enclosed within these tags in a smaller font Display the specified text in boldface Display the specified text in italics Display the specified text with underline Create a table-like structure Replace this position with an image Put a line break here Specify a hyper link Create a variable and accept its value from the Display an option

As you can see, most of the HTML tags have the same or equivalent tags in WML. Also, like HTML, some WML tags have corresponding ending tags, whereas others have none. Due to the similarities between HTML and WML, an HTML developer can quickly learn WML.

Wireless Internet

567

WMLScript WMLScript Basics WMLScript allows client-side scripting on WML pages. Just as WML is pretty similar to HTML, WMLScript is similar to JavaScript. However, there is one major difference between Web-based scripting languages such as JavaScript/VBScript and WMLScript. Whereas the former languages can be executed on the Web browsers as well as Web servers, WMLScript can be executed only at the client side. One question is frequently asked. If the issues of a small amount of client-side memory and a less powerful processor at the client are so significant in wireless world, why have the additional burden of performing client-side processing at all? Would it not put extra processing requirements on the already less-powerful client? The counter-argument to this is that it is better to only once send all the necessary additional functionality to the WAP client in the form of WMLScript, rather than making frequent trips between the WAP client and the WAP gateway. In the absence of WMLScript, the WAP client would have to use the services of some code at the WAP gateway every time even a small interaction from the is involved. Instead, once WMLScript is sent to the client, the round-trips back to the WAP gateway can be avoided, at least for a few things within the reach of the WMLScript. There are two major differences between the Web-based scripting languages and WMLScript. 1. Whereas the client-side scripting code is embedded in HTML code in case of Web-based scripting languages, it is not so in case of WMLScript. In this case, the WMLScript code is stored in a separate file, and that file is externally called from the WML code, when required. A WMLScript file has a wmls extension. 2. The client side script (by virtue of being embedded in HTML code) is always sent together with the HTML page in the case of Web-based HTML pages. However, in case of WAP-based WML pages, the WMLScript pages associated with a particular WML page are not sent to the client as a default. Only the compiled WML page is sent to the client. The compiled WMLScript file is sent to the client by the gateway only when some functionality in that page is requested by the client by making an explicit call to one of the functions contained in that WMLScript. Until then, it resides at the Web gateway. This ensures that the overhead of sending WMLScript from the WAP gateway to the client is minimized. The process of the compilation of WMLScript is similar to the compilation of a program written in any other programming language. However, the compilation process itself is based on how a Java program is compiled. This is described below. (a) A WMLScript compiler compiles the code of WMLScript into virtual machine language instructions. That is, the compiled WMLScript code is assumed to work on a computer that does not exist physically. Consequently, when you compile a WMLScript program, the output of the process is a set of assembly language-like instructions (which are pretty similar to Java bytecode) that are supposed to execute correctly on a hypothetical computer, called as a virtual machine. (b) The compiled bytecode instructions remain on the WAP gateway, until a mobile device asks for them. When this happens, the WAP gateway sends these bytecode instructions to the mobile device. (c) Now, the micro browser inside a WAP device executes the bytecode, similar to how a Java interpreter inside a Web browser interprets the bytecode of a Java applet. Note that the micro browser is programmed to interpret the compiled version (i.e., bytecode form) of WMLScript instructions. This means that only the bytecode needs to be sent by the WAP gateway to the mobile device. In contrast, in case of the Web-based scripting languages, the entire code (for scripting languages such as JavaScript and VBScript) is sent to the browser, which interprets the high-level instructions in English of these scripting languages. Thus, in case of WAP, the transport of code is minimized by sending only the

Web Technologies

568 compiled code to the micro browser, which is so critical, considering the bandwidth limitations. Also, interpretation of the bytecode is far simpler than compilation/interpretation of the instructions in English, thus reducing the burden on the less powerful WAP device.

WMLScript Example A WMLScript file contains one or more functions. A WML deck can invoke a WMLScript function by referencing the WMLScript’s file name and the name of the function to be called, ed by a hash. Let us understand how this works with a simple example, as shown in Fig. 15.27. Figure 15.27(a) shows a WML page, which calls a WMLScript, as shown in Fig. 15.27(b). Note how the function calculate from the WMLScript sample.wmls is called by the WML page.

Fig. 15.27

WML calling a WMLScript

Notably, the syntax for calling a WMLScript function is quite similar to calling a card in the same WML deck. Let us understand how the code in the WMLScript works. extern function calculate(a,b) {

This indicates the start of the function calculate. The keyword extern indicates that this function can be called externally by a WML page. The function also expects two parameters, a, and b. var sum;

This statement declares a variable called as sum. Any variable in WMLScript can be declared in this fashion.

Wireless Internet

569 sum = a+b;

This line assigns the sum of a, and b to the variable sum, declared earlier. Note that a, and b are received by this function as parameters from the WML page. Therefore, this statement would add whatever values the WML page es as a and b, and store it in the variable sum. In this case, a=2, b=3, and therefore, the sum would be computed as 2+3 = 5. WMLBrowser.SetVar(“number”, sum);

This code cannot be understood in isolation. Read this line along with the line that displays The result ... in the WML page. Note that in that line of the WML code, $(number) is specified. But, number is not specified here. Now, coming back to the above WMLScript, we can see that we want to replace number with sum. In simple , we are asking the WMLScript function to display the value of sum in place of the variable number (i.e., after the message The result ..., as shown in the WML code). Due to this, the line will be displayed as shown below. The result of 2+3 is 5. WMLBrowser.refresh();

This statement would cause the contents in the micro browser to refresh. In case of WAP, the ’s screen is not automatically updated as a result of executing a WML function. Why is this? that the WMLScript resides on the WAP gateway, until it is executed. After it executes, the WML page must refresh the screen to reflect the results of this execution. In case of HTML, the scripts travel along with the HTML page, and therefore, after the scripts are executed, the browser automatically refreshes the screen. Therefore, this statement is required in case of WML, but not in case of HTML.

Other features of WMLScript Other main features of WMLScript are quite similar to those found in any scripting language. Let us summarize them, as shown in Table 15.4.

Table 15.4 Main features of WMLScript Language construct

Details

Purpose

Arithmetic operators

+

Logical operators

&&

||

!

Relational operators

==

!=

>

var ... if (...) while (...) for (...)

e.g. var x e.g. if (x > y) z = 100; e.g. while (x > 0) ... e.g. for (i=1; i < 100; i++)

-

*

/

>=

<

<=

For performing basic arithmetic operations such as addition, subtraction, multiplication, and division respectively For performing logical tests such as AND, OR, and NOT respectively To compare values using constructs such as is equal to, not equal to, greater than, greater than or equal to, less than, and less than or equal to respectively Declaring a variable Conditional construct The standard while loop The standard for loop

We shall not go into the details of these WMLScript features further.

Web Technologies

570

15.4.7 The Session Layer—Wireless Session Protocol (WSP) In the WAP protocol stack, the session layer is represented by the Wireless Session Protocol (WSP). It is devised with the aim of implementing a request-response protocol similar to HTTP. However, it also provides some additional features. These features are required considering the mobile nature of the WAP clients. For instance, if a client changes base stations when on move, its connection with the server should not be lost. WSP has to take this fact into consideration. WSP allows the possibility of data exchange between applications in one of the two ways.

Connection oriented session services This operates over the transaction layer of the WAP protocol, i.e., the Wireless Transaction Protocol (WTP). Here, a connection between a client and a server is established before the actual data is exchanged. A client sends a message to the server, requesting for a connection to be established between them. In this type of data exchange, session management is possible. Thus, the data transfer between the client and the server is reliable. This is achieved by implementing an acknowledgement mechanism at the WTP layer, by which the destination acknowledges each packet on its arrival—similar to what T does. Also, a session can be suspended, if for some reason, there are connection problems. It can be resumed later, at the same point where it was suspended.

Connectionless session services These services operate directly over the transport layer, i.e., Wireless Datagram Protocol (WDP). As the name suggests, there is no guarantee of a successful communication between the client and the server in this method of communication. Therefore, it is a best-effort approach. Clearly, there is no acknowledgement mechanism implemented in layers below WSP, in this case. In either case, for starting a new session, the client invokes a WSP method, which includes parameters such as the server address, the client address and other standard communication headers. This is very similar to what happens in case of a HTTP request. That is why, many times, WSP is called as binary HTTP. WSP defines all methods that are defined in HTTP. As mentioned previously, the difference between HTTP and WSP is that whereas HTTP commands are text commands, WSP commands are in the binary form, and travel between the WAP client and the WAP gateway only and not up to the server. The typical message exchange involved between a WAP client, a WAP gateway and the origin server in a WSP session is as follows. 1. A WAP-enabled phone enters a URL to a specific WML document. As we know, this can be done in a number of ways. For example, it could be entered by the , or it can be the result of clicking a hyper link on a displayed WML page. The micro browser running inside the mobile device sends a request consisting of a Get-PDU to the WAP gateway. A PDU (Protocol Data Unit) is similar to the HTTP request/response structures. It contains the various requests/responses going between the WAP client the WAP gateway. The PDU is a slightly different depending on whether the client and server interact in the connection oriented WSP mode or the connectionless WSP mode. The Transaction Id (TID) field must be omitted when using connection-oriented mode (because the connection would be already established). However, it must be sent in each PDU when using connectionless mode (to identify which transaction is going on). In connectionless WSP, the TID is ed to and from the session as the transaction id parameter of the session primitive. Like HTTP-request and HTTP-response, there are various types of PDUs (called as PDU types) such as Push-PDU, Get-PDU and Reply-PDU. A typical PDU structure is shown below in Fig. 15.28.

Wireless Internet

571

Fig. 15.28

Protocol Data Unit (PDU) in WSP

2. The type field in the PDU tells how to interpret the type-specific contents. The WSP specification defines the allowed types and their assigned number. 3. The WAP gateway receives the PDU and parses it. It obtains the URL from the PDU, and uses it to make a T connection to the specified origin server. This happens exactly like any T connection between a Web browser and a Web server. This is shown in Fig. 15.29.

Fig. 15.29 Interaction from the mobile device to the WAP gateway and then to the origin server 4. Once the connection is established, the WAP gateway sends an HTTP request for the document to the origin server, as shown in Fig. 15.30.

Fig. 15.30

The HTTP request going from the WAP gateway to the origin server

5. In response, the origin server either retrieves the HTML page (if the request was for a static Web page) by requesting the operating system to find that file from the disk and load it into memory, or executes a server-side program (if the request was for executing a server-side program such as ASP, JSP, servlet, etc.). In either case, it constructs and sends an HTTP response back to the WAP gateway. This is shown in Fig. 15.31.

Web Technologies

572

Fig. 15.31

Origin server sends back an HTTP response back to the WAP gateway

6. The WAP gateway receives and decodes the HTTP response. Specifically, it examines the variable content-type inside the HTTP response structure. 7. If the content-type is specified as WML, then the WAP gateway compiles the WML page into binary code. This is because, many times the server can generate the WML code directly. Otherwise, if it is HTML, it first translates the HTML code into WML, and then compiles the WML thus generated, into the corresponding binary code. This is shown in Fig. 15.32.

Fig. 15.32

WAP gateway converts HTML into WML and then into bytecode

8. The gateway computes the content-length of the message thus generated, and builds a Reply-PDU. This is the counterpart of the Get-PDU, which we saw earlier. 9. The gateway now sends the Reply-PDU to the WAP client, in the form of a WAP Response. This is shown in Fig. 15.33.

Wireless Internet

573

Fig. 15.33

The Reply-PDU in the form of WAP response

10. The micro browser inside the WAP client receives the Reply-PDU, extracts the binary form of the WML text inside it, and interprets it to present its contents to the .

15.4.8 The Transaction Layer—Wireless Transaction Protocol (WTP) The Wireless Transaction Protocol is similar to the T or UDP protocols of the T/IP model. It provides services to ensure reliable or non-reliable transactions, depending on what the has chosen to do. It runs on top of the transport layer (WDP), or over the optional security layer (WTLS). It allows the applications to decide about what kind of reliability and efficiency are required. Like T, it performs the segmentation of a message into multiple packets, and then reassembling them at the destination. Like T, WTP has a provision for sequencing packets, so that missing packets can be identified at the destination, and duplicate packets discarded. WTP achieves the flexibility of reliable or non-reliable communications by providing for three different kinds of mechanisms: unreliable request, reliable request and reliable request with one response message.

Unreliable request Similar to the way UDP works, in an unreliable request, the message sender sends a request and hopes that the destination gets it. However, it immediately forgets about this transmission. The destination knows this, and does not bother to send an acknowledgement back to the sender. Thus, it is a case of a best effort delivery, as shown in Fig. 15.34. Actually, the word Transaction is a misnomer in this case, since there is a single message transfer without any regard to its successful/unsuccessful delivery.

Fig. 15.34 Unreliable request

Web Technologies

574

Reliable request This is very similar to how T works, but with a slight difference, considering the differences between the wired and the wireless world. In case of T, the sender starts a timer, sends a message and waits for an acknowledgement from the destination. If the timer elapses before an acknowledgement arrives from the destination, the sender retransmits the message. In case of reliable requests, however, the sender does not start a timer, unlike T. Instead, it just sends a message to the destination. The destination acknowledges it and also stores the received message for some time. If the sender does not receive the acknowledgement for some pre-specified time, it requests the destination for the acknowledgement. Then the destination sends the acknowledgement again to the sender. This is shown in Fig. 15.35.

Fig. 15.35 Reliable request Reliable request with one response message This is the third possibility with WTP. Here, the sender sends a request to the destination. The destination responds with an acknowledgement (called as response message). The original sender then acknowledges the acknowledgement itself. The original sender also maintains a copy of its acknowledgement, in case the destination does not receive it the first time, so that it can retransmit it. Finally, the transaction ends at the destination, as shown in Fig. 15.36.

Fig. 15.36 Reliable Request with one Response Message

15.4.9 The Security Layer—Wireless Transport Layer Security (WTLS) The wireless world is more vulnerable to security issues as compared to the wired world, as the number of parties involved is more, and the chances of people not taking proper security measures when on move are

Wireless Internet

575 significantly higher. As a result, the WAP protocol stack includes the Wireless Transport Layer Security (WTLS) as an additional layer, which is not found in other similar protocol stacks. WTLS is optional. It is based on the Transport Layer Security (TLS) protocol, which, in turn, is based on the Secure Socket Layer (SSL) protocol. When present, WTLS runs on top of the transport layer of WAP (WDP). As we know, SSL has made tremendous impact on the way e-commerce transactions can be conducted in the traditional Internet world. SSL allows two parties involved in a transaction to make it totally secure and reliable. WTLS makes similar attempts in the wireless world. WTLS ensures four advantages: privacy, server authentication, client authentication and data integrity.

Privacy ensures that the messages ing between the client and the server are not accessible to anybody else. Encrypting the messages, as discussed earlier, does this.

Server authentication gives the client a confidence that the server is indeed what it is depicting, and not someone who is posing as the server, with or without malicious intentions. Client authentication on similar lines, gives the server a confidence that the client is indeed what it is depicting, and not someone who is posing as the client, with or without malicious intentions.

Data integrity ensures that no one can tamper with the messages going between the client and the server, by modifying their contents in any manner. Figure 15.37 shows how the communication between a WAP client and the origin server can be made secure. Between the WAP client and the WAP gateway, we have WTLS to ensure a secure-mode transaction. Between the WAP gateway and the origin server, SSL takes care of security, as usual. Thus, the WAP gateway performs the translations between WTLS and SSL in both directions.

Fig. 15.37 WTLS and SSL security The conversion between WTLS and SSL is a major point for debate. This is because the WAP gateway first converts WTLS text into plain text and then applies SSL (or vice versa). Therefore, it has access to the non-encrypted message in its original form! The WAP gateway performs this conversion in its memory and never stores any portions of it on its disk. Clearly, if it stores it on its disk, it can be a major cause for worry. However, even the fact that it performs this conversion in its memory has not made many people quite happy about the amount of security thus provided. They feel that even a momentary lapse here could cause havoc. As

Web Technologies

576 a consequence, many banks, merchants and financial institutions ing WAP transactions prefer to have their own WAP gateways to make sure that the WTLS-to-SSL and SSL-to-WTLS conversion is under their control. The most important difference between SSL and WTLS is that SSL needs a reliable transport layer, i.e., T underneath for it to guarantee a secure mode of transaction between the client and the server. In contrast, in case of WAP, the reliable/unreliable mode of transactions is decided by protocols above WTLS (namely, by WTP and WSP). Therefore, WTLS does not require a reliable transmission mode. In other words, it can work as well with unreliable mode of transport, which is not possible with SSL. To achieve this, WTLS defines a sequence number field in its frame, which is not done in case of SSL. Instead, SSL relies on T to perform sequencing and error checking.

15.4.10 The Transport Layer—Wireless Datagram Protocol (WDP) The transport layer is represented by the Wireless Datagram Protocol (WDP), which is the bottommost layer in the WAP stack. As expected from any good transport layer protocol, WDP shields the upper layers of the WAP stack from the intricacies and unnecessary details of the underlying communication media. WDP ensures a smooth, error-free communication between a mobile device and its base station over the wireless network. The actual implementation of WDP depends on the underlying bearer services (i.e., the transport mechanism such as IP). The closer the bearer services to implementing IP, the less WDP has to adapt to suit it. If the bearer is already supplying IP as the underlying protocol, WDP uses UDP as the datagram protocol at this layer. WDP offers more or less the same functionality as UDP does. WDP uses source and destination port numbers for multiplexing and de-multiplexing data. For sending a datagram, it uses fields like destination address, destination port number, source address, and source port number, etc. The source and destination addresses could be simply telephone numbers, IP addresses or any other unique identifiers, as agreed upon by all parties. As remarked before, if the bearer uses IP as the underlying protocol, WDP uses the capabilities of IP for segmentation and reassembly of data. However, if it is not IP, WDP has to provide for these features. This more or less completes our discussion on WAP.

SUMMARY l l

l l

l

l

l

Mobile IP protocol is needed to handle the case of mobile devices. Traditional Internet works on the principle that devices and routers/servers are stationary. However, mobile devices move—hence, we need modifications to the basic Internet architecture. Mobile IP works on principles that are quite similar to those of moving someone’s home. Mobile IP uses the concepts of home address and co-located address to deliver datagrams correctly to the right mobile host, even when it is on the move. Mobile T is needed because traditional Internet believes that the underlying communication medium is unreliable, whereas the end points are not. However, in the case of mobile Internet, the opposite is true, and we need to handle it. The General Packet Radio Service (GPRS) technology allows the GSM phone subscribers to be able to access the Internet while on the move. GPRS data rates are low, and it is expected to be an intermediate solution.

Wireless Internet

577 l

l

l

l l

l

l

l

l

l

The Wireless Application Protocol (WAP) is a communication protocol that enables wireless mobile devices to have an access to the Internet. In case of the WAP architecture, we have an additional level of interface: the WAP gateway, which stands between the client (browser) and the Web server. The job of the WAP gateway is to translate client requests to the server from WAP to HTTP, and on the way back from the server to the client, from HTTP to WAP. The wireless Internet based on WAP uses a special tag language called as Wireless Markup Language (WML). WML is a highly optimized language that is invented keeping in mind all the shortcomings of mobile devices, which suits these devices very well. A scripting language called as WMLScript can also be used for client-side scripting. The WAP stack consists of five layers: the application layer, the session layer, the transaction layer, the security layer and the transport layer. The application layer is also called as Wireless Application Environment (WAE). This layer provides an environment for wireless application development, similar to the application layer of the T/IP stack. The session layer is also called as Wireless Session Protocol (WSP). It provides methods for allowing a client-server interaction in the form of sessions between a mobile device and the WAP gateway. This is conceptually similar to the session layer of the OSI model. The transaction layer is also called as Wireless Transaction Protocol (WTP) in the WAP terminology. It provides methods for performing transactions with the desired degree of reliability. Such a layer is missing from the T/IP and OSI models. The security layer in the WAP stack is also called as Wireless Transport Layer Security (WTLS) protocol. It is an optional layer, that when present, provides features such as authentication, privacy and secure connections—as required by many modern e-commerce and m-commerce applications. The transport layer of the WAP stack is also called as Wireless Datagram Protocol (WDP) and it deals with the issues of transporting data packets between the mobile device and the WAP gateway, similar to the way T and UDP work.

REVIEW QUESTIONS Multiple-choice Questions 1. The concept of is used in mobile IP. (a) starvation (b) tunnelling (c) bridging (d) ing 2. is one of the ways of handling T effectively in mobile networks. (a) Fast T (b) Slow T (c) Buffered T (d) Transactional T 3. The gateway that stands between the mobile network and the Internet in GPRS is called as (a) CCSN (b) SGGN (c) SGSN (d) GGSN 4. The transforms HTTP requests and responses to WAP. (a) Web server (b) WAP browser (c) WAP database (d) WAP gateway 5. WAP internally uses the language. (a) HTML (b) WML (c) HDML (d) XML

.

Web Technologies

578 6. 7. 8.

9.

10.

is the equivalent of JavaScript or VBScript in WAP. (a) WML (b) WMLLive (c) WMLScript (d) JScript Excepting physical layer, WAP consists of protocol layers. (a) 4 (b) 5 (c) 6 (d) 7 WSP is equivalent to the in the OSI model. (a) physical layer (b) transport layer (c) session layer (d) application layer WDP is equivalent to the in the OSI model. (a) physical layer (b) transport layer (c) session layer (d) application layer WTLS stands for . (a) Wireless Transport Layer Security (b) Wireless Transaction Layer (c) Wireless Technology Layer Specifications (d) Wireless Transit Layer Security

Detailed Questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Discuss how mobile IP works. What is tunnelling? How does it work? What will happen if mobile IP does not exist? Why do we need to worry about T in mobile networks? Discuss the various ways of overcoming T problems in mobile networks. What is GPRS? How does it work? Describe the internal architecture of a WAP gateway. Describe the WAP stack in brief. Discuss the main tags of WML. How is WMLScript different from Web-based scripting languages?

Exercises 1. Write a small WML page that displays your name with a message ‘I am happy’. Do the same in J2ME. 2. Write the necessary WML code that accepts rate, quantity and calculates the bill amount and displays it to the . 3. In the same WML code, do not accept the quantity if it is either 0 or above 5 by using client-side validations in the form of WMLScript. 4. Investigate technologies such as CDMA, GSM, WiFi, WiMax. 5. Study at least one mobile operating system and development environment. What are their key features?

Appendix

579

Appendix

WEB 2.0 Introduction Web 2.0 refers to second-generation of Web based communities and hosted services, such as social networking, sites, wikis (Online information system that can be edited by any visitor) and folksonomies ( generated classification used to categorize and retrieve Web content)—that facilitate collaboration and sharing between s. Web 2.0 indicates improved form of the World Wide Web. Technologies such as Weblogs (Blogs), social bookmarking, wikis, podcasts (a digital media file or a series of such files, that is distributed over the Internet using syndication feeds for playback on portable media players and personal computers), RSS feeds (and other forms of many-to-many publishing), social software, Web APIs, Web standards and online Web services imply a significant change in Web usage. Web 2.0 can also refer to one or more of the following: n

n

n

It enables the communication between incapable information system and sources of content and functionality. It facilitates generating and distributing Web content itself, characterized by open communication, decentralization of authority, and freedom to share and re-use. It provides enhanced organization and categorization of content, emphasizing on deep linking (making a hyperlink that points to a specific page or image on another website, instead of that website’s main or home page).

Key Principles and Characteristic of Web 2.0 Web 2.0 means more than design element like glossy buttons, large colorful fonts and “wet-floor” effect. Any Web 2.0 Web site may exhibits some basic common properties. The may include: n n n n

The Web as a platform—delivering (and allowing s to use) applications entirely through a browser. Data—the driving force. Architecture of participation : The system which facilitate to add his contribution. A rich, interactive, -friendly interface base on AJAX.

Web Technologies

580 n n n n

Lightweight business models (Keeping it simple) enabled by content and service combination. The end of the software release cycle (“the perpetual beta”). Software above the level of a single device. Some kind of social-networking aspect.

Technical Innovations Associated with Web 2.0 The following lists of technical innovations have set the foundation for Web 2.0. n

n

n n n n n

Web-based applications and desktops: l Richer -experience—AJAX, Web Office. l Several browser-based “operating systems” or online desktops, WebEx, meta. Rich Internet applications with use of AJAX, Adobe Flash, Flex and OpenLaszlo to improve experience. Server-side software. Client-side software. XML and RSS (Really Simple Syndication—also known as “Web syndication”). Specialized protocols (FOAF & XFN for social networking). Web protocols (REST and SOAP).

In one way or the other “Web 2.0” is formed on principles demonstrated by success stories of Web 1.0 and interesting new applications.

Web 2.0 Core Principles The Web as a Platform The Web is considered as platform rather than as an information medium. Google pioneered this concept and began a native Web application delivering a “service at no cost” to the customers.

Appendix

581 Overture (Now Yahoo!) and Google also figured out how to enable ad placement on virtually any Web page. Similarly eBay enables transactions between single individuals, acting as an automated intermediary. Other Web 2.0 success stories demonstrate this same behavior of making innovative use of data.

Collective Intelligence An essential part of Web 2.0 is harnessing collective intelligence, turning the Web into a kind of global brain. Rather than collecting data one should use the data to one’s own advantage, like Google page rank. It allowed s to rank the page when they use the search and this information is then fed back to get more relevant results. Companies like Nike are using people to get new design ideas through blogs. Some financial companies are using blogs to understand needs of people and are creating products like loans on the favorable to the s.

The Architecture of Participation Web 2.0 companies set inclusive defaults for aggregating data and building value as a side-effect of ordinary use of the application. The architecture of the Internet, and the World Wide Web, as well as of open source software projects like Linux, Apache, and Perl, is such that s build collective value as an automatic by-product. Each of these projects has a small core, well-defined extension mechanisms, and an approach that lets any well-behaved component be added by anyone. Data Management Every significant Internet application to date has been backed by a specialized database: Google’s Web crawl, Yahoo!’s directory (and Web crawl), Amazon’s database of products etc. Database management is a core competency of Web 2.0 companies, so much so that we have sometimes referred to these applications as “infoware” rather than merely software.

End of the Software Release Cycle—the Perpetual BETA Operations must become a core competency. Software will cease to perform unless it is maintained on a daily basis. s must be treated as co-developers, in a reflection of open source development practices (even if the software in question is unlikely to be released under an open source license.) The open source dictum, “release early and release often” in fact has morphed into an even more radical position, “the perpetual beta,” in which the product is developed in the open, with new features slipstreamed in on a monthly, weekly or even daily basis.

Lightweight Programming Models lightweight programming models that allow for loosely coupled systems. Simple Web services, like RSS and REST-based Web services, are syndicating data outwards. Design for “hackability”. Web 2.0 will provide opportunities for companies to beat the competition by getting better at harnessing and integrating services provided by others. Software above the Level of a Single Device One other feature of Web 2.0 that deserves mention is the fact that it’s no longer limited to the PC platform but has extended to devices like Hand held PC, mobiles, digital music and storage devices like iTunes and TiVo. These are not Web applications per se, but they leverage the power of the Web platform, making it a seamless, almost invisible part of their infrastructure.

Web 2.0 in Financial Services industry More and more financial services institutions will use Web 2.0 concepts and technologies both internally and externally to make their services and applications richer and more compelling to s. The following could be some of the Web 2.0 uses in financial industry. 1. Improved Web interfaces that mimic the real-time responsiveness of desktop applications within a browser window. 2. Improved communication between people via social-networking technologies.

Web Technologies

582 3. Improved communication between separate software applications. 4. Financial services applications like Social lending, in which borrowers and lenders come together without the involvement of a bank, could benefit from Web 2.0. 5. Information and knowledge gathered from people’s blogs to identify target markets, create project teams and discover unvoiced conclusions. 6. Intuitive page building— should see on the home page what she often visits. 7. Rather than hosting on single costly machines, software can be hosted on multiple redundant cost effective machines like in the case of Google and Yahoo. 8. Use Mashup technology to build a complex site rather that go for a big-bang solution. 9. Easier integration with help of Rich Internet applications (RIA) and Use technologies like SOA that complement the RIA. 10. Intellectual content development via Collective intelligence. 11. Use blogs to give executives an informal channel for employee and customer discussions. 12. RSS feeds to funnel news and data into system and other data subscribers also the subscriber can customize the information according to their own preferences. 13. Capture ’s trail on the Web site to understand s behavior and needs from Web site and improve on them. 14. Extend the interface to mobile and other devices.

Glossary n

Web syndication A form of syndication in which a section of a website is made available for

n

Syndication A group of individuals or organizations combined or making a t effort to undertake

other sites to use (RSS).

n

n

n

n

n

some specific duty or carry out specific transactions or negotiations. Social bookmarking A way for Internet s to store, classify, share and search Internet bookmarks. Blog (Web log) A website where entries are written in chronological order and displayed in reverse chronological order. REST REST is a simple interface that transmits domain-specific data over HTTP without an additional messaging layer such as SOAP or session tracking via HTTP cookies. RSS A family of Web feed formats used to publish frequently updated content such as blog entries, news headlines or podcasts. An RSS document, which is called a “feed,” “Web feed,” or “channel,” contains either a summary of content from an associated Web site or the full text. Social software Softwares that enable people to rendezvous, connect or collaborate through computer-mediated communication (IM, Chat, Forums,Weblogs/Blogs, Wikis, Collaborative real time editor (Google Docs) and prediction Markets.

What is RSS? RSS (Really simply syndication) is one of the formats used in publishing latest contents such as blogs, news headlines or podcasts on Internet Web sites. RSS is: 1. An easy way to distribute latest news

Appendix

583 2. A lightweight XML format 3. Used to improve traffic An RSS document is called as “feed”, “Web feed” or “channel”. This document contains summary of the actual content. This document is in XML format and there are various links available in the document. When clicks on a particular link, the corresponding Web page is displayed in the browser. All the major Web sites such as Google News, BBC, CNN, and NDTV, provide the feature of RSS feeds. Interested s can subscribe to these feeds by using RSS reader so that they can receive updated content from such Web sites.

Web feed and RSS Aggregators Web feed means providing the latest content to the subscribed s. Web feed is provided by Web sites. Web feed is also regularly updated summary content. A web feed is a document which contains web links. A has to subscribe to a particular website’s feed by using feed reader. There can be many Web feeds across various Web servers during the particular time period. The feed can be ed using the Web sites or the programs that syndicate from the feed. All the web feeds can be collected using Aggregator or news readers. RSS feed format is based on XML and it is not easily understandable by humans. Hence, to interpret the RSS contents “news reader” or “aggregator” programs are used. A needs to subscribe to an aggregator. For example, Google Reader is an aggregator provided by Google and Yahoo News is provided by Yahoo. Google Reader provides news from various top news sites such as BBC news, ESPN and Google news etc. Apart from these, a can subscribe to any other sites as well. The aggregator or reader checks continuously or after certain time interval, as defined by the for new contents and s them from that site. Thus, the can have all the updated links from various Web sites in one single window of the browser. Clearly, this is “pulling” of information by the end s. Feeder programs can be Web-based (accessed as a Web Service) or client-based (desktop-based). If feeders multimedia data, this kind of RSS data is called as a podcast.

Web Technologies

584

Web Syndication Web syndication is a method/process where some part of the Website is made available to the other Websites to use. It is a process in which Web feeds are made available so that others can get recently added material on the Web site, such as news. Thus, Web syndication helps both the Web sites by providing the information and displaying that information. Web syndication helps in exchanging the information in automated and structured format and it reduces the time also. RSS can be treated as a mini database which contains headlines and description about the latest updates on the Web site.

RSS Example 1. A real life example is shown here with the help of Google Reader. The signs in to Google Reader. After successfully g in to the Google Reader, the home page will be displayed, which will have all the updated links from all the default and subscribed Web sites. This will show the latest Web feeds that are updated and consolidation is done from all default sites and subscribed links with the help of Web syndication.

Appendix

585 2. If the clicks on any URL above, an appropriate screen will be displayed.

3. From here, the can go to the actual contents.

Web Technologies

586

MASHUP Mashup—Overview Mashup is a Web application that integrates data from more than one source or Web sites. Contents used in mashup generally come from other sources or third party using public interfaces or APIs provided by that source. These interfaces/APIs are exposed in the form of Web Services. In simple , a Web site that uses data, services and functionality from another Web site is called as Mashup. However, simply linking of to another Web site through an HTML hyperlink can not be called as Mashup. There are various services provided by Web sites that generate different type of contents. Mashup means integrating services and contents from multiple Web sites. can see this information on the screen but dose not have the knowledge about the source of the particular information. Integration of the services and content in a smooth fashion happens in the background. Usage of Mashup is increasing at extremely high rate. Majority of mashup are using map services such as those provided by Yahoo Maps and Google Maps. Although Mashup is being heavily used in map services, they are not limited to the map services only.

Mashup styles There are two mashup styles: Server side mashup and Client side mashup.

Server-side mashup In server-side mashup, the integration of services and contents happens at server side. Here, the server acts as a proxy between Web application on the client and on other Web sites. Here, the client makes requests to the server and the server makes calls to the other Web site.

Appendix

587 The above diagram can be explained as follows: 1. The client makes a request to server of its Web site. The request could be an AJaX request in the form of an XmlHttpRequest object. 2. The request is received by the Web component (such as a Java servlet). The request is processed by a Java class, which called as a proxy class. 3. The proxy class opens the connection to the other web site that provides the information. 4. The mashup site receives the request and processes and returns response to the proxy class. 5. The proxy class receives the response and converts into proper data format. 6. The Web component delivers the data to the client and client receives the response. 7. Finally, the client’s page is updated. The benefits of this approach are as follows: 1. 2. 3. 4.

Proxy is used as buffer between the client and the other Web site. In this style, only required data can be sent to the client and in small chunks. Transformation of data and manipulation of data is possible before sending it to the client. Security can be handled in a more efficient manner.

It suffers from the following possible issues: 1. Using server side mashup can result into significant delay since the request goes to Web sever of the main Web site and then to the other Web site. The same happens with the response also. 2. The proper security measure should be in place to protect server side proxy from unauthorized use.

Client-side mashup In client-side mashup, the services and the content are integrated at client side. Here, the client mashup directly interacts with other Web site’s data.

Web Technologies

588 The above diagram can be explained as follows: 1. 2. 3. 4. 5.

Browser makes the request for the Web page on its Web site. In response to the request made by the client, the Web server of the main Web site returns some data. This data is encoded by the client and the address of the other Web site is retrieved. The connection to the other Web site is made and required data is retrieved. Finally, client’s view is refreshed.

Following are the advantages of this approach: 1. No server side Web component is required. 2. Performance wise, client side mashup is better; since response and request go directly from browser to the mashup server and back. 3. It also reduces the load of the server since the server side proxy is not responsible for processing of the request and response. Following are the issues in this approach: 1. No buffer is provided. 2. Some times, other Web site return large data and it is difficult to handle this much of data at the client side. 3. No transformation of the data and data manipulation happens before the data is sent to the client. 4. Handling security requirements are difficult at the client.

REST PROTOCOL What is REST? Today, Web Services can be written in two ways: n

n

Using the traditional Remote Procedure Call (RPC) mechanism, which uses Simple Object Access Protocol (SOAP) as the means of communication between a client and a server. Using REST, which is far simpler; as defined below.

REST (Representation State Transfer) is a simple mechanism of accessing Web Services. REST describes how the resources pertaining to Web Services are defined and addressed. It is an alternative to the traditional SOAP/RPC technologies. REST has nothing to do with the implementation details and which technology is used. REST uses the following standards/protocols: n n n

HTTP—For remote access to resources. URL—For defining end access points. XML/HTML/JPEG/GIF—As means of data representation.

For example, i-flex may define a resource called as “flexcube”. Then the client can access this resource with the following URL: http://www.iflexsoltions.com/products/flexcube

Appendix

589 When the accesses this URL, the representation of this resource is returned (i.e.flexcube.html). At this point, this representation places the client application in a state. Flexcube.html may have several other hyperlinks (i.e. representations) and the can access all such links (i.e. representations). The new representation places the client application into a different/new state. Thus, the client application changes state with each resource representation i.e. it transfers state. Combining these keywords (representation, state, and transfer), we have the acronym REST.

REST Features 1. Statelessness The basic highlight of the REST philosophy is the statelessness approach. To overcome the drawbacks of the stateless HTTP protocol, we know that developers need to provide for session management in their applications. For instance, they need to use session objects, cookies, URL rewriting, etc. However, REST goes back to the traditional stateless approach. This means that each request from the client to the server goes with all the details to understand the request and cannot depend on the any stored on past information. Therefore, it should be clear that the application interacts with the resource just by knowing two things: (1) the identifier of the resource, and (2) the action required. Other things such as the past information of that client or intermediaries (i.e. the session state information) are not needed. Application designers need to keep this in consideration. 2. for only HTTP methods How can a Web Services client access a Web Service? If it is an RPC/SOAP kind of Web Service, the client can call methods on the objects exposed as Web Services [e.g. .transfer (100);]. In contrast, with REST, we can only use HTTP-based methods such as GET, POST, DELETE, etc. In an RPC, an application is made up of remotely accessible objects and each object has different methods which can be invoked as and when required. The client needs to be aware of identity of the object before trying to accesses these methods, so that client can locate the objects in the first place. In REST, the client can interact with the resources and navigate using hyperlinks without the knowledge of the resources.

RESTful Web Service Example Web Services based on the REST approach are called as RESTful Web Services. Let us look at an example of creating Web Services from the REST perspective. ABC Publications has deployed some Web Services to enable its customer to do the following: n n n

Get the list of available books Get detailed information about a book Submit a purchase order (PO)

Get the list of books The appropriate Web Service would make available resource to the book list resource. For example client would use this URL to get the book list: http://www.abublications.com/books If the client submits this URL, the XML document containing a list of all the books would be returned the client. The implementation of this Web Service is completely transparent to the client and ABC Publication co can modify the underlying implementation of this resource without impacting clients.

Web Technologies

590

As we can see, every book entry has a link to get the detailed information about that book. This is a key feature of REST.

Get detailed information about the book The Web Service makes available a URL for each book. For example, to view the details of the book 00123, this would the URL: http://www.abublications.com/book/00123 The following would be the document received by the client after submitting the above request. 00123 Information Technology> This book is useful to understand the concepts of IT
200.00


Again, there is a link to see the detailed description about the book. Each response document allows the client to drill down to get more detailed information.

Submit a purchase order In this situation, Web services makes available to submit the purchase order to the customer. The client creates the PO in required format let’s say XML and submit that XML (using HTTP POST method). The PO service would take that XML and do the necessary processing and additionally it will provide URL to the client so that client and edit that PO in the future. 00123 ABC 2007-08-30-A567

REST VS SOAP SOAP and REST are two main techniques to work with Web Services. In this article, we will compare their pros and cons. But before that, let us quickly recap the basic concepts in Web Services. A Web Service is a software service provided by a server (implemented as a program) and can be used by a client. In other words, Web Services allow different providers and consumers to speak with each other over a network; irrespective of the technologies, operating systems etc. Earlier, Remote Procedure Call (RPC)

Appendix

591 techniques were in use and technologies such as DCOM, RMI, CORBA, or plain RPC were used for this purpose. There are some important buzzwords in web services. (a) WSDL (Web Service Description Language) An XML document which provides the description about the Web Service. (b) UDDI (Universal Discovery Description and Integration) The registry of the Web Services and /client can take the help of this to find out different Web Services. (c) SOAP and REST We shall examine these now.

What is SOAP? SOAP stands for Simple Object Access Protocol. SOAP uses XML format to exchange data. It can also be considered as a free-form message format based on XML standards. An XML message encapsulated inside a SOAP envelope (which contains header and footer to identify each message uniquely) travels on top of the HTTP protocol. Usage of XML makes SOAP platform and language independent. To access any resource using SOAP, the client needs to call that particular service. For example, when a client wants to check the balance in her bank , the client would send a SOAP request to and receive a SOAP response from the Web Service.

What is REST? REST stands for Representational State transfer. It is an architecture used for describing the Internet and to access it as well. It is much simpler way than traditional RPC/SOAP to access the Web Services. It does not use any new standard for accessing Web Services. It relies on the traditional HTTP, URL, HTML, XML, and GIF etc. It is light weight and reduces the burden from the server. It is stateless in nature and needs to make use of information pieces such as cookies, URL rewriting etc. Naturally, it does not maintain the session state automatically. According to the REST style, each resource can be identified by a unique URI (Universal Resource Identification) and it can be accessed by the Web Service. Standard HTTP interface is used in the form of methods such as GET, POST, PUT, and DELETE. According to the principle of REST, each resource should be classified based on its usage. Also a good URI should be assigned to that resource. The following samples should help us understand the differences between SOAP and REST.

Web Technologies

592

RESTful Example: Online Book Purchase

SOAP Example: Online Book Purchase

Technology Comparison REST

SOAP

Uses the existing infrastructure such as HTTP, URL and XML/HTML,GIF etc.

Uses the existing infrastructure and additionally SOAP standards.

A unique URL identifies one resource.

Generic interfaces are used to group and identify many resources together.

Focus is on performance.

Focus is on integration of distributed applications.

Appendix

593

Protocol Comparison REST

SOAP

Request is URI and the result is XML.

Request is SOAP and response is also SOAP.

HTTP is application layer protocol.

HTTP is more like a transport protocol.

Synchronous in nature.

s both synchronous and asynchronous operations.

State Management REST

SOAP

Stateless—each request to the server must contain all the necessary information to process the request.

May maintain conversion state across multiple message exchanges.

Cookies, URL rewriting and Hidden form fields have to be explicitly used for session management.

Session Headers can be added to the SOAP envelope itself to maintain session.

Security REST

SOAP

Security is handled by HTTP/HTTPS.

SOAP security extensions are defined by WS-Security.

SSL 1.0 is used.

XML encryption and XML Signature can be used.

Design REST

SOAP

Identify the resources that can be exposed as services. Define URL address to the resources.

Define Services and operations into WSDL document. Define data model for the messages exchanged by the service.

Distinguish the resource based on GET, PUT, POST and DELETE methods.

Choose appropriate transport protocol, security and transactional polices.

Implement and deploy on Web server

Implement and deploy on Web Services container.

Web Technologies

594

XHTML—THE NEW HTML STANDARD Introduction XHTML Stands for EXtensible HyperText Markup Language. It is a combination of HTML and XML. It is expected to replace HTML slowly but surely. Syntax-wise it is Identical to HTML but addresses the poor coding standards of HTML. XHTML mandates strict adherence to coding rules. XHTML is a W3C recommendation and all the new browsers XHTML. There are three main parts in an XHTML documents. n n n

DOCTYPE head body

An XHTML example is shown below. Sample XHTML 1z1z7

This is a sample XHTML file.



XHTML Document Types There are three Document Type Definition (DTD) validation types, which describe the allowed syntax and grammar in an XHTML document.

1. Strict

The XHTML Strict document type separates the HTML tags and their presentation-related specifications by using Cascading Style Sheet (CSS). For example, the font type and size for a text tag would be specified in a separate CSS file.

2. Transitional

Appendix

595 If we are using older version of browsers that do not recognize CSS or in case of transformation from HTML to XHTML where presentation part is included in HTML then you can give preference to Transitional DOCTYPE.

3. Frameset

This is simply XHTML 1.0 transitional with added elements to HTML frame-related tags, namely , , and <noframes>.

Benefits of XHTML Syntax Checking There are so many Web pages containing bad HTML. The main reason behind this is that the HTML language rules are not implemented strictly by the Web browsers. Following is an example of bad HTML, but this page will surely work well in all kinds of browsers and will produce the specified output. HTML Sample 411t3h

HTML Sample 411t3h

Hello

How are you? The strictness of XHTML is useful in today’s world where there are so many channels in use such as browsers, mobile phones, PDAs, etc. Extensibility XML documents are required to be well-formed (elements nesting properly). Under HTML, the addition of a new group of elements requires alteration of the entire DTD which leads to wait for next HTML version. In an XML-based DTD, the new set of elements has to be internally consistent and wellformed which can be added to an existing DTD. This greatly eases the development and integration of new collections of elements. Portability By being strict about well-formed tags, XHTML requires less processing power and less complicated algorithms to render. This means that XHTML can be displayed by devices with less processing power, such as mobile phones. Major Differences Between HTML and XHTML 1. Elements must be in lowercase in XHTML (optional in HTML) Invalid Syntax: 1x3f28

Invalid Syntax



Web Technologies

596 Valid Syntax:

Valid Syntax



2. Elements must be paired with a closing tag in XHTML (optional in HTML) Invalid Syntax:



Valid Syntax:



3. Elements must be properly nested in XHTML (optional in HTML) In HTML, element’s closure sequence can be improper, like this: Sample Code

In XHTML, element’s closure sequence should be proper, like this: Sample Code

4. Structure must be separated from content For example, the

tag is a content tag (paragraph) so we cannot include a table inside it for example, because a table is a format construct. We can, however, put the

tag inside tags with no problem because the content goes in the construct, not the other way around. 5. XHTML documents must have one root element All XHTML elements must be nested within the root element. Nested elements can have sub elements correctly nested in parent element. ... ...

The W3C Markup Validation Service The Markup Validator is a free service by W3C that helps check the validity of Web documents, hosted at the URL http://validator.w3.org/#validate_by_input. Examples below show a valid and invalid XHTML document.

Appendix

597

Valid XHTML

Invalid XHTML (which is Valid HTML)

Index

#''

Index

.NET 236 .NET framework 237 3-D Secure 425, 427

A abstract factory 495 access control 341, 343, 428 Access Control List (ACL) 343 acknowledgement 23, 92 acquirer 414, 423, 428 acquiring bank 410, 412 active open 89, 134 Active Server Pages (ASP) 290 active Web page 234, 332, 333 ActiveX control 234, 276, 524, 525 adaptive bridge 34 address 27 resolution 68 Address Resolution Protocol (ARP) 45, 52, 58, 68, 69 address spoofing 382 ADO.NET 255 Advanced Research Project Agency (ARPA) 39 AJAX 191, 217 Alert Protocol 368, 376 American Standard Code for Information Interchange (ASCII) 18, 438 amplifier 30 annotation 330 anonymous electronic money 430, 431 Anonymous Offline electronic money 432, 433 Apache 148 Axis 529

Tomcat 148 Xerces 495 Applet 334 application 7 gateway 385 layer 9, 19, 45, 46, 367 server 507, 522 ARP query datagram 69 request 74 request message 70 ARPA 40 ARPAnet 39 ASP.NET 148, 235, 319, 527, 529 asymmetric key cryptography 357 Asymmetric key encryption 353, 359 asynchronous 4, 11 attenuation 29 attribute 451 authentication 340, 341, 358, 366, 394, 427, 428, 430, 575 Authentication Header (AH) 389, 394, 397 authorization network 409 availability 341, 343

B Bandwidth 11 base class library 238 Base Station 551 Base-64 encoding 124, 128 bastion host 386 Bean-managed persistent entity bean best effort 74, 75, 81

330

Index

$ best-effort delivery 56 birthday attack 362 bit rate 10 Bit synchronization 11 blinded money 431 Bootstrap Protocol (BOOTP) 73, 74 bridge 30, 31, 32, 33 broadband 11 BS 551 BSD 40 UNIX 40 bus 11 Business tier 280, 282 Business-to-Business (B2B) 445 Business-to-Consumer (B2C) 445 button 171

C card association 412 Care of address 545, 546 Cascading Style Sheets (CSS) 175, 443 Certification Authority (CA) 357, 414 character sets 18 check box 167, 169, 170 Checksum 94, 96 cipher text 351, 352 circuit gateway 385 class 60, 62 class A 66 class C 66 clear text 350 clearing 410 client computer 40 client tier 280 server 147 clogging attack 401 co-located care-of address 546 code g 278 collaborative authoring 40 collision 3, 58, 362 Common Gateway Interface (CGI) 182 Common Language Runtime (CLR) 238 Common Language Specifications (CLS) 237 Common Object Request Broker Architecture (CORBA) 508, 510, 515, 517, 518, 519, 520, 521, 522, 523, 524, 525, 528 communication modes 4

Component Object Model (COM) 523 Component Transaction Monitors (CTM) 505 compression 7, 128, 130 computer address 48 concurrent 98 confidentiality 340, 341, 358, 366, 390, 397, 427 Configurable address 50 congestion 13 attack 401 control 15 window 549 connecting device 28 connection 7 connection management 506 connectionless 74 container 284 Container-managed persistent entity bean 330 control connection 133, 134, 135, 136, 138 Controller 315 cookie 300 correction 16 CRC 5, 6, 27, 45, 51, 58, 81 credit card 408 credit card processing 410 Cryptanalysis 348 cryptography 348 Cryptology 348 CSMA/CD 27, 58, 60 CTM 506 custom tag 304 Cyclic Redundancy Check (CRC) 4, 359

D data binding 255 compression 18 connection 139 encryption 7, 18 integrity 394, 575 link 7, 8, 33 Data Encryption Standard (DES) 352 data link control layer 3 data link layer 4, 5, 9, 11, 12, 13, 45, 46, 69, 81 data transfer connection 133, 135, 136 Database Management System (DBMS) 439 datagram 14, 15, 58, 545 approach 5, 6, 12 DCOM 524, 525, 528

Index

$ decompression 7, 18 decryption 7, 18 deserialization 504 design pattern 495 destination address 5 Destination unreachable 75, 77 Diffie-Hellman 401, 403 Diffie-Hellman Key Exchange 355, 400 digital cash 428 certificate 355, 356, 357, 420, 421, 422 envelope 357, 419, 421 enveloping 128 signature 128, 130, 131, 358, 359, 364, 365, 366, 401, 421, 422, 428 direct delivery 63 Discovery 547, 548 Distributed Component Object Model (DCOM) 523 distributed components architecture 509 distributed database 105 objects 505 objects architecture 509 processing 3 DNS poisoning 346 server 108 spoofing 346 DNSSec (Secure DNS) 348 Document Object Model (DOM) 466, 490 Document Type Definition (DTD) 455 DOM 494, 495 domain 101, 106, 107 domain name 101, 103, 104, 105, 107, 108, 114, 117, 148, 149 Domain Name System (DNS) 101, 105, 107, 346 Domain Name System Server (DNS Server) 107 dotted decimal notation 65 double spending problem 432, 433 drop-down list 167, 172 DTD 489 dual signature 418 duplex 4 duplication 95 control 82 Dynamic address 50 Dynamic Host Configuration Protocol (DH) 74

dynamic packet filter 384 Web page 233, 234

E EBCDIC 18 EIS tier 280, 282 EJB container 282 electronic cash 428 electronic commerce 408 Electronic Data Interchange (EDI) 440, 445 electronic mail (email) 19, 101, 108 electronic money 428, 429, 430, 431, 432 element 451 email 109, 112, 114 email (SMTP) 45 address 112, 113 client 112 message 123 server 111, 112, 120 Encapsulating Security Payload (ESP) 390, 397 encoding standards 18 encryption 128, 131 end to end delivery 80 end-to-end layers 8 Enterprise JavaBeans (EJB) 282, 297, 325 entity bean 327, 329 ephemeral port number 139 error control 3, 4, 5, 12, 14, 15, 26, 30, 81, 96 error detection/recovery 4, 16, 23 Ethernet 14, 24, 25, 27, 45 Ethernet frame 27 Expression Language (EL) 303 Extensible Markup Language (XML) 436 external DTD 457, 466 external style sheet 177, 179, 180

F fabrication 341 FDDI 24, 25 FDM 11 File Transfer Protocol (FTP) 19, 45, 101, 132 firewall 379, 380, 381, 383, 389, 397, 400 flow control 3, 5, 12, 13, 14, 15, 96 foreign agent 545, 546 network 548 form 167, 168

Index

$ forwarding 5 fragment 57 fragmentation 54 Fragmentation offset 54 frame 7, 11, 46, 158, 159, 160 frame format 37, 45 freeware 40 FTP 39 client 135 command 138 server 135 full-duplex 10, 11, 17, 18, 84

G garbage collection 238 Gateway GPRS Service Node 551 General Packet Radio Service 550 go-back-n 14 GPRS 550

H half-duplex 4, 10, 11, 17, 18 handshake protocol 368, 369, 374, 375 hardware address 50 hash 359, 389 HMAC 375 home address 545, 546 agent 545, 546, 548 network 548 page 145 host id 55 name 104 number 60, 61, 62, 64 HTML control 242, 247 form 182 request 149, 150, 180 response 149 Server 243 HTML server control 243, 247 HTML table 161 hybrid 11 Hyper Text Markup Language (HTML) 147, 153, 442 Hyper Text Transfer Protocol (HTTP) 147

hyperlink 157 hypertext 146

I identified electronic money 430, 431 Identified Offline electronic money 432 Identified Online electronic money 432 IDL 518 IIOP 510, 512, 517, 520, 525 IIS 148 indirect delivery 63 T 549 inline 179 Inline style sheet 178 integrity 340, 342, 397 interception 341 Interface Definition File (IDL) 503 Interface Definition Language (IDL) 508, 515 internal DTD 457, 466 internal style declaration 179 style sheet 177, 179 internet 22 Internet Assigned Number Authority (IANA) 61 Internet Control Message Protocol (ICMP) 45, 75 Internet Explorer 146, 174, 175 Internet Inter-ORB Protocol (IIOP) 515, 519 Internet Key Exchange (IKE) 392 Internet layer 45, 80, 92 Internet Mail Access Protocol (IMAP) 122 Internet Protocol (IP) 45, 50 Internet Security Association and Key Management P 400 Internet Service Provider (ISP) 61, 346, 388 internetwork 22, 39 internetworking 22, 23 device 28 interruption 343 IP 56, 57, 60 address 49, 55, 60, 61, 62, 67, 68, 69, 70, 71, 74, 86, 87, 90, 101, 104, 105, 108, 544, 545 datagram 53, 55, 57, 84, 85, 92, 96, 97, 545, 551 header 545 new generation (IPng) 387 Next Generation (IPng) 67

Index

$! protocol 39 Security (IPSec) 387 sniffing 344 spoofing 344 version 4 (IPv4) 67, 387 version 6 (IPv6) 67, 387 addresses 12 within-IP 548 IPSec 388, 389, 391, 392, 400, 405 ISAKMP 402, 403 ISAKMP/Oakley 400 Issuer 414, 426 issuing bank 410, 412 iterative 97

J J2EE 5.0 279 Java 2 279 Java 2 Enterprise 280 Java 2 Enterprise Edition 5.0 279 Java 2 Standard 280 Java API for XML Processing (JAXP) 494 Java applet 234, 332 Archive (JAR) 302 Database Connectivity (JDBC) 304 Development 279 EE 280 Enterprise Edition 5.0 279 Naming and Directory API (JNDI) 331 Persistence API 282 Remote Method Protocol (JRMP) 521 Runtime 280 SE 280 Server Pages (JSP) 283, 290 Servlet 148, 527 Virtual Machine (JVM) 238, 276, 282, 332, 334, 522 JavaBean 297, 298, 318 JavaScript 191, 192, 290 JavaServer Faces (JSF) 318 JAXP 495 JDK 279 JEE 5.0 279 JSP 148, 291, 298 Standard Template Library (JSTL) 302

K key agreement 355 exchange 355 pair 129, 353 ring 129

L LAN 23, 24 Layer 2 Tunnelling Protocol (L2TP) 405 layers 1, 7 learning bridge 34, 35 load balancing 505 Local Area Networks (LANs) 13 local 186 Location transparency 327 logical 12 address 15, 37, 49, 52 connection 16 Longitudinal Redundancy Check (LRC) 359 loosely-coupled 527 loss control 81 loss-less compression 375

M MAC 394 Mail Transfer Agent (MTA) 116 mailbox 110, 111, 113, 115, 120 markup 153 marshalling 504, 516 master secret 373 Maximum Transmission Unit 383 Media Access and Control (MAC) 3, 13, 14, 45 Medium 11 mesh 11 Message Authentication Code (MAC) 375 message digest 130, 359, 360, 361, 362, 363, 364, 365, 366, 389, 401 Message Driven Bean (MDB) 327 message integrity 342 Message Oriented Middleware (MOM) 506, 507 message queue 507 Microsoft Intermediate Language (MSIL) 237, 238 middleware 502 MIME header 125

Index

$" mobile host 545, 548 IP 544, 546 T 544, 549 Model 315 Model-View-Controller (MVC) 315 modification 342 Mosaic 40, 146 MTU 383 Multi- 327 multicast address 62 multicasting 62 Multiplexing 11 multipoint 13 multiport bridge 35 Multipurpose Internet Mail Extensions (MIME) 124

N naming services 522 negative acknowledgement (NAK) 1, 2, 14 Netscape Navigator 40, 146, 174, 175 network 8, 55 congestion 6 id 55 Network Interface Card (NIC) 24, 30, 34, 51, 52, 55, 58 network layer 6, 7, 11, 12, 13, 14, 16, 25 number 60, 61, 62, 63, 64, 66 Network Virtual Terminal (NVT) 187, 188 network-layer address 37 networking device 28 next hop 58 node 3 noise 29 non-persistent 94 non-repudiation 341, 342, 343, 358

O Oakley 403 Oakley protocol 401 Object Request Broker (ORB) 505, 512, 515 offline electronic money 430, 432 one-time session 419 symmetric key 130, 421

online electronic money 430, 432 payments 408 open source code 40 Open System Interconnection (OSI) 7 ORB 514, 518, 519 ordered list 164 origin server 552 OSI 25

P packet 2, 5, 6, 7, 14, 15 filter 381, 382, 383, 384, 385 format 39 number 5 size 23, 26 sniffing 344 spoofing 344 switching 56, 80 parallel 11 parser 487 parsing 487 ive open 89, 134, 136 payment gateway 410, 411, 414, 418, 419, 420, 422 PayPal 433 Persistence 326 connection 95 pharming 346 Phishing 344 physical 7, 8 address 11, 12, 13, 14, 27, 37, 48, 49, 50, 51, 52, 68, 69, 70, 71, 73, 74 layer 3, 4, 9, 10, 11, 28, 29, 45, 47 plain text 350, 351 Point Of Sale (POS) 409 point-to-point connections 14 Point-to-Point Tunnelling Protocol (PPTP) 405 port 85 address 50 number 87, 88, 90, 93, 96, 136, 139 port-to-port communication 83 portal 146 positive acknowledgement (ACK) 1, 2, 14 Post Office Protocol (POP) 120 pre-master secret 372, 373 presentation 7 layer 9, 18

Index

$# Pretty Good Privacy (PGP) 128 primary class 62 privacy 575 Privacy Enhanced Mail (PEM) 128, 130 private key 129, 130, 353, 354, 358, 364, 365, 366, 372, 401, 419, 420, 421, 429 proprietary source code 40 protocol 1, 4 proxy server 385 pseudo-terminal driver 188 public key 129, 130, 353, 354, 355, 356, 357, 358, 365, 418, 419, 429, 430 Public Key Cryptography Standard (PKCS) 132 public key encryption 353, 401 public-private key 129

R radio button 167, 169, 170 RARP 73 RARP query datagram 71 Record Protocol 368, 374 Redirect 76 regenerator 29, 547 registration 548 relay agent 73 reliability 95 Remote awareness 326 remote (TELNET) 19, 45, 187 Remote Method Invocation (RMI) 521 Remote Procedure Call (RPC) 503, 504, 512, 521, 527 repeater 28, 29, 30, 31, 32 replay attacks 395, 396 resolver 108 result tree 482 retransmission 14, 92 Reverse Address Resolution Protocol (RARP) 45, 71 ring 11 RMI 521, 522, 523, 524, 525, 528 root element 451 tag 451 router 6, 9, 14, 24, 25, 35, 38, 39, 66, 81, 381, 389, 544, 545, 546 routing 5, 15, 45, 544 algorithm 6, 13, 23, 25, 26 table 6, 12, 25

RPC 528 RS 232-C 10 RSA 403 Run Length Encoding (RLE)

138

S SAX 494, 495 schema 466, 489 secret key 394 encryption 352, 401 Secure Electronic Transaction (SET) 413 Secure MIME (S/MIME) 130 Secure Socket Layer (SSL) 366, 575 Security Association (SA) 392 Security Association Database (SAD) 393 segment 15, 16, 30, 31, 57, 81, 84 selective retransmission 549 sequence control 82 sequencing 16 serial 11 serial/parallel 4 serialization 504 server computer 40 control 317 Service Oriented Architecture (SOA) 503, 526 Serving GPRS Node 551 Servlet 282, 290, 291 container 282 Lifecycle 285, 287 session 7 bean 327, 328 ID 300 layer 9, 16, 17, 18, 45 management 300 state management 300 settlement 410, 411 SGSN 551 Signal encoding 11 signed ActiveX control 276 applet 276 Simple API for XML (SAX) 490 simple bridge 33 Mail Transfer Protocol (SMTP) 101, 116 Object Access Protocol (SOAP) 529 simplex 4, 10, 11, 17, 18

Index

$$ skeleton 504, 521 sliding window 2, 14, 45, 94, 395 slow start 549 snooping 344 T 549 socket 87 number 60 source address 5 quench 76 tree 482 Specific address 50 spoofed address 344 spool 112, 116 SSL 367 handshake 371 layer 367 star 11 stateful packet filter 384 session bean 328 stateless protocol 150, 299 session bean 328 Static address 50 Web page 232 store and forward 3 store and forward packet switching 39 stub 504, 521, 524 style 175 sheet 177, 481 sub-domain 106 sub-network 30 subnet 14 symmetric key cryptography 397 encryption 352 Synchronization 14 synchronous 4, 11

T table 161 tag 451 T 56, 57, 60 connection 86, 88, 94, 95, 117, 118, 122, 134, 147, 149, 180, 544 segment 85, 92, 93

TDM 11 TELNET 39, 185, 187 client 188 server 188 terminal driver 186 TERminal NETwork 187 text box 168, 170 field 167 TFTP server 141 three-way handshake 90 Time Division Multiplexing, (TDM) 5 Time to live 55, 76 time-out 2, 4 Token Ring 25, 27 Tomcat 148 Topology 11 transaction management 310, 326 Transaction Processing monitors 520 transactional T 549 Transmission Control Protocol (T) 23, 45, 80, 95 Transmission Control Protocol/Internet Protocol 23, 44 Transmission mode 11 type 11 transport 7 layer 6, 10, 13, 15, 16, 17, 18, 45, 46, 80, 367 Transport Layer Security (TLS) 575 transport mode 391, 392, 398, 399 Trivial File Transfer Protocol (TFTP) 45, 101, 141 tunnel mode 390, 391, 392, 397, 398, 399 tunnelling 546, 547, 548

U UDP datagram 95, 96 un-marshaling 504, 516 Uniform Resource Locator (URL) 145, 148, 157 universal address 37, 49 Universal Discovery Description and Integration 529 universal protocol 44 service 23 unordered list 164 URL rewriting 301 agent 115

Index

$% datagram 46 Datagram Protocol (UDP)

45, 95

V validation control 250 Value Added Networks (VAN) 441, 446 VBScript 290 View 315 virtual circuit 5, 6, 14, 56, 84 communication 9 connection 84, 89, 97 network 27, 45, 50 path 9 Virtual Circuit Number—VCN 6 Virtual Private Networks (VPN) 403 VPN tunnel 404

W W3 Consortium 40 WAE agent, WTA agent and WAP stack 555 WAN 23, 24 WAP browser 553 device 553 gateway 552 request 553 response 554 stack 556 Web browser 40, 87, 89, 146, 148, 149, 150, 153, 154, 157, 180, 188, 191, 366, 367 Web container 148, 280 Web page 145, 147, 150, 151, 152, 153, 180 Web server 40, 87, 89, 103, 145, 147, 148, 149, 150, 180, 184, 189, 191, 243, 315, 318, 319, 366, 367, 522, 551 Web server control 247 Web Service 237, 280, 503, 526, 527, 528, 529

Web Services Description Language (WSDL) 529 Web site 103, 145 tier 280 Web-based email 123 well-known port 73, 87, 116, 120 Wide Area Networks (WANs) 14 Wireless Application Environment (WAE) 557 Wireless Application Environment agent 555 Wireless Application Protocol (WAP) 544, 552 Wireless Bitmap (WBMP) 559 Wireless Datagram Protocol (WDP) 558, 576 Wireless Markup Language (WML) 554 Wireless Markup Language Script (WMLScript) 555 Wireless Session Protocol (WSP) 557, 570 Wireless Telephony Applications agent (WTA us 556 Wireless Transaction Protocol (WTP) 558, 573 Wireless Transport Layer Security (WTLS) 558, 574 with pipelining 95 without pipelining 95 WMLScript 567 World Wide Web (HTTP) 45 World Wide Web (WWW) 19, 39, 145, 146 World Wide Web Consortium (W3C) 146, 175

X X.25 14 XML parsing 443 Stylesheet Language (XSL) 443 XSL Formatting Objects (XSL-FO) 480 Transformation Language (XSLT) 472 XSLT processor 480 style sheet 473, 474, 479


Related Documents 3h463d

Web Technologies Nodrm 56584c
December 2019 103
Web Technologies Black Book y2127
October 2019 98
Web Technologies Notes 3t172z
November 2019 107
Advanced Java And Web Technologies 4d4i2a
November 2021 0
Web-technologies-lecture-notes-unit-8.pdf 1x2ib
December 2019 38
Empowerment Technologies - The Features Of Web 2 1q1xe
December 2019 78