DNS & Proxy Server

Project

Autor: Nir Ben-Yosef ID#: 02745185

Table of Contents

1. Introduction

1.1 General Description

1.2 Definitions, Acronyms and Abbreviations

1.3 References

2. High Level Design Chart

3. Protocols In Use

3.1 HTML - HyperText Markup Language

3.2 HTTP - Hyper Text Transfer Protocol

3.3 TCP/IP Protocol

3.4 DNS - Domain Name System

4. Modules

4.1 DNS module

4.2 httpProxy module.

4.3 http_protocol module.

4.4 http utils module

5. Appandix A

5.1 httpProxy module code

5.2 dns module code

5.3 http protocol code

5.4 http utils code

5.5 main progarm

5.6 makefile

1. Introduction

1.1 General Description

A proxy server is a special HTTP server that typically runs on a firewall machine. The proxy waits for a request from inside the firewall, forwards the request to the remote server outside the firewall, reads the response and then sends it back to the client.

In the usual case, the same proxy is used by all the clients within a given subnet. This makes it possible for the proxy to do efficient caching of documents that are requested by a number of clients.

The ability to cache documents also makes proxies attractive to those not inside a firewall. Setting up a proxy server is easy, and the most popular Web client programs already have proxy support built in. So, it is simple to configure an entire work group to use a caching proxy server. This cuts down on network traffic costs since many of the documents are retrieved from a local cache once the initial request has been made.

When a normal HTTP request is made by a client, the HTTP server gets only the path and keyword portion of the requested URL ; other parts, namely the protocol specifier "http:" and the host name are obviously clear to the remote HTTP server - it knows that it is an HTTP server, and it knows the host machine that it is running on. The requested path specifies the document or a CGI script on the local file system of the server, or some other resource available from that server.

When a client sends a request to a proxy server the situation is slightly different. The client always uses HTTP for transactions with the proxy server, even when accessing a resource served by a remote server using another protocol, like Gopher or FTP. However, instead of specifying only the path name and possibly search keywords to the proxy server, the full URL is specified. This way the proxy server has all the information necessary to make the actual request to the remote server specified in the request URL, using the protocol specified in the URL.

From this point on, the proxy server acts like a client to retrieve the document. However, the presentation" on the proxy actually means the creation of an HTTP reply containing the requested document to the client. For example, a Gopher or FTP directory listing is returned as an HTML document.

A DNS server is a distributed database that is used by TCP/IP applications to map between host names and IP addresses. Each site maintain its own database of information and runs a server program that other systems across the Internet can query.

1.2 Definitions, Acronyms and Abbreviations

HTTP - Hyper Text Transfer Protocol.

HTML - Hyper Text Markup Language

DNS - Domain Name Server

1.3 References

rfc1034.html

rfc1541.html

rfc1945.html

2. High Level Design Chart

3. Protocols In Use

3.1 HTML - HyperText Markup Language

The Hyper Text Markup Language (HTML) is a simple data format used to create hypertext documents that are portable from one platform to another. HTML documents constructed of generic semantics that are appropriate for representing information from a wide range of domains.

Though many people speak of "HTML Programming" with a capital P, HTML is really not a programming language at all. HTML is exactly what it claims to be: a markup language. You use HTML to mark up a text document, just as you would if you were an editor using a red pencil. The marks you use indicate which format (or presentation style) should be used when displaying the marked text.

Everything you create in HTML relies on marks, or tags, like these. To be a whiz-bang HTML programmer, all you need to learn is which tags do what.

3.2 HTTP - Hyper Text Transfer Protocol

The Hypertext Transfer Protocol (HTTP) is an application-level protocol with the lightness and speed necessary for distributed, collaborative, hypermedia information systems. It is a generic, stateless, object-oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). A feature of HTTP is the typing of data representation, allowing systems to be built independently of the data being transferred.

HTTP has been in use by the World-Wide Web global information initiative since 1990.

Practical information systems require more functionality than simple retrieval, including search, front-end update, and annotation. HTTP allows an open-ended set of methods to be used to indicate the purpose of a request. It builds on the discipline of reference provided by the Uniform Resource Identifier (URI) , as a location (URL) or name (URN) , for indicating the resource on which a method is to be applied. Messages are passed in a format similar to that used by Internet Mail and the Multipurpose Internet Mail Extensions (MIME).

HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet protocols, such as SMTP, NNTP, FTP, Gopher [1], and WAIS, allowing basic hypermedia access to resources available from diverse applications and simplifying the implementation of user agents.

The HTTP protocol is based on a request/response paradigm. A client establishes a connection with a server and sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta information, and possible body content.

Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection (v) between the user agent (UA) and the origin server (O).

request chain ------------------------>

UA -------------------v------------------- O

<----------------------- response chain

A more complicated situation occurs when one or more intermediaries are present in the request/response chain.

There are three common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or parts of the message, and forwardinthe reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying server's protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.

request chain -------------------------------------->

UA -----v----- A -----v----- B -----v----- C -----v----- O

<------------------------------------- response chain

The figure above shows three intermediaries (A, B, and C) between the user agent and origin server. A request or response message that travels the whole chain must pass through four separate connections. This distinction is important because some HTTP communication options may apply only to the connection with the nearest, non-tunnel neighbor, only to the end-points of the chain, or to all connections along the chain. Although the diagram is linear, each participant may be engaged in multiple, simultaneous communications. For example, B may be receiving requests from many clients other than A, and/or forwarding requests to servers other than C, at the same time that it is handling A's request.

Any party to the communication which is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request. The following illustrates the resulting chain if B has a cached copy of an earlier response from O (via C) for a request which has not been cached by UA or A.

request chain ---------->

UA -----v----- A -----v----- B - - - - - - C - - - - - - O

<--------- response chain

Not all responses are cachable, and some requests may contain modifiers which place special requirements on cache behavior. Some HTTP/1.0 applications use heuristics to describe what is or is not a "cachable" response, but these rules are not standardized.

On the Internet, HTTP communication generally takes place over TCP/IP connections. The default port is TCP 80 , but other ports can be used. This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used, and the mapping of the HTTP/1.0 request and response structures onto the transport data units of the protocol in question is outside the scope of this specification.

Except for experimental applications, current practice requires that the connection be established by the client prior to each request and closed by the server after sending the response. Both clients and servers should be aware that either party may close the connection prematurely, due to user action, automated time-out, or program failure, and should handle such closing in a predictable fashion. In any case, the closing of the connection by either or both parties always terminates the current request, regardless of its status.

3.3 TCP/IP Protocol

TCP/IP is a set of protocols developed to allow cooperating computers to share resources across a network. It was developed by a community of researchers centered around the ARPAnet. Certainly the ARPAnet is the best-known TCP/IP network.

First some basic definitions. The most accurate name for the set of protocols we are describing is the "Internet protocol suite". TCP and IP are two of the protocols in this suite. (They will be described below.) Because TCP and IP are the best known of the protocols, it has become common to use the term TCP/IP or IP/TCP to refer to the whole family.

The Internet is a collection of networks, including the Arpanet, NSFnet, regional networks such as NYsernet, local networks at a number of University and research institutions, and a number of military networks. The term "Internet" applies to this entire set of networks. The subset of them that is managed by the Department of Defense is referred to as the "DDN" (Defense Data Network). This includes some research-oriented networks, such as the Arpanet, as well as more strictly military ones. (Because much of the funding for Internet protocol developments is done via the DDN organization, the terms Internet and DDN can sometimes seem equivalent.) All of these networks are connected to each other. Users can send messages from any of them to any other, except where there are security or other policy restrictions on access. Officially speaking, the Internet protocol documents are simply standards adopted by the Internet community for its own use. More recently, the Department of Defense issued a MILSPEC definition of TCP/IP. This was intended to be a more formal definition, appropriate for use in purchasing specifications. However most of the TCP/IP community continues to use the Internet standards. The MILSPEC version is intended to be consistent with it. Whatever it is called, TCP/IP is a family of protocols. A few provide "low-level" functions needed for many applications. These include IP, TCP, and UDP. Others are protocols for doing specific tasks, e.g. transferring files between computers, sending mail, or finding out who is logged in on another computer. Initially TCP/IP was used mostly between minicomputers or mainframes. These machines had their own disks, and generally were self-contained.

Thus the most important "traditional" TCP/IP services are:

These services should be present in any implementation of TCP/IP. These traditional applications still play a very important role in TCP/IP-based networks. Although people are still likely to work with one specific computer, that computer will call on other systems on the net for specialized services. This has led to the "server/client" model of network services. A server is a system that provides a specific service for the rest of the network. A client is another system that uses that service. Here are the kinds of servers typically present in a modern computer setup. Note that these computer services can all be provided within the framework of TCP/IP.

3.4 DNS - Domain Name System

Host name to address mappings were maintained by the Network Information Center (NIC) in a single file (HOSTS.TXT) which was FTPed by all hosts.

The total network bandwidth consumed in distributing a new version by this scheme is proportional to the square of the number of hosts in the network, and even when multiple levels of FTP are used, the outgoing FTP load on the NIC host is considerable. Explosive growth in the number of hosts didn't bode well for the future.

The Domain Name System, or DNS, is a distributed database that is used by TCP/IP application to map between hostnames an IP addresses. The therm distributed is used because no single site on the Internet knows all the information. Each site maintains its own database of information and runs a server program that other systems across the Internet (clients) can query. The DNS provides the protocol that allows clients and servers to communicate with each other.

From an application's point of view, access to the DNS is through a resolver. The resolver is accessed primarily through two library functions: gethostbyname() & gethostbyaddr().

The resolver contacts one or more name servers to do the mapping.

4. Modules

4.1 DNS module

The DNS module serves as a Domain name system server in front of an outside Client (Netscap, etc...).

The DNS Server initialize a UDP Socket in Port 80 in which the Server waits for DNS requests for IP Addresses form outside clients. After receiving the request the server extracts the host name to which an IP address is returned from the DNS request, creates a new DNS header and adds at the end of the original request the IP address for the specified Host.

In case the requested host has no entry in the internal hosts table a name error message returned to the Client.

DNS Module Services:

This service is called at the initiation of the program and starts the DNS Server operation, creates the UDP socket on port # 53 and waits for Clients requests.

This service loops for ever listening to the UDP socket on port 80 for DSN requests. After receiving a request it calls the dnsProcessRquest service which handles the request and returns the requested page to the Client.

This service parse the received DNS request. Extracts the host name, looks for it in the internal hosts table. Builds a proper response message containing the host IP number / name error message back to the Client.

4.2 httpProxy module

The HTTP Proxy module serves as a pipe between an outside Client and an outside Server in a double Client-Server model. When it acts both part alternately.

Http Proxy Module Services:

This service is called at the initiation of the program and starts the http proxy operation, creates the TCP/IP socket on port # 80 and waits for Clients requests.

This service loops for ever listening to the TCP/IP socket on port 80 for http requests. After receiving a request it calls the http_protocol module which handles the request and returns the requested page to the Client.

4.3 http Protocol module

This module contains all of the HTTP Protocol services the Proxy server needs:

This service reads the request from the socket, interprets and handles it.

This service intrpets the metode of the http request.

This service process the URL of the request received from the socket.

This service reads the request from the socket.

This service reads all of the mime headers from the http request.

4.4 http utils module

This service inserts a mime header into the hash table.

This service returns arequested mime header value.

This service prints the mime headers to the specified socket.

This function closes the mime headers table.

This function initiate a new mime headers table.

5. Appandix A

5.1 httpProxy module code

5.2 dns module code

5.3 http protocol code

5.4 http utils code

5.5 main progarm

5.6 makefile

Makefile File Code

END OF DOCUMENT