Lab Project #2, Part B: Proxy Cache

In this lab, you will develop a small web proxy server which is also able to cache web pages. This is a very simple proxy server which only understands simple GET-requests, but is able to handle all kinds of objects, not just HTML pages, but also images. This will give you a chance to get to know one of the most popular application protocols on the Internet- the Hypertext Transfer Protocol (HTTP). When you're done with the assignment, you should be able to configure your web browser to use your personal proxy server as a web proxy.

Overview: HTTP Proxies

Ordinarily, HTTP is a client-server protocol. The client (usually your web browser) communicates directly with the server (the web server software). However, in some circumstances it may be useful to introduce an intermediate entity called a proxy. Conceptually, the proxy sits between the client and the server. In the simplest case, instead of sending requests directly to the server the client sends all its requests to the proxy. The proxy then opens a connection to the server, and passes on the client's request. The proxy receives the reply from the server, and then sends that reply back to the client. Notice that the proxy is essentially acting like both a HTTP client (to the remote server) and a HTTP server (to the initial client). Why use a proxy? There are a few possible reasons:

Assignment Details

Reference Code

The code is divided into three classes as follows:

Your work will be to complete the proxy so that it is able to receive requests, forward them, read replies, and return those to the clients. You will need to complete the classes ProxyCache, HttpRequest, and HttpResponse. The places where you need to fill in code are marked with /* Fill in */. Each place may require one or more lines of code.

NOTE: As explained below, the proxy uses DataInputStreams for processing the replies from servers. This is because the replies are a mixture of textual and binary data and the only input streams in Java which allow treating both at the same time are DataInputStreams. To get the code to compile, you must use the -deprecation argument for the compiler as follows:

        javac -deprecation *.java

If you do not use the -deprecation flag, the compiler will refuse to compile your code!

Running the Proxy

Running the proxy is as follows:

       java ProxyCache port
where port is the port number on which you want the proxy to listen for incoming connections from clients.

Configuring Your Browser

You will also need to configure your web browser to use your proxy. This depends on your browser. In Internet Explorer, you can set the proxy in "Internet Options" in the Connections tab under LAN Settings. In Netscape (and derived browsers, such as Mozilla), you can set the proxy in Edit->Preferences and then select Advanced and Proxies.

In both cases you need to give the address of the proxy and the port number which you gave when you started the proxy. You can run the proxy and browser on the same computer without any problems.

Proxy Functionality

The proxy works as follows.

  1. The proxy listens for requests from clients
  2. When there is a request, the proxy spawns a new thread for handling the request and creates an HttpRequest-object which contains the request.
  3. The new thread sends the request to the server and reads the server's reply into an HttpResponse-object.
  4. The thread sends the response back to the requesting client.

Your task is to complete the code which handles the above process. Most of the error handling in the proxy is very simple and it does not inform the client about errors. When there are errors, the proxy will simply stop processing the request and the client will eventually get a timeout.

Some browsers also send their requests one at a time, without using parallel connections. Especially in pages with lot of inlined images, this may cause the page to load very slowly.

Programming Hints

Most of the code you need to write relates to processing HTTP requests and responses as well as handling Java sockets.

One point worth noting is the processing of replies from the server. In an HTTP response, the headers are sent as ASCII lines, separated by CRLF character sequences. The headers are followed by an empty line and the response body, which can be binary data in the case of images, for example.

Java separates the input streams according to whether they are text-based or binary, which presents a small problem in this case. Only DataInputStreams are able to handle both text and binary data simultaneously; all other streams are either pure text (e.g., BufferedReader), or pure binary (e.g., BufferedInputStream), and mixing them on the same socket does not generally work.

The DataInputStream has a small gotcha, because it is not able to guarantee that the data it reads can be correctly converted to the correct characters on every platform (DataInputStream.readLine() function). In the case of this lab, the conversion usually works, but the compiler will flag the DataInputStream.readLine()-method as deprecated and will refuse to compile without the -deprecation flag.

It is highly recommended that you use the DataInputStream for reading the response.


(Bonus points) Possible Extensions

While it may not be obvious at first, proxies are very flexible tools that can serve a number of different purposes on the web. Common uses for proxies include improving giving performance boosts to dial-up users (through caching and pre-fetching), privacy protection (through anonymous proxies), content filtering and blocking (used in many "NetNanny"-type applications), and content transformation. Sample Proxy Applications:

When you have finished the basic assignment, you can try the following extensions for bonus points.