HTTP协议的那些事儿

2017/03/22 网络

HTTP is a simple protocol. The client establishes a TCP connection to the server, issues a request, and reads back the server’s response. The server denotes the end of its response by closing the connection. The file returned by the server normally contains pointers to other files that can reside on other servers. The simplicity seen by the user is the apparent ease of following these links from server to server.

The client requests are simple ASCII lines and the server’s response begins with ASCII lines (headers) followed by the data (which can be ASCII or binary). It is the client software (the browser) that parses the server’s response, formatting the output and highlighting links to other documents.

HTTP协议工作在应用层,是目前WWW万维网的基础,大部分的网络访问都是基于HTTP协议。在安全需求高的地方需要使用HTTPS协议来增强安全性,这一点暂时不在这里的讨论范围。

这篇文章主要通过wireshark抓包来分析一次HTTP交互过程中通信的内容,HTTP协议是借助TCP协议来实现的,由于HTTP是无状态的协议,所以每一次交互都需要进行TCP的三次握手和四次挥手。下面我们重点会分析一下协议头、状态码、以及常见的头域。

Message Types

对于HTTP 1.0版本,请求报文的格式是

request-line (request request-URI HTTP-version)
headers (0 or more)
<blank line>
body (only for a POST request)

而响应报文的格式是

status-line(HTTP-version response-code response-phrase)
headers (0 or more)
<blank line>
body

request

Three requests are supported.

  1. The GET request, which returns whatever information is identified by the request-URI.
  2. The HEAD request is similar to the GET request, but only the server’s header information is returned, not the actual contents (the body) of the specified document. This request is often used to test a hypertext link for validity, accessibility, and recent modification.
  3. The POST request is used for posting electronic mail, news, or sending forms that can be filled in by an interactive user. This is the only request that sends a body with the request. A valid Content-Length header field (described later) is required to specify the length of the body.

response-code

状态码出现在响应报文中,也就是服务器发送给客户端的数据包中。下面是常见的状态码:

200 OK, request succeeded
204 OK, but no content to return
200 OK, request succeeded
304 Document has not been modified
401 Unauthorized; request requires user authentication
404 Not found
500 Internal server error

交互过程

使用浏览器访问我的博客alienx.cn,使用wireshark完整的记录下HTTP报文的交互过程。 注意到这里主要是使用GET请求获得网页的内容,并且可以简单的看到每一次响应的状态码,其中出现了200、304状态码,前者表示一切正常,后者表示文件未被修改。我们挑选其中一个请求报文,查看下面的内容: 这是一个GET请求,HTTP版本是1.1,访问的路径是主机的根目录,每一个头域都由域名、双引号、域值和换行(\r\n)组成。 这是服务器对上一个GET请求的响应报文,其中状态码200表示一切正常,还包含了Date、Server等头域。


我们再分析另一个数据包,其中请求报文如下: 注意到其中的If-Modified-Since头域,它表示客户端想询问服务器,自己已经缓存了这个请求的文件,请问服务器那边这个文件有没有修改,如果有修改的话请重新发送一份。 因为这个文件并没有修改,所以服务器返回一个304状态码,不需要重复传送这个文件。但是,请不要忘记,这个过程依旧需要TCP的三次握手和四次挥手。

总结

这篇文章主要讲了HTTP协议的头部以及常见的交互过程,观察上图会发现HTTP是无状态协议,每一次传送文档都需要单独建立TCP连接,在《TCP/IP详解 卷I》中关于这点的描述是“The Biggest performance problem associated with HTTP is its use of one TCP connection per line.”。

思考

HTTP协议是无状态的,为了保持用户会话状态使用了什么技术去弥补?如果使用该技术时,用户禁用cookie之后,还有什么方式实现?

Search

    Table of Contents