HTTP实时处理是公司产品的特色,今天在造数据包过程中,着实被HTTP某些字段给拦住了,每次看到HTTP PARSER的时候,被无数种情况弄得根本没动力看下去,可是网络中相比成熟的TCP/IP,应用层HTTP显得更像个迷,复杂并迷人着
Web信息都是存储在Web服务器上,也就是HTTP服务器,Web server使用的是HTTP协议,资源有文本文件,图片,视频等等;HTTP客户端就是Web浏览器,客户端发出HTTP请求之后,服务器会在HTTP响应中回送所请求的数据,当然前提是返回成功
一切文字都是浮云,抓包分析才是真理~!
打开tshark监听80端口,存取PCAP文件格式的数据包,当然用tcpdump也可以
[root@2015 ~]# tshark port 80 -w 80.pcap Running as user "root" and group "root". This could be dangerous. Capturing on eth0
同一台机器,wget发起一个HTTP request,从wget过程中可以看到IP层的IP地址和TCP层的端口号80,可见HTTP是基于可靠协议TCP的
[root@2015 ~]# wget www.baidu.com --2015-02-02 21:51:35-- http://www.baidu.com/ Resolving www.baidu.com... 180.97.33.108, 180.97.33.107 Connecting to www.baidu.com|180.97.33.108|:80... connected. HTTP request sent, awaiting response... 200 OK
终端抓包,tshark解析,加上-V选项,将数据包的详细信息打印出来,重定向到一个文件当中来分析
[root@2015 ~]# tshark port 80 -w 80.pcap Running as user "root" and group "root". This could be dangerous. Capturing on eth0 64 ^C [root@pprobe2 ~]# tshark -r 80.pcap -V > 80 Running as user "root" and group "root". This could be dangerous.
vim打开文件80.,可以看到HTTP报文详细信息,可见HTTP报文都是一行行简单的字符串,request的起始行,说明了Method是GET,也就是需要从服务器向客户端发送资源;从294~298都是HTTP的Header,可见每个字段都是由KEY:VALUE的格式组成,头部以\r\n结尾,也就是空行结尾
285 Hypertext Transfer Protocol 286 GET / HTTP/1.0\r\n 287 [Expert Info (Chat/Sequence): GET / HTTP/1.0\r\n] 288 [Message: GET / HTTP/1.0\r\n] 289 [Severity level: Chat] 290 [Group: Sequence] 291 Request Method: GET 292 Request URI: / 293 Request Version: HTTP/1.0 294 User-Agent: Wget/1.12 (linux-gnu)\r\n 295 Accept: */*\r\n 296 Host: www.baidu.com\r\n 297 Connection: Keep-Alive\r\n 298 \r\n
下面就是response的报文,同样也有响应的起始行,说明了HTTP版本,返回码;从4866行开始就是响应的HTTP报文头部,同样以KEY:VALUE格式书写,结尾为一个空行;而且类型Content-Type是text/html表明了响应报文里主体内容是一个text文本,也就是我们wget了的index.html,主体的内容就是index.html的内容
4858 Hypertext Transfer Protocol 4859 HTTP/1.1 200 OK\r\n 4860 [Expert Info (Chat/Sequence): HTTP/1.1 200 OK\r\n] 4861 [Message: HTTP/1.1 200 OK\r\n] 4862 [Severity level: Chat] 4863 [Group: Sequence] 4863 [Group: Sequence] 4864 Request Version: HTTP/1.1 4865 Response Code: 200 4866 Date: Mon, 02 Feb 2015 13:56:20 GMT\r\n 4867 Content-Type: text/html; charset=utf-8\r\n 4868 Connection: Close\r\n 4869 Vary: Accept-Encoding\r\n 4870 Set-Cookie: BAIDUID=F05D5D1A512A5C4D616936F1EDB6051D:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; d omain=.baidu.com\r\n 4871 Set-Cookie: BAIDUPSID=F05D5D1A512A5C4D616936F1EDB6051D; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; doma in=.baidu.com\r\n 4872 Set-Cookie: BDSVRTM=0; path=/\r\n 4873 Set-Cookie: BD_HOME=0; path=/\r\n 4874 Set-Cookie: H_PS_PSSID=11384_11077_1463_11362_11055_11394_11400_11276_11240_11151_11242_11404_10618_10634; path=/; domain=.ba idu.com\r\n 4875 P3P: CP=" OTI DSP COR IVA OUR IND COM "\r\n 4876 Cache-Control: private\r\n 4877 Cxy_all: baidu+f70fde394ab8b4d754146d21edcf1d7b\r\n 4878 Expires: Mon, 02 Feb 2015 13:55:37 GMT\r\n 4879 X-Powered-By: HPHP\r\n 4880 Server: BWS/1.1\r\n 4881 BDPAGETYPE: 1\r\n 4882 BDQID: 0x928f07640000a1b8\r\n 4883 BDUSERID: 0\r\n 4884 \r\n 4885 Line-based text data: text/html
所以从一个WEB客户端来与WEB server进行连接的过程:
1:客户端从用户输入的URL中解析出服务器的主机名
2:客户端将服务器的主机名通过DNS解析转换成其IP地址
3:客户端将端口号从URL里解析出来
4:客户端和服务器之间通过TCP三次握手建立TCP连接
5:客户端给服务器发送一条HTTP请求报文
6:服务器给客户端返回一条HTTP响应报文
7:关闭连接,返回的信息在客户端显示出来