In-Depth Analysis of Server Web Cache

  

Cache-Control, Last-Modified, and ETag are several fields related to web caching in the RFC 2616 (HTTP/1.1) protocol. The first two are used to control the expiration date of the cache, and the latter two are used to verify the validity of the web page. It should be noted that HTTP/1.0 has a weaker cache control mechanism: Pragma, which uses the HTTP/1.0 cache to ignore the Expires and Cache-Control headers. Let's take the Apache2.0 server as an example. We only discuss the HTTP/1.1 protocol.

Expires

The Expires field declares the time when a web page or URL is no longer cached by the browser. Once this time has elapsed, the browser should contact the origin server. The RFC tells us: “Because the inferred expiration time may reduce semantic transparency, it should be used with caution, and we encourage the original server to provide the exact expiration time possible. ”

For general pure static pages, such as html, gif, jpg, css, js, the default installed Apache server will not add this field in the response header. After the Firefox browser receives the corresponding, if it finds that there is no Expires field, the browser infers a suitable expiration time based on the file type and the "Last-Modified" field and stores it on the client. The estimated time is generally about three days after receiving the response time.

Apache's expires_module module can automatically add the Expires field to the Http response header. In the Apache httpd.conf file, do the following configuration:

#Enable the expires_module module LoadModule expires_module modules/mod_expires.so # Enable the validity period ExpiresActive On # GIF Valid for 1 month ExpiresByType image/gif A2592000 # HTML Document Valid for one week after the last modified time ExpiresByType text/html M604800 #The following meanings are similar to ExpiresByType text/css “now plus 2 month" ExpiresByType text/js “now plus 2 day" ExpiresByType image/jpeg “access plus 2 month" ExpiresByType image/bmp “access plus 2 month" ExpiresByType image/x-icon <quo;access plus 2 month" ExpiresByType image/png “access plus 2 month"

For dynamic pages, if There is no function to force Expires inside the page, such as header(”Expires: ” . gmdate(”D, d MYH:i:s”) . ” GMT”), Apache server will put Wed, 11 Jan 1984 05:00:00 GMT is returned to the browser as the content of the Expires field. That is, the dynamic page is always invalid. The browser will still save the dynamic page that has expired.

It can be found that the Firefox browser always caches all pages, regardless of invalidation, no expiration, or no expiration. Even if the effective date of a web page declared in the cache is 1970-01-01 08:00:00, the browser will still send the Last-Modified and ETag fields of the file in the cache. If the server side verifies the pass and returns to the 304 state, the browser will still use this cache.

Cache-Control

More elements can be declared in the Cache-Control field, such as no-cache, must-revalidate, max-age=0, etc. These elements are used to indicate how long the page is cached, how it is cached, how it is converted to a different medium, and how it is stored in the persistent medium. However, any Cache-Control directive does not guarantee privacy or data security. The &privy;private” and “no-store" directives can provide some protection for privacy and security, but they cannot be used to replace authentication and encryption.

Apache's mod_cern_meta module allows control of the file-level Http response header, and it can also configure the Cache-Control header (or any other header). The response header file is a file named in the subdirectory of the original directory, named after the original file name. Please refer to the official website of Apache for specific usage.

where Cache-Control : max-age represents the expiration date. If the mod_cern_meta module is not started, the Apache server will convert the date in the Expires field to a delta value in seconds and assign it to max-age. If you start the mod_cern_meta module and configure the max-age value, Apache will override this with the Expires field. At the same time, max-age implies Canche-Control: public. This way the browser accepts the Cache-Control : max-age and Expires values ​​are the same.

If the expiration date Cache-Control : max-ag=0 or is negative, the browser will set Expires to 1970-01-01 08:00:00 in the corresponding cache.

Last-Modified

Last-Modified and ETag are two fields related to a Conditional Request. If a cache receives a request for a page, it sends a verification request asking if the server page has changed, with the "ETag" and ”If Modify Since" header in the HTTP header. The server judges whether there is update information based on this information. If not, it returns HTTP 304 (Not Modify); if there is an update, it returns HTTP 200 and updated page content, and carries the new "ETag" and "Last-Modified" .

Using this mechanism, you can avoid sending files to the browser repeatedly, but still generate an HTTP request.

Generally, a purely static page will have Last-Modified information. The Apache server will read the Last-Modified information in the page file and add it to the http response header.

For dynamic pages, if there is no function to force Last-Modified inside the page, such as header(”Last-Modified: ” . gmdate(”D, d MYH:i:s”) & rdquo; GMT”), the Apache server will return the current time as Last-Modified to the browser.

Whether it's a purely static page or a dynamic page, the Firefox browser cleverly sets the Last-Modified of the cached page at the time it receives the response from the server, rather than following the Last-Modified field in the http response header.

ETag

Now that you have Last-Modified, why use an ETag field? Because if you make two changes to a file in less than a second, Last-Modified will be incorrect. Therefore, HTTP/1.1 provides more rigorous authentication with the Entity Tag header.

By default, the Apache server adds ETag fields to the response headers of all static and dynamic files.

This option can be configured in the Apache httpd.conf file via the FileETag directive. The FileETag directive configures the properties of the file used to create the Etag (entity tag) response header when the document is based on a file. In Apache 1.3.22 and earlier, the value of ETag was obtained by hashing the index section (INode), size (Size) and last modification time (MTime) of the file. If a directory's configuration contains ‘FileETag INode MTime Size’ and its subdirectory contains ‘FileETag -INode’ then the settings of this subdirectory (and will be inherited by any subdirectories that are not covered) will wait Price is ‘FileETag MTime Size’.

In a multi-load balanced server environment, the same file will have different etag or file modification dates, and the browser will re-download each time. Setting ‘FileETag None’ can make the response header no longer contain the ETag field.

Copyright © Windows knowledge All Rights Reserved