From e8548fc9cd5d97e049061b3f29c1528ef9a25186 Mon Sep 17 00:00:00 2001 From: sspencerwire Date: Tue, 13 Aug 2024 15:24:00 -0500 Subject: [PATCH] editing `052-load-balancer-proxies-varnish.md` (#2258) * remove as much passive voice as possible (still some remaining) * replace conjunctions with words * some sentence simplification * fix bare html issue (insert <>) * other minor fixes --- .../052-load-balancer-proxies-varnish.md | 130 +++++++++--------- 1 file changed, 68 insertions(+), 62 deletions(-) diff --git a/docs/books/web_services/052-load-balancer-proxies-varnish.md b/docs/books/web_services/052-load-balancer-proxies-varnish.md index 22fffee31e..bb86bafbf2 100644 --- a/docs/books/web_services/052-load-balancer-proxies-varnish.md +++ b/docs/books/web_services/052-load-balancer-proxies-varnish.md @@ -13,7 +13,7 @@ In this chapter, you will learn about the web accelerator proxy cache : Varnish. **Objectives**: In this chapter, you will learn how to: :heavy_check_mark: Install and configure Varnish; -:heavy_check_mark: Cache the content of a website. +:heavy_check_mark: Cache the content of a website. :checkered_flag: **reverse-proxy**, **cache** @@ -26,61 +26,65 @@ In this chapter, you will learn about the web accelerator proxy cache : Varnish. ### Generalities -Varnish is an HTTP reverse-proxy-cache service, in other words, a websites accelerator. +Varnish is an HTTP reverse-proxy-cache service, or a website accelerator. Varnish receives HTTP requests from visitors: -* if the response to the cached request is available, it is returned directly to the client from the server's memory, -* if it doesn't have the response, Varnish addresses the web server. Varnish then sends the request to the web server, retrieves the response, stores it in its cache and responds to the client. +* if the response to the cached request is available, it returns the response directly to the client from the server's memory, +* if it does not have the response, Varnish addresses the web server. Varnish then sends the request to the web server, retrieves the response, stores it in its cache and responds to the client. Providing the response from the in-memory cache improves response times for clients. In this case, there is no access to physical disks. -By default, Varnish listens on port **6081** and uses **VCL** (**V**arnish **C**onfiguration **L**anguage) for its configuration. Thanks to VCL, it's possible to decide what should or shouldn't be transmitted to the client, what should be cached, from which site and how the response can be modified. +By default, Varnish listens on port **6081** and uses **VCL** (**V**arnish **C**onfiguration **L**anguage) for its configuration. Thanks to VCL, it is possible to: -Varnish can be extended with VMOD modules (Varnish Modules). +* Decide the content the client receives by way of transmission +* What the cached content is +* From what site, and how, modifications of the response occur. + +Varnish is extensible with VMOD modules (Varnish Modules). #### Ensuring high availability -Several mechanisms are used to ensure high availability throughout a web chain: +The use of several mechanisms ensure high availability throughout a web chain: -* if varnish is behind load balancers: as the LBs are generally in cluster mode, they are already in HA mode. A check from the LBs verifies varnish availability. If a varnish server no longer responds, it is automatically removed from the pool of available servers. In this case, varnish is in ACTIVE/ACTIVE mode. -* if varnish isn't behind a LB cluster, clients address a VIP (see Heartbeat chapter), which is shared between the 2 varnishes. In this case, varnish is in ACTIVE/PASSIVE mode. If the active server is no longer available, the VIP switches to the second varnish node. -* When a backend is no longer available, it can be removed from the varnish backend pool either automatically (with a health check) or manually in CLI mode (useful to ease the upgrades/updates). +* if varnish is behind load balancers(LBs): as the LBs are generally in cluster mode, they are already in HA mode. A check from the LBs verifies varnish availability. If a varnish server no longer responds, it is automatically removed from the pool of available servers. In this case, varnish is in ACTIVE/ACTIVE mode. +* if varnish is not behind an LB cluster, clients address a VIP (see Heartbeat chapter) shared between the 2 varnishes. In this case, varnish is in ACTIVE/PASSIVE mode. If the active server is no longer available, the VIP switches to the second varnish node. +* When a backend is no longer available, you can remove it from the varnish backend pool, either automatically (with a health check), or manually in CLI mode (useful to ease upgrades or updates). #### Ensuring scalability -If the backends are no longer sufficient to support the workload : +If the backends are no longer sufficient to support the workload: * either add more resources to the backends and reconfigure the middleware -* or add a new backend to the varnish backend pool. +* or add another backend to the varnish backend pool #### Facilitating scalability -As a web page is made up of an HTML page (often dynamically generated by php, for example) and more static resources (jpg, gif, css, js, etc.), it quickly becomes interesting to cache the resources that can be cached (the static ones), which obviously offloads a large number of requests from the backends. +During creation, a web page is made up of HTML (often dynamically generated by PHP) and more static resources (jpg, gif, css, js, and so on), it quickly becomes interesting to cache the resources that are cacheable (the static ones), which offloads a large number of requests from the backends. !!! NOTE It is possible to cache web pages (html, php, asp, jsp, etc.), but this is more complicated. You need to know the application and whether the pages are cacheable, which should be the case with a REST API, for example. -When a web server is accessed directly by clients, in this case the server will be called upon for the same image as many times as there are clients. Once the client has received the image for the first time, it is cached on the browser side, depending on the configuration of the site and the web application. +When a client accesses a web server directly, the server must return the same image as many times as the clients requesting it. Once the client has received the image for the first time, it is cached on the browser side, depending on the configuration of the site and the web application. -When the server is accessed behind a properly configured cache server, the first client requesting the image will result in an initial request to the backend, but the image will be cached for a certain period of time and delivered directly to subsequent clients. +When accessing the server behind a properly configured cache server, the first client requesting the image will result in an initial request to the backend, but caching of the image will occur for a certain period of time, and subsequent delivery is direct to other clients requesting the same resource. Although a well-configured browser-side cache reduces the number of requests to the backend, it is complementary to the use of a varnish proxy-cache. #### TLS certificate management -Varnish cannot communicate in HTTPS (and it's not its role to do so). +Varnish cannot communicate in HTTPS (and it is not its role to do so). The certificate must therefore be either : -* carried by the LB when the flow passes through it (which is the recommended solution: centralization of the certificate, etc.). The flow then passes unencrypted through the datacenter -* carried by an Apache, Nginx or HAProxy service on the varnish server itself, which only acts as a proxy to the varnish (from port 443 to port 80). This solution is useful if varnish is accessed directly. +* carried by the LB when the flow passes through it (which is the recommended solution: centralization of the certificate, etc.). The flow then passes unencrypted through the data center +* carried by an Apache, Nginx or HAProxy service on the varnish server itself, which only acts as a proxy to the varnish (from port 443 to port 80). This solution is useful if accessing varnish directly. * Similarly, Varnish cannot communicate with backends on port 443. When necessary, you need to use a nginx or apache reverse proxy to decrypt the request for varnish. #### How it works -As we saw earlier, in a basic Web service, the client communicates directly with the service via TCP on port 80. +In a basic Web service, the client communicates directly with the service with TCP on port 80. ![How a standard website works](img/varnish_website.png) @@ -88,11 +92,11 @@ To take advantage of the cache, the client must communicate with the web service ![How Varnish works by default](img/varnish_website_with_varnish.png) -To make the service transparent to the client, you'll need to change the default listening port for Varnish and the web service vhosts. +To make the service transparent to the client, you will need to change the default listening port for Varnish and the web service vhosts. ![Transparent implementation for the customer](img/varnish_website_with_varnish_port_80.png) -To provide an HTTPS service, you'll need to add either a load balancer upstream of the varnish service or a proxy service on the varnish server, such as Apache, Nginx or HAProxy. +To provide an HTTPS service, you will need to add either a load balancer upstream of the varnish service or a proxy service on the varnish server, such as Apache, Nginx or HAProxy. ### Configuration @@ -106,7 +110,7 @@ systemctl start varnish #### Configuring the varnish daemon -Since `systemctl`, varnish params are setup thanks to the service file `/usr/lib/systemd/system/varnish.service`: +Since `systemctl`, varnish parameters are setup thanks to the service file `/usr/lib/systemd/system/varnish.service`: ```bash [Unit] @@ -147,15 +151,15 @@ $ sudo systemctl edit varnish.service ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,512m ``` -To specify a cache storage backend, the option can be specified several times. Possible storage types are malloc (cache in memory, then swap if needed), or file (create a file on disk, then map to memory). Sizes are expressed in K/M/G/T (kilobytes, megabytes, ...). +To specify a cache storage backend, you can specify the option several times. Possible storage types are `malloc` (cache in memory, then swap if needed), or `file` (create a file on disk, then map to memory). Sizes are expressed in K/M/G/T (kilobytes, megabytes, gigabytes or terabytes). #### Configuring the backends -Varnish is configured using a specific language called VCL. +Varnish uses a specific language called VCL for its configuration. -This involves compiling the VCL configuration file in C language. The service can be restarted if compilation is successful and no alarms are displayed. +This involves compiling the VCL configuration file in C language. Restarting the service can occur if compilation is successful with no alarms. -The varnish configuration can be tested with the following command: +You can test the varnish configuration with the following command: ```bash varnishd -C -f /etc/varnish/default.vcl @@ -165,23 +169,25 @@ varnishd -C -f /etc/varnish/default.vcl It is advisable to check the VCL syntax before restarting the `varnishd` daemon. -The configuration is reloaded with the command : +Reload the configuration with the command: ```bash systemctl reload varnishd ``` -Warning: a `systemctl restart varnishd` empties the varnish cache and causes a peak load on the backends. You should therefore avoid reloading varnishd. +!!! warning + +A `systemctl restart varnishd` empties the varnish cache and causes a peak load on the backends. You should therefore avoid reloading `varnishd`. !!! NOTE - To configure Varnish, please follow the recommendations on this page: https://www.getpagespeed.com/server-setup/varnish/varnish-virtual-hosts. + To configure Varnish, please follow the recommendations on this page: . ### VCL language #### Subroutines -Varnish uses VCL files, segmented into subroutines containing the actions to be executed. These subroutines are executed only in the specific cases they define. The default `/etc/varnish/default.vcl` file contains the `vcl_recv`, `vcl_backend_response` and `vcl_deliver` routines: +Varnish uses VCL files, segmented into subroutines containing the actions to run. These subroutines run only in the specific cases they define. The default `/etc/varnish/default.vcl` file contains the `vcl_recv`, `vcl_backend_response` and `vcl_deliver` routines: ```bash # @@ -217,8 +223,8 @@ sub vcl_deliver { } ``` -* **vcl_recv**: This routine is called before the request is sent to the backend. In this routine, you can modify HTTP headers, cookies, choose the backend, etc. See actions `set req`. -* **vcl_backend_response**: This routine is called after reception of the backend response (`beresp` means BackEnd RESPonse). See `set bereq.` and `set beresp.` actions. +* **vcl_recv**: routine called before sending the request to the backend. In this routine, you can modify HTTP headers, cookies, choose the backend, and so on. See actions `set req`. +* **vcl_backend_response**: routine called after reception of the backend response (`beresp` means BackEnd RESPonse). See `set bereq.` and `set beresp.` actions. * **vcl_deliver**: This routine is useful for modifying Varnish output. If you need to modify the final object (add or remove a header, etc.), you can do so in `vcl_deliver`. #### VCL operators @@ -232,24 +238,24 @@ sub vcl_deliver { #### Varnish objects -* **req**: the request object. When Varnish receives the request, `req` is created. Most of the work in the `vcl_recv` subroutine concerns this object. +* **req**: the request object. Creates the `req` when Varnish receives the request. Most of the work in the `vcl_recv` subroutine concerns this object. * **bereq**: the request object destined for the web server. Varnish creates this object from `req`. * **beresp**: the web server response object. It contains the object headers from the application. You can modify the server response in the `vcl_backend_response` subroutine. -* **resp**: the HTTP response to be sent to the client. This object is modified in the `vcl_deliver` subroutine. +* **resp**: the HTTP response sent to the client. Modify this object with the `vcl_deliver` subroutine. * **obj**: the cached object. Read-only. #### Varnish actions The most frequent actions: -* **pass**: When `pass` is returned, the request and subsequent response will come from the application server. No cache is applied. `pass` is returned from the `vcl_recv` subroutine. -* **hash**: When `hash` is returned from `vcl_recv`, Varnish will serve the content from the cache even if the request is configured to pass without cache. -* **pipe**: This action is used to manage flows. In this case, Varnish will no longer inspect each request, but will let all bytes pass to the server. `pipe` is used, for example, by websockets or video stream management. +* **pass**: When returned, the request and subsequent response will come from the application server. No application of cache occurs. `pass` returns from the `vcl_recv` subroutine. +* **hash**: When returned from `vcl_recv`, Varnish will serve the content from the cache even if the configuration of request specifies passing without cache. +* **pipe**: Used to manage flows. In this case, Varnish will no longer inspect each request, but will let all bytes pass to the server. Websockets or video stream management, for example use `pipe`. * **deliver**: Delivers the object to the client. Usually from the `vcl_backend_response` subroutine. -* **restart**: Restarts the request processing process. Modifications to the `req` object are retained. -* **retry**: The request is transferred back to the application server. Used from `vcl_backend_response` or `vcl_backend_error` if the application response is unsatisfactory. +* **restart**: Restarts the request processing process. Retains modifications to the `req` object. +* **retry**: Transfers the request back to the application server. Used from `vcl_backend_response` or `vcl_backend_error` if the application response is unsatisfactory. -In summary, the possible interactions between subroutines and actions are illustrated in the diagram below: +In summary, illustrated in the diagram below are the possible interactions between subroutines and actions: ![Transparent implementation for the customer](img/varnish_interactions.png) @@ -263,9 +269,9 @@ It is possible to verify that a page comes from the varnish cache from the HTTP Varnish uses the term `backend` for the vhosts it needs to proxy. -Several backends can be defined on the same Varnish server. +You can define several backends on the same Varnish server. -Backends are configured in `/etc/varnish/default.vcl`. +Configuring backends is through `/etc/varnish/default.vcl`. #### ACL management @@ -299,9 +305,9 @@ if (req.url ~ "/(login|admin)") { Varnish never caches HTTP POST requests or requests containing cookies (whether from the client or the backend). -If the backend uses cookies, then no content will be cached. +If the backend uses cookies, caching of content will not occur. -To correct this behavior, we can dereference the cookies in our requests: +To correct this behavior, you can unset the cookies in your requests: ```bash sub vcl_recv { @@ -315,7 +321,7 @@ sub vcl_backend_response { #### Distribute requests to different backends -When hosting several sites, such as a document server (doc.rockylinux.org) and a wiki (wiki.rockylinux.org), it's possible to distribute requests to the right backend. +When hosting several sites, such as a document server () and a wiki (), it is possible to distribute requests to the right backend. Backends declaration: @@ -331,7 +337,7 @@ backend blog { } ``` -The req.backend object is modified according to the host called in the HTTP request in the `vcl_recv` subroutine: +Modification of `req.backend` object occurs according to the host called in the HTTP request in the `vcl_recv` subroutine: ```bash sub vcl_recv { @@ -347,11 +353,11 @@ sub vcl_recv { #### Load distribution -Varnish can handle load balancing via specific backends called directors. +Varnish can handle load balancing with specific backends called directors. -The round-robin director distributes requests to the round-robin backends (alternately). Each backend can be assigned a weighting. +The round-robin director distributes requests to the round-robin backends (alternately). You can assign a weight to each backend. -The client director distributes requests according to a sticky session affinity on any header element (e.g. with a session cookie). In this case, a client is always returned to the same backend. +The client director distributes requests according to a sticky session affinity on any header element (that is, with a session cookie). In this case, a client is always returned to the same backend. Backends declaration @@ -386,9 +392,9 @@ sub vcl_recv { } ``` -#### Managing backends via CLI +#### Managing backends with CLI -Backends can be marked as **sick** or **healthy** for administration or maintenance purposes. This action allows you to remove a node from the pool without having to modify the Varnish server configuration (and therefore without restarting it) or stop the backend service. +Marking backends as **sick** or **healthy** is possible for administration or maintenance purposes. This action allows you to remove a node from the pool without having to modify the Varnish server configuration (and therefore without restarting it) or stop the backend service. View backend status : The `backend.list` command displays all backends, even those without a health check (probe). @@ -426,7 +432,7 @@ To let varnish decide on the state of its backends, it is imperative to switch b varnishadm backend.set_health site.front01 auto ``` -The backends can be declared following: https://github.com/mattiasgeniar/varnish-6.0-configuration-templates +Declaring the backends is done by following: . ### Apache logs @@ -444,7 +450,7 @@ and take this new format into account in the website vhost: CustomLog /var/log/httpd/www-access.log.formatux.fr varnishcombined ``` -and make Varnish compatible: +and make it Varnish compatible: ```bash if (req.restarts == 0) { @@ -466,7 +472,7 @@ on the command line: varnishadm 'ban req.url ~ .' ``` -using a secret and a port other than the default : +using a secret and a port other than the default: ```bash varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082 'ban req.url ~ .' @@ -493,7 +499,7 @@ via an HTTP PURGE request: curl -X PURGE http://www.example.org/foo.txt ``` -Varnish must be configured to accept this request: +Configuring Varnish to accept this request is done with: ```bash acl local { @@ -517,7 +523,7 @@ sub vcl_recv { Varnish writes its logs in memory and in binary so as not to penalize its performance. When it runs out of memory space, it rewrites new records on top of old ones, starting from the beginning of its memory space. -Logs can be consulted using the `varnishstat` (statistics), `varnishtop` (top for Varnish), `varnishlog` (verbose logging) or `varnishnsca` (logs in NCSA format, like Apache) tools: +It is possible to consult the logs with the `varnishstat` (statistics), `varnishtop` (top for Varnish), `varnishlog` (verbose logging) or `varnishnsca` (logs in NCSA format, like Apache) tools: ```bash varnishstat @@ -526,14 +532,14 @@ varnishlog varnishnsca ``` -The `-q` option can be used to apply filters to logs using the preceding commands: +Using the `-q` option to apply filters to logs is done using: ```bash varnishlog -q 'TxHeader eq MISS' -q "ReqHeader ~ '^Host: rockylinux\.org$'" varnishncsa -q "ReqHeader eq 'X-Cache: MISS'" ``` -Logging to disk is performed by the varnishlog and varnishnsca daemons independently of the `varnishd` daemon. The `varnishd` daemon continues to populate its logs in memory without penalizing performance towards clients, then the other daemons copy the logs to disk. +`varnishlog` and `varnishnsca` daemons logs to disk independently of the `varnishd` daemon. The `varnishd` daemon continues to populate its logs in memory without penalizing performance towards clients, then the other daemons copy the logs to disk. ### Workshop @@ -626,11 +632,11 @@ $ curl http://server1.rockylinux.lan:6081 As you can see, Apache serves the index page. -Some headers have been added, giving us information that our request was handled by varnish (header `Via`), and the cached time of the page (header `Age`), giving us the information that our page was served directly from the varnish memory instead of from the disk via Apache. +Some headers have been added, giving us information that our request was handled by varnish (header `Via`), and the cached time of the page (header `Age`), giving us the information that our page was served directly from the varnish memory instead of from the disk with Apache. #### Task 4 : Remove some headers -We will remove some headers that can give unneeded informations to hackers. +We will remove some headers that can give unneeded information to hackers. In the sub `vcl_deliver`, add the following: @@ -641,7 +647,7 @@ sub vcl_deliver { unset resp.http.Via; set resp.http.node = "F01"; set resp.http.X-Cache-Hits = obj.hits; - if (obj.hits > 0) { # Add debug header to see if it's a HIT/MISS and the number of hits, disable when not needed + if (obj.hits > 0) { # Add debug header to see if it is a HIT/MISS and the number of hits, disable when not needed set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; @@ -670,7 +676,7 @@ Accept-Ranges: bytes Connection: keep-alive ``` -As you can see, the unwanted headers have been removed and the necessary one (to troubleshoot for example) have been added. +As you can see, removal of the unwanted headers occurs, while adding the necessary one (to troubleshoot for example). ### Conclusion