Discussion:
[squid-users] Squid 4.2 : caching is not working
Hariharan Sethuraman
2018-09-07 16:46:35 UTC
Permalink
Hi team,

I have created directories using squid -z and then triggered squid -f
/etc/squid/squid.conf -NYCd 1. Find (1) debug info below. And below (2) are
the cache directory and squid-config.

(1) - debug info:
squidclient -h localhost cache_object://localhost/ mgr:objects >>> this was
showing the entry when the download was going on and disappeared after the
download complete(~290MB) on the browser. When I checked the du of cache
directory, it is intact with 200KB
---------------------------------------------------------------------------
bash-4.4# squid -v
Squid Cache: Version 4.2
Service Name: squid
This binary uses LibreSSL 2.7.4. For legal restrictions on distribution see
https://www.openssl.org/source/license.html
configure options: '--build=x86_64-alpine-linux-musl'
'--host=x86_64-alpine-linux-musl' '--prefix=/usr'
'--datadir=/usr/share/squid' '--sysconfdir=/etc/squid'
'--libexecdir=/usr/lib/squid' '--localstatedir=/var'
'--with-logdir=/var/log/squid' '--disable-strict-error-checking'
'--disable-arch-native' '--enable-removal-policies=lru,heap'
'--enable-auth-digest'
'--enable-auth-basic=getpwnam,NCSA,SMB,SMB_LM,RADIUS' '--enable-epoll'
'--enable-external-acl-helpers=file_userip,unix_group,wbinfo_group,session'
'--enable-auth-ntlm=fake,SMB_LM' '--enable-auth-negotiate=kerberos,wrapper'
'--disable-mit' '--enable-heimdal' '--enable-delay-pools'
'--enable-openssl' '--enable-ssl-crtd' '--enable-linux-netfilter'
'--enable-ident-lookups' '--enable-useragent-log' '--enable-cache-digests'
'--enable-referer-log' '--enable-async-io' '--enable-truncate'
'--enable-arp-acl' '--enable-htcp' '--enable-carp' '--enable-poll'
'--enable-follow-x-forwarded-for' '--with-large-files'
'--with-default-user=squid' '--with-openssl'
'build_alias=x86_64-alpine-linux-musl'
'host_alias=x86_64-alpine-linux-musl' 'CC=gcc' 'CFLAGS=-Os
-fomit-frame-pointer' 'LDFLAGS=-Wl,--as-needed' 'CPPFLAGS=-Os
-fomit-frame-pointer' 'CXXFLAGS=-Os -fomit-frame-pointer'


(2) - cache directory and squid-config
---------------------------------------------------------------------------
bash-4.4# ls /var/spool/squid/cache
00 02 04 06 08 0A 0C
0E swap.state
01 03 05 07 09 0B 0D
0F
bash-4.4#
..
cache allow all
strip_query_terms off
..
cache_dir ufs /var/spool/squid/cache 2000 16 256
maximum_object_size 300 MB
..
range_offset_limit -1
..
url_rewrite_access allow all
url_rewrite_program /usr/bin/python /usr/share/proxypass.py

http_access deny all
...
always_direct deny all

(a) Please let me know what am missing to enable cache.
(b) Also "squidclient -h localhost cache_object://localhost/ mgr:objects"
hope this command will show the entry even after caching.


Thanks,
Hari
Alex Rousskov
2018-09-07 17:30:20 UTC
Permalink
Post by Hariharan Sethuraman
squidclient -h localhost cache_object://localhost/ mgr:objects >>> this
was showing the entry when the download was going on and disappeared
after the download complete(~290MB) on the browser. When I checked the
du of cache directory, it is intact with 200KB
Was the response cachable? You can use the redbot.org service to examine
the corresponding resource (URL).

If the service tells you that the resource was cachable in principle (or
if you cannot use the service), then you can post both HTTP request and
response headers (as received by Squid) here for further analysis. You
can collect those headers in cache.log by setting debug_options to ALL,2.
Post by Hariharan Sethuraman
(a) Please let me know what am missing to enable cache.
I think your cache is enabled, but Squid refused to cache a particular
response you tested with. There is not enough information to say why.
Post by Hariharan Sethuraman
(b) Also "squidclient -h localhost cache_object://localhost/
mgr:objects" hope this command will show the entry even after caching.
AFAICT, mgr:objects shows both in-progress transactions and cached
entries that do not belong to any in-progress transaction. However,
those cached entries will only be shown for UFS-based disk caches and
for a non-shared memory cache.

You are using a UFS-based cache. You should be (and probably are) using
a non-shared memory cache because you are using a UFS-based cache. In
summary, you most likely can use mgr:objects to see if the response was
cached. The above paragraph just answers your question in a way that may
be useful for others that have a different Squid configuration.


HTH,

Alex.
Amos Jeffries
2018-09-07 17:30:54 UTC
Permalink
Post by Hariharan Sethuraman
Hi team,
I have created directories using squid -z and then triggered squid -f
/etc/squid/squid.conf -NYCd 1. Find (1) debug info below. And below (2)
are the cache directory and squid-config. 
squidclient -h localhost cache_object://localhost/ mgr:objects >>> this
You do not need to pass squidclient the cache_object: URLs, nor
localhost as server. Just:

squidclient mgr:objects

Also, what *exactly* did that report tell you?
"cache" is more than just the disk storage area.
Post by Hariharan Sethuraman
was showing the entry when the download was going on and disappeared
after the download complete(~290MB) on the browser.
What I am thinking reading that is that probably Squid used the cache
storage area as a temporary location for the bytes of a very large
object, but then removed it once the response was completely delivered
since it was not cacheable.

Details matter. The "~" means "approximately" and your config says
*exactly* 300 MByte is the upper limit.

So an object which is "approximately 290" may in truth be *over* 300
and thus not permitted to cache.


NP: you can use the tool at redbot.org to check URL cacheability. It
will also tell you about any caching related HTTP compliance issues with
that resource.
Or you can set "debug_options 11,2" in your squid.conf and check the
exact HTTP messages your proxy is dealing with.


When I checked the
Post by Hariharan Sethuraman
du of cache directory, it is intact with 200KB
...
Post by Hariharan Sethuraman
..
cache allow all
strip_query_terms off
Above are defaults. No need to configure since Squid-3.
Post by Hariharan Sethuraman
..
cache_dir ufs /var/spool/squid/cache 2000 16 256
maximum_object_size 300 MB
..
range_offset_limit -1
..
url_rewrite_access allow all
url_rewrite_program  /usr/bin/python /usr/share/proxypass.py
Not relevant, except that when testing the URL like with redbot.org you
need to use the URL this helper produces instead of what was passed into
Squid by the client.
Post by Hariharan Sethuraman
http_access deny all
...
always_direct deny all
(a) Please let me know what am missing to enable cache.
Cache is enabled and Squid caches as much as it can by default - within
the limits prescribed by HTTP specification and your config settings.

So the only thing to do is ensure that you do not actively *prevent*
caching from happening somehow.
Post by Hariharan Sethuraman
(b) Also "squidclient -h localhost cache_object://localhost/
mgr:objects" hope this command will show the entry even after caching.
It (well, "squidclient mgr:objects") should show all objects currently
known to the proxy. That will mostly be cached objects (both disk and
in-memory), but also included temporary in-transit objects.

Amos
Hariharan Sethuraman
2018-09-08 04:14:28 UTC
Permalink
Hi,
I see the response can be cached. Will try out increasing logging level of
cache.log


HTTP/1.1 200 OK
Date: Sat, 08 Sep 2018 04:10:38 GMT
Server: Apache/2.2
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/plain;; charset=ISO-8859-1

response headers: 203 bytes body: 21 bytes transfer overhead: 9 bytes
view body
<https://redbot.org/?uri=https%3A%2F%2Fdl.cisco.com%2Fpcgi-bin%2Fswdld%2Fdownload.cgi%3Fdwnld_code%3DxhMnkw8Z-oFg8Jvk7BeSZnkmIdO48GTAA01cTlDABiH7c9QJsq9s5ypIQWgQxY4cI66yKQsWgjvIpCgr8_DGemwm_6VqWfSJQ8Xonly7l-DcKbz9kqMQXOHnp2G1BHj_-wp1DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM&req_hdr=Authentication%3ABearer+HQdHyp1lCj2cZxGYPlzEqlEuGxww#>
view har
<https://redbot.org/?id=9ml2eaa9&req_hdr=Authentication%3ABearer%20HQdHyp1lCj2cZxGYPlzEqlEuGxww&req_hdr=User-Agent%3ARED/1%20(https://redbot.org/)&req_hdr=Referer%3Ahttps://dl.cisco.com/pcgi-bin/swdld/download.cgi?dwnld_code%3DxhMnkw8Z-oFg8Jvk7BeSZnkmIdO48GTAA01cTlDABiH7c9QJsq9s5ypIQWgQxY4cI66yKQsWgjvIpCgr8_DGemwm_6VqWfSJQ8Xonly7l-DcKbz9kqMQXOHnp2G1BHj_-wp1DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM&check_name=default&format=har>
save
<https://redbot.org/?uri=https%3A%2F%2Fdl.cisco.com%2Fpcgi-bin%2Fswdld%2Fdownload.cgi%3Fdwnld_code%3DxhMnkw8Z-oFg8Jvk7BeSZnkmIdO48GTAA01cTlDABiH7c9QJsq9s5ypIQWgQxY4cI66yKQsWgjvIpCgr8_DGemwm_6VqWfSJQ8Xonly7l-DcKbz9kqMQXOHnp2G1BHj_-wp1DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM&req_hdr=Authentication%3ABearer+HQdHyp1lCj2cZxGYPlzEqlEuGxww#>
General

- The Content-Type header's syntax isn't valid.
- The Keep-Alive header is deprecated.
- The server's clock is correct.

Caching

- This response allows all caches to store it.
- This response allows a cache to assign its own freshness lifetime.

Thanks,
Hari
Post by Amos Jeffries
Post by Hariharan Sethuraman
Hi team,
I have created directories using squid -z and then triggered squid -f
/etc/squid/squid.conf -NYCd 1. Find (1) debug info below. And below (2)
are the cache directory and squid-config.
squidclient -h localhost cache_object://localhost/ mgr:objects >>> this
You do not need to pass squidclient the cache_object: URLs, nor
squidclient mgr:objects
Also, what *exactly* did that report tell you?
"cache" is more than just the disk storage area.
Post by Hariharan Sethuraman
was showing the entry when the download was going on and disappeared
after the download complete(~290MB) on the browser.
What I am thinking reading that is that probably Squid used the cache
storage area as a temporary location for the bytes of a very large
object, but then removed it once the response was completely delivered
since it was not cacheable.
Details matter. The "~" means "approximately" and your config says
*exactly* 300 MByte is the upper limit.
So an object which is "approximately 290" may in truth be *over* 300
and thus not permitted to cache.
NP: you can use the tool at redbot.org to check URL cacheability. It
will also tell you about any caching related HTTP compliance issues with
that resource.
Or you can set "debug_options 11,2" in your squid.conf and check the
exact HTTP messages your proxy is dealing with.
When I checked the
Post by Hariharan Sethuraman
du of cache directory, it is intact with 200KB
...
Post by Hariharan Sethuraman
..
cache allow all
strip_query_terms off
Above are defaults. No need to configure since Squid-3.
Post by Hariharan Sethuraman
..
cache_dir ufs /var/spool/squid/cache 2000 16 256
maximum_object_size 300 MB
..
range_offset_limit -1
..
url_rewrite_access allow all
url_rewrite_program /usr/bin/python /usr/share/proxypass.py
Not relevant, except that when testing the URL like with redbot.org you
need to use the URL this helper produces instead of what was passed into
Squid by the client.
Post by Hariharan Sethuraman
http_access deny all
...
always_direct deny all
(a) Please let me know what am missing to enable cache.
Cache is enabled and Squid caches as much as it can by default - within
the limits prescribed by HTTP specification and your config settings.
So the only thing to do is ensure that you do not actively *prevent*
caching from happening somehow.
Post by Hariharan Sethuraman
(b) Also "squidclient -h localhost cache_object://localhost/
mgr:objects" hope this command will show the entry even after caching.
It (well, "squidclient mgr:objects") should show all objects currently
known to the proxy. That will mostly be cached objects (both disk and
in-memory), but also included temporary in-transit objects.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Hariharan Sethuraman
2018-09-08 04:18:56 UTC
Permalink
Hi Amos,

This is what I see when the download is in progress:

KEY 44000000000000000902000000000000
STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
LV:1536379799 LU:1536379801 LM:1532110990 EX:-1
4 locks, 1 clients, 1 refs
Swap Dir -1, File 0XFFFFFFFF
GET
https://example.com/DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM
inmem_lo: 99225582
inmem_hi: 99324372
swapout: 0 bytes queued
Post by Hariharan Sethuraman
Hi,
I see the response can be cached. Will try out increasing logging level of
cache.log
HTTP/1.1 200 OK
Date: Sat, 08 Sep 2018 04:10:38 GMT
Server: Apache/2.2
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/plain;; charset=ISO-8859-1
response headers: 203 bytes body: 21 bytes transfer overhead: 9 bytes
view body
<https://redbot.org/?uri=https%3A%2F%2Fdl.cisco.com%2Fpcgi-bin%2Fswdld%2Fdownload.cgi%3Fdwnld_code%3DxhMnkw8Z-oFg8Jvk7BeSZnkmIdO48GTAA01cTlDABiH7c9QJsq9s5ypIQWgQxY4cI66yKQsWgjvIpCgr8_DGemwm_6VqWfSJQ8Xonly7l-DcKbz9kqMQXOHnp2G1BHj_-wp1DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM&req_hdr=Authentication%3ABearer+HQdHyp1lCj2cZxGYPlzEqlEuGxww#>
view har
<https://redbot.org/?id=9ml2eaa9&req_hdr=Authentication%3ABearer%20HQdHyp1lCj2cZxGYPlzEqlEuGxww&req_hdr=User-Agent%3ARED/1%20(https://redbot.org/)&req_hdr=Referer%3Ahttps://dl.cisco.com/pcgi-bin/swdld/download.cgi?dwnld_code%3DxhMnkw8Z-oFg8Jvk7BeSZnkmIdO48GTAA01cTlDABiH7c9QJsq9s5ypIQWgQxY4cI66yKQsWgjvIpCgr8_DGemwm_6VqWfSJQ8Xonly7l-DcKbz9kqMQXOHnp2G1BHj_-wp1DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM&check_name=default&format=har>
save
<https://redbot.org/?uri=https%3A%2F%2Fdl.cisco.com%2Fpcgi-bin%2Fswdld%2Fdownload.cgi%3Fdwnld_code%3DxhMnkw8Z-oFg8Jvk7BeSZnkmIdO48GTAA01cTlDABiH7c9QJsq9s5ypIQWgQxY4cI66yKQsWgjvIpCgr8_DGemwm_6VqWfSJQ8Xonly7l-DcKbz9kqMQXOHnp2G1BHj_-wp1DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM&req_hdr=Authentication%3ABearer+HQdHyp1lCj2cZxGYPlzEqlEuGxww#>
General
- The Content-Type header's syntax isn't valid.
- The Keep-Alive header is deprecated.
- The server's clock is correct.
Caching
- This response allows all caches to store it.
- This response allows a cache to assign its own freshness lifetime.
Thanks,
Hari
Post by Amos Jeffries
Post by Hariharan Sethuraman
Hi team,
I have created directories using squid -z and then triggered squid -f
/etc/squid/squid.conf -NYCd 1. Find (1) debug info below. And below (2)
are the cache directory and squid-config.
squidclient -h localhost cache_object://localhost/ mgr:objects >>> this
You do not need to pass squidclient the cache_object: URLs, nor
squidclient mgr:objects
Also, what *exactly* did that report tell you?
"cache" is more than just the disk storage area.
Post by Hariharan Sethuraman
was showing the entry when the download was going on and disappeared
after the download complete(~290MB) on the browser.
What I am thinking reading that is that probably Squid used the cache
storage area as a temporary location for the bytes of a very large
object, but then removed it once the response was completely delivered
since it was not cacheable.
Details matter. The "~" means "approximately" and your config says
*exactly* 300 MByte is the upper limit.
So an object which is "approximately 290" may in truth be *over* 300
and thus not permitted to cache.
NP: you can use the tool at redbot.org to check URL cacheability. It
will also tell you about any caching related HTTP compliance issues with
that resource.
Or you can set "debug_options 11,2" in your squid.conf and check the
exact HTTP messages your proxy is dealing with.
When I checked the
Post by Hariharan Sethuraman
du of cache directory, it is intact with 200KB
...
Post by Hariharan Sethuraman
..
cache allow all
strip_query_terms off
Above are defaults. No need to configure since Squid-3.
Post by Hariharan Sethuraman
..
cache_dir ufs /var/spool/squid/cache 2000 16 256
maximum_object_size 300 MB
..
range_offset_limit -1
..
url_rewrite_access allow all
url_rewrite_program /usr/bin/python /usr/share/proxypass.py
Not relevant, except that when testing the URL like with redbot.org you
need to use the URL this helper produces instead of what was passed into
Squid by the client.
Post by Hariharan Sethuraman
http_access deny all
...
always_direct deny all
(a) Please let me know what am missing to enable cache.
Cache is enabled and Squid caches as much as it can by default - within
the limits prescribed by HTTP specification and your config settings.
So the only thing to do is ensure that you do not actively *prevent*
caching from happening somehow.
Post by Hariharan Sethuraman
(b) Also "squidclient -h localhost cache_object://localhost/
mgr:objects" hope this command will show the entry even after caching.
It (well, "squidclient mgr:objects") should show all objects currently
known to the proxy. That will mostly be cached objects (both disk and
in-memory), but also included temporary in-transit objects.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Hariharan Sethuraman
2018-09-08 14:43:16 UTC
Permalink
But the partial data is been continuously sending back to client. Squid
didn't wait for the complete file to download
1. HEAD request to read image information
2. GET request to download the image
Not quite. GET partial / Range request to fetch the content.
Squid converted it into a full request for the backend server due to
range_offset_limit -1. But that does mean Squid had to download ~240MB
of data before anything starts being sent to the client.
(2)
---------
GET /DcKbz9kqMQXK-zp95pv9LH11kjhTpxOJsJ-1FYEL4
Host: example.com:3129 <http://example.com:3129>^M
Range: bytes=242819145-^M
User-Agent: curl/7.56.1^M
Accept: */*^M
2018/09/08 07:28:39.938| 11,2| http.cc(2261) sendRequest: HTTP Server
---------
GET /DcKbz9kqMQXK-zp95pv9LH11kjhTpxOJsJ-1FYEL4
User-Agent: curl/7.56.1^M
Accept: */*^M
Host: exampletarget.com <http://exampletarget.com>^M
Via: 1.1 jb7mgd (squid/4.2)^M
Surrogate-Capability: jb7mgd="Surrogate/1.0"^M
X-Forwarded-For: **.**.**.**^M
Cache-Control: max-age=0^M
Connection: keep-alive^M
2018/09/08 07:28:44.359| 11,2| http.cc(723) processReplyHeader: HTTP
---------
HTTP/1.1 200 OK
Date: Sat, 08 Sep 2018 07:28:40 GMT^M
Server: Apache/2.2^M
Content-Disposition: attachment; filename=somefile.iso;^M
Last-Modified: Fri, 20 Jul 2018 18:23:10 GMT^M
ETag: "4a54c59-11653800-571726350bf80"^M
Accept-Ranges: bytes^M
Content-Length: 291846144^M
Keep-Alive: timeout=5, max=100^M
Connection: Keep-Alive^M
Content-Type: application/unknown^M
2018/09/08 07:28:44.361| 11,2| Stream.cc(267) sendStartOfMessage: HTTP
---------
HTTP/1.1 206 Partial Content^M
Date: Sat, 08 Sep 2018 07:28:40 GMT^M
Server: Apache/2.2^M
Content-Disposition: attachment; filename=somefile.iso;^M
Last-Modified: Fri, 20 Jul 2018 18:23:10 GMT^M
ETag: "4a54c59-11653800-571726350bf80"^M
Accept-Ranges: bytes^M
Content-Type: application/unknown^M
X-Cache: MISS from jb7mgd^M
X-Cache-Lookup: MISS from jb7mgd:3128^M
Via: 1.1 jb7mgd (squid/4.2)^M
Connection: keep-alive^M
Content-Range: bytes 242819145-291846143/291846144^M
Content-Length: 49026999^M
Thanks,
Hari
Hi Amos,
KEY 44000000000000000902000000000000
STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
So file stored in memory and scheduled for removal.
LV:1536379799 LU:1536379801 LM:1532110990 EX:-1
4 locks, 1 clients, 1 refs
Swap Dir -1, File 0XFFFFFFFF
GET
https://example.com/DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM
inmem_lo: 99225582
inmem_hi: 99324372
swapout: 0 bytes queued
Amos
Hariharan Sethuraman
2018-09-10 12:18:55 UTC
Permalink
Hi All,

I have two things to clarify:
1) In earlier email (snipped below), Amos told that is caching and
scheduled to download - does it mean that we got the answer and do some
override?
--------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Post by Hariharan Sethuraman
KEY 44000000000000000902000000000000
STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
So file stored in memory and scheduled for removal.
--------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2) With more debug_options enabled, I see that it is not caching because
the response is part of authenticated flow. Is there a way I can override
this?
- Initially when this file download starts, it gets authorized (I think
that is why I see HEAD request is sent to target which has Cache-Control:
max-age=0) and then the subsequent GET requests (dont have any
cache-control header) download the chunk using adjusted range and offset.
- But the subsequent GET reply seems set with auth flag, see the code
below snipped from source (
https://github.com/squid-cache/squid/blob/4f1c93a7a0d14eec223e199275ce570d840f71bc/src/http.cc
).
// RFC 2068, sec 14.9.4 - MUST NOT cache any response with
Authentication UNLESS certain CC controls are present
// allow HTTP violations to IGNORE those controls (ie re-block caching
Auth)
if (request && (request->flags.auth || request->flags.authSent)) {
if (!rep->cache_control)
return decision.make(ReuseDecision::reuseNot,
"authenticated and server reply missing
Cache-Control");
- I tried adding override-expire in the cgi-bin refresh pattern, but that
will override only for max-age in Cache-control but not relevant for auth
flag.

Please find the logs below:
Please let me know if I am missing something.

Thanks,
Hari

Refresh pattern (URL:
https://example.com/pcgi-bin/swdld/download.cgi?dwnld_code=xhM...):
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320

Logs:
2018/09/10 09:57:24.864| 11,2| http.cc(723) processReplyHeader: HTTP Server
RESPONSE:
---------
HTTP/1.1 206 Partial Content
Date: Mon, 10 Sep 2018 09:57:20 GMT^M
Server: Apache/2.2^M
Content-Disposition: attachment; filename=somefile.iso;^M
Last-Modified: Thu, 22 Mar 2018 02:11:11 GMT^M
ETag: "4ad8a61-1193b000-567f6d2466dc0"^M
Accept-Ranges: bytes^M
Content-Length: 294342471^M
Content-Range: bytes 549049-294891519/294891520^M
Keep-Alive: timeout=5, max=100^M
Connection: Keep-Alive^M
Content-Type: application/unknown^M
^M
----------
2018/09/10 09:57:24.864| 11,5| Client.cc(134) setVirginReply:
0x5560bf163d78 setting virgin reply to 0x5560bf162c20
2018/09/10 09:57:24.864| ctx: exit level 0
2018/09/10 09:57:24.864| 83,3| AccessCheck.cc(42) Start: adaptation off,
skipping
2018/09/10 09:57:24.864| 11,5| Client.cc(969) adaptOrFinalizeReply:
adaptationAccessCheckPending=0
2018/09/10 09:57:24.864| 11,5| Client.cc(152) setFinalReply: 0x5560bf163d78
setting final reply to 0x5560bf162c20
2018/09/10 09:57:24.864| 20,3| store.cc(1807) replaceHttpReply:
StoreEntry::replaceHttpReply:
https://example.com/pcgi-bin/swdld/download.cgi?dwnld_code=xhMnkw8Z-oECuFusb12luTTCm0rP8jZiRFu8gsXRtc
2018/09/10 09:57:24.864| ctx: enter level 0: '
https://example.com/pcgi-bin/swdld/download.cgi?dwnld_code=xhMnkw8Z-oECuFusb12luTTCm0rP8jZiRFu8gsXRtcoacGcTu6dv-dkLcT4lqtgvM70n8-ucJsj09lRYt_a0t7_M5
2018/09/10 09:57:24.864| 11,3| http.cc(907) haveParsedReplyHeaders: HTTP
CODE: 206
2018/09/10 09:57:24.864| 73,3| HttpRequest.cc(664) storeId: sent back
effectiveRequestUrl:
https://dl.cisco.com/pcgi-bin/swdld/download.cgi?dwnld_code=xhMnkw8Z-oECuFusb12luTTCm0rP8jZiRFu8gsXRtcoac
2018/09/10 09:57:24.864| 20,3| Controller.cc(386) peek:
76E544615E001DBF49EF0F94EE0A8F9A
2018/09/10 09:57:24.865| 20,4| Controller.cc(420) peek: cannot locate
76E544615E001DBF49EF0F94EE0A8F9A
2018/09/10 09:57:24.865| 20,3| store.cc(450) releaseRequest: 0
e:=p2IV/0x5560bf16ac60*3
2018/09/10 09:57:24.865| 20,3| store.cc(580) setPrivateKey: 01
e:=p2IV/0x5560bf16ac60*3
2018/09/10 09:57:24.865| 11,3| http.cc(982) haveParsedReplyHeaders:
decided: do not cache and do not share because authenticated and server
reply missing Cache-Control; HTTP status 206 e:=p2XIV/0x
2018/09/10 09:57:24.865| ctx: exit level 0
Post by Hariharan Sethuraman
But the partial data is been continuously sending back to client. Squid
didn't wait for the complete file to download
1. HEAD request to read image information
2. GET request to download the image
Not quite. GET partial / Range request to fetch the content.
Squid converted it into a full request for the backend server due to
range_offset_limit -1. But that does mean Squid had to download ~240MB
of data before anything starts being sent to the client.
(2)
---------
GET /DcKbz9kqMQXK-zp95pv9LH11kjhTpxOJsJ-1FYEL4
Host: example.com:3129 <http://example.com:3129>^M
Range: bytes=242819145-^M
User-Agent: curl/7.56.1^M
Accept: */*^M
2018/09/08 07:28:39.938| 11,2| http.cc(2261) sendRequest: HTTP Server
---------
GET /DcKbz9kqMQXK-zp95pv9LH11kjhTpxOJsJ-1FYEL4
User-Agent: curl/7.56.1^M
Accept: */*^M
Host: exampletarget.com <http://exampletarget.com>^M
Via: 1.1 jb7mgd (squid/4.2)^M
Surrogate-Capability: jb7mgd="Surrogate/1.0"^M
X-Forwarded-For: **.**.**.**^M
Cache-Control: max-age=0^M
Connection: keep-alive^M
2018/09/08 07:28:44.359| 11,2| http.cc(723) processReplyHeader: HTTP
---------
HTTP/1.1 200 OK
Date: Sat, 08 Sep 2018 07:28:40 GMT^M
Server: Apache/2.2^M
Content-Disposition: attachment; filename=somefile.iso;^M
Last-Modified: Fri, 20 Jul 2018 18:23:10 GMT^M
ETag: "4a54c59-11653800-571726350bf80"^M
Accept-Ranges: bytes^M
Content-Length: 291846144^M
Keep-Alive: timeout=5, max=100^M
Connection: Keep-Alive^M
Content-Type: application/unknown^M
2018/09/08 07:28:44.361| 11,2| Stream.cc(267) sendStartOfMessage: HTTP
---------
HTTP/1.1 206 Partial Content^M
Date: Sat, 08 Sep 2018 07:28:40 GMT^M
Server: Apache/2.2^M
Content-Disposition: attachment; filename=somefile.iso;^M
Last-Modified: Fri, 20 Jul 2018 18:23:10 GMT^M
ETag: "4a54c59-11653800-571726350bf80"^M
Accept-Ranges: bytes^M
Content-Type: application/unknown^M
X-Cache: MISS from jb7mgd^M
X-Cache-Lookup: MISS from jb7mgd:3128^M
Via: 1.1 jb7mgd (squid/4.2)^M
Connection: keep-alive^M
Content-Range: bytes 242819145-291846143/291846144^M
Content-Length: 49026999^M
Thanks,
Hari
Hi Amos,
KEY 44000000000000000902000000000000
STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
So file stored in memory and scheduled for removal.
LV:1536379799 LU:1536379801 LM:1532110990 EX:-1
4 locks, 1 clients, 1 refs
Swap Dir -1, File 0XFFFFFFFF
GET
https://example.com/DhAskLOUpvRG2oeR_f_FxYTyLVHIN5esRF-LXOUKwkwyT0TOf6xO-AUm3KaM
inmem_lo: 99225582
inmem_hi: 99324372
swapout: 0 bytes queued
Amos
Amos Jeffries
2018-09-10 14:16:03 UTC
Permalink
Post by Hariharan Sethuraman
Hi All,
1) In earlier email (snipped below), Amos told that is caching and
scheduled to download
Thats not what I wrote. There is data and it is scheduled for removal
(erase) as soon as the current client gets responded to. It is
specifically *not* caching.

That confirms why you are not seeing anything in the disk cache.
Post by Hariharan Sethuraman
- does it mean that we got the answer and do some
override?
--------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 
     KEY 44000000000000000902000000000000
             STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
             RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
So file stored in memory and scheduled for removal. 
--------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
2) With more debug_options enabled, I see that it is not caching because
the response is part of authenticated flow. Is there a way I can
override this?
No. The server is supplying sufficient headers for caching to make it
appear that the site authors intentionally are sending what does get
delivered.
Post by Hariharan Sethuraman
-   Initially when this file download starts, it gets authorized (I
think that is why I see HEAD request is sent to target which has
Cache-Control: max-age=0) and then the subsequent GET requests (dont
have any cache-control header) download the chunk using adjusted range
and offset.
Client delivered Cache-Control do not matter. It is the *server*
Cache-Control which matters here.

The client can also *not* send the authentication header. That would let
Squid cache the object IF the server sent this same object without
credentials being needed.
Post by Hariharan Sethuraman
-   But the subsequent GET reply seems set with auth flag, see the code
below snipped from source
(https://github.com/squid-cache/squid/blob/4f1c93a7a0d14eec223e199275ce570d840f71bc/src/http.cc). 
        // RFC 2068, sec 14.9.4 - MUST NOT cache any response with
Authentication UNLESS certain CC controls are present
    // allow HTTP violations to IGNORE those controls (ie re-block
caching Auth)
    if (request && (request->flags.auth || request->flags.authSent)) {
        if (!rep->cache_control)
            return decision.make(ReuseDecision::reuseNot,
                                 "authenticated and server reply missing
Cache-Control");
-   I tried adding override-expire in the cgi-bin refresh pattern, but
that will override only for max-age in Cache-control but not relevant
for auth flag.
Indeed. It also will only do anything for certain outdated dynamic
content URLs.

As far as Squid is able to tell the content was generated specifically
for this authenticated user. You need the server to send Cache-Control
with one of public, must-revalidate, or s-maxage which indicate that it
is actually cacheable by a shared cache (ie Squid).

Otherwise the object is "private", and the cache related settings are
intended for a client-specific cache, such as a Browser has.



Amos
Hariharan Sethuraman
2018-09-10 14:27:39 UTC
Permalink
This post might be inappropriate. Click to display it.
Hariharan Sethuraman
2018-09-10 14:37:19 UTC
Permalink
Also can I achieve using reply_header_replace directive? I know it is
violation, just to understand the available options.
Post by Hariharan Sethuraman
Post by Amos Jeffries
Post by Hariharan Sethuraman
Hi All,
1) In earlier email (snipped below), Amos told that is caching and
scheduled to download
Thats not what I wrote. There is data and it is scheduled for removal
(erase) as soon as the current client gets responded to. It is
specifically *not* caching.
That confirms why you are not seeing anything in the disk cache.
Post by Hariharan Sethuraman
- does it mean that we got the answer and do some
override?
--------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Post by Hariharan Sethuraman
Post by Hariharan Sethuraman
KEY 44000000000000000902000000000000
STORE_PENDING NOT_IN_MEMORY SWAPOUT_NONE PING_DONE
RELEASE_REQUEST,DISPATCHED,PRIVATE,VALIDATED
So file stored in memory and scheduled for removal.
--------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Post by Hariharan Sethuraman
2) With more debug_options enabled, I see that it is not caching because
the response is part of authenticated flow. Is there a way I can
override this?
No. The server is supplying sufficient headers for caching to make it
appear that the site authors intentionally are sending what does get
delivered.
If I understand correctly, you are saying the caching will not be done on
squid as the content is authorised by the specific client. We can't do
anything until I ask site owners to change cache control as public?
Post by Amos Jeffries
Post by Hariharan Sethuraman
- Initially when this file download starts, it gets authorized (I
think that is why I see HEAD request is sent to target which has
Cache-Control: max-age=0) and then the subsequent GET requests (dont
have any cache-control header) download the chunk using adjusted range
and offset.
Client delivered Cache-Control do not matter. It is the *server*
Cache-Control which matters here.
The client can also *not* send the authentication header. That would let
Squid cache the object IF the server sent this same object without
credentials being needed.
Post by Hariharan Sethuraman
- But the subsequent GET reply seems set with auth flag, see the code
below snipped from source
(
https://github.com/squid-cache/squid/blob/4f1c93a7a0d14eec223e199275ce570d840f71bc/src/http.cc
).
Post by Hariharan Sethuraman
// RFC 2068, sec 14.9.4 - MUST NOT cache any response with
Authentication UNLESS certain CC controls are present
// allow HTTP violations to IGNORE those controls (ie re-block
caching Auth)
if (request && (request->flags.auth || request->flags.authSent)) {
if (!rep->cache_control)
return decision.make(ReuseDecision::reuseNot,
"authenticated and server reply missing
Cache-Control");
- I tried adding override-expire in the cgi-bin refresh pattern, but
that will override only for max-age in Cache-control but not relevant
for auth flag.
Indeed. It also will only do anything for certain outdated dynamic
content URLs.
As far as Squid is able to tell the content was generated specifically
for this authenticated user. You need the server to send Cache-Control
with one of public, must-revalidate, or s-maxage which indicate that it
is actually cacheable by a shared cache (ie Squid).
Otherwise the object is "private", and the cache related settings are
intended for a client-specific cache, such as a Browser has.
Amos
Amos Jeffries
2018-09-10 14:45:52 UTC
Permalink
Post by Hariharan Sethuraman
Also can I achieve using reply_header_replace directive? I know it is
violation, just to understand the available options.
No, the header replacement only alters the messages leaving Squid.

Amos
Amos Jeffries
2018-09-10 14:50:59 UTC
Permalink
Post by Alex Rousskov
Post by Hariharan Sethuraman
 
2) With more debug_options enabled, I see that it is not caching
because
Post by Hariharan Sethuraman
the response is part of authenticated flow. Is there a way I can
override this?
No. The server is supplying sufficient headers for caching to make it
appear that the site authors intentionally are sending what does get
delivered.
If I understand correctly, you are saying the caching will not be done
on squid as the content is authorised by the specific client. We can't
Authenticated, not authorized. This is one place where the difference
matters.
Post by Alex Rousskov
do anything until I ask site owners to change cache control as public?
Pretty much, yes. I know Chrome at least used to deliver binaries whose
installer contained details of the Google account of the user fetching
it. So its not very safe to assume even downloaders are safely transferable.

Amos
Hariharan Sethuraman
2018-09-10 15:03:50 UTC
Permalink
Thanks a lot Amos.

1) ok, the client does a GET /<resource> with authorization header. So I
cant cache unless I ask the site-owner to send the cache-control to
whatever it can enable the intermediate cache-server to persist it.
2) Does squid-cache allow a way where I can upload the file into cache?
Post by Amos Jeffries
Post by Alex Rousskov
Post by Hariharan Sethuraman
2) With more debug_options enabled, I see that it is not caching
because
Post by Hariharan Sethuraman
the response is part of authenticated flow. Is there a way I can
override this?
No. The server is supplying sufficient headers for caching to make it
appear that the site authors intentionally are sending what does get
delivered.
If I understand correctly, you are saying the caching will not be done
on squid as the content is authorised by the specific client. We can't
Authenticated, not authorized. This is one place where the difference
matters.
Post by Alex Rousskov
do anything until I ask site owners to change cache control as public?
Pretty much, yes. I know Chrome at least used to deliver binaries whose
installer contained details of the Google account of the user fetching
it. So its not very safe to assume even downloaders are safely
transferable.
Amos
Amos Jeffries
2018-09-10 15:26:11 UTC
Permalink
Post by Hariharan Sethuraman
Thanks a lot Amos.
1) ok, the client does a GET /<resource> with authorization header. So I
cant cache unless I ask the site-owner to send the cache-control to
whatever it can enable the intermediate cache-server to persist it.
2) Does squid-cache allow a way where I can upload the file into cache?
If you can fetch it without sending credentials, that response should be
cacheable.

Amos
Hariharan Sethuraman
2018-09-10 15:32:14 UTC
Permalink
It requires Auth for download. Thanks, I will find out a way.
Post by Amos Jeffries
Post by Hariharan Sethuraman
Thanks a lot Amos.
1) ok, the client does a GET /<resource> with authorization header. So I
cant cache unless I ask the site-owner to send the cache-control to
whatever it can enable the intermediate cache-server to persist it.
2) Does squid-cache allow a way where I can upload the file into cache?
If you can fetch it without sending credentials, that response should be
cacheable.
Amos
Alex Rousskov
2018-09-10 15:40:57 UTC
Permalink
Post by Hariharan Sethuraman
1) ok, the client does a GET /<resource> with authorization header. So I
cant cache unless I ask the site-owner to send the cache-control to
whatever it can enable the intermediate cache-server to persist it.
2) Does squid-cache allow a way where I can upload the file into cache?
You may have one or two Squid-related options AFAICT:

1. Configure Squid to remove the authentication headers going to the
origin server (see request_header_access). If the origin server does not
actually require authentication for this specific resource, then Squid
will get a cachable response back. Assuming the server is not broken,
this approach is safe. However, this approach will _not_ work if the
server requires authentication for this resource.

2. Configure Squid to (use an adaptation service to) add Cache-Control
response headers that would allow Squid to cache the authenticated
response. Adding response headers pre-cache probably requires using an
adaptation service -- Squid itself does not have a directive that would
add response headers before the response is evaluated for cachability
(reply_header_access is a post-cache directive so it will not work
here). This approach should "work" regardless of the server behavior. As
Amos has said, this approach is _unsafe_ -- you may cache and share a
response with user-specific info in it. You should not do this unless
you are absolutely sure that the response is safe to share!

Changing the origin server behavior is the best option if it is
available to you.

Alex.
Post by Hariharan Sethuraman
     >  
     > 2) With more debug_options enabled, I see that it is not caching
     because
     > the response is part of authenticated flow. Is there a way I can
     > override this?
     No. The server is supplying sufficient headers for caching to
make it
     appear that the site authors intentionally are sending what
does get
     delivered.
If I understand correctly, you are saying the caching will not be done
on squid as the content is authorised by the specific client. We can't
Authenticated, not authorized. This is one place where the difference
matters.
do anything until I ask site owners to change cache control as public?
Pretty much, yes. I know Chrome at least used to deliver binaries whose
installer contained details of the Google account of the user fetching
it. So its not very safe to assume even downloaders are safely transferable.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Hariharan Sethuraman
2018-09-10 15:48:26 UTC
Permalink
Many thanks Alex, option 2 could work. Will check on security aspects.

Thanks again to everyone.
Post by Alex Rousskov
Post by Hariharan Sethuraman
1) ok, the client does a GET /<resource> with authorization header. So I
cant cache unless I ask the site-owner to send the cache-control to
whatever it can enable the intermediate cache-server to persist it.
2) Does squid-cache allow a way where I can upload the file into cache?
1. Configure Squid to remove the authentication headers going to the
origin server (see request_header_access). If the origin server does not
actually require authentication for this specific resource, then Squid
will get a cachable response back. Assuming the server is not broken,
this approach is safe. However, this approach will _not_ work if the
server requires authentication for this resource.
2. Configure Squid to (use an adaptation service to) add Cache-Control
response headers that would allow Squid to cache the authenticated
response. Adding response headers pre-cache probably requires using an
adaptation service -- Squid itself does not have a directive that would
add response headers before the response is evaluated for cachability
(reply_header_access is a post-cache directive so it will not work
here). This approach should "work" regardless of the server behavior. As
Amos has said, this approach is _unsafe_ -- you may cache and share a
response with user-specific info in it. You should not do this unless
you are absolutely sure that the response is safe to share!
Changing the origin server behavior is the best option if it is
available to you.
Alex.
Post by Hariharan Sethuraman
Post by Alex Rousskov
Post by Hariharan Sethuraman
2) With more debug_options enabled, I see that it is not
caching
Post by Hariharan Sethuraman
Post by Alex Rousskov
because
Post by Hariharan Sethuraman
the response is part of authenticated flow. Is there a way I
can
Post by Hariharan Sethuraman
Post by Alex Rousskov
Post by Hariharan Sethuraman
override this?
No. The server is supplying sufficient headers for caching to
make it
Post by Alex Rousskov
appear that the site authors intentionally are sending what
does get
Post by Alex Rousskov
delivered.
If I understand correctly, you are saying the caching will not be
done
Post by Hariharan Sethuraman
Post by Alex Rousskov
on squid as the content is authorised by the specific client. We
can't
Post by Hariharan Sethuraman
Authenticated, not authorized. This is one place where the difference
matters.
Post by Alex Rousskov
do anything until I ask site owners to change cache control as
public?
Post by Hariharan Sethuraman
Pretty much, yes. I know Chrome at least used to deliver binaries
whose
Post by Hariharan Sethuraman
installer contained details of the Google account of the user
fetching
Post by Hariharan Sethuraman
it. So its not very safe to assume even downloaders are safely transferable.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Loading...