Discussion:
[squid-users] HTTPS cache for Java application - only getting TCP_MISS
baretomas
2018-06-13 19:28:27 UTC
Permalink
Hello,

I'm setting up a Squid proxy as a cache for a number (as many as possible)
of identical JAVA applications to run their web calls through. The calls are
ofc identical, and the response they get can safely be cached for 5-10
seconds.
I do this because most of the calls is directed at a single server on the
internet that I don't want to hammer, since I will ofc be locked out of it
then.

Currently Im simply testing this on a single computer: the application and
squid

The calls from the application is done using ssl / https by telling java to
use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up
squid and JAVA with self-signed certificates, and the application sends its
calls through squid and gets the reponse. No problem there (wasnt easy that
either I must say :P ).

The problem is that none of the calls get cached: All rows in the access.log
hava a TCP_MISS/200 tag in them.

I've searched all through the web for a solution to this, and have tried
everything people have suggested. So I was hoping someone could help me?

Anyone have any tips on what to try?

MY config (note Ive set the refresh_pattern like that just to see if I could
catch anything. The plan is to modify it so it actualyl does refresh the
responses frmo the web calls in 5-10 seconds intervals. There are commented
out pats Ive tried with no luck there too):


#
# Recommended minimum configuration:
#

debug_options ALL,2

# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where browsing
# should be allowed

acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
#acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
#acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged)
machines

acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

#
# Recommended minimum Access Permission configuration:
#

# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager

# Deny requests to certain unsafe ports
http_access deny !Safe_ports

# Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports

# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#



# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost

# And finally deny all other access to this proxy
http_access deny all


# Squid normally listens to port 3128
#http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem
key=/cygdrive/c/squid/etc/squid/ssl/myca.key

http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem

#https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem



# Uncomment the line below to enable disk caching - path format is
/cygdrive/<full path to cache folder>, i.e.
#cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256

# certificate generation program
sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
/cygdrive/c/squid/var/cache/squid_ssldb -M 4MB

# Leave coredumps in the first cache dir
coredump_dir /var/cache/squid

# Add any of your own refresh_pattern entries above these.
#refresh_pattern ^ftp: 1440 20% 10080
#refresh_pattern ^gopher: 1440 0% 1440
#refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
#refresh_pattern -i (/cgi-bin/|\?) 1440 100% 4320 ignore-no-store
override-lastmod override-expire ignore-must-revalidate ignore-reload
ignore-private ignore-auth
refresh_pattern . 1440 100% 4320 ignore-no-store override-lastmod
override-expire ignore-must-revalidate ignore-reload ignore-private
ignore-auth override-lastmod

# Bumped requests have relative URLs so Squid has to use reverse proxy
# or accelerator code. By default, that code denies direct forwarding.
# The need for this option may disappear in the future.
#always_direct allow all

dns_nameservers 8.8.8.8 208.67.222.222

max_filedescriptors 3200

# Max Object Size Cache
maximum_object_size 10240 KB


acl step1 at_step SslBump1

ssl_bump peek step1
ssl_bump bump all


#acl step1 at_step SslBump1
#acl step2 at_step SslBump2
#acl step3 at_step SslBump3

#acl ssl_exclude_domains ssl::server_name
"/cygdrive/c/squid/etc/squid/ssl_exclude_domains.conf"
#acl ssl_exclude_ips dst
"/cygdrive/c/squid/etc/squid/ssl_exclude_ips.conf"

#ssl_bump splice localhost
#ssl_bump peek step1 all
#ssl_bump splice ssl_exclude_domains
#ssl_bump splice ssl_exclude_ips
#ssl_bump stare step2 all
#ssl_bump bump all




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
Antony Stone
2018-06-13 19:37:08 UTC
Permalink
Post by baretomas
Hello,
I'm setting up a Squid proxy as a cache for a number (as many as possible)
of identical JAVA applications to run their web calls through.
The problem is that none of the calls get cached: All rows in the
access.log hava a TCP_MISS/200 tag in them.
I've searched all through the web for a solution to this, and have tried
everything people have suggested. So I was hoping someone could help me?
Show us the response you get (at least the full headers, content is neither
here nor there) from the remote server.

My bet is that the website manager has used one or more "don't cache"
directives which Squid is simply faithfully obeying.


Antony.
--
Please apologise my errors, since I have a very small device.

Please reply to the list;
please *don't* CC me.
Antony Stone
2018-06-13 19:44:51 UTC
Permalink
Post by baretomas
The calls from the application is done using ssl / https by telling java to
use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost).
Okay, but...
Post by baretomas
http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
# certificate generation program
sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
/cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump bump all
Surely all this peeking and bumping is only needed if you're running Squid in
interception mode, whereas you've said that you've configured your Java
application to explicitly use Squid as a proxy?


Have you tried your Squid configuration with a plain browser, configured to use
the proxy, with (a) a few random websites, and (b) the specific resource you're
trying to access from your Java application, to see whether it is actually
working as a caching proxy?


Antony.
--
This sentence contains exacly three erors.

Please reply to the list;
please *don't* CC me.
Amos Jeffries
2018-06-14 11:33:36 UTC
Permalink
Post by Antony Stone
Post by baretomas
The calls from the application is done using ssl / https by telling java to
use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost).
Okay, but...
Post by baretomas
http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
# certificate generation program
sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
/cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump bump all
Surely all this peeking and bumping is only needed if you're running Squid in
interception mode,
Not quite. SSL-Bump is interception of the TLS layer. Regular / forward
/ explicit proxies use it to decrypt the CONNECT messages transporting
HTTPS traffic through tunnels.
Post by Antony Stone
whereas you've said that you've configured your Java
application to explicitly use Squid as a proxy?
The proxy port and SSL-Bump config is consistent with a SSL-Bumping
forward proxy.

I suspect the -Dhttp.proxyHost is probably the Java apps equivalent to
the Linux http_proxy environment variables we are more familiar with
seeing applications use to connect to that type of proxy.
Post by Antony Stone
Have you tried your Squid configuration with a plain browser, configured to use
the proxy, with (a) a few random websites, and (b) the specific resource you're
trying to access from your Java application, to see whether it is actually
working as a caching proxy?
Good idea.


Amos
Tomas Finnøy
2018-06-14 07:09:05 UTC
Permalink
Post by Antony Stone
Surely all this peeking and bumping is only needed if you're running Squid in
interception mode, whereas you've said that you've configured your Java
application to explicitly use Squid as a proxy?
I found some "how-to's" and posts that were explaining how to make a https cache proxy, and they were all mentioning bumping. Isn't the bump needed to decrypt the response, so it is possible to store it in the cache? I dont need any acl with peek and bump for my scenario at all, is what you are saying?
Post by Antony Stone
Have you tried your Squid configuration with a plain browser, configured to use
the proxy, with (a) a few random websites, and (b) the specific resource you're
trying to access from your Java application, to see whether it is actually
working as a caching proxy?
No. And something I will do now. Thanks for tips.

Sorry for the messy formatting here, but I didnt get your responses to my mail. I only saw it in the archives and copied it over to my mail here....

/Tomas
Antony Stone
2018-06-14 08:25:54 UTC
Permalink
Post by Tomas Finnøy
Post by Antony Stone
Surely all this peeking and bumping is only needed if you're running
Squid in interception mode, whereas you've said that you've configured
your Java application to explicitly use Squid as a proxy?
I found some "how-to's" and posts that were explaining how to make a https
cache proxy, and they were all mentioning bumping. Isn't the bump needed
to decrypt the response, so it is possible to store it in the cache?
No, because when you explicitly configure a browser (or in your case a Java
application) to use a proxy, it sends a request to the proxy saying "please go
and fetch something from this URI for me", and Squid then does all the HTTPS
negotiations needed to talk to the remote server. What Squid gets back is the
plain unencrypted content, which it can then pass on to the browser (or
application), and if it's allowed to (by whatever it finds in the headers of
the response) it can also cache it.
Post by Tomas Finnøy
I dont need any acl with peek and bump for my scenario at all, is what you
are saying?
Correct.
Post by Tomas Finnøy
Post by Antony Stone
Have you tried your Squid configuration with a plain browser, configured
to use the proxy, with (a) a few random websites, and (b) the specific
resource you're trying to access from your Java application, to see
whether it is actually working as a caching proxy?
No. And something I will do now. Thanks for tips.
No problem. Just suggesting "start simple" before moving on to several
complex things interacting with each other...
Post by Tomas Finnøy
Sorry for the messy formatting here, but I didnt get your responses to my
mail. I only saw it in the archives and copied it over to my mail here....
Hm, odd, I see my reply on the list just as normal.


Antony.
--
I thought of going into banking, until I lost interest.

Please reply to the list;
please *don't* CC me.
Tomas Finnøy
2018-06-14 08:31:48 UTC
Permalink
Post by Antony Stone
Post by Tomas Finnøy
Post by Antony Stone
Surely all this peeking and bumping is only needed if you're running
Squid in interception mode, whereas you've said that you've configured
your Java application to explicitly use Squid as a proxy?
I found some "how-to's" and posts that were explaining how to make a https
cache proxy, and they were all mentioning bumping. Isn't the bump needed
to decrypt the response, so it is possible to store it in the cache?
No, because when you explicitly configure a browser (or in your case a Java
application) to use a proxy, it sends a request to the proxy saying "please go
and fetch something from this URI for me", and Squid then does all the HTTPS
negotiations needed to talk to the remote server. What Squid gets back is the
plain unencrypted content, which it can then pass on to the browser (or
application), and if it's allowed to (by whatever it finds in the headers of
the response) it can also cache it.
Post by Tomas Finnøy
I dont need any acl with peek and bump for my scenario at all, is what you
are saying?
Correct.
Post by Tomas Finnøy
Post by Antony Stone
Have you tried your Squid configuration with a plain browser, configured
to use the proxy, with (a) a few random websites, and (b) the specific
resource you're trying to access from your Java application, to see
whether it is actually working as a caching proxy?
No. And something I will do now. Thanks for tips.
No problem. Just suggesting "start simple" before moving on to several
complex things interacting with each other...
Post by Tomas Finnøy
Sorry for the messy formatting here, but I didnt get your responses to my
mail. I only saw it in the archives and copied it over to my mail here....
Hm, odd, I see my reply on the list just as normal.
Ok now it arrived like it should!

Thanks for your tips! Very much appreciated!

/Tomas
Amos Jeffries
2018-06-14 11:25:29 UTC
Permalink
Post by baretomas
Hello,
I'm setting up a Squid proxy as a cache for a number (as many as possible)
of identical JAVA applications to run their web calls through. The calls are
ofc identical, and the response they get can safely be cached for 5-10
seconds.
I do this because most of the calls is directed at a single server on the
internet that I don't want to hammer, since I will ofc be locked out of it
then.
Currently Im simply testing this on a single computer: the application and
squid
The calls from the application is done using ssl / https by telling java to
use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up
squid and JAVA with self-signed certificates, and the application sends its
calls through squid and gets the reponse. No problem there (wasnt easy that
either I must say :P ).
I was going to ask what was so hard about it. Then I looked at your
config and see that your are in fact using NAT interception instead of
the easy way.

So what _exactly_ do those -D options cause the Java applications to do
with the proxy?
I have some suspicions, but am not familiar enough with Java API and
the specific details are critical to what you need the proxy to be doing.
Post by baretomas
The problem is that none of the calls get cached: All rows in the access.log
hava a TCP_MISS/200 tag in them.
I've searched all through the web for a solution to this, and have tried
everything people have suggested. So I was hoping someone could help me?
Anyone have any tips on what to try?
There are three ways to do this:

1) if you own the domain the apps are connecting to. Setup the proxy as
a normal TLS / HTTPS reverse-proxy.

2) if you have enough control of the apps to get them connecting with
TLS *to the proxy* and sending their requests there. Do that.

3) the (relatively) complicated SSL-Bump way you found. The proxy is
fully at the mercy of the the messages sent by apps and servers. Caching
is a luxury here, easily broken / prevented.

Well, there is a forth way with intercept. But that is a VERY last
resort and you already have (3) going and that is already better than
intercept. Getting to (1) or (2) would be simplest if you meet the "if
..." requirements for those.
Post by baretomas
MY config (note Ive set the refresh_pattern like that just to see if I could
catch anything. The plan is to modify it so it actualyl does refresh the
responses frmo the web calls in 5-10 seconds intervals. There are commented
...

Ah. The way you write that implies a misunderstanding about refresh_pattern.

HTTP has some fixed algorithms written into the protocol that caches are
required to perform to determine if any object stored can be used or
requires replacement.

The parameters used by these algorithms come in the form of headers in
the originally stored reply message, the current clients request.
Sometimes they require revalidation, which is a quick check with the
server for updated instructions and/or content.

What refresh_pattern actually does is provide default values for those
algorithm parameters IF any one (or more) of them are missing from those
HTTP messages.


The proper way to make caching happen with your desired behaviour is for
the server to present HTTP Cache-Control header saying the object is
cacheable (ie does not forbid caching), but not for more than 10seconds.
Cache-Control: max-age=10
OR to say that objects need revalidation, but presents a 304 status for
revalidation checks. (ie Cache-Control:no-cache) (yeah, thats right,
"no-cache" means *do* cache).

That said, I doubt you really are wanting to force that and would be
happy if the server was instructing the the proxy as being safe to cache
an object for several minutes or any value larger than 10sec.


So what we circle back to is that you are probably trying to force
things to cache and be used long past their actual safe-to-use lifetimes
as specified by the devs most authoritative on that subject (under
10sec?). As you should be aware, this is highly unsafe thing to be doing
unless you are one of those devs - be very careful what you choose to do.
Post by baretomas
# Squid normally listens to port 3128
#http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem
key=/cygdrive/c/squid/etc/squid/ssl/myca.key
http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
#https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
Hmm. This is a Windows machine running Cygwin?
FYI: Performance is going to be terrible. It may not be super relevant
yet. Just be aware that Windows imposes limitations on usable sockets
per application - which is much smaller than a typical proxy requires.
The Cygwin people do a lot but they cannot solve some OS limitation
problems.

To meet your very first sentence "as many as possible" requirement you
will need a non-Windows machine to run the proxy on. That simple change
will get you something around 3 orders of magnitude higher peak client
capacity on the proxy.
Post by baretomas
# Uncomment the line below to enable disk caching - path format is
/cygdrive/<full path to cache folder>, i.e.
#cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256
# certificate generation program
sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
/cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
# Leave coredumps in the first cache dir
coredump_dir /var/cache/squid
# Add any of your own refresh_pattern entries above these.
#refresh_pattern ^ftp: 1440 20% 10080
#refresh_pattern ^gopher: 1440 0% 1440
#refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
#refresh_pattern -i (/cgi-bin/|\?) 1440 100% 4320 ignore-no-store
override-lastmod override-expire ignore-must-revalidate ignore-reload
ignore-private ignore-auth
refresh_pattern . 1440 100% 4320 ignore-no-store override-lastmod
override-expire ignore-must-revalidate ignore-reload ignore-private
ignore-auth override-lastmod
* ignore-must-revalidate actively *reduces* caching. Because it disables
several of the widely used HTTP mechanisms that rely on revalidation to
allow things to be stored in a cache.
It is *only* beneficial if the server is broken; requiring revalidation
plus not supporting revalidation.


* ignore-auth same un-intuitive effects as ignoring revalidation, again
reducing caching ability.
This is only useful if you want to prevent caching of contents which
require any form of login to view. High security networks dealing with
classified or confidential materials find this useful - regular Internet
admin not so much.


* ignore-no-store is highly dangerous and rarely necessary. The "nuclear
option" for caching. It has the potential to eradicate user privacy and
scramble up any server personalized content (not in a good way).
This is a last resort intended only to copy with severely braindead
applications. YMMV whether you have to deal with any of those - just
treat this an absolute last resort rather than something to play with.


Overall - in order to use these refresh-pattern controls you *need* to
know what the HTTP(S) messages going through your proxy contain in terms
of caching headers AND what those messages are doing semantically /
content wise for the client application. Using any of them as a generic
"makes caching better" thing only leads to problems in todays HTTP protocol.
Post by baretomas
# Bumped requests have relative URLs so Squid has to use reverse proxy
# or accelerator code. By default, that code denies direct forwarding.
# The need for this option may disappear in the future.
#always_direct allow all
dns_nameservers 8.8.8.8 208.67.222.222
Use of 8.8.8.8 is known to be explicitly detrimental to caching
intercepted traffic.

Those servers present different result sets based on the timing and IP
sending the query. The #1 requirement of caching intercepted (or
SSL-Bump'ed) content is that the client and proxy have the exact same
view of DNS system contents. Having the DNS reply contents change
between two consecutive and identical queries breaks that requirement.
Post by baretomas
max_filedescriptors 3200
# Max Object Size Cache
maximum_object_size 10240 KB
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump bump all
This causes the proxy to attempt decryption of the traffic using crypto
algorithms based solely on the ClientHello details and its own
capabilities. There is zero server crypto capabilities known for the
proxy to use to ensure traffic can actually make it to the server.

You are rather lucky that it actually worked at all. Almost any
deviation (ie emergency security updates in future) at either client or
server or proxy endpoints risks breaking the communication through this
proxy.

Ideally there would be a stare action for step2 and them bump only at
step 3.




So in summary to the things to try to get better caching:

* ditch 8.8.8.8. Use a local DNS resolver within your own network,
shared by clients and proxy. That can use 8.8.8.8 itself, the important
part is that it should be responsible for caching DNS results and
ensuring the app clients and Squid see as much the same records as possible.

* try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for
message being decrypted in the proxy. Look at those headers to see why
they are not caching normally. Use that info to inform your next
actions. It cannot tell you how the message is used by the application,
hopefully you can figure that out somehow before forcing anything unnatural.

* if you can, try pasting some of the transaction URLs into the tool at
redbot.org to see if there are any HTTP level mistakes in the apps that
could be fixed for better cacheability.

Amos
Tomas Finnøy
2018-06-14 15:49:59 UTC
Permalink
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Post by Amos Jeffries
Post by baretomas
Hello,
I'm setting up a Squid proxy as a cache for a number (as many as possible)
of identical JAVA applications to run their web calls through. The calls are
ofc identical, and the response they get can safely be cached for 5-10
seconds.
I do this because most of the calls is directed at a single server on the
internet that I don't want to hammer, since I will ofc be locked out of it
then.
Currently Im simply testing this on a single computer: the application and
squid
The calls from the application is done using ssl / https by telling java to
use Squid as a proxy (-Dhttps.proxyHost and -Dhttp.proxyHost). I've set up
squid and JAVA with self-signed certificates, and the application sends its
calls through squid and gets the reponse. No problem there (wasnt easy that
either I must say :P ).
I was going to ask what was so hard about it. Then I looked at your
config and see that your are in fact using NAT interception instead of
the easy way.
So what exactly do those -D options cause the Java applications to do
with the proxy?
I have some suspicions, but am not familiar enough with Java API and
the specific details are critical to what you need the proxy to be doing.
Post by baretomas
The problem is that none of the calls get cached: All rows in the access.log
hava a TCP_MISS/200 tag in them.
I've searched all through the web for a solution to this, and have tried
everything people have suggested. So I was hoping someone could help me?
Anyone have any tips on what to try?
1. if you own the domain the apps are connecting to. Setup the proxy as
a normal TLS / HTTPS reverse-proxy.
2. if you have enough control of the apps to get them connecting with
TLS to the proxy and sending their requests there. Do that.
3. the (relatively) complicated SSL-Bump way you found. The proxy is
fully at the mercy of the the messages sent by apps and servers. Caching
is a luxury here, easily broken / prevented.
Well, there is a forth way with intercept. But that is a VERY last
resort and you already have (3) going and that is already better than
intercept. Getting to (1) or (2) would be simplest if you meet the "if
..." requirements for those.
Post by baretomas
MY config (note Ive set the refresh_pattern like that just to see if I could
catch anything. The plan is to modify it so it actualyl does refresh the
responses frmo the web calls in 5-10 seconds intervals. There are commented
...
Ah. The way you write that implies a misunderstanding about refresh_pattern.
HTTP has some fixed algorithms written into the protocol that caches are
required to perform to determine if any object stored can be used or
requires replacement.
The parameters used by these algorithms come in the form of headers in
the originally stored reply message, the current clients request.
Sometimes they require revalidation, which is a quick check with the
server for updated instructions and/or content.
What refresh_pattern actually does is provide default values for those
algorithm parameters IF any one (or more) of them are missing from those
HTTP messages.
The proper way to make caching happen with your desired behaviour is for
the server to present HTTP Cache-Control header saying the object is
cacheable (ie does not forbid caching), but not for more than 10seconds.
Cache-Control: max-age=10
OR to say that objects need revalidation, but presents a 304 status for
revalidation checks. (ie Cache-Control:no-cache) (yeah, thats right,
"no-cache" means do cache).
That said, I doubt you really are wanting to force that and would be
happy if the server was instructing the the proxy as being safe to cache
an object for several minutes or any value larger than 10sec.
So what we circle back to is that you are probably trying to force
things to cache and be used long past their actual safe-to-use lifetimes
as specified by the devs most authoritative on that subject (under
10sec?). As you should be aware, this is highly unsafe thing to be doing
unless you are one of those devs - be very careful what you choose to do.
Post by baretomas
Squid normally listens to port 3128
===================================
#http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB cert=/cygdrive/c/squid/etc/squid/correct.pem
key=/cygdrive/c/squid/etc/squid/ssl/myca.key
http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
#https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
Hmm. This is a Windows machine running Cygwin?
FYI: Performance is going to be terrible. It may not be super relevant
yet. Just be aware that Windows imposes limitations on usable sockets
per application - which is much smaller than a typical proxy requires.
The Cygwin people do a lot but they cannot solve some OS limitation
problems.
To meet your very first sentence "as many as possible" requirement you
will need a non-Windows machine to run the proxy on. That simple change
will get you something around 3 orders of magnitude higher peak client
capacity on the proxy.
Post by baretomas
Uncomment the line below to enable disk caching - path format is
================================================================
/cygdrive/<full path to cache folder>, i.e.
#cache_dir aufs /cygdrive/c/squid/var/cache/ 3000 16 256
certificate generation program
==============================
sslcrtd_program /cygdrive/c/squid/lib/squid/ssl_crtd -s
/cygdrive/c/squid/var/cache/squid_ssldb -M 4MB
Leave coredumps in the first cache dir
======================================
coredump_dir /var/cache/squid
Add any of your own refresh_pattern entries above these.
========================================================
#refresh_pattern ^ftp: 1440 20% 10080
#refresh_pattern ^gopher: 1440 0% 1440
#refresh_pattern -i (/cgi-bin/|?) 0 0% 0
#refresh_pattern -i (/cgi-bin/|?) 1440 100% 4320 ignore-no-store
override-lastmod override-expire ignore-must-revalidate ignore-reload
ignore-private ignore-auth
refresh_pattern . 1440 100% 4320 ignore-no-store override-lastmod
override-expire ignore-must-revalidate ignore-reload ignore-private
ignore-auth override-lastmod
- ignore-must-revalidate actively reduces caching. Because it disables
several of the widely used HTTP mechanisms that rely on revalidation to
allow things to be stored in a cache.
It is only beneficial if the server is broken; requiring revalidation
plus not supporting revalidation.
- ignore-auth same un-intuitive effects as ignoring revalidation, again
reducing caching ability.
This is only useful if you want to prevent caching of contents which
require any form of login to view. High security networks dealing with
classified or confidential materials find this useful - regular Internet
admin not so much.
- ignore-no-store is highly dangerous and rarely necessary. The "nuclear
option" for caching. It has the potential to eradicate user privacy and
scramble up any server personalized content (not in a good way).
This is a last resort intended only to copy with severely braindead
applications. YMMV whether you have to deal with any of those - just
treat this an absolute last resort rather than something to play with.
Overall - in order to use these refresh-pattern controls you need to
know what the HTTP(S) messages going through your proxy contain in terms
of caching headers AND what those messages are doing semantically /
content wise for the client application. Using any of them as a generic
"makes caching better" thing only leads to problems in todays HTTP protocol.
Post by baretomas
Bumped requests have relative URLs so Squid has to use reverse proxy
====================================================================
or accelerator code. By default, that code denies direct forwarding.
====================================================================
The need for this option may disappear in the future.
=====================================================
#always_direct allow all
dns_nameservers 8.8.8.8 208.67.222.222
Use of 8.8.8.8 is known to be explicitly detrimental to caching
intercepted traffic.
Those servers present different result sets based on the timing and IP
sending the query. The #1 requirement of caching intercepted (or
SSL-Bump'ed) content is that the client and proxy have the exact same
view of DNS system contents. Having the DNS reply contents change
between two consecutive and identical queries breaks that requirement.
Post by baretomas
max_filedescriptors 3200
Max Object Size Cache
=====================
maximum_object_size 10240 KB
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump bump all
This causes the proxy to attempt decryption of the traffic using crypto
algorithms based solely on the ClientHello details and its own
capabilities. There is zero server crypto capabilities known for the
proxy to use to ensure traffic can actually make it to the server.
You are rather lucky that it actually worked at all. Almost any
deviation (ie emergency security updates in future) at either client or
server or proxy endpoints risks breaking the communication through this
proxy.
Ideally there would be a stare action for step2 and them bump only at
step 3.
- ditch 8.8.8.8. Use a local DNS resolver within your own network,
shared by clients and proxy. That can use 8.8.8.8 itself, the important
part is that it should be responsible for caching DNS results and
ensuring the app clients and Squid see as much the same records as possible.
- try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for
message being decrypted in the proxy. Look at those headers to see why
they are not caching normally. Use that info to inform your next
actions. It cannot tell you how the message is used by the application,
hopefully you can figure that out somehow before forcing anything unnatural.
- if you can, try pasting some of the transaction URLs into the tool at
redbot.org to see if there are any HTTP level mistakes in the apps that
could be fixed for better cacheability.
Amos
Very much thanks for this very informative post to my question! I will spend some time understanding it, and try out the things you suggest!
Thanks again!
baretomas
2018-06-14 19:32:20 UTC
Permalink
Ok Im back. Still confused as ever. Look below for my story.
Post by Amos Jeffries
1. if you own the domain the apps are connecting to. Setup the proxy as
a normal TLS / HTTPS reverse-proxy.
2. if you have enough control of the apps to get them connecting with
TLS to the proxy and sending their requests there. Do that.
3. the (relatively) complicated SSL-Bump way you found. The proxy is
fully at the mercy of the the messages sent by apps and servers. Caching
is a luxury here, easily broken / prevented.
Well, there is a forth way with intercept. But that is a VERY last
resort and you already have (3) going and that is already better than
intercept. Getting to (1) or (2) would be simplest if you meet the "if
..." requirements for those.
1. Both the proxy and the apps are on the same machine on my home network.
The server they are calling is not mine and have no way of modifying its
behaviour. That rules out 1, if I understood correctly?

2. According to the java docs, the https_proxy (-Dhttps.proxyHost and
-Dhttps.proxyPort should redirect all ssl traffic to that destination.)
should cover this. And I already have done that.

So I seemed to have combined 2 and 3 here?

But I *only* need 2 you are saying.
Post by Amos Jeffries
The proper way to make caching happen with your desired behaviour is for
the server to present HTTP Cache-Control header saying the object is
cacheable (ie does not forbid caching), but not for more than 10seconds.
Cache-Control: max-age=10
OR to say that objects need revalidation, but presents a 304 status for
revalidation checks. (ie Cache-Control:no-cache) (yeah, thats right,
"no-cache" means do cache).
That said, I doubt you really are wanting to force that and would be
happy if the server was instructing the the proxy as being safe to cache
an object for several minutes or any value larger than 10sec.
So what we circle back to is that you are probably trying to force
things to cache and be used long past their actual safe-to-use lifetimes
as specified by the devs most authoritative on that subject (under
10sec?). As you should be aware, this is highly unsafe thing to be doing
unless you are one of those devs - be very careful what you choose to do.
I'm well aware of the issues this might pose, and yes: the server *is*
sending out cache-control. Look below for the header. My wish is to ignore
those headers and still cache the content.

To repeat, Im well aware of the issues this might pose, and is ready to run
side-by-side tests continuosly to make certain all the app behave like they
should, even if they are only getting cached content.

Ive analyzed the apps web calls for about a day of data, and figured out how
many and what type of calls it does, and I know the response content very
well, since I have written apps against that API myself.

What I am actually doing is writing a test bench for the application in
question by running many of them simultaneously against the same data sets,
with differing configurations to compare the result.

No other person is to touch this project, so Im fairly sure it wont affect
anyone but my own free time :)

It's not meant for production environment of any kind, and Im aware of the
potential failures of the project should anything out of my control happen.

But thanks for the warnings of course. You have no way of knowing what my
intentions with the project were when I first asked my questions, and
probably should have had done that from the start.

I hope that makes evrything a bit clearer? Anyway,

So...is it all possible to override the cache controls of the headers below?

server reply header:

HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Date: Wed, 13 Jun 2018 17:18:33 GMT
Server: nginx
Vary: Accept-Encoding
Strict-Transport-Security: max-age=31536000; includeSubdomains
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Content-Security-Policy: default-src 'self'
X-Content-Security-Policy: default-src 'self'
X-WebKit-CSP: default-src 'self'
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
X-Cache: Miss from cloudfront
Via: 1.1 21258ec71c1aa4499bcd08c6ad0eba38.cloudfront.net (CloudFront)
X-Amz-Cf-Id: gdqZScePve6zvtHqlFa8TmCmmh0rKGrwD2Gwx46PbUSqd94QiJhkPQ==
Post by Amos Jeffries
Post by baretomas
Squid normally listens to port 3128
===================================
#http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/correct.pem
Post by baretomas
key=/cygdrive/c/squid/etc/squid/ssl/myca.key
http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
#https_port 3129 cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem
Hmm. This is a Windows machine running Cygwin?
FYI: Performance is going to be terrible. It may not be super relevant
yet. Just be aware that Windows imposes limitations on usable sockets
per application - which is much smaller than a typical proxy requires.
The Cygwin people do a lot but they cannot solve some OS limitation
problems.
To meet your very first sentence "as many as possible" requirement you
will need a non-Windows machine to run the proxy on. That simple change
will get you something around 3 orders of magnitude higher peak client
capacity on the proxy.
The current setup is on my windows laptop. Im only doing PoCs on it before
doing it all over again on a debian box with docker, where the applications
are to run. (Im already looking forward to bug hunting the differences in
behaviour for the whole thing after moving it, btw!)
Sounds ok? If anyone have any suggestions on how I would run this even more
efficiently, I would very much like to know! It's a critical facet of the
proejhct to get as many of these applciations running at the same time.
Post by Amos Jeffries
Overall - in order to use these refresh-pattern controls you need to
know what the HTTP(S) messages going through your proxy contain in terms
of caching headers AND what those messages are doing semantically /
content wise for the client application. Using any of them as a generic
"makes caching better" thing only leads to problems in todays HTTP protocol.
I've pulled out the relevant bits of the headers:

Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0

I guess I went slightly overboard with my config. I tried to add one
directive after another to see if there was any change. I couldn't spot any.
Would this be the correct setup for my refresh_pattern to be able to cache
reponses with this header?

ignore-no-store ignore-must-revalidate override-expire ?

What about the Expires: 0? Is that covered by override-expire?
Post by Amos Jeffries
Post by baretomas
dns_nameservers 8.8.8.8 208.67.222.222
Use of 8.8.8.8 is known to be explicitly detrimental to caching
intercepted traffic.
Those servers present different result sets based on the timing and IP
sending the query. The #1 requirement of caching intercepted (or
SSL-Bump'ed) content is that the client and proxy have the exact same
view of DNS system contents. Having the DNS reply contents change
between two consecutive and identical queries breaks that requirement.
Thanks. Using my own dns now!
Post by Amos Jeffries
Post by baretomas
max_filedescriptors 3200
Max Object Size Cache
=====================
maximum_object_size 10240 KB
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump bump all
This causes the proxy to attempt decryption of the traffic using crypto
algorithms based solely on the ClientHello details and its own
capabilities. There is zero server crypto capabilities known for the
proxy to use to ensure traffic can actually make it to the server.
You are rather lucky that it actually worked at all. Almost any
deviation (ie emergency security updates in future) at either client or
server or proxy endpoints risks breaking the communication through this
proxy.
Ideally there would be a stare action for step2 and them bump only at
step 3.
- ditch 8.8.8.8. Use a local DNS resolver within your own network,
shared by clients and proxy. That can use 8.8.8.8 itself, the important
part is that it should be responsible for caching DNS results and
ensuring the app clients and Squid see as much the same records as possible.
- try "debug_options 11,2" to get a cache.log of the HTTP(S) headers for
message being decrypted in the proxy. Look at those headers to see why
they are not caching normally. Use that info to inform your next
actions. It cannot tell you how the message is used by the
application,
hopefully you can figure that out somehow before forcing anything unnatural.
- if you can, try pasting some of the transaction URLs into the tool at
redbot.org to see if there are any HTTP level mistakes in the apps that
could be fixed for better cacheability.
What Im still confused about all this, and it's probably because I dont
understand the squid system well enough, and that I have possibly read too
many suggestions on other forums that have pointed me in the wrong
direction.

What I have gathered from your posts and the others:

If I want to cache https/ssl I need to decrypt the requests and responses
using certificates.
Otherwise the application will only tunnel its requests to the destination
and the proxy wont know what's in the messages.

However, I *dont* have to bump and peek and do a fancy dance to get squid to
cache it.

I also see that Amos is suggesting a reverse proxy (accel mode) on the
proxy. So...

... I assume this is not needed?
http_port 3128 ssl-bump generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem

And I need something like this:

https_port 3128 accel generate-host-certificates=on
dynamic_cert_mem_cache_size=4MB
cert=/cygdrive/c/squid/etc/squid/proxyCAx.pem
key=/cygdrive/c/squid/etc/squid/proxyCA.pem

I tried this here now, and it seems to fail even more than before. No idea
why. I get this error message that seems to point me in the right direction,
but haven't found anything on google to give me an indication as to what it
actually points towards.

Squid_SSL_accept: Error negotiating SSL connection on FD 10:
error:00000005:lib(0):func(0):DH lib

Am I right in my musings so far? :)

Oh. Ive tried to use the browser as source of my tests. There is little
change in result to what the java apps are doing.

Any thoughts or ideas are wildly appreciated!







--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
Alex Rousskov
2018-06-14 21:25:10 UTC
Permalink
Post by baretomas
Post by Amos Jeffries
2. if you have enough control of the apps to get them connecting with
TLS to the proxy and sending their requests there. Do that.
You are not doing this if your Squid receives CONNECT requests. If you
can get your apps to do the right thing, then Squid would be receiving
GET requests (and such) with https:// URLs instead of CONNECT requests.
Post by baretomas
Post by Amos Jeffries
3. the (relatively) complicated SSL-Bump way you found. The proxy is
fully at the mercy of the the messages sent by apps and servers.
You are doing this right now. Some Java magic encrypts your app requests
and sends encrypted requests through Squid via CONNECT tunnels. You bump
those encrypted tunnels to get to the HTTP requests and cache responses.

Alex.
Post by baretomas
According to the java docs, the https_proxy (-Dhttps.proxyHost and
-Dhttps.proxyPort should redirect all ssl traffic to that destination.)
Loading...