Discussion:
[squid-users] How to create a simple whitelist using regexes?
RB
2018-10-15 05:04:39 UTC
Permalink
Hi everyone,

I'm trying to deny all urls except for only whitelisted regular
expressions. I have only this regular expression in my file
"squid_sites.txt"

^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*


My "squid.conf"


debug_options 28,7

###
### Global settings define
###

http_port 3128

###
### Authorization rules define
###

###
### Networks define
###

acl localnet src 10.5.0.0/1
acl localnet src 172.16.0.0/16
acl localnet src fc00::/7
acl localnet src fe80::/10

###
### Ports define
###

acl SSL_ports port 443 # https
acl SSL_ports port 22 # SSH
acl Safe_ports port 80 # http
acl Safe_ports port 443 # https
acl Safe_ports port 22 # SSH

acl purge method PURGE

acl CONNECT method CONNECT

acl bastion src 10.5.0.0/1
acl whitelist url_regex "/vagrant/squid_sites.txt"

###
### Rules define
###

http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports

http_access allow bastion whitelist
http_access deny bastion all

# http_access deny all

###
### Secondary global settings define
###


# icp_access allow localnet
# icp_access deny all
#
# htcp_access allow localnet
# htcp_access deny all

# Add any of your own refresh_pattern entries above these.
access_log /var/log/squid3/access.log squid
cache_log /var/log/squid3/cache.log squid
cache_store_log /var/log/squid3/store.log squid

refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern (Release|Package(.gz)*)$ 0 20% 2880

coredump_dir /var/spool/squid3
maximum_object_size 1024 MB
cache_mem 2048 MB


I tried enabling debugging and tailing /var/log/squid3/cache.log but my
curl statement keeps matching "all".

$ curl -sSL --proxy localhost:3128 -D - "
https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
Squid
X-Squid-Error: ERR_ACCESS_DENIED 0


Any ideas what I'm doing wrong?

Thank you.
Matus UHLAR - fantomas
2018-10-15 08:49:24 UTC
Permalink
Post by RB
I'm trying to deny all urls except for only whitelisted regular
expressions. I have only this regular expression in my file
"squid_sites.txt"
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
Post by RB
acl bastion src 10.5.0.0/1
acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
Post by RB
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports
http_access allow bastion whitelist
http_access deny bastion all
I tried enabling debugging and tailing /var/log/squid3/cache.log but my
curl statement keeps matching "all".
of course it matches all, everything should match "all".

I more wonder why doesn't it match "http_access allow localhost"
Post by RB
$ curl -sSL --proxy localhost:3128 -D - "
https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
Squid
X-Squid-Error: ERR_ACCESS_DENIED 0
Any ideas what I'm doing wrong?
have you reloaded squid config after changing it?
Did squid confirm it?
--
Matus UHLAR - fantomas, ***@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
RB
2018-10-15 15:11:31 UTC
Permalink
Hi Matus,

Thanks for responding so quickly. I uploaded my configurations here if that
is more helpful: https://bit.ly/2NF4zNb

The config that I previously shared is called squid_corp.conf. I also
noticed that if I don't use regular expressions and instead use domains, it
works correctly:

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org


Every time my squid.conf or my squid_sites.txt is modified, I restart the
squid service

sudo service squid3 restart


Then I use curl to test and now the url works.

$ curl -sSL --proxy localhost:3128 -D -
https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1
HTTP/1.1 200 Connection established

HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8


But this does not allow me to get more granular. I can only allow all
subdomains and paths for the domain squid-cache.org but I'm unable to only
allow the regular expressions if I put them inline or put them in
squid_sites.txt.

# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*


If I put them inline like I have above, when I restarted squid it says the
following

2018/10/15 14:54:48 kid1| strtokFile: .*squid-cache.org/SquidFaq/SquidAcl.*
not found


If I put the expressions in the squid_sites.txt the above "not found"
message isn't shown and this is the debug output in
/var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).

2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8
matched=1 async=0 finished=0
2018/10/15 15:05:45.083 kid1| Acl.cc(336) matches: ACLList::matches:
checking whitelist
2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches:
ACL::checklistMatches: checking 'whitelist'
2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match:
checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match:
looking for '(^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches:
ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is false


So it's failing the regular expression check. If I use grep to verify if
the regex works, it does.

$ echo https://wiki.squid-cache.org/SquidFaq/SquidAcl | grep "^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*"
https://wiki.squid-cache.org/SquidFaq/SquidAcl
Post by Matus UHLAR - fantomas
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?

Ah interesting. Are you saying that my https connections will always fail
unless I use ssl_bump to decrypt https to http connections? How would this
work correctly in production? Does squid proxy only block urls if it
detects http? How do you configure ssl_bump to work in this case? and is
that viable in production?
Post by Matus UHLAR - fantomas
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
have you reloaded squid config after changing it?
Did squid confirm it?
Would you have an example of one entire config file that would work to
whitelist an http/https url using a regular expression?

Best,
Post by Matus UHLAR - fantomas
Post by RB
I'm trying to deny all urls except for only whitelisted regular
expressions. I have only this regular expression in my file
"squid_sites.txt"
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
Post by RB
acl bastion src 10.5.0.0/1
acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
Post by RB
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports
http_access allow bastion whitelist
http_access deny bastion all
I tried enabling debugging and tailing /var/log/squid3/cache.log but my
curl statement keeps matching "all".
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
Post by RB
$ curl -sSL --proxy localhost:3128 -D - "
https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
Squid
X-Squid-Error: ERR_ACCESS_DENIED 0
Any ideas what I'm doing wrong?
have you reloaded squid config after changing it?
Did squid confirm it?
--
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
RB
2018-10-15 15:56:50 UTC
Permalink
I think I know what the issue is which can give us a clue to what is going
on.

2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match:
checking 'wiki.squid-cache.org:443'
2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match:
looking for '(^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
)'
2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches:
ACL::ChecklistMatches: result for 'whitelist' is 0

The above seems to be applying the regex to "wiki.squid-cache.org:443"
instead of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the
regex ".*squid-cache.org.*" to my list of regular expressions and now I see
this.

2018/10/15 15:16:03.641 kid1| RegexData.cc(71) match: aclRegexData::match:
checking 'wiki.squid-cache.org:443'
2018/10/15 15:16:03.641 kid1| RegexData.cc(82) match: aclRegexData::match:
looking for '(^https?://[^/]+/
wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)'
2018/10/15 15:16:03.641 kid1| RegexData.cc(93) match: aclRegexData::match:
match '(^https?://[^/]+/
wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)' found in '
wiki.squid-cache.org:443'
2018/10/15 15:16:03.641 kid1| Acl.cc(321) checklistMatches:
ACL::ChecklistMatches: result for 'whitelist' is 1


Any idea why url_regex wouldn't try to match the full url and instead only
matches on the subdomain, host domain, and port?

The Squid FAQ <https://wiki.squid-cache.org/SquidFaq/SquidAcl> says the
following:

*url_regex*: URL regular expression pattern matching
*urlpath_regex*: URL-path regular expression pattern matching, leaves out
the protocol and hostname


with this example given

acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$


This seems to be the case between 3.3.8 (default on ubuntu 14.04) and
3.5.12 (default on ubuntu 16.04).

Is there another configuration that forces url_regex to match the entire
url? or should I use a different acl type?

Best,
Post by RB
Hi Matus,
Thanks for responding so quickly. I uploaded my configurations here if
that is more helpful: https://bit.ly/2NF4zNb
The config that I previously shared is called squid_corp.conf. I also
noticed that if I don't use regular expressions and instead use domains, it
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org
Every time my squid.conf or my squid_sites.txt is modified, I restart the
squid service
sudo service squid3 restart
Then I use curl to test and now the url works.
$ curl -sSL --proxy localhost:3128 -D -
https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1
HTTP/1.1 200 Connection established
HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8
But this does not allow me to get more granular. I can only allow all
subdomains and paths for the domain squid-cache.org but I'm unable to
only allow the regular expressions if I put them inline or put them in
squid_sites.txt.
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*
If I put them inline like I have above, when I restarted squid it says the
following
2018/10/15 14:54:48 kid1| strtokFile: .*
squid-cache.org/SquidFaq/SquidAcl.* not found
If I put the expressions in the squid_sites.txt the above "not found"
message isn't shown and this is the debug output in
/var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).
2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8
matched=1 async=0 finished=0
checking whitelist
ACL::checklistMatches: checking 'whitelist'
checking 'wiki.squid-cache.org:443'
looking for '(^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
)'
ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is false
So it's failing the regular expression check. If I use grep to verify if
the regex works, it does.
$ echo https://wiki.squid-cache.org/SquidFaq/SquidAcl | grep "^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*"
https://wiki.squid-cache.org/SquidFaq/SquidAcl
Post by Matus UHLAR - fantomas
are you aware that you can only see CONNECT in https requests, unless
using
ssl_bump?
Ah interesting. Are you saying that my https connections will always fail
unless I use ssl_bump to decrypt https to http connections? How would this
work correctly in production? Does squid proxy only block urls if it
detects http? How do you configure ssl_bump to work in this case? and is
that viable in production?
Post by Matus UHLAR - fantomas
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
have you reloaded squid config after changing it?
Did squid confirm it?
Would you have an example of one entire config file that would work to
whitelist an http/https url using a regular expression?
Best,
Post by Matus UHLAR - fantomas
Post by RB
I'm trying to deny all urls except for only whitelisted regular
expressions. I have only this regular expression in my file
"squid_sites.txt"
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
Post by RB
acl bastion src 10.5.0.0/1
acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
Post by RB
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports
http_access allow bastion whitelist
http_access deny bastion all
I tried enabling debugging and tailing /var/log/squid3/cache.log but my
curl statement keeps matching "all".
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
Post by RB
$ curl -sSL --proxy localhost:3128 -D - "
https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
Squid
X-Squid-Error: ERR_ACCESS_DENIED 0
Any ideas what I'm doing wrong?
have you reloaded squid config after changing it?
Did squid confirm it?
--
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
RB
2018-10-15 16:48:53 UTC
Permalink
Hi again...

After some more research it looks like squid only has access to the url
domain if it's HTTPS and the only way to get the url path and query string
is to use ssl_bump to decrypt https so squid can see url path and query
arguments.

To use ssl_bump, I have to compile the code from source with --enable-ssl,
create a certificate, and add it to the chain of certs to every other vm
that proxies through squid, then squid can decrypt the https urls to see
paths and query args and finally apply the regex to those urls in order to
only allow explicit regex urls.

Is this correct?
Post by RB
I think I know what the issue is which can give us a clue to what is going
on.
checking 'wiki.squid-cache.org:443'
looking for '(^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
)'
ACL::ChecklistMatches: result for 'whitelist' is 0
The above seems to be applying the regex to "wiki.squid-cache.org:443"
instead of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added
the regex ".*squid-cache.org.*" to my list of regular expressions and now I
see this.
checking 'wiki.squid-cache.org:443'
looking for '(^https?://[^/]+/
wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
<http://wiki.squid-cache.org/SquidFaq/SquidAcl.*)%7C(squid-cache.org.*>)'
match '(^https?://[^/]+/
wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
<http://wiki.squid-cache.org/SquidFaq/SquidAcl.*)%7C(squid-cache.org.*>)'
found in 'wiki.squid-cache.org:443'
ACL::ChecklistMatches: result for 'whitelist' is 1
Any idea why url_regex wouldn't try to match the full url and instead only
matches on the subdomain, host domain, and port?
The Squid FAQ <https://wiki.squid-cache.org/SquidFaq/SquidAcl> says the
*url_regex*: URL regular expression pattern matching
*urlpath_regex*: URL-path regular expression pattern matching, leaves out
the protocol and hostname
with this example given
acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
This seems to be the case between 3.3.8 (default on ubuntu 14.04) and
3.5.12 (default on ubuntu 16.04).
Is there another configuration that forces url_regex to match the entire
url? or should I use a different acl type?
Best,
Post by RB
Hi Matus,
Thanks for responding so quickly. I uploaded my configurations here if
that is more helpful: https://bit.ly/2NF4zNb
The config that I previously shared is called squid_corp.conf. I also
noticed that if I don't use regular expressions and instead use domains, it
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org
Every time my squid.conf or my squid_sites.txt is modified, I restart the
squid service
sudo service squid3 restart
Then I use curl to test and now the url works.
$ curl -sSL --proxy localhost:3128 -D -
https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1
HTTP/1.1 200 Connection established
HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8
But this does not allow me to get more granular. I can only allow all
subdomains and paths for the domain squid-cache.org but I'm unable to
only allow the regular expressions if I put them inline or put them in
squid_sites.txt.
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*
If I put them inline like I have above, when I restarted squid it says
the following
2018/10/15 14:54:48 kid1| strtokFile: .*
squid-cache.org/SquidFaq/SquidAcl.* not found
If I put the expressions in the squid_sites.txt the above "not found"
message isn't shown and this is the debug output in
/var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).
2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8
matched=1 async=0 finished=0
checking whitelist
ACL::checklistMatches: checking 'whitelist'
aclRegexData::match: checking 'wiki.squid-cache.org:443'
aclRegexData::match: looking for '(^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
)'
ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is false
So it's failing the regular expression check. If I use grep to verify if
the regex works, it does.
$ echo https://wiki.squid-cache.org/SquidFaq/SquidAcl | grep "^
https://wiki.squid-cache.org/SquidFaq/SquidAcl.*"
https://wiki.squid-cache.org/SquidFaq/SquidAcl
Post by Matus UHLAR - fantomas
are you aware that you can only see CONNECT in https requests, unless
using
ssl_bump?
Ah interesting. Are you saying that my https connections will always fail
unless I use ssl_bump to decrypt https to http connections? How would this
work correctly in production? Does squid proxy only block urls if it
detects http? How do you configure ssl_bump to work in this case? and is
that viable in production?
Post by Matus UHLAR - fantomas
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
have you reloaded squid config after changing it?
Did squid confirm it?
Would you have an example of one entire config file that would work to
whitelist an http/https url using a regular expression?
Best,
Post by Matus UHLAR - fantomas
Post by RB
I'm trying to deny all urls except for only whitelisted regular
expressions. I have only this regular expression in my file
"squid_sites.txt"
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
Post by RB
acl bastion src 10.5.0.0/1
acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
Post by RB
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports
http_access allow bastion whitelist
http_access deny bastion all
I tried enabling debugging and tailing /var/log/squid3/cache.log but my
curl statement keeps matching "all".
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
Post by RB
$ curl -sSL --proxy localhost:3128 -D - "
https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 |
grep
Post by RB
Squid
X-Squid-Error: ERR_ACCESS_DENIED 0
Any ideas what I'm doing wrong?
have you reloaded squid config after changing it?
Did squid confirm it?
--
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Alex Rousskov
2018-10-15 17:08:05 UTC
Permalink
Post by RB
After some more research it looks like squid only has access to the url
domain if it's HTTPS and the only way to get the url path and query
string is to use ssl_bump to decrypt https so squid can see url path and
query arguments.
Replace "url domain" with "service name". In many cases, they are about
the same today, but there is a trend for SNI values to migrate from
identifying specific sites (e.g., foo.example.com) to identifying broad
services (e.g., everything.example.com), making SNIs increasingly imprecise.

Please note that you cannot bump sites that pin their certificates or
use other measures that prevent bumping. Long-term, most sites will
probably fall into that category by switching to TLS v1.3 and hiding
their true names behind essentially fake/generic SNIs.
Post by RB
To use ssl_bump, I have to compile the code from source with
--enable-ssl, create a certificate, and add it to the chain of certs to
every other vm that proxies through squid, then squid can decrypt the
https urls to see paths and query args and finally apply the regex to
those urls in order to only allow explicit regex urls.
Is this correct?
Replace "add it to the chain of certs" with "add it to the set of
trusted CA certificates". CA certificates are not chained... And, yes,
every client (every "vm" in your case?) that proxies through Squid would
have to trust your CA certificate.

The above sounds correct (and will be painful) if your clients cannot
send unencrypted requests for https:... URLs to Squid. On the other
hand, if your clients can send unencrypted requests for https:... URLs
to Squid, then no bumping is necessary at all. Please note that those
unencrypted requests may be inside an encrypted TLS connection -- they
are not necessarily insecure or unsafe. Unfortunately, popular browsers
do _not_ support sending unencrypted requests for https:... URLs to proxies.


HTH,

Alex.
Post by RB
I think I know what the issue is which can give us a clue to what is
going on.
aclRegexData::match: checking 'wiki.squid-cache.org:443
<http://wiki.squid-cache.org:443/>'
aclRegexData::match: looking for
'(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
<https://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org/SquidFaq/SquidAcl.*>)'
ACL::ChecklistMatches: result for 'whitelist' is 0
The above seems to be applying the regex to
"wiki.squid-cache.org:443 <http://wiki.squid-cache.org:443>" instead
of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the
regex ".*squid-cache.org.*" to my list of regular expressions and
now I see this.
aclRegexData::match: checking 'wiki.squid-cache.org:443
<http://wiki.squid-cache.org:443>'
aclRegexData::match: looking for
'(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
<http://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org.*>)'
aclRegexData::match: match
'(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*
<http://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org.*>)'
found in 'wiki.squid-cache.org:443
<http://wiki.squid-cache.org:443>'
ACL::ChecklistMatches: result for 'whitelist' is 1
Any idea why url_regex wouldn't try to match the full url and
instead only matches on the subdomain, host domain, and port? 
The Squid FAQ <https://wiki.squid-cache.org/SquidFaq/SquidAcl> says
*url_regex*: URL regular expression pattern matching
*urlpath_regex*: URL-path regular expression pattern matching,
leaves out the protocol and hostname
with this example given
acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$
This seems to be the case between 3.3.8 (default on ubuntu 14.04)
and 3.5.12 (default on ubuntu 16.04).
Is there another configuration that forces url_regex to match the
entire url? or should I use a different acl type?
Best,
Hi Matus,
Thanks for responding so quickly. I uploaded my configurations
here if that is more helpful: https://bit.ly/2NF4zNb
The config that I previously shared is called squid_corp.conf. I
also noticed that if I don't use regular expressions and instead
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org
<http://squid-cache.org>
Every time my squid.conf or my squid_sites.txt is modified, I
restart the squid service
sudo service squid3 restart
Then I use curl to test and now the url works. 
$ curl -sSL --proxy localhost:3128 -D -
https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
HTTP/1.1 200 Connection established
HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8
But this does not allow me to get more granular. I can only
allow all subdomains and paths for the domain squid-cache.org
<http://squid-cache.org> but I'm unable to only allow the
regular expressions if I put them inline or put them in
squid_sites.txt.
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
acl whitelist url_regex
.*squid-cache.org/SquidFaq/SquidAcl.*
<http://squid-cache.org/SquidFaq/SquidAcl.*>
If I put them inline like I have above, when I restarted squid
it says the following
.*squid-cache.org/SquidFaq/SquidAcl.*
<http://squid-cache.org/SquidFaq/SquidAcl.*> not found
If I put the expressions in the squid_sites.txt the above "not
found" message isn't shown and this is the debug output in
/var/log/squid3/cache.log (full
output https://pastebin.com/NVwRxVmQ).
0x7fb0068da2b8 matched=1 async=0 finished=0
ACLList::matches: checking whitelist
ACL::checklistMatches: checking 'whitelist'
aclRegexData::match: checking 'wiki.squid-cache.org:443
<http://wiki.squid-cache.org:443>'
aclRegexData::match: looking for
'(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*
<https://wiki.squid-cache.org/SquidFaq/SquidAcl.*%29%7C%28squid-cache.org/SquidFaq/SquidAcl.*>)'
ACL::ChecklistMatches: result for 'whitelist' is 0
2018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist
mismatched.
2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist
result is false
So it's failing the regular expression check. If I use grep to
verify if the regex works, it does.
$ echo https://wiki.squid-cache.org/SquidFaq/SquidAcl | grep
"^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*"
https://wiki.squid-cache.org/SquidFaq/SquidAcl
Post by Matus UHLAR - fantomas
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
Ah interesting. Are you saying that my https connections will
always fail unless I use ssl_bump to decrypt https to http
connections? How would this work correctly in production? Does
squid proxy only block urls if it detects http? How do you
configure ssl_bump to work in this case? and is that viable in
production?
Post by Matus UHLAR - fantomas
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
 have you reloaded squid config after changing it?
Did squid confirm it?
Would you have an example of one entire config file that would
work to whitelist an http/https url using a regular expression?
Best,
On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas
Post by Matus UHLAR - fantomas
I'm trying to deny all urls except for only whitelisted regular
expressions. I have only this regular expression in my file
"squid_sites.txt"
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
are you aware that you can only see CONNECT in https
requests, unless using
ssl_bump?
Post by Matus UHLAR - fantomas
acl bastion src 10.5.0.0/1 <http://10.5.0.0/1>
acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
Post by Matus UHLAR - fantomas
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow purge localhost
http_access deny purge
http_access deny CONNECT !SSL_ports
http_access allow bastion whitelist
http_access deny bastion all
I tried enabling debugging and tailing
/var/log/squid3/cache.log but my
Post by Matus UHLAR - fantomas
curl statement keeps matching "all".
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
Post by Matus UHLAR - fantomas
$ curl -sSL --proxy localhost:3128 -D - "
https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o
/dev/null 2>&1 | grep
Post by Matus UHLAR - fantomas
Squid
X-Squid-Error: ERR_ACCESS_DENIED 0
Any ideas what I'm doing wrong?
have you reloaded squid config after changing it?
Did squid confirm it?
--
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek
reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Matus UHLAR - fantomas
2018-10-15 17:25:38 UTC
Permalink
Post by RB
After some more research it looks like squid only has access to the url
domain if it's HTTPS and the only way to get the url path and query string
is to use ssl_bump to decrypt https so squid can see url path and query
arguments.
this is what I wrote before. Looking at it now, I should have explained more
deeply....
Post by RB
Post by Matus UHLAR - fantomas
are you aware that you can only see CONNECT in https requests, unless
using ssl_bump?
To use ssl_bump, I have to compile the code from source with --enable-ssl,
create a certificate, and add it to the chain of certs to every other vm
that proxies through squid, then squid can decrypt the https urls to see
paths and query args and finally apply the regex to those urls in order to
only allow explicit regex urls.
Is this correct?
Alex has explained already.

I would like to note that the whole purpose of SSL encription in HTTPS is to
deny anyone between client and server to see what is the client accessing.
That includes your proxy.

And we often see complaints about SSL bump not working because different
clients expect certificates signed by their certificate autorities, not by
yours.
--
Matus UHLAR - fantomas, ***@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody
Amos Jeffries
2018-10-17 02:46:04 UTC
Permalink
In addition to what Matus and Alex have already said about your problem,
you do not appear to understand regex patterns properly.
Post by RB
Hi Matus,
Thanks for responding so quickly. I uploaded my configurations here if
that is more helpful: https://bit.ly/2NF4zNb
The config that I previously shared is called squid_corp.conf. I also
noticed that if I don't use regular expressions and instead use domains,
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex .squid-cache.org
This is still a regex. The ACL type is "url_regex" which makes the
string a regex - no matter what it looks like to your human eyes. To
Squid it is a regex.

It will match things like http://example.com/sZsquid-cacheXORG just
easily as any sub-domain of squid-cache.org. For example any traffic
injecting our squid-cache.org domain into their path or query-string.
Post by RB
Every time my squid.conf or my squid_sites.txt is modified, I restart
the squid service
sudo service squid3 restart
If Squid does not accept the config file it will not necessarily restart.

You should always run "squid -k parse" or "squid3 -k parse" to check the
config before attempting a restart.


The old Debian sysV init scripts had some protections that would protect
you from problems. But the newer systemd "service" systems are not able
to do that in a nice way. The habit is a good one to get into anyway.
Post by RB
Then I use curl to test and now the url works. 
$ curl -sSL --proxy localhost:3128 -D -
https://wiki.squid-cache.org/SquidFaq/SquidAcl-o /dev/null 2>&1
HTTP/1.1 200 Connection established
HTTP/1.1 200 OK
Date: Mon, 15 Oct 2018 14:47:33 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: Cookie,User-Agent,Accept-Encoding
Content-Length: 101912
Cache-Control: max-age=3600
Expires: Mon, 15 Oct 2018 15:47:33 GMT
Content-Type: text/html; charset=utf-8
But this does not allow me to get more granular. I can only allow all
subdomains and paths for the domain squid-cache.org
<http://squid-cache.org> but I'm unable to only allow the regular
expressions if I put them inline or put them in squid_sites.txt.
# acl whitelist url_regex "/vagrant/squid_sites.txt"
acl whitelist url_regex
^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*
Any regex pattern that lacks the beginning (^) and ending ($) anchor
symbols is always a match against *anywhere* in the input string.

So starting it with an optional prefix (.* or .?) or ending it with an
optional suffix (.* or .?) is pointless and confusing.


Notice how the pattern Squid is actually using lacks these prefix/suffix
Post by RB
aclRegexData::match: looking for
'(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'
Post by Matus UHLAR - fantomas
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
Ah interesting. Are you saying that my https connections will always fail
They will always fail to match your current regex, because your current
regex contain characters which are only ever existing in path portions
of URLs (note the *L*). Never in a CONNECT message URI (note the *I*)
which never contains any path portion.
Post by RB
unless I use ssl_bump to decrypt https to http connections? How
would this work correctly in production? Does squid proxy only block
urls if it detects http? How do you configure ssl_bump to work in this
case? and is that viable in production?
SSL-Bump is to take the CONNECT tunnel data/payload portion and
_attempt_ decrypt any TLS inside. *If* the tunnel contains HTTPS traffic
(not guaranteed) that is where the full https:// ... URLs are found.

Matus and Alex have already mentioned the issues with that so I wont
cover it again.

Amos

Loading...