[squid-users] Centralized Squid - design and implementation

Discussion:

Kinkie

2014-11-16 16:22:43 UTC

Hello everyone,
first of all thanks to the community of squid for such a great job.

Hello Alberto,

[...]

1. I would like to leave the solution we are using now (wpad balancing). In
a situation like the one I have described, centralized squid serving the
spokes/branches, which is the best solution for clustering/HA? If one of the
centralized nodes had to "die" I would like client machines not to remain
"hanging" but to continue working on an active node without disruption. A
hierarchy of proxy would be the solution?

If you want to maximize the efficiency of your balancing solution, you
probably want a slightly different approach: instead of using the
client-ip as hashing mechanism, you want to hash on the destination
host.
e.g. have a pac-file like (untested, and to be adjusted):

function FindProxyForURL(url, host) {
var dest_ip = dnsResolve(host);
var dest_hash= dest_ip.slice(-1) % 2;
if (dest_hash)
return "PROXY local_proxy1:port; PROXY local_proxy2:port; DIRECT";
return "PROXY local_proxy2:port; PROXY local_proxy1:port; DIRECT"
}
This will balance by the final digit of the destination IP of the
service. The downside is that it requires DNS lookups by the clients,
and that if the primary local proxy fails, it takes a few seconds (up
to 30) for clients to give up and fail over to secondary.

local_proxies can then either go direct to the origin server (if
intranet) or use a balancing mechanism such as carp (see the
documentation for the cache_peer directive in squid) to maximize
efficiency, especially for Internet destinations.

The only single-point-of-failure at the HTTP level in this design is
the PACfile server, it'll be up to you to make that reliable.

2. Bearing in mind that all users will be AD authenticated, which url
filtering/blacklist solution do you suggest?
In the past I have worked a lot with squidguard and dansguardian but now
they don't seem to be the state of the art anymore.
2a. To use the native acl squid with the squidblacklist.org lists
(http://www.squidblacklist.org/)
2b. To use urlfilterdb (http://www.urlfilterdb.com/products/overview.html)

I don't know, sorry.

3. Which GNU/Linux distro do you suggest me? I was thinking about Debian
Jessie (just frozen) or CentOS7.

http://wiki.squid-cache.org/BestOsForSquid

--
Francesco

brendan kearney

2014-11-16 21:51:01 UTC

Permalink

Https is no issue. The ssl session will persist to the same proxy for the
duration of the session. I have no problems at all.

Ok, thank you very much. I think this is a good solution, maybe with an
active/passive HAProxy with keepalived.
Are you able to serve also https without any problem through HAProxy or
only http request?
regards,
a.

I use kerberos auth and do not have issues. You have to pay attention to
the details with kerberos auth (dns name and principals need to match,
specific options set in squid configs), but it is working very well for me

Hi Brendan

i use HAProxy to load balance based on the least number of connections

Do you use kerberos/AD authentication?
Any issues with HAPROXY in front of the squid nodes?
Thx,
a.

alberto

2014-11-17 09:08:06 UTC

Permalink

Let me start to say that I am biased since I am the author of ufdbGuard.
If you have worked with squidGuard than you will find that ufdbGuard is an
excellent replacement since ufdbGuard was forked in 2005 from squidGuard
and has since gained many features.

Hi Marcus,
thank you for your reply.
I know you (i'm an old lurker of the squid list :-)) and the urlfilterdb
project.
I am very interested in the project and I will give it a chance without any
doubt, starting from the trial license.
FYI, there are about 1000 users in total.
Thank you to everyone, i'll come back soon!:-)

a.

--
https://qa.debian.org/developer.php?login=***@email.it

Carlos Defoe

2014-11-17 11:39:15 UTC

Permalink

Use a load balancer. HAproxy will do the trick, if you don't want to
spend some money on a professional load balancer like F5 big-ip.

Don't drop the use of wpad. You can send the balancer name (eg.
proxy.your.domain) as a default for every client, and send the names
of the proxy nodes as a failover.

Post by alberto

Hi Marcus,
thank you for your reply.
I know you (i'm an old lurker of the squid list :-)) and the urlfilterdb
project.
I am very interested in the project and I will give it a chance without any
doubt, starting from the trial license.
FYI, there are about 1000 users in total.
Thank you to everyone, i'll come back soon!:-)
a.
--
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

Alexander Samad

2014-11-17 21:01:29 UTC

Permalink

Why haproxy instead of a pacemaker. I have 2 dmz boxes I setup in a
cluster. so I have 2 vips for the squid proxies. and dns setup to
round robin to the vip's.

I see sort of even distribution but I don't have a single point of
failure. if 1 node failes the vip moves over to the other node..

Post by Carlos Defoe
Use a load balancer. HAproxy will do the trick, if you don't want to
spend some money on a professional load balancer like F5 big-ip.
Don't drop the use of wpad. You can send the balancer name (eg.
proxy.your.domain) as a default for every client, and send the names
of the proxy nodes as a failover.

Post by alberto

Hi Marcus,
thank you for your reply.
I know you (i'm an old lurker of the squid list :-)) and the urlfilterdb
project.
I am very interested in the project and I will give it a chance without any
doubt, starting from the trial license.
FYI, there are about 1000 users in total.
Thank you to everyone, i'll come back soon!:-)
a.
--
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

Antony Stone

2014-11-17 21:17:12 UTC

Permalink

Post by Alexander Samad
Why haproxy instead of a pacemaker. I have 2 dmz boxes I setup in a
cluster. so I have 2 vips for the squid proxies. and dns setup to
round robin to the vip's.
I see sort of even distribution but I don't have a single point of
failure. if 1 node failes the vip moves over to the other node..

Pacemaker is a fairly "dumb" (no offence meant, see below) network-level
failover system, and if you do master-master failover, it can end up doing
load balancing for you.

However, it only knows about node availability, whereas HAproxy can monitor
many more things about your nodes, and also very easily expand to more than
two nodes, doing true load balancing based on node availability, node load,
node response times, number of connections to each node... it's a lot more
"intelligient" (maybe "aware" is a better term) than pacemaker.

The downside of HAproxy is that you need an HAproxy machine in addition to the
(Squid, in this case) nodes, and for real High Availability you should have
two HAproxy nodes running Pacemaker between them, to avoid the HAproxy itself
being a Single Point of Failure. It doesn't need to be a big machine, though.

However the benefits of being able to send new connections to the machine with
the lowest load, the fastest response, the fewest current connections, or
several other things, means it's a lot more flexible, not to mention expandable
if you decide to grow your Squid farm to 3, 4 or however many more servers.

Post by Alexander Samad

On Mon, Nov 17, 2014 at 3:04 AM, Marcus Kool

Let me start to say that I am biased since I am the author of
ufdbGuard. If you have worked with squidGuard than you will find that
ufdbGuard is an excellent replacement since ufdbGuard was forked in
2005 from squidGuard and has since gained many features.

Hi Marcus,
thank you for your reply.
I know you (i'm an old lurker of the squid list :-)) and the urlfilterdb
project.
I am very interested in the project and I will give it a chance without
any doubt, starting from the trial license.
FYI, there are about 1000 users in total.
Thank you to everyone, i'll come back soon!:-)

Regards,

Antony.

--
You can tell that the day just isn't going right when you find yourself using
the telephone before the toilet.

Please reply to the list;
please *don't* CC me.

Amos Jeffries

2014-11-17 21:36:32 UTC

Permalink

Post by Carlos Defoe
Use a load balancer. HAproxy will do the trick, if you don't want
to spend some money on a professional load balancer like F5
big-ip.

Or even, taddah ... Squid!

see cache_peer for the many load balancing algorithms available.

Post by Carlos Defoe
Don't drop the use of wpad. You can send the balancer name (eg.
proxy.your.domain) as a default for every client, and send the
names of the proxy nodes as a failover.

Carlos Defoe

2014-11-18 03:07:01 UTC

Permalink

I don't meant to use wpad as a load balancer. I would not do it, wpad
and pac are not designed for doing that, although it is (roughly)
possible to do it.

The load balancer device, if there is one, have one and only name, eg,
"proxy.your.domain". All the clients must point to that very same
name, and let the balancer do its job, that is, balancing between
nodes with a chosen algorithm, as a proxy itself.

With such a device in the scenario, wpad is only useful to send the
node names as a failover, for the (very very rare) case in which the
balancer is offline.

As for my scenario, I also use wpad to configure some exceptions, some
clients that will use a completely different proxy, etc...

But, in fact, when using wpad, you have a new point of failure, that
is, the webserver that is serving the pac (wpad.dat) file. As a fair
solution, every proxy node can have a light webserver serve the pac
file and let the DNS balance the requests for the name
"http://wpad.your.domain/wpad.dat".

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Carlos Defoe
Use a load balancer. HAproxy will do the trick, if you don't want
to spend some money on a professional load balancer like F5
big-ip.

Or even, taddah ... Squid!
see cache_peer for the many load balancing algorithms available.

Post by Carlos Defoe
Don't drop the use of wpad. You can send the balancer name (eg.
proxy.your.domain) as a default for every client, and send the
names of the proxy nodes as a failover.

But, but, WPAD *is* the first layer of load balancing.
Kinkie already posted the way to do clean loadbalancing with failover
between proxies. Without any need for a bottlneck at a balancer server
or IP.
Amos
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
iQEcBAEBAgAGBQJUampfAAoJELJo5wb/XPRjqmoH/1Uofs/AnILu8GePdCHBLu3R
7O6dXpjcI8GTLtjXw12YWVQG5tAW6SBu3S6iLiwLLIxfcHxok/bcn9n+AnD5aBxh
deALwoYavG+iG28uj6DA65eqg02mOgps0HNbLOIk5kZS6G50mKRIoXVIS1JaqESb
797L9VhDLdckgotn8XGEAXOsPT6ZptgSoHvs/6X1YOx3iV51criH8Nt4O1UsiSY9
R/YjhfPbtDuK5UG0lU8w1BN1NaJBH2ZQzWu318kUFkGQ6a1eXFIEVTUZG+7APVrd
KOYbgGv99HaTo13a77BYb2Yr5wviVjG41B5rF6Y3LpRCAvOJ9GTb5WDzKPrgeJo=
=/QNG
-----END PGP SIGNATURE-----
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

Jason Haar

2014-11-18 06:31:26 UTC

Permalink

Post by Carlos Defoe
As for my scenario, I also use wpad to configure some exceptions, some
clients that will use a completely different proxy, etc...

--
Cheers

Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

Carlos Defoe

2014-11-18 11:35:39 UTC

Permalink

Well, you just wrote a load balancer in PHP, with a load balancing
algorithm in it. It serves the same purpose as HAproxy (I don't really
use HAproxy, so I don't know, but I use the F5 big-ip which is
perfectly capable of testing Internet links behind squid). In you
scheme, WPAD is being used to tell the clients where the load balancer
(a webserver with a php script) is, and PAC probably as the answer
format, which returns a currently valid proxy node address directly to
the client. But as far as I know, once the client gets the PAC answer,
it willl not refresh until the browser is restarted, so it might be a
small problem there.

But it is a good solution, as proved by your decade of using it, and
much cheaper than a F5. As for the DNS trick, it is intended to
increase high availability of the web servers that are serving
wpad.dat (or your php script), because if it runs on only one
webserver, at some point no clients will find anything at all.

Well, there's a lot of ways of doing the same thing, including ucarp,
squid cache_peer as Amos said... It's just a matter of picking the one
that fits.

Post by Jason Haar

Post by Carlos Defoe
As for my scenario, I also use wpad to configure some exceptions, some
clients that will use a completely different proxy, etc...

Our "wpad.dat" is actually a PHP script which tests that the "official"
proxy (per client subnet) is actually working (with caching of the
results for performance reasons of course), if not it flicks them off to
another site's proxy server. Much better than trying to do dynamic DNS
tricks with a local HAproxy. ie if you have actually lost local Internet
access due to an ISP outage, HAproxy isn't going to help. But if WPAD
knows that a WAN-connected proxy is still working - why not point your
users at that instead
We've been doing this for 10+ years, 99% of the time it's never needed,
but when it's needed, it works :-)
--
Cheers
Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

Brendan Kearney

2014-11-18 12:39:37 UTC

Permalink

Post by Carlos Defoe
Well, you just wrote a load balancer in PHP, with a load balancing
algorithm in it. It serves the same purpose as HAproxy (I don't really
use HAproxy, so I don't know, but I use the F5 big-ip which is
perfectly capable of testing Internet links behind squid). In you
scheme, WPAD is being used to tell the clients where the load balancer
(a webserver with a php script) is, and PAC probably as the answer
format, which returns a currently valid proxy node address directly to
the client. But as far as I know, once the client gets the PAC answer,
it willl not refresh until the browser is restarted, so it might be a
small problem there.
But it is a good solution, as proved by your decade of using it, and
much cheaper than a F5. As for the DNS trick, it is intended to
increase high availability of the web servers that are serving
wpad.dat (or your php script), because if it runs on only one
webserver, at some point no clients will find anything at all.
Well, there's a lot of ways of doing the same thing, including ucarp,
squid cache_peer as Amos said... It's just a matter of picking the one
that fits.

Post by Jason Haar

Post by Carlos Defoe
As for my scenario, I also use wpad to configure some exceptions, some
clients that will use a completely different proxy, etc...

_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

web servers providing pac/wpad dont need to be a single point of
failure, given that multiple instances of web servers can be behind a
load balancer, just like squid. i have this arrangement, and get plenty
of reliability out of it. it scales well too.

i have setup my VIP for the proxies in such a way that if you hit port
8080 you get load balanced to the pool with all members in it. if you
hit the VIP on port 8081, you get load balanced to a pool with only the
first proxy in it, 8082 goes to the second proxy, etc. this allows me
to test each proxy individually, and because the VIP name is the same,
the same kerberos ticket satisfies the auth requests. at work, we have
F5s as well, and as a service check we attempt to GET some content we
host, and attempt to GET google or cnn. the check requires that at
least one of the GETs succeed, in order to mark the device up. i dont
have the external check in my HAProxy configs, but might have to look
into it.

as for my pac/wpad script, i have logic in it to send requests proxied
or unproxied, based on my design or security decisions. i have logic
for direct access domains, direct access hosts, direct access networks,
proxied domains (forces the use of the proxy, overriding any other
logic), proxied hosts (again, override logic), and hosts that are forced
via a specific proxy by sending the request to a specific port on the
VIP.

the bulk of my access will be proxied, and i return the VIP on port 8080
as the primary proxy, and then ports 8081, 8082, etc as secondary,
tertiary, and so on. that way the browser will always get all possible
avenues for access, should something be wrong with one or more of the
VIPs. what i am not sure of is if HAProxy will reply with a RST when no
pool member(s) is/are available for a given VIP/pool. we have this
setup at work on the F5s, and i'm not sure if i have it in HAProxy (or
if i can do it at all).

i would suggest that if you use a pac/wpad solution, you look into
pactester, which is a google summer of code project that executes pac
files and provides output indicating what actions would be returned to
the browser, given a URL. so, with my setup if i call pactester and
give it http://www.google.com, it returns to me:

PROXY proxy.bpk2.com:8080; PROXY proxy.bpk2.com:8081; PROXY
proxy.bpk2.com:8082

if i call pactester with http://www.bpk2.com, it returns to me:

DIRECT

with a bit of scripting and a couple of files with URLs in them, i can
quickly evaluate my proxy script, validate the logic and perform a
rudimentary syntax and punctuation check on any changes i make to the
script.

Jason Haar

2014-11-18 20:44:31 UTC

Permalink

Post by Brendan Kearney
i would suggest that if you use a pac/wpad solution, you look into
pactester, which is a google summer of code project that executes pac
files and provides output indicating what actions would be returned to
the browser, given a URL.

couldn't agree more. We have it built into our QA to run before we ever
roll out any change to our WPAD php script (a bug in there means
everyone loses Internet access - so we have to be careful).

Auto-generating a PAC script per client allows us to change behaviour
based on User-Agent, client IP, proxy and destination - and allows us to
control what web services should be DIRECT and what should be proxied.
There is no other way of achieving those outcomes.

Oh yes, and now that both Chrome and Firefox support proxies over HTTPS,
I'm starting to ponder putting up some form of proxy on the Internet for
our staff to use (authenticated of course!) - WPAD makes that something
we could implement with no client changes - pretty cool :-)

--
Cheers

Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

Kinkie

2014-11-19 11:17:51 UTC

Permalink

One word of caution: pactester uses the Firefox JavaScript engine, which is
more forgiving than MSIE's. So while it is a very useful tool, it may let
some errors slip through.

Post by Jason Haar

couldn't agree more. We have it built into our QA to run before we ever
roll out any change to our WPAD php script (a bug in there means
everyone loses Internet access - so we have to be careful).
Auto-generating a PAC script per client allows us to change behaviour
based on User-Agent, client IP, proxy and destination - and allows us to
control what web services should be DIRECT and what should be proxied.
There is no other way of achieving those outcomes.
Oh yes, and now that both Chrome and Firefox support proxies over HTTPS,
I'm starting to ponder putting up some form of proxy on the Internet for
our staff to use (authenticated of course!) - WPAD makes that something
we could implement with no client changes - pretty cool :-)
--
Cheers
Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

brendan kearney

2014-11-19 13:11:44 UTC

Permalink

Yes and it seems java is even more sensitive. I had an array member
defined on a line that was not terminated with a semicolon and browsers did
not throw errors, but java did. Pactester did not catch this. Missing
curly braces and I think quotes are caught.

Also of note, you have to set the content type header for a pac file or
else you run into weird issues. I found that browsers are forgiving and
will execute the script and take its output if the header is not set.
Flash does not do this. It might call for the script but does not use it
if the Content-Type header is not set to
"application/x-ns-proxy-autoconfig".

GoToMeeting has also pissed me off. The client parses the script and takes
any value found in it, before executing the script and taking the output of
the execution. This has the result of finding inappropriate proxies to use,
when you are in a corporate environment and have proxies dedicated to
client access or other functions that should not be leveraged in all
cases. I got their technical team on a call because we have a large citrix
install base (both products have the same parent company) and complained to
no avail. I had to write a doc on how to correct the client config for
anyone needing to use GoTo... products.

Post by Kinkie
One word of caution: pactester uses the Firefox JavaScript engine, which
is more forgiving than MSIE's. So while it is a very useful tool, it may
let some errors slip through.

Post by Jason Haar

couldn't agree more. We have it built into our QA to run before we ever
roll out any change to our WPAD php script (a bug in there means
everyone loses Internet access - so we have to be careful).
Auto-generating a PAC script per client allows us to change behaviour
based on User-Agent, client IP, proxy and destination - and allows us to
control what web services should be DIRECT and what should be proxied.
There is no other way of achieving those outcomes.
Oh yes, and now that both Chrome and Firefox support proxies over HTTPS,
I'm starting to ponder putting up some form of proxy on the Internet for
our staff to use (authenticated of course!) - WPAD makes that something
we could implement with no client changes - pretty cool :-)
--
Cheers
Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users

Nishant Sharma

2014-11-19 13:32:41 UTC

Permalink

Post by brendan kearney
Yes and it seems java is even more sensitive. I had an array member
defined on a line that was not terminated with a semicolon and browsers did
not throw errors, but java did. Pactester did not catch this. Missing
curly braces and I think quotes are caught.
Also of note, you have to set the content type header for a pac file or
else you run into weird issues. I found that browsers are forgiving and
will execute the script and take its output if the header is not set.
Flash does not do this. It might call for the script but does not use it
if the Content-Type header is not set to
"application/x-ns-proxy-autoconfig".
GoToMeeting has also pissed me off. The client parses the script and takes
any value found in it, before executing the script and taking the output of
the execution. This has the result of finding inappropriate proxies to use,
when you are in a corporate environment and have proxies dedicated to
client access or other functions that should not be leveraged in all
cases. I got their technical team on a call because we have a large citrix
install base (both products have the same parent company) and
complained to
no avail. I had to write a doc on how to correct the client config for
anyone needing to use GoTo... products.

Post by Kinkie
One word of caution: pactester uses the Firefox JavaScript engine,

which

Post by Kinkie
is more forgiving than MSIE's. So while it is a very useful tool, it

may

Post by Kinkie
let some errors slip through.

Post by Jason Haar

Post by Brendan Kearney
i would suggest that if you use a pac/wpad solution, you look into
pactester, which is a google summer of code project that executes

pac