Discussion:
[squid-users] Large text ACL lists
Darren
2016-09-29 09:44:28 UTC
Permalink
Hi All

I have been tinkering with Squidguard for a while, using it to manage ACL lists and time limits etc.

While it works OK, it's not in active development and has it's issues.

What are the limitations with just pumping ACL lists directly into Squid and letting it do all the work internally without running a team of squidguards?

how efficient is squid now at parsing the text files directly, will i Need more ram as the list grows? Is it slower or are their optimizations that I can do?

thanks all

Darren Breeze





Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird]
Antony Stone
2016-09-29 10:08:09 UTC
Permalink
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to manage ACL
lists and time limits etc.
While it works OK, it's not in active development and has its issues.
Have you considered https://www.urlfilterdb.com/products/ufdbguard.html ?

Their database is not free, but the filtering plugin is.
Post by Darren
What are the limitations with just pumping ACL lists directly into Squid
and letting it do all the work internally without running a team of
squidguards?
how efficient is squid now at parsing the text files directly, will i Need
more ram as the list grows? Is it slower or are their optimizations that I
can do?
Maybe someone else can answer this part of your question; I have no data on
this.


Antony.
--
This email was created using 100% recycled electrons.

Please reply to the list;
please *don't* CC me.
Benjamin E. Nichols
2016-09-29 17:42:49 UTC
Permalink
The other issue is that shalla and urlblacklist produce garbage
blacklists, and neither of them are actively developing or improving the
backend technology required to product high quality blacklists.

We are the leading publisher of blacklists tailored for Web Filtering
Purposes.

We are also the only commercial source for Squid Native ACL. Yes, we
have it.
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to manage
ACL lists and time limits etc.
While it works OK, it's not in active development and has it's issues.
What are the limitations with just pumping ACL lists directly into
Squid and letting it do all the work internally without running a team
of squidguards?
how efficient is squid now at parsing the text files directly, will i
Need more ram as the list grows? Is it slower or are their
optimizations that I can do?
thanks all
Darren Breeze
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
--
--

Signed,

Benjamin E. Nichols
http://www.squidblacklist.org

1-405-397-1360 - Call Anytime.
Darren
2016-09-29 21:29:12 UTC
Permalink
Hi

What I am trying to do is to simplify everything and remove the external re-writers from the workflow due to the fact that they are either old with sporadic development or wrap their own lists into the solution.

I am also producing my own ACL lists for this project so third party blacklists will not work for me. 

Squid has a lot more smarts and is very active in development so I think it would be a more complete robust solution if I can get a handle on how it behaves when parsing large ACL files.

My ACL's will be stored on a Ram based drive so speed there should not be an issue.

Looking at the config samples at squidblackist.org, you seem to pump massive ACL lists through the dstdomain acl so maybe that is anecdotal evidence that this will work OK.

Darren B.


Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
On 30/09/2016 1:43:33 AM, Benjamin E. Nichols <***@squidblacklist.org> wrote:
The other issue is that shalla and urlblacklist produce garbage blacklists, and neither of them are actively developing or improving the backend technology required to product high quality blacklists.
We are the leading publisher of blacklists tailored for Web Filtering Purposes.

We are also the only commercial source for Squid Native ACL. Yes, we have it.


On 9/29/2016 4:44 AM, Darren wrote:

Hi All

I have been tinkering with Squidguard for a while, using it to manage ACL lists and time limits etc.

While it works OK, it's not in active development and has it's issues.

What are the limitations with just pumping ACL lists directly into Squid and letting it do all the work internally without running a team of squidguards?

how efficient is squid now at parsing the text files directly, will i Need more ram as the list grows? Is it slower or are their optimizations that I can do?

thanks all

Darren Breeze





Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]


_______________________________________________ squid-users mailing list squid-***@lists.squid-cache.org [mailto:squid-***@lists.squid-cache.org] http://lists.squid-cache.org/listinfo/squid-users [http://lists.squid-cache.org/listinfo/squid-users]

-- -- Signed, Benjamin E. Nichols http://www.squidblacklist.org [http://www.squidblacklist.org] 1-405-397-1360 - Call Anytime.
Benjamin E. Nichols
2016-09-29 21:42:04 UTC
Permalink
Well, forgive me for bad mouthing the developers here, but I think this
is a good reason.

You see, you are going to have to eliminate all the redundant subdomains
in your blacklists, because they are going to crash modern versions of
squid. And to do this I would recommend using an older version of Squid
for your blacklist validation purposes, because, a few years ago, The
developers decided it was a good idea to stop throwing errors in the
logs when there is a duplicate entry in the blacklists, you know, the
way squid used to be, I have no idea who is smoking hashish over there
and making these idiotic decisions, because clearly, it would be better
to actually have something in your error log indicating where the
problem is, rather than just having squid shit on itself and have zero
indication of how or why it happened, but again, the hashish must be
cheap because the latest versions of squid will do just that, shit on
themselves and give you zero indication of why or where the problems in
your acl lists are.

And yes Squid Native ACL Blacklisting does work, but we are also, unlike
the competitors, actually removing dead domains daily, to minimize
wasteful bulk, in other words, we are actually doing our job rather than
just boasting about line counts with 50% dead domains that really should
be removed to make the list size the most efficient.

We also offer the lists in other various formats to ensure maximum
compatibility.
I would love to share thoughts with you regarding the matter.
Post by Darren
Hi
What I am trying to do is to simplify everything and remove the
external re-writers from the workflow due to the fact that they are
either old with sporadic development or wrap their own lists into the
solution.
I am also producing my own ACL lists for this project so third party
blacklists will not work for me.
Squid has a lot more smarts and is very active in development so I
think it would be a more complete robust solution if I can get a
handle on how it behaves when parsing large ACL files.
My ACL's will be stored on a Ram based drive so speed there should not be an issue.
Looking at the config samples at squidblackist.org, you seem to pump
massive ACL lists through the dstdomain acl so maybe that is anecdotal
evidence that this will work OK.
Darren B.
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
Post by Darren
On 30/09/2016 1:43:33 AM, Benjamin E. Nichols
The other issue is that shalla and urlblacklist produce garbage
blacklists, and neither of them are actively developing or improving
the backend technology required to product high quality blacklists.
We are the leading publisher of blacklists tailored for Web Filtering Purposes.
We are also the only commercial source for Squid Native ACL. Yes, we have it.
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to
manage ACL lists and time limits etc.
While it works OK, it's not in active development and has it's issues.
What are the limitations with just pumping ACL lists directly into
Squid and letting it do all the work internally without running a
team of squidguards?
how efficient is squid now at parsing the text files directly, will
i Need more ram as the list grows? Is it slower or are their
optimizations that I can do?
thanks all
Darren Breeze
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
--
--
Signed,
Benjamin E. Nichols
http://www.squidblacklist.org
1-405-397-1360 - Call Anytime.
--
--

Signed,

Benjamin E. Nichols
http://www.squidblacklist.org

1-405-397-1360 - Call Anytime.
Alex Rousskov
2016-09-29 22:48:37 UTC
Permalink
Post by Benjamin E. Nichols
Well, forgive me for bad mouthing the developers here, but I think this
is a good reason.
It is not. Badmouthing, for any reason, has no positive side effects and
may have many negative ones.
Post by Benjamin E. Nichols
it would be better
to actually have something in your error log indicating where the
problem is, rather than just having squid shit on itself and have zero
indication of how or why it happened
Please file a bug report with specifics if you have not already.
Post by Benjamin E. Nichols
but we are also, unlike the competitors, ...
we are actually doing our job rather than ...
Please do your best to avoid disparaging competitors on this mailing
list, regardless of whether you think your comments are 100% accurate.
Many of the "blacklisting" posts (not just yours) are already in the
"advertisement" red zone, and with the attacks added, the Project would
have to police the mailing list. That would not be a good use of
volunteers time!


Thank you,

Alex.
Post by Benjamin E. Nichols
Post by Darren
Hi
What I am trying to do is to simplify everything and remove the
external re-writers from the workflow due to the fact that they are
either old with sporadic development or wrap their own lists into the
solution.
I am also producing my own ACL lists for this project so third party
blacklists will not work for me.
Squid has a lot more smarts and is very active in development so I
think it would be a more complete robust solution if I can get a
handle on how it behaves when parsing large ACL files.
My ACL's will be stored on a Ram based drive so speed there should not be an issue.
Looking at the config samples at squidblackist.org, you seem to pump
massive ACL lists through the dstdomain acl so maybe that is anecdotal
evidence that this will work OK.
Darren B.
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
Post by Darren
On 30/09/2016 1:43:33 AM, Benjamin E. Nichols
The other issue is that shalla and urlblacklist produce garbage
blacklists, and neither of them are actively developing or improving
the backend technology required to product high quality blacklists.
We are the leading publisher of blacklists tailored for Web Filtering Purposes.
We are also the only commercial source for Squid Native ACL. Yes, we have it.
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to
manage ACL lists and time limits etc.
While it works OK, it's not in active development and has it's issues.
What are the limitations with just pumping ACL lists directly into
Squid and letting it do all the work internally without running a
team of squidguards?
how efficient is squid now at parsing the text files directly, will
i Need more ram as the list grows? Is it slower or are their
optimizations that I can do?
thanks all
Darren Breeze
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
--
--
Signed,
Benjamin E. Nichols
http://www.squidblacklist.org
1-405-397-1360 - Call Anytime.
--
--
Signed,
Benjamin E. Nichols
http://www.squidblacklist.org
1-405-397-1360 - Call Anytime.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Benjamin E. Nichols
2016-09-29 23:04:22 UTC
Permalink
Dear Mr Alex Rousskov.

Please kindly take your opinions, take them and shove them directly
up your bloated arrogant ass. I have little need to cater to you, or to
dignify your mindless criticism of my opinions, which only serve to
demonstrate that your ego is larger than you are sir.


Signed,

Benjamin E. Nichols

http://www.squidblacklist.org
Amos Jeffries
2016-09-30 03:04:36 UTC
Permalink
<snip>

That is more than enough please.

Some people on this list are competitors. There will necessarily be
private issues between people and/or organisations.

And that is exactly where those issues should stay. Private. It benefits
us all to interact politely on the list(s) no matter what is going on in
the background.

What gets written here is on permanent public record and might come back
to bite later in life.


If you have problems with how Squid code decisions are going, please
join in over at squid-dev. Or if you just want to join in. We could do
with more interested people.


Amos Jeffries
The Squid Software Foundation
Amos Jeffries
2016-09-30 02:41:57 UTC
Permalink
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to manage
ACL lists and time limits etc.
While it works OK, it's not in active development and has it's
issues.
What are the limitations with just pumping ACL lists directly into
Squid and letting it do all the work internally without running a
team of squidguards?
CPU mostly. The helpers will use Nx the RAM for N helpers, so Squid
technically uses less that way. But since Squid workers are internally
single-threaded the CPU time takes from the processing of things through
the lists does slow down the workers handling other transactions. There
is also the time on startup for loading the data into memory. With big
data lists both of those differences can be noticable.

There are some RAM differences purely due to the storage formats. We
have not particularly optimized Squid ACLs recently for large data sets.
Post by Darren
how efficient is squid now at parsing the text files directly, will i
Need more ram as the list grows? Is it slower or are their
optimizations that I can do?
You will. Regardless of whether you use a helper or Squid.

Optimizations center around reducing the list sizes, removing
duplication, overlaps and dead entries.

For regex ACLs compacting the patterns down helps a lot. Squid will do
that itself now but is not very smart about it, so manual optimizations
still can have big impact.


Amos
Darren
2016-09-30 05:58:32 UTC
Permalink
Thank you Amos

The resources I save not running multiple Squidguards will  make more ram available as you say and having a simpler setup is never a bad thing either.

Just to clarify, so when squid fires up, it caches the ACL file into ram  in it's entirety and then does some optimizations? If that is the case I would need to budget the ram to allow for this.

This sounds great and I get the bonus reverse DNS on dstdomain acls too, something Squidgard didn't do.

happy days

thanks

Darren B.





Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to manage
ACL lists and time limits etc.
While it works OK, it's not in active development and has it's
issues.
What are the limitations with just pumping ACL lists directly into
Squid and letting it do all the work internally without running a
team of squidguards?
CPU mostly. The helpers will use Nx the RAM for N helpers, so Squid
technically uses less that way. But since Squid workers are internally
single-threaded the CPU time takes from the processing of things through
the lists does slow down the workers handling other transactions. There
is also the time on startup for loading the data into memory. With big
data lists both of those differences can be noticable.

There are some RAM differences purely due to the storage formats. We
have not particularly optimized Squid ACLs recently for large data sets.
Post by Darren
how efficient is squid now at parsing the text files directly, will i
Need more ram as the list grows? Is it slower or are their
optimizations that I can do?
You will. Regardless of whether you use a helper or Squid.

Optimizations center around reducing the list sizes, removing
duplication, overlaps and dead entries.

For regex ACLs compacting the patterns down helps a lot. Squid will do
that itself now but is not very smart about it, so manual optimizations
still can have big impact.


Amos
Amos Jeffries
2016-09-30 06:59:07 UTC
Permalink
Post by Darren
Thank you Amos
The resources I save not running multiple Squidguards will make more
ram available as you say and having a simpler setup is never a bad
thing either.
Just to clarify, so when squid fires up, it caches the ACL file into
ram in it's entirety and then does some optimizations? If that is
the case I would need to budget the ram to allow for this.
Not quite. Squid still reads the files line by line into a memory
structure for whatever type of ACL is being loaded. That is part of why
its so much slowe to load than the helpers (which generally do as you
describe).

The optimizations are type dependent and fairly simplistic. Ignoring
duplicate entries, catenating regex into bigger " A|B " patterns (faster
to check against), etc.

Amos
Yuri Voinov
2016-09-30 11:21:32 UTC
Permalink
Amos, I'm afraid that this is not a solution. Block lists have become so
huge that only their compression and / or placement in an external
database (as Marcus) can save the situation.
Post by Amos Jeffries
Post by Darren
Thank you Amos
The resources I save not running multiple Squidguards will make more
ram available as you say and having a simpler setup is never a bad
thing either.
Just to clarify, so when squid fires up, it caches the ACL file into
ram in it's entirety and then does some optimizations? If that is
the case I would need to budget the ram to allow for this.
Not quite. Squid still reads the files line by line into a memory
structure for whatever type of ACL is being loaded. That is part of why
its so much slowe to load than the helpers (which generally do as you
describe).
The optimizations are type dependent and fairly simplistic. Ignoring
duplicate entries, catenating regex into bigger " A|B " patterns (faster
to check against), etc.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Darren
2016-09-30 22:05:04 UTC
Permalink
Hi

My main issue with squid guard is that when I try and block say www.facebook.com and the user goes to https://www.facebook.com, squidguard only sees the initial CONNECT as the target IP so doesn't match against the domain entry.

If squidguard did a reverse DNS lookup, I could keep using that more complex filtering solution. That is where the dstdomain acl is a better option but has the ram overhead.

Time for some experimentation

thanks again for the feedback




Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
On 30/09/2016 7:21:53 PM, Yuri Voinov <***@gmail.com> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Amos, I'm afraid that this is not a solution. Block lists have become so
huge that only their compression and / or placement in an external
database (as Marcus) can save the situation.
Post by Amos Jeffries
Post by Darren
Thank you Amos
The resources I save not running multiple Squidguards will make more
ram available as you say and having a simpler setup is never a bad
thing either.
Just to clarify, so when squid fires up, it caches the ACL file into
ram in it's entirety and then does some optimizations? If that is
the case I would need to budget the ram to allow for this.
Not quite. Squid still reads the files line by line into a memory
structure for whatever type of ACL is being loaded. That is part of why
its so much slowe to load than the helpers (which generally do as you
describe).
The optimizations are type dependent and fairly simplistic. Ignoring
duplicate entries, catenating regex into bigger " A|B " patterns (faster
to check against), etc.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Darren
2016-10-01 01:02:15 UTC
Permalink
One further question

If I have to reload the ACL lists do I restart squid or is there a way to update without impacting the users to much?

In some of the scenarios, some acl lists may change frequently

thanks again.



Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
On 1/10/2016 6:05:05 AM, Darren <***@gmail.com> wrote:
Hi

My main issue with squid guard is that when I try and block say www.facebook.com and the user goes to https://www.facebook.com, squidguard only sees the initial CONNECT as the target IP so doesn't match against the domain entry.

If squidguard did a reverse DNS lookup, I could keep using that more complex filtering solution. That is where the dstdomain acl is a better option but has the ram overhead.

Time for some experimentation

thanks again for the feedback




Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
On 30/09/2016 7:21:53 PM, Yuri Voinov <***@gmail.com> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Amos, I'm afraid that this is not a solution. Block lists have become so
huge that only their compression and / or placement in an external
database (as Marcus) can save the situation.
Post by Amos Jeffries
Post by Darren
Thank you Amos
The resources I save not running multiple Squidguards will make more
ram available as you say and having a simpler setup is never a bad
thing either.
Just to clarify, so when squid fires up, it caches the ACL file into
ram in it's entirety and then does some optimizations? If that is
the case I would need to budget the ram to allow for this.
Not quite. Squid still reads the files line by line into a memory
structure for whatever type of ACL is being loaded. That is part of why
its so much slowe to load than the helpers (which generally do as you
describe).
The optimizations are type dependent and fairly simplistic. Ignoring
duplicate entries, catenating regex into bigger " A|B " patterns (faster
to check against), etc.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Benjamin E. Nichols
2016-10-01 01:11:55 UTC
Permalink
I would recommend you stop squid and start it, simply doing a -k
reconfigure is a bad idea, because sometimes squid will not reload the
new blacklists, I have no idea why it is unpredictable in this manner or
if they have fixed this problem, I didnt write the software, but what I
do know, in my experience, is that the most reliable way to ensure the
lists actually get reloaded when using large acl domain lists in the
manner you are, is to stop squid3 and start , which is also kinda lame
because it takes longer, but its sure to work.

Anyway thats my two cents.
Post by Darren
One further question
If I have to reload the ACL lists do I restart squid or is there a way
to update without impacting the users to much?
In some of the scenarios, some acl lists may change frequently
thanks again.
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
Post by Darren
Hi
My main issue with squid guard is that when I try and block say
www.facebook.com and the user goes to https://www.facebook.com,
squidguard only sees the initial CONNECT as the target IP so doesn't
match against the domain entry.
If squidguard did a reverse DNS lookup, I could keep using that more
complex filtering solution. That is where the dstdomain acl is a
better option but has the ram overhead.
Time for some experimentation
thanks again for the feedback
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
Post by Darren
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Amos, I'm afraid that this is not a solution. Block lists have become so
huge that only their compression and / or placement in an external
database (as Marcus) can save the situation.
Post by Amos Jeffries
Post by Darren
Thank you Amos
The resources I save not running multiple Squidguards will make more
ram available as you say and having a simpler setup is never a bad
thing either.
Just to clarify, so when squid fires up, it caches the ACL file into
ram in it's entirety and then does some optimizations? If that is
the case I would need to budget the ram to allow for this.
Not quite. Squid still reads the files line by line into a memory
structure for whatever type of ACL is being loaded. That is part
of why
Post by Amos Jeffries
its so much slowe to load than the helpers (which generally do as you
describe).
The optimizations are type dependent and fairly simplistic. Ignoring
duplicate entries, catenating regex into bigger " A|B " patterns
(faster
Post by Amos Jeffries
to check against), etc.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAEBCAAGBQJX7kq8AAoJENNXIZxhPexGH+cH/jmZsQlcZgXpwt62pHDtHp4t
TWDnhr5KOfHv+GFeBUmJYuD2nn8wefb5KUUhea5fdpRAeDihFDQDPQDwAnaC/E5q
FzE68zh+nF13xVwTW9/5mQhK75G17mOGJPGFPn1ZUC3lf/Q2JCOhWB+0MFilXXcQ
/ptCeQII/E8oXaiBOvHPzasOp6eDnu/m51q0DnkfoUceEWap9W0rY/vKxwL32FI9
fjqoZGGBPt3FDczjb8/9X6trqeGBwUl4PKSTE4JSdyU6z52evaCSsVbEgAmw+LjI
ELCBPOuU7buFxNjCSNLVhDNQeZJFJxPV8Oh/OcDQZQDhdUYliEwRke5Sz+Rz37k=
=hFD2
-----END PGP SIGNATURE-----
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
--
--

Signed,

Benjamin E. Nichols
http://www.squidblacklist.org

1-405-397-1360 - Call Anytime.
Benjamin E. Nichols
2016-10-01 01:16:30 UTC
Permalink
Also if you are going to use Squid Native ACL blacklists and reload
while you are updating, its a good idea to have a parent proxy
configured, so that your traffic/users wont be interrupted, squid will
default to the next available proxy while its unavailable/reloading the
blacklists and forward traffic to it, otherwise your proxy will be down
during the reload process and your users will be without the ability to
surf.
Post by Darren
One further question
If I have to reload the ACL lists do I restart squid or is there a way
to update without impacting the users to much?
In some of the scenarios, some acl lists may change frequently
thanks again.
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
Post by Darren
Hi
My main issue with squid guard is that when I try and block say
www.facebook.com and the user goes to https://www.facebook.com,
squidguard only sees the initial CONNECT as the target IP so doesn't
match against the domain entry.
If squidguard did a reverse DNS lookup, I could keep using that more
complex filtering solution. That is where the dstdomain acl is a
better option but has the ram overhead.
Time for some experimentation
thanks again for the feedback
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
Post by Darren
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Amos, I'm afraid that this is not a solution. Block lists have become so
huge that only their compression and / or placement in an external
database (as Marcus) can save the situation.
Post by Amos Jeffries
Post by Darren
Thank you Amos
The resources I save not running multiple Squidguards will make more
ram available as you say and having a simpler setup is never a bad
thing either.
Just to clarify, so when squid fires up, it caches the ACL file into
ram in it's entirety and then does some optimizations? If that is
the case I would need to budget the ram to allow for this.
Not quite. Squid still reads the files line by line into a memory
structure for whatever type of ACL is being loaded. That is part
of why
Post by Amos Jeffries
its so much slowe to load than the helpers (which generally do as you
describe).
The optimizations are type dependent and fairly simplistic. Ignoring
duplicate entries, catenating regex into bigger " A|B " patterns
(faster
Post by Amos Jeffries
to check against), etc.
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAEBCAAGBQJX7kq8AAoJENNXIZxhPexGH+cH/jmZsQlcZgXpwt62pHDtHp4t
TWDnhr5KOfHv+GFeBUmJYuD2nn8wefb5KUUhea5fdpRAeDihFDQDPQDwAnaC/E5q
FzE68zh+nF13xVwTW9/5mQhK75G17mOGJPGFPn1ZUC3lf/Q2JCOhWB+0MFilXXcQ
/ptCeQII/E8oXaiBOvHPzasOp6eDnu/m51q0DnkfoUceEWap9W0rY/vKxwL32FI9
fjqoZGGBPt3FDczjb8/9X6trqeGBwUl4PKSTE4JSdyU6z52evaCSsVbEgAmw+LjI
ELCBPOuU7buFxNjCSNLVhDNQeZJFJxPV8Oh/OcDQZQDhdUYliEwRke5Sz+Rz37k=
=hFD2
-----END PGP SIGNATURE-----
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
--
--

Signed,

Benjamin E. Nichols
http://www.squidblacklist.org

1-405-397-1360 - Call Anytime.
Alex Rousskov
2016-10-02 00:38:18 UTC
Permalink
Post by Darren
If I have to reload the ACL lists do I restart squid or is there a way
to update without impacting the users to much?
You can reconfigure Squid instead of restarting it. Reconfiguration is
usually better than a complete restart as far as user impact is
concerned, but reconfiguration is currently still pretty disruptive for
users because Squid closes its listening ports while reconfiguring and
does a lot of useless work which slows reconfiguration down. Also, there
have been many cases where reconfiguration led to memory leaks and other
problems.

Seamless hot reconfiguration has been on many admin wish lists for a
long time, and ACL refreshing is a big part of that demand. We are
moving in that direction but the progress has been slow.

Alex.
Benjamin E. Nichols
2016-10-02 01:08:46 UTC
Permalink
I wouldnt advise reconfigure for when you update your blacklists, sure
it sounds great, but in reality, as I said, in my experience, only
sometimes will it actually reload the acl from disk, sometimes it wont.
Youll do a reconfigure and discover your squid is still running the old
acls which presumably are memory resident, now this may have been a bug
thats been since fixed, but Im not messing with it. In our testig
environment we dont have time to be dealing with squid deciding to load
an acl from disk when it feels like it should do so, we need it to load
from disk every time.

Once you do your own testing youll see what im talking about, go ahead
and add some urls to your acl and -k reconfigure, do this a few times,
and I am certain youll eventually find out what Im telling you is true.
Post by Alex Rousskov
Post by Darren
If I have to reload the ACL lists do I restart squid or is there a way
to update without impacting the users to much?
You can reconfigure Squid instead of restarting it. Reconfiguration is
usually better than a complete restart as far as user impact is
concerned, but reconfiguration is currently still pretty disruptive for
users because Squid closes its listening ports while reconfiguring and
does a lot of useless work which slows reconfiguration down. Also, there
have been many cases where reconfiguration led to memory leaks and other
problems.
Seamless hot reconfiguration has been on many admin wish lists for a
long time, and ACL refreshing is a big part of that demand. We are
moving in that direction but the progress has been slow.
Alex.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
--
--

Signed,

Benjamin E. Nichols
http://www.squidblacklist.org

1-405-397-1360 - Call Anytime.
Amos Jeffries
2016-10-02 04:15:30 UTC
Permalink
Post by Benjamin E. Nichols
I wouldnt advise reconfigure for when you update your blacklists, sure
it sounds great, but in reality, as I said, in my experience, only
sometimes will it actually reload the acl from disk, sometimes it wont.
Youll do a reconfigure and discover your squid is still running the old
acls which presumably are memory resident, now this may have been a bug
thats been since fixed, but Im not messing with it. In our testig
environment we dont have time to be dealing with squid deciding to load
an acl from disk when it feels like it should do so, we need it to load
from disk every time.
Once you do your own testing youll see what im talking about, go ahead
and add some urls to your acl and -k reconfigure, do this a few times,
So you are sening Squid a series of reload signals so fast it does not
have time to complete one before the next is arriving?

There are many fixes in Squid-4 and latest 3.5 for those situations. But
still some open bug reports about the behaviour there. Those are not
related to ACLs specifically. Any reconfig task that takes longer than
the time between -k reconfigure signals being sent will trigger issues.

... Ironically using a helper is one of the things which breaks. Squid
looses track of whether any given new helper being started was for the
current or previous -k reconfigure signal.

And of course connections and transactions which are already underway
are not affected by newly loaded config details.

Amos
Darren
2016-10-02 04:24:52 UTC
Permalink
Hi

I have now opened the Pandora box of writing my own helper as per Bobs suggestion. 

I am playing with the idea of pre-processing my acl lists and using memcached as a KV store. This way I should be able to update ACL members whilst keeping everything as available as possible.

I would update the acl members outside of squid so it should be fast, and if I get my tree model right, fast and scale well too.

I have had great success with Memcache on various large web applications so again, pending a clever tree algorithm, this could provide me with what I need without having to reload / restart squid.

Darren B.






Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
Post by Benjamin E. Nichols
I wouldnt advise reconfigure for when you update your blacklists, sure
it sounds great, but in reality, as I said, in my experience, only
sometimes will it actually reload the acl from disk, sometimes it wont.
Youll do a reconfigure and discover your squid is still running the old
acls which presumably are memory resident, now this may have been a bug
thats been since fixed, but Im not messing with it. In our testig
environment we dont have time to be dealing with squid deciding to load
an acl from disk when it feels like it should do so, we need it to load
from disk every time.
Once you do your own testing youll see what im talking about, go ahead
and add some urls to your acl and -k reconfigure, do this a few times,
So you are sening Squid a series of reload signals so fast it does not
have time to complete one before the next is arriving?

There are many fixes in Squid-4 and latest 3.5 for those situations. But
still some open bug reports about the behaviour there. Those are not
related to ACLs specifically. Any reconfig task that takes longer than
the time between -k reconfigure signals being sent will trigger issues.

... Ironically using a helper is one of the things which breaks. Squid
looses track of whether any given new helper being started was for the
current or previous -k reconfigure signal.

And of course connections and transactions which are already underway
are not affected by newly loaded config details.

Amos
Nishant Sharma
2016-10-02 06:16:32 UTC
Permalink
Hi,
Post by Darren
Hi
I have now opened the Pandora box of writing my own helper as per Bobs
suggestion. 
We are working on a redirector which we are currently using at around 100 geographically distributed squids. These squid are running on OpenWRT and PfSense embedded boxes like Mikrotik Routerboard, PCEngine Alix & APU.

The helper is written in Perl while server uses Postgresql, memcached and a deamon.

You may check it out at:

https://github.com/codemarauder/charcoal

http://charcoal.io

If you wish to do alpha testing, I would be more than happy to provide access to you on the hosted service.

Regards,
Nishant
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Bob Cochran
2016-10-03 17:25:51 UTC
Permalink
Post by Nishant Sharma
Hi,
Post by Darren
Hi
I have now opened the Pandora box of writing my own helper as per Bobs
suggestion.
We are working on a redirector which we are currently using at around 100 geographically distributed squids. These squid are running on OpenWRT and PfSense embedded boxes like Mikrotik Routerboard, PCEngine Alix & APU.
The helper is written in Perl while server uses Postgresql, memcached and a deamon.
https://github.com/codemarauder/charcoal
http://charcoal.io
It may be helpful at this point to remind everyone that there is a page
on the squid site that lists redirectors:
http://www.squid-cache.org/Misc/redirectors.html

Nishant, perhaps you should list Charcoal here.

I searched through the list for python-based redirectors. Two come up,
but the links seem to be stale / broken and probably should be removed:
iredir and pyredir.
Post by Nishant Sharma
If you wish to do alpha testing, I would be more than happy to provide access to you on the hosted service.
Regards,
Nishant
Darren
2016-10-03 21:58:46 UTC
Permalink
Hi Nishant

Thanks for the lead, I will have a look.

Redis is also interesting in this case due to its ability to scan keys and iterate through keys with a wildcard and cursors. Redis looks like it's just what I need as I need to swap in and out sets of sites on demand.

I have also been using Perl for over 20 years so my rewriter will be a child of Larry Wall.

Darren B.







Sent from Mailbird [http://www.getmailbird.com/?utm_source=Mailbird&amp;utm_medium=email&amp;utm_campaign=sent-from-mailbird]
On 2/10/2016 2:16:51 PM, Nishant Sharma <***@gmail.com> wrote:
Hi,
Post by Darren
Hi
I have now opened the Pandora box of writing my own helper as per Bobs suggestion. 
We are working on a redirector which we are currently using at around 100 geographically distributed squids. These squid are running on OpenWRT and PfSense embedded boxes like Mikrotik Routerboard, PCEngine Alix & APU.

The helper is written in Perl while server uses Postgresql, memcached and a deamon.

You may check it out at:

https://github.com/codemarauder/charcoal

http://charcoal.io

If you wish to do alpha testing, I would be more than happy to provide access to you on the hosted service.

Regards,
Nishant


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Bob Cochran
2016-10-01 01:45:01 UTC
Permalink
Darren,

Have you also considered writing your own redirector/rewriter in a
language like python? There seems to be a nice starting example in the
"Squid Book", which I was able to get working along with extending it.

Good luck,

Bob
Post by Darren
Hi All
I have been tinkering with Squidguard for a while, using it to manage
ACL lists and time limits etc.
While it works OK, it's not in active development and has it's issues.
What are the limitations with just pumping ACL lists directly into
Squid and letting it do all the work internally without running a team
of squidguards?
how efficient is squid now at parsing the text files directly, will i
Need more ram as the list grows? Is it slower or are their
optimizations that I can do?
thanks all
Darren Breeze
Sent from Mailbird
<http://www.getmailbird.com/?utm_source=Mailbird&utm_medium=email&utm_campaign=sent-from-mailbird>
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Continue reading on narkive:
Loading...