Discussion:
[squid-users] Performance issue /cache_dir / cache_mem
pacolo
2018-11-08 15:46:24 UTC
Permalink
Hello,

I am having performance issues with a deployment of a farm of 5 servers
(CentOS Linux release 7.5.1804) with Squid 3.5.20-12.el7, that are used for
the internet access of a scholar community.
This is around 7 Gbps at peak hour, including 60% of HTTPS not processed at
the moment by Squid (we will try to intercept HTTPS and processed it in the
near future).

I have noticed several error events in /var/log/audit/audit.log
type=ANOM_ABEND msg=audit(30/10/18 10:30:54.557:18355) : auid=unset
uid=squid gid=squid ses=unset pid=567 comm=squid reason="memory violation"
sig=SIGABRT

That corresponded with another events in /var/log/squid/cache.log
2018/10/30 10:26:15 kid1| assertion failed: filemap.cc:50: "capacity_ <= (1
<< 24)"
2018/10/30 10:26:19 kid1| Set Current Directory to /cache
2018/10/30 10:26:19 kid1| Starting Squid Cache version 3.5.20 for
x86_64-redhat-linux-gnu...
2018/10/30 10:26:19 kid1| Service Name: squid
2018/10/30 10:26:19 kid1| Process ID 567


There were thousands of squid's restarts per day, which appear to be the
main problem.
I have noticed that this problem could be related to the maximum value of
our cache_dir size, according to...
https://bugs.squid-cache.org/show_bug.cgi?id=3566

I have been looking for relevant information regarding the cache_dir max
sizes, but all posts seem a little bit old, for example...

http://squid-web-proxy-cache.1019090.n4.nabble.com/size-of-cache-dir-td1033280.html

http://squid-web-proxy-cache.1019090.n4.nabble.com/cache-dir-size-td1033774.html


This is deployed in a virtual environment with an storage platform of
different rpm disks, resources aren't the problem, more could be added if it
is needed.
Each server has 4 CPU, 8 GB RAM, and LVM with an OS disk of 30 GB and a
Cache disk of 8 TB.
What we need is to deploy 8 TB per server, or as many as it is possible and
we could deploy another virtual server to reach to 40 TB total.

I have noticed that the first approach could be wrong, as we only referenced
one cache_dir with the 8000000 MB in the cache_dir...
cache_dir aufs /cache 800000 16 256

Then, the following error was returned ((squid-1): xcalloc: Unable to
allocate 18446744073566526858 blocks of 1 bytes!), until the noticed the
maximum value accepted...
These are the sentences related to the mem and disk options.

memory_replacement_policy heap GDSF
cache_mem 1024 MB
maximum_object_size_in_memory 10 MB

cache_replacement_policy heap LFUDA
cache_dir aufs /cache 5242880 16 256
maximum_object_size 1024 MB
cache_swap_low 90
cache_swap_high 95

We noticed the errors commented as service's degradation was reported by the
customer.
By the way, with this configuration, only 2 TB aprox. was cached in each
server.

I suppose more RAM would be needed, as according to the rule "14 MB of
memory per 1 GB on disk for 64-bit Squid".
But I would need some clarifications with this, I suppose that 14 MB of
memory needed is referencing our total RAM, and 1 GB on disk is referencing
1 GB in the cache_dir (as our 8 TB are not detected, only 5TB).
So taking into account we need to deploy an 40 TB cache in total, for
example in 5 servers, with 8 TB per server, it will be needed at 112 GB of
RAM per server at least.
Am I right?


Please, could somebody point me in the right direction?
I have noticed about https://wiki.squid-cache.org/Features/SmpScale, but
before testing that I would like to know if there is any maximum value for
cache_dir.

Thanks!
Paco.




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
Alex Rousskov
2018-11-08 17:42:31 UTC
Permalink
assertion failed: filemap.cc:50: "capacity_ <= (1 << 24)"
I have noticed that this problem could be related to the maximum value of
our cache_dir size, according to...
https://bugs.squid-cache.org/show_bug.cgi?id=3566
As that bug discussion attempts to clarify, it is not the cache_dir size
as such, but the number of objects in that cache_dir. The latter is
limited by 16777216 objects (a hard-coded limit).
What we need is to deploy 8 TB per server,
Divide 8 TB by your average disk-cached object size, then divide by
16777216 to find out the minimum number of cache_dirs per server.

For example, with 13KB average disk-cached object size, caching 8 TB on
disk requires 40 cache_dirs (8 * 1024 * 1024 * 1024 / 13 / 16777216).

See also: store_avg_object_size
cache_dir aufs /cache 5242880 16 256
With default store_avg_object_size of 13KB, the above yields 412977624
objects, which is x24 more than the 16777216 limit.

I do not remember if Squid has a hard-coded maximum for the number of
cache_dirs. However, even if Squid does not, please note that several
algorithms iterate through all cache_dirs. Each iteration is relatively
fast (e.g., a complex hash lookup), but having lots of cache_dirs may
slow your Squid down because of these linear searches.

Alex.
P.S. If you move to SMP Squid, please note that you should not continue
to use aufs cache_dirs.
pacolo
2018-11-22 12:52:48 UTC
Permalink
Hello people,

@Alex, thanks for your help, we took your advise and change the deployment
to SMP, some issues were solved, but others appeared :-(.

The config files are attached, in case anybody could help or to help others
with our same issues, as we have been searching several days and we didn't
found any solution or a recent configuration guide.

df_-hT.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/df_-hT.txt>
LVM_info.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/LVM_info.txt>
fdisk_-l.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/fdisk_-l.txt>
fstab.fstab
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/fstab.fstab>
dev_shm_permissions.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/dev_shm_permissions.txt>
cat_proc_cpuinfo.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/cat_proc_cpuinfo.txt>
systecl_-a.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/systecl_-a.log>
99-sysctl.conf
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/99-sysctl.conf>
squid.conf
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/squid.conf>
frontend.conf
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/frontend.conf>
backend.conf
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend.conf>


1. VMWare - Hardware changes
Changed the CPU's servers from 2 CPU with 2 sockets (2*2) to 5 CPU with 1
socket each (5*1).
Changed from 8 GB to 32 GB RAM
LVM with an OS disk of 30 GB, and changed the 8TB to several disks of 2 TB,
as suggested, fewer smaller disks better than a big disk.


2. Server - LVM config disks
Attached are the LVM info, df -hT, fstab, fdisk -l.
4 disks of 2 TB.
Changed the file system from initial xfs to ext4.
We checked the Reiserfs suggestion, we tried to change it, but found
several problems (discontinuity, system not compatible), so we decided to
remain in ext4.
Mounted ext4 with noatime.


3. Server - CPU affinity
Attached is cat /proc/cpuinfo
I suppose the processor id in /proc/cpuinfo is the id that should be
referenced in the cpu_affinity_map, isn't it?

cpu_affinity_map process_numbers=1,2,3,4 cores=1,2,3,4


4. systctl
Taking into account several posts founded, we changed some settings in the
sysctl to improve the server's performance.


Something very important are the file descriptors, though we configured
this parameter in the squid.conf.
This actually solve the main problem, as we checked they were being quickly
exhausted.

There are several parameters related to the memory, but we found some
discrepancies in the examples, so we haven't changed them.


5. shm
Attached dev_shm permissions.

We checked the following error in the /var/log/messages at the squid
start...
Ipc::Mem::Segment::open failed to shm_open(/squid-cache4_spaces.shm): (2)
No such file or directory

So we add the following line to the /etc/fstab file:
shm /dev/shm tmpfs nodev,nosuid,noexec 0 0

The errors continued, so we created the files (squid-cache2_spaces.shm,
etc) that appear to be missing, and the errors changed to...
FATAL: Ipc::Mem::Segment::attach failed to mmap(/squid-cache4_spaces.shm):
(22) Invalid argument

Regarding this error, we have disabled the SELinux, as suggested in some
post, but the errors continued appearing.


6. Squid configuration
Attached the squid.conf, frontend.conf and backend.conf.
We configured them as the Wiki example, although it appears that this
example is too much old, furthermore it is suggesting to use aufs, when it
is not SMP aware.
Then we changed several things as suggested in several posts of the Nabble
forum.

We found several posts where issues are reported and Alex suggest to
update, so we updated Squid from 3.5.28 to 4.1, but the errors continued.
We don't know whether it is coincidence, but almost everybody that is
reporting these issues have our same OS, CentOS.
Our system is CentOS Linux release 7.5.1804 (Core).


Furthermore, we have attached the log files, in case it helps.

var_log_messages.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/var_log_messages.txt>
frontend.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/frontend.log>
backend0.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend0.log>
backend2.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend2.log>
backend3.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend3.log>
backend4.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend4.log>
backend5.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend5.log>
backend6.log
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/backend6.log>

In summary, we don't know how to continue the troubleshooting, please help.

Thanks in advance and sorry for the brick.
Best regards,
Paco.

References:
https://wiki.squid-cache.org/Features/SmpScale
https://wiki.squid-cache.org/ConfigExamples/SmpCarpCluster
https://wiki.squid-cache.org/Features/RockStore
https://wiki.squid-cache.org/BestOsForSquid
http://vietlux.blogspot.com/2012/07/squid-proxy-tuning-for-high-perfomance.html
https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php
http://squid-web-proxy-cache.1019090.n4.nabble.com/FATAL-Ipc-Mem-Segment-create-failed-to-shm-open-squid-cf-queues-shm-17-File-exists-td4680157.html
http://squid-web-proxy-cache.1019090.n4.nabble.com/FATAL-shm-open-squid-ssl-session-cache-shm-td4683398.html




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
pacolo
2018-11-23 14:21:22 UTC
Permalink
Hello again,

We have found an issue in backend.conf, as the Rock cache_dir is SMP aware.

Change this...
#cache_dir rock /cache${process_number} 2097152
to this...
cache_dir rock /cache1 2097152


... then the new errors are:
Nov 23 14:55:28 px06 squid[12559]: ERROR: /cache1/rock communication channel
establishment timeout
Nov 23 14:55:28 px06 squid[12559]: FATAL: Rock cache_dir at /cache1/rock
failed to open db file: (0) No error.

We have search for in the forum
(http://squid-web-proxy-cache.1019090.n4.nabble.com/RockStore-quot-Fatal-Error-quot-td4666691.html),
and tried what other people suggested without success.


/cache1
drwxr-xr-x 2 squid squid 16384 nov 21 13:05 lost+found
drwxr-xr-x 2 squid squid 4096 nov 23 12:38 rock

The permissions in /dev/shm are correct, too. Squid is writing some files.
ls_-l_dev_shm.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/ls_-l_dev_shm.txt>

In addition, it appears that Squid can write in the localstatedir...

--localstatedir=/var'
squid_-v.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/squid_-v.txt>

/var/run/squid
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-coordinator.ipc
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-1.ipc
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-2.ipc
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-3.ipc
srwxr-x--- 1 squid squid 0 nov 23 13:00 squid-kid-4.ipc
srwxr-x--- 1 squid squid 0 nov 23 13:00 squid-kid-5.ipc


SELinux status: disabled

var_log_messages.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/var_log_messages.txt>

Any idea?

Best regards,
Paco.




--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
Amos Jeffries
2018-11-24 07:10:37 UTC
Permalink
Post by pacolo
Hello again,
We have found an issue in backend.conf, as the Rock cache_dir is SMP aware.
Change this...
#cache_dir rock /cache${process_number} 2097152
to this...
cache_dir rock /cache1 2097152
Nov 23 14:55:28 px06 squid[12559]: ERROR: /cache1/rock communication channel
establishment timeout
Nov 23 14:55:28 px06 squid[12559]: FATAL: Rock cache_dir at /cache1/rock
failed to open db file: (0) No error.
We have search for in the forum
(http://squid-web-proxy-cache.1019090.n4.nabble.com/RockStore-quot-Fatal-Error-quot-td4666691.html),
and tried what other people suggested without success.
If you have a mix of "/cache${process_number}" and "/cache1" in your
config files you may still be mixing SMP-aware and SMP-disabled access
to the "/cache1" path.

By your mention of "backend.conf" I assume you are trying to use
something based on our example SMP CARP cluster configuration.
If that is correct please compare what you have to the current example
config <https://wiki.squid-cache.org/ConfigExamples/SmpCarpCluster>.

It has had a few changes since initially written, and people
copy-pasting it into tutorials without linking back to our info have got
various bugs in their texts. Sometimes because they copied old versions
that no longer work, or because they made arbitrary changes without
properly understanding the consequences.
Post by pacolo
/cache1
drwxr-xr-x 2 squid squid 16384 nov 21 13:05 lost+found
drwxr-xr-x 2 squid squid 4096 nov 23 12:38 rock
The permissions in /dev/shm are correct, too. Squid is writing some files.
ls_-l_dev_shm.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/ls_-l_dev_shm.txt>
In addition, it appears that Squid can write in the localstatedir...
--localstatedir=/var'
squid_-v.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/squid_-v.txt>
/var/run/squid
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-coordinator.ipc
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-1.ipc
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-2.ipc
srwxr-x--- 1 squid squid 0 nov 23 14:55 squid-kid-3.ipc
srwxr-x--- 1 squid squid 0 nov 23 13:00 squid-kid-4.ipc
srwxr-x--- 1 squid squid 0 nov 23 13:00 squid-kid-5.ipc
SELinux status: disabled
var_log_messages.txt
<http://squid-web-proxy-cache.1019090.n4.nabble.com/file/t377599/var_log_messages.txt>
The log you provide has a mixture of multiple process outputs. But
appears to be lacking the "kidN" information Squid attaches to every log
line indicating which ${process_number} is writing to the log.

That makes it very hard to determine the source of SMP issues from a log
like this. Luckily you did provide the whole log and Squid-4 logs this
detail at the startup:

(squid-coord-4) process 12557 started
(squid-disk-3) process 12558 started
(squid-2) process 12559 started
(squid-1) process 12560 started

The Disker cannot open the configure cache_dir rock:

Nov 23 14:55:21 px06 squid[12558]: ERROR: cannot open /cache1/rock:
(21) Is a directory

The SMP worker did not receiver any registration response from the
Disker, the cache_dir access fails. Worker aborts and enters into a loop
of constantly dying due to unresponsive Disker.

Nov 23 14:55:28 px06 squid[12559]: ERROR: /cache1/rock communication
channel establishment timeout
Nov 23 14:55:28 px06 squid[12559]: Not currently OK to rewrite swap log.
Nov 23 14:55:28 px06 squid[12559]: storeDirWriteCleanLogs: Operation
aborted.
Nov 23 14:55:28 px06 squid[12559]: FATAL: Rock cache_dir at
/cache1/rock failed to open db file: (0) No error.


So the fix is to:

1) stop Squid.

2) make sure it is fully shutdown with no residual instances or
processes running.

3) make sure the SMP /dev/shm sockets opened by Squid are fully gone.
Delete manually if necessary.

4) make sure the PID file is fully gone. Delete manually if necessary.

5) erase everything in the /cache1 directory.

5a) optionally: erase any other caches you may have.
This will speed up the -z process, but only the cache showing errors
actually needs to be fully clean to fix this error message.

6) run "squid -z" manually and wait until it completes.

7) start Squid.


Amos
pacolo
2018-12-03 15:42:57 UTC
Permalink
Hello Amos,
Post by Amos Jeffries
If you have a mix of "/cache${process_number}" and "/cache1" in your
config files you may still be mixing SMP-aware and SMP-disabled access
to the "/cache1" path.
That's exactly what happened to us, as I mentioned, we are new in the
Squid's world. I have several years of experience in Blue Coat proxies, but
these settings were preconfigured in their system.
Post by Amos Jeffries
By your mention of "backend.conf" I assume you are trying to use
something based on our example SMP CARP cluster configuration.
If that is correct please compare what you have to the current example
It has had a few changes since initially written, and people
copy-pasting it into tutorials without linking back to our info have got
various bugs in their texts. Sometimes because they copied old versions
that no longer work, or because they made arbitrary changes without
properly understanding the consequences.
We followed that example, but it was last edited at 2013-03-23 04:15, so it
has a mixture of AUFS and Rock caches, that's why I missunderstood the
/cache${process_number} with the cache modes.

As you say, there's a lot of info into the tutorials released on the web,
but it's difficult to find the information updated with the most updated
version of Squid.

Maybe it will be worth an update of this example, it would save you some
time not having to reply to people that missunderstood that info.
Post by Amos Jeffries
1) stop Squid.
2) make sure it is fully shutdown with no residual instances or
processes running.
3) make sure the SMP /dev/shm sockets opened by Squid are fully gone.
Delete manually if necessary.
4) make sure the PID file is fully gone. Delete manually if necessary.
5) erase everything in the /cache1 directory.
5a) optionally: erase any other caches you may have.
This will speed up the -z process, but only the cache showing errors
actually needs to be fully clean to fix this error message.
6) run "squid -z" manually and wait until it completes.
7) start Squid.
The procedure worked perfectly.

But unfortunately, we faced other issues...

kid1| TCP connection to localhost/4003 failed
kid1| Detected DEAD Parent: backend-kid3
kid1| temporary disabling (Service Unavailable) digest from localhost

So, we changed the cache_peer definition from localhost to 127.0.0.1, as
suggested in https://wiki.squid-cache.org/Features/IPv6.
And that was all to make Squid running.

Your help is much appreciated.

Thanks!
Paco.



--
Sent from: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html
Loading...