Discussion:
[squid-users] 3.5.27 to 4.4: a worker is dead
Heiler Bemerguy
2018-11-07 11:24:57 UTC
Permalink
After some hours the worker 4 died unexpectely and didn't come back. I
have cleaned the rock cache store before upgrading just to be safe..
COLD start........

proxy    15478  0.6  1.8 9518972 1207212 ?     S    Nov06   6:17
(squid-5) --kid squid-5 -s
proxy    15480  0.4  1.4 9391808 954248 ?      S    Nov06   3:56
(squid-3) --kid squid-3 -s
proxy    15481  1.1  2.5 9734420 1679360 ?     S    Nov06  10:45
(squid-2) --kid squid-2 -s
proxy    26933  0.0  0.0 6501296 26768 ?       S    00:25   0:01
(squid-coord-9) --kid squid-coord-9 -s
proxy    26936  0.0  0.6 7424208 401788 ?      S    00:25   0:07
(squid-disk-6) --kid squid-disk-6 -s
proxy    27011  0.0  0.2 7424208 185580 ?      S    00:28   0:03
(squid-disk-7) --kid squid-disk-7 -s
proxy    27227  0.0  0.0 7424208 54980 ?       S    00:37   0:01
(squid-disk-8) --kid squid-disk-8 -s

AMD64 12-core, 64gb ram

cache_mem 5200 MB
maximum_object_size_in_memory 2 MB
maximum_object_size 4 GB
workers 5
cache_dir rock /cache2 131000 min-size=1 max-size=196608
cache_dir rock /cache3 131000 min-size=196609 max-size=624288
cache_dir rock /cache4 131000 min-size=624289 max-swap-rate=500
swap-timeout=500

Squid Cache: Version 4.4
Service Name: squid

This binary uses OpenSSL 1.0.1t  3 May 2016. For legal restrictions on
distribution see https://www.openssl.org/source/license.html

configure options:  '--enable-delay-pools' '--with-openssl'
'--enable-htcp' '--disable-maintainer-mode'
'--disable-dependency-tracking' '--disable-wccp' '--disable-snmp'
'--enable-inline' '--enable-async-io=32' '--enable-storeio=aufs,rock'
'--enable-underscores' '--enable-removal-policies=lru,heap'
'--enable-http-violations' '--disable-ident-lookups'
'--with-large-files' '--enable-ssl' '--enable-ltdl-convenience'
'--prefix=/usr' '--localstatedir=/var' '--libexecdir=/lib/squid'
'--srcdir=.' '--datadir=/usr/share/squid' '--sysconfdir=/etc/squid'
'--with-default-user=proxy' '--with-logdir=/var/log'
'--with-pidfile=/var/run/squid.pid' '--with-filedescriptors=16384'
'--with-aufs-threads=32' '--disable-translation'


2018/11/07 07:06:19 kid4| clientProcessHit: URL mismatch,
'[unknown_URI]' !=
'http://ocsp.godaddy.com//MEQwQjBAMD4wPDAJBgUrDgMCGgUABBTkIInKBAzXkF0Qh0pel3lfHJ9GPAQU0sSw0pHUTBFxs2HLPaH%2B3ahq1OMCAxvnFQ%3D%3D'
2018/11/07 07:07:11 kid4| WARNING: communication with /cache2/rock may
be too slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os
2018/11/07 07:07:11 kid4| clientProcessHit: URL mismatch,
'[unknown_URI]' != 'http://www.orm.com.br/templates/noticiaDev.php'
2018/11/07 07:13:26 kid4| clientProcessHit: URL mismatch,
'[unknown_URI]' !=
'http://storage.googleapis.com/update-delta/hfnkpimlhhgieaddgfemjhofmfblmnib/4803/4802/b767e40b59c48c2ec52977502ac10e35b84c00600197162f42dd941b5095cafd.crxd'
2018/11/07 07:17:03 kid4| FATAL: Dying from an exception handling
failure; exception: check failed: false
    exception location: mem/PageStack.cc(106) push

2018/11/07 07:17:03 kid4| Current Directory is /etc/init.d
2018/11/07 07:17:03 kid4| Starting Squid Cache version 4.4 for
x86_64-pc-linux-gnu...
2018/11/07 07:17:03 kid4| Service Name: squid
2018/11/07 07:17:03 kid4| Process ID 6320
2018/11/07 07:17:03 kid4| Process Roles: worker
2018/11/07 07:17:03 kid4| With 65536 file descriptors available
2018/11/07 07:17:03 kid4| Initializing IP Cache...
2018/11/07 07:17:03 kid4| DNS Socket created at [::], FD 5
2018/11/07 07:17:03 kid4| DNS Socket created at 0.0.0.0, FD 13
2018/11/07 07:17:03 kid4| Adding domain cinbesa.pmb from /etc/resolv.conf
2018/11/07 07:17:03 kid4| Adding nameserver 127.0.0.1 from /etc/resolv.conf
2018/11/07 07:17:03 kid4| helperOpenServers: Starting 8/16 'ufdbgclient'
processes
2018/11/07 07:17:03 kid4| helperOpenServers: Starting 10/30
'store-id.pl' processes
2018/11/07 07:17:03 kid4| Logfile: opening log
daemon:/var/log/squid/access.log
2018/11/07 07:17:03 kid4| Logfile Daemon: opening log
/var/log/squid/access.log
2018/11/07 07:17:03 kid4| Store logging disabled
2018/11/07 07:17:03 kid4| WARNING: disk-cache maximum object size is too
large for mem-cache: 4194304.00 KB > 2048.00 KB
2018/11/07 07:17:03 kid4| Swap maxSize 0 + 5324800 KB, estimated 118328
objects
2018/11/07 07:17:03 kid4| Target number of buckets: 11832
2018/11/07 07:17:03 kid4| Using 16384 Store buckets
2018/11/07 07:17:03 kid4| Max Mem  size: 5324800 KB [shared]
2018/11/07 07:17:03 kid4| Max Swap size: 0 KB
2018/11/07 07:17:03 kid4| Using Least Load store dir selection
2018/11/07 07:17:03 kid4| Current Directory is /etc/init.d
2018/11/07 07:17:03 kid4| Finished loading MIME types and icons.
2018/11/07 07:17:03 kid4| Configuring Parent 10.1.10.10/8081/0
2018/11/07 07:17:03 kid4| Squid plugin modules loaded: 0
2018/11/07 07:17:03 kid4| Adaptation support is off.
2018/11/07 07:17:09 kid4| Closing HTTP(S) port [::]:3080
2018/11/07 07:17:09 kid4| Not currently OK to rewrite swap log.
2018/11/07 07:17:09 kid4| storeDirWriteCleanLogs: Operation aborted.
2018/11/07 07:17:09 kid4| FATAL: kid4 registration timed out
2018/11/07 07:17:09 kid4| Squid Cache (Version 4.4): Terminated abnormally.
CPU Usage: 0.036 seconds = 0.024 user + 0.012 sys
Maximum Resident Size: 156240 KB
Page faults with physical i/o: 0
2018/11/07 07:17:09 kid4| Current Directory is /etc/init.d
2018/11/07 07:17:09 kid4| Starting Squid Cache version 4.4 for
x86_64-pc-linux-gnu...
2018/11/07 07:17:09 kid4| Service Name: squid
2018/11/07 07:17:09 kid4| Process ID 6341
2018/11/07 07:17:09 kid4| Process Roles: worker
2018/11/07 07:17:09 kid4| With 65536 file descriptors available
2018/11/07 07:17:09 kid4| Initializing IP Cache...
2018/11/07 07:17:09 kid4| DNS Socket created at [::], FD 5
2018/11/07 07:17:09 kid4| DNS Socket created at 0.0.0.0, FD 13
2018/11/07 07:17:09 kid4| Adding domain cinbesa.pmb from /etc/resolv.conf
2018/11/07 07:17:09 kid4| Adding nameserver 127.0.0.1 from /etc/resolv.conf
2018/11/07 07:17:09 kid4| helperOpenServers: Starting 8/16 'ufdbgclient'
processes
2018/11/07 07:17:09 kid4| helperOpenServers: Starting 10/30
'store-id.pl' processes
2018/11/07 07:17:09 kid4| Logfile: opening log
daemon:/var/log/squid/access.log
2018/11/07 07:17:09 kid4| Logfile Daemon: opening log
/var/log/squid/access.log
2018/11/07 07:17:09 kid4| Store logging disabled
2018/11/07 07:17:09 kid4| WARNING: disk-cache maximum object size is too
large for mem-cache: 4194304.00 KB > 2048.00 KB
2018/11/07 07:17:09 kid4| Swap maxSize 0 + 5324800 KB, estimated 118328
objects
2018/11/07 07:17:09 kid4| Target number of buckets: 11832
2018/11/07 07:17:09 kid4| Using 16384 Store buckets
2018/11/07 07:17:09 kid4| Max Mem  size: 5324800 KB [shared]
2018/11/07 07:17:09 kid4| Max Swap size: 0 KB
2018/11/07 07:17:09 kid4| Using Least Load store dir selection
2018/11/07 07:17:09 kid4| Current Directory is /etc/init.d
2018/11/07 07:17:09 kid4| Finished loading MIME types and icons.
2018/11/07 07:17:09 kid4| Configuring Parent 10.1.10.10/8081/0
2018/11/07 07:17:09 kid4| Squid plugin modules loaded: 0
2018/11/07 07:17:09 kid4| Adaptation support is off.
2018/11/07 07:17:15 kid4| Closing HTTP(S) port [::]:3080
2018/11/07 07:17:15 kid4| Not currently OK to rewrite swap log.
2018/11/07 07:17:15 kid4| storeDirWriteCleanLogs: Operation aborted.
2018/11/07 07:17:15 kid4| FATAL: kid4 registration timed out
2018/11/07 07:17:15 kid4| Squid Cache (Version 4.4): Terminated abnormally.
CPU Usage: 0.040 seconds = 0.024 user + 0.016 sys
Maximum Resident Size: 156496 KB
Page faults with physical i/o: 0

(loop)
--
Atenciosamente,

Heiler Bensimon Bemerguy - CINBESA
Analista de Redes, Wi-Fi,
Virtualização e Serviços Internet
(55) 91 98151-4894
Amos Jeffries
2018-11-08 03:32:03 UTC
Permalink
Post by Heiler Bemerguy
After some hours the worker 4 died unexpectely and didn't come back. I
have cleaned the rock cache store before upgrading just to be safe..
COLD start........
proxy    15478  0.6  1.8 9518972 1207212 ?     S    Nov06   6:17
(squid-5) --kid squid-5 -s
proxy    15480  0.4  1.4 9391808 954248 ?      S    Nov06   3:56
(squid-3) --kid squid-3 -s
proxy    15481  1.1  2.5 9734420 1679360 ?     S    Nov06  10:45
(squid-2) --kid squid-2 -s
proxy    26933  0.0  0.0 6501296 26768 ?       S    00:25   0:01
(squid-coord-9) --kid squid-coord-9 -s
proxy    26936  0.0  0.6 7424208 401788 ?      S    00:25   0:07
(squid-disk-6) --kid squid-disk-6 -s
proxy    27011  0.0  0.2 7424208 185580 ?      S    00:28   0:03
(squid-disk-7) --kid squid-disk-7 -s
proxy    27227  0.0  0.0 7424208 54980 ?       S    00:37   0:01
(squid-disk-8) --kid squid-disk-8 -s
AMD64 12-core, 64gb ram
cache_mem 5200 MB
maximum_object_size_in_memory 2 MB
maximum_object_size 4 GB
workers 5
cache_dir rock /cache2 131000 min-size=1 max-size=196608
cache_dir rock /cache3 131000 min-size=196609 max-size=624288
cache_dir rock /cache4 131000 min-size=624289 max-swap-rate=500
swap-timeout=500
The worker "kid4" died with an exception doing something with shared-memory.
Post by Heiler Bemerguy
2018/11/07 07:06:19 kid4| clientProcessHit: URL mismatch,
'[unknown_URI]' !=
'http://ocsp.godaddy.com//MEQwQjBAMD4wPDAJBgUrDgMCGgUABBTkIInKBAzXkF0Qh0pel3lfHJ9GPAQU0sSw0pHUTBFxs2HLPaH%2B3ahq1OMCAxvnFQ%3D%3D'
Your rock cache apparently contains at least several objects with the
exact string "[unknown URI]" as their URL / store-ID key. These are
attempting to be delivered to the client when it requested that OSCP URL.

Squid has detected the problem and will fetch a new object from the
network instead of the cache. It also begins the process of purging that
corrupt object from the cache.

... but then ...
Post by Heiler Bemerguy
2018/11/07 07:07:11 kid4| WARNING: communication with /cache2/rock may
be too slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os
Activity the worker is asking the Disker to do is taking a long time,
some actions have started to timeout. If they were fetching objects
those client transactions will be treated as MISS.
I'm not sure what happens if those were deletions.
Post by Heiler Bemerguy
2018/11/07 07:07:11 kid4| clientProcessHit: URL mismatch,
'[unknown_URI]' != 'http://www.orm.com.br/templates/noticiaDev.php'
2018/11/07 07:13:26 kid4| clientProcessHit: URL mismatch,
'[unknown_URI]' !=
'http://storage.googleapis.com/update-delta/hfnkpimlhhgieaddgfemjhofmfblmnib/4803/4802/b767e40b59c48c2ec52977502ac10e35b84c00600197162f42dd941b5095cafd.crxd'
2018/11/07 07:17:03 kid4| FATAL: Dying from an exception handling
failure; exception: check failed: false
    exception location: mem/PageStack.cc(106) push
The SMP shared-memory space has been asked to allocate more memory than
it has total capacity.

...
Post by Heiler Bemerguy
2018/11/07 07:17:03 kid4| Starting Squid Cache version 4.4 for
x86_64-pc-linux-gnu...
...
Post by Heiler Bemerguy
2018/11/07 07:17:03 kid4| WARNING: disk-cache maximum object size is too
large for mem-cache: 4194304.00 KB > 2048.00 KB
The above might have something to do with the shared-memory problem.

It is part of rock design that the cache contents have to be able to be
stored in memory.

...
Post by Heiler Bemerguy
2018/11/07 07:17:09 kid4| storeDirWriteCleanLogs: Operation aborted.
2018/11/07 07:17:09 kid4| FATAL: kid4 registration timed out
The auto-recovery process keeps dying trying to re-start it because of
the very long time it takes to startup when the proxy is already under
traffic load and memory pressure. These type of delays are usually due
to Squid being configured with very large rock caches.




So, to solve this try reducing the cache size. rock was designed for
caches in the MB ranges with many small objects. So start small then
push it up until problems start to appear again.


Keep in mind that rock was designed and optimized for caches of several
hundred *MB* with millions of small 0-32KB sized objects. For large
objects the UFS caches are optimal.

Squid caching is intended for the two types to work together for best
performance at all object size ranges.

Amos
Alex Rousskov
2018-11-08 06:07:36 UTC
Permalink
Post by Amos Jeffries
Post by Heiler Bemerguy
2018/11/07 07:17:03 kid4| WARNING: disk-cache maximum object size is too
large for mem-cache: 4194304.00 KB > 2048.00 KB
It is part of rock design that the cache contents have to be able to be
stored in memory.
FTR, the above statement is incorrect: There is no such requirement in
rock design. Needless to say, on-disk objects larger than maximum
in-memory object size cannot be cached in memory, but that is true for
any Squid configurations, not just those using rock cache_dirs.
Post by Amos Jeffries
rock was designed for caches in the MB ranges
The above statement is also incorrect: Rock was designed for GBs and
even TBs worth of cached content. There are still many things that can
be done in rock to support large caches better, of course, but
optimizing MBs-sized caches specifically was never a design objective.
Post by Amos Jeffries
Keep in mind that rock was designed and optimized for caches of several
hundred *MB* with millions of small 0-32KB sized objects.
Optimized? Probably -- most cached objects in most deployment
environments are indeed small.

Designed? No. One of the LargeRock project explicit goals was support
for cached objects that are several GBs in size:
https://wiki.squid-cache.org/Features/LargeRockStore
Post by Amos Jeffries
Squid caching is intended for the two types to work together for best
performance at all object size ranges.
Not really: Rock cache_dirs should not be used together with UFS-based
disk caches, especially in SMP environments where UFS-based disk caches
should not be used at all.


I realize that the above corrections may not help resolve the problem OP
is facing. I just did not want those misstatements to be archived
unaddressed.

Alex.
Heiler Bemerguy
2018-11-08 15:49:48 UTC
Permalink
Post by Amos Jeffries
Your rock cache apparently contains at least several objects with the
exact string "[unknown URI]" as their URL / store-ID key. These are
attempting to be delivered to the client when it requested that OSCP URL.
Squid has detected the problem and will fetch a new object from the
network instead of the cache. It also begins the process of purging that
corrupt object from the cache.
This always ocurred, even with 3.5.27.. and I think it's related to
cache_peers asking for objects. Am I the only one seeing it?
Post by Amos Jeffries
Post by Heiler Bemerguy
2018/11/07 07:17:03 kid4| FATAL: Dying from an exception handling
failure; exception: check failed: false
    exception location: mem/PageStack.cc(106) push
The SMP shared-memory space has been asked to allocate more memory than
it has total capacity.
Why would it do that? 3.5.27 worked forever flawlessly with the same
config. Should I decrease cache_mem?
--
Atenciosamente,

Heiler Bensimon Bemerguy - CINBESA
Analista de Redes, Wi-Fi,
Virtualização e Serviços Internet
(55) 91 98151-4894
Alex Rousskov
2018-11-08 16:42:40 UTC
Permalink
Post by Heiler Bemerguy
Post by Amos Jeffries
Post by Heiler Bemerguy
2018/11/07 07:17:03 kid4| FATAL: Dying from an exception handling
failure; exception: check failed: false
     exception location: mem/PageStack.cc(106) push
The SMP shared-memory space has been asked to allocate more memory than
it has total capacity.
Why would it do that?
I am not sure that explanation is accurate. The exception is thrown when
Squid's shared memory manager is given a (previously used/allocated)
shared memory page to add/return to the index of free (i.e. available
for future use) memory pages. When trying to add that free page, Squid
found that the entire index of free shared memory pages is full -- all
previously allocated shared memory pages for the given scope/segment
were already freed/indexed, and there could not have been one more free
page.

It is not clear whether Squid is wrong about the index being full or the
calling code is giving a free page that was already index (i.e. an
equivalent of a double-free bug). There could also be some (currently
unknown) and mishandled race condition when navigating the shared memory
index.

My bet is on a bug in Squid code that attempts to recover from disker
errors (you have reported some disker-related errors/timeouts that
preceded this exception IIRC). If I am right, then this exception is a
symptom of a Squid bug located elsewhere.

Alex.

Continue reading on narkive:
Loading...