Discussion:
[squid-users] Identifying the source of Invalid-request (squid 3) -> error:transaction-end-before-headers (Squid 4)
Jester Purtteman
2016-10-15 13:36:04 UTC
Permalink
Greetings!



I am running a transparent proxy for plain http traffic, memory caching
only, I have something like 500 devices that are using the proxy at any
given time over a satellite and I am averaging in the range of 2,000
requests per minute across the proxy (again, no SSL bump, I do not control
the devices at all). I am using 3.5.22 compiled from sources (if it
matters). I have been seeing lines in my access log like the following:



1476535967.570 0 xxx.xxx.xxx.xxx TAG_NONE/400 4538 NONE
error:invalid-request - HIER_NONE/- text/html



After some digging on this list I began to suspect websockets or other
non-http traffic coming across port 80. After additional reading, and as
much as anything to test the hypothesis, I decided to try squid 4.0.15 with
on-unsupported-protocol. I get what I am guessing to be the same result
with new error text around it:



1476536369.742 0 xxx.xxx.xxx.xxx NONE/000 0 NONE
error:transaction-end-before-headers - HIER_NONE/- -



An interesting point to interject here is that my "Hits as % bytes sent" in
3.5.x has always been in the 2 to 5% range, but there are periods (sometimes
long ones) where the inbound traffic to squid is much higher than the
outbound. When I switch to 4.0.x, I am now running about -27% (note,
negative twenty-seven) as bytes, which makes me suspect it is logging the
higher inbound than outbound now. So, apparently, this unsupported protocol
is triggering some sort of large download, but does not end up going to the
client. Obviously, this is not good, so I'm digging deeper and I'd
appreciate any pointers that come to mind.



I would like to know a couple things, first: is there some debugging level
other than ALL,9 that might give me some illumination? ALL,9 generates
about 15 MB of debug log per second at my current load level, and these
errors aren't real frequent, so I end up with ~ 400 MB of text that needs to
be sifted through. As you can imagine, that can be a bit brutal. If I
could even identify the other end point, I would at least be able to figure
out if this is Apple, Microsoft, Android, something else, and perhaps get
closer to being able to replicate the error. Thoughts would be appreciated.
In case its relevant, my compile options were:



./configure --prefix=/usr --localstatedir=/var
--libexecdir=/usr/lib/squid --srcdir=. --datadir=/usr/share/squid
--sysconfdir=/etc/squid --with-default-user=proxy --with-logdir=/var/log
--with-pidfile=/var/run/squid.pid --enable-linux-netfilter
--enable-cache-digests --enable-storeio=ufs,aufs,diskd,rock
--enable-async-io=30 --enable-http-violations --enable-zph-qos
--with-netfilter-conntrack --with-filedescriptors=65536 --with-large-files



Note that, a lot of those are based on a very long and tedious
guess-and-check session last year, and some of them probably are totally
irrelevant to my setup (I'm looking at you --enable-http-violations and
--enable-zph-qos) but hey, what is life without the unnecessary noise from
lazily copy-and-pasting old compile lines.



My configuration, edited to eliminate my numerous comments and hashed out
lines of experiments and to hide network identifiers, is pasted below.



///BEGIN /etc/squid/squid.conf

workers 4



acl localnet src 10.0.0.0/8 # RFC1918 possible internal network

acl localnet src 172.16.0.0/12 # RFC1918 possible internal network

acl localnet src 192.168.0.0/16 # RFC1918 possible internal network



acl localnet src fc00::/7 # RFC 4193 local private network range

acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged)
machines



acl SSL_ports port 443

acl Safe_ports port 80 # http

acl Safe_ports port 21 # ftp

acl Safe_ports port 443 # https

acl Safe_ports port 70 # gopher

acl Safe_ports port 210 # wais

acl Safe_ports port 1025-65535 # unregistered ports

acl Safe_ports port 280 # http-mgmt

acl Safe_ports port 488 # gss-http

acl Safe_ports port 591 # filemaker

acl Safe_ports port 777 # multiling http

acl CONNECT method CONNECT



#Note that I added this line when testing Squid-4, it is commented out when
running Squid-3

on_unsupported_protocol tunnel all



http_access allow localnet

http_access allow localhost



http_access deny !Safe_ports

http_access deny CONNECT !SSL_Ports



# And finally deny all other access to this proxy

http_access deny all



http_port 3128

http_port 3129 tproxy



visible_hostname squid-proxy.mydomain.tld



acl updatesites dstdom_regex "/etc/squid/updatesites.txt"



icp_port 3130

htcp_port 4827

icp_access allow localnet

icp_access deny all



#Testing QoS Marks

qos_flows tos local-hit=0x30

qos_flows mark local-hit=0x30

qos_flows mark miss=0x0



maximum_object_size 800 MB updatesites

maximum_object_size 80 MB !updatesites

range_offset_limit 0

quick_abort_min 0 KB



store_id_program /usr/lib/squid/storeid_file_rewrite
/etc/squid/storeid_rewrite.conf

store_id_children 10 startup=3 idle=1 concurrency=0



cache_mem 16384 MB

maximum_object_size_in_memory 8 MB



cache_swap_low 90

cache_swap_high 95



cache_store_log daemon:/var/log/squid/store.log

access_log daemon:/var/log/squid/access.log squid

cache_log /var/log/squid/cache.log

logfile_rotate 40

max_open_disk_fds 64000



refresh_pattern ^ftp: 1440 20% 10080

refresh_pattern ^gopher: 1440 0% 1440

refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

refresh_pattern . 0 20% 4320



cache_mgr ***@mydomain.td

cache_effective_user proxy

cache_effective_group proxy

///END /etc/squid/squid.conf



So, I have a few questions I guess:

(1) For one thing, what are the implications of "on_unsupported_protocol
tunnel all"? I did it as a quick attempt to see if that had any new and
interesting impacts, but is it safe-ish? Am I letting the bad-guys come
pouring through with that?

(2) What debug levels should I be thinking about to try and figure out
what is happening. Seems like we won't get very far without identifying
what is throwing that error.

(3) Has anyone else seen this? Right now, for example (after 10 minutes
of typing an email) I'm actually running -61% Hits as Bytes! (Negative!)
Ouch!



Thanks!



--Jester
Alex Rousskov
2016-10-15 17:57:38 UTC
Permalink
Post by Jester Purtteman
1476535967.570 0 xxx.xxx.xxx.xxx TAG_NONE/400 4538 NONE
error:invalid-request - HIER_NONE/- text/html
After some digging on this list I began to suspect websockets or other
non-http traffic coming across port 80. I decided to try squid
4.0.15 with on-unsupported-protocol. I get what I am guessing to be the
1476536369.742 0 xxx.xxx.xxx.xxx NONE/000 0 NONE
error:transaction-end-before-headers - HIER_NONE/- -
An interesting point to interject here is that my “Hits as % bytes sent”
in 3.5.x has always been in the 2 to 5% range, but there are periods
(sometimes long ones) where the inbound traffic to squid is much higher
than the outbound. When I switch to 4.0.x, I am now running about -27%
(note, negative twenty-seven) as bytes, which makes me suspect it is
logging the higher inbound than outbound now.
That difference sounds potentially important to me. I encourage you to
figure out what causes it (which is exactly what you have started doing,
of course).
Post by Jester Purtteman
So, apparently, this
unsupported protocol is triggering some sort of large download, but does
not end up going to the client.
There might be some exceptions, but non-tunneled
error:transaction-end-before-headers are not supposed to trigger a
download. Squid does not know what to download because Squid cannot
parse the request...

When enabled, tunneled error:transaction-end-before-headers do download
data, of course, but you may be able to measure how much they download
then by finding the corresponding transactions in access log.
Post by Jester Purtteman
I would like to know a couple things, first: is there some debugging
level other than ALL,9 that might give me some illumination?
I am sure there is. Once you know what the problem/cause is, it is easy
to come up with the corresponding optimal debug_options settings to show
the cause. Before that? You can try various settings (debugging sections
are semi-documented in doc/debug-sections.txt), but it is often not
worth your time.
Post by Jester Purtteman
ALL,9
generates about 15 MB of debug log per second at my current load level,
and these errors aren’t real frequent, so I end up with ~ 400 MB of text
that needs to be sifted through. As you can imagine, that can be a bit
brutal.
I do not quite understand the problem of a 400MB ALL,9 cache.log. IMHO,
it is not much more difficult to deal with than a 1MB ALL,9 cache.log:
Either you can navigate ALL,9 noise or you cannot; the total log size
does not really matter much beyond a few MB levels IMO (provided you
have enough disk space to store it and logging itself does not slow
Squid down too much to reproduce the problem).

Please note that I am not saying that you are doing something wrong or
even complaining about a non-problem. I am only saying that I do not see
a [solvable in the context of this email thread] problem with ALL,9 logs
so I cannot help you solve it.
Post by Jester Purtteman
(1) For one thing, what are the implications of
“on_unsupported_protocol tunnel all”?
In rough terms, everything that is not SSL or HTTP will be tunneled.
AFAIK, non-HTTP inside SSL will not be tunneled yet (there is an
important patch for that going through squid-dev review right now).
Post by Jester Purtteman
I did it as a quick attempt to
see if that had any new and interesting impacts, but is it safe-ish?
I do not know what you mean by "safe", but, in a sense, it is more
"safe" than having no proxy at all because your access.log will show you
those tunnels.
Post by Jester Purtteman
Am I letting the bad-guys come pouring through with that?
I personally do not know -- in general, it depends on the bad guys in
your environment. Others here may have deployment-specific stories that
I lack.
Post by Jester Purtteman
(2) What debug levels should I be thinking about to try and figure
out what is happening. Seems like we won’t get very far without
identifying what is throwing that error.
If you do not want to deal with ALL,9, I would recommend this combination:

* a packet capture (you can limit the captured packet size if needed)
* access log format that logs all IPs and all TCP ports so that you can
match an access log line with captured packets/connection.
Post by Jester Purtteman
(3) Has anyone else seen this? Right now, for example (after 10
minutes of typing an email) I’m actually running -61% Hits as Bytes!
(Negative!) Ouch!
As I said earlier, I am not sure the negative byte hit ratio is actually
related to these errors, but it could be. Squid v4 fixed a few
size-related accounting bugs. It is possible that we screw something up
in the process or that your actual byte hit ratio was always bad (but
you did not know about it because Squid was lying to you).

Can you compare Squid-reported numbers with OS/interfaces-reported
numbers somehow? If OS/interface numbers confirm v3.5 report but not
v4.0 report, then there is a bug we need to fix.


HTH,

Alex.
Jester Purtteman
2016-10-15 23:16:55 UTC
Permalink
Thanks for the reply Alex, I've embedded some comments below, and I will
get back to you with additional info after some testing.
Post by Alex Rousskov
Post by Jester Purtteman
1476535967.570 0 xxx.xxx.xxx.xxx TAG_NONE/400 4538 NONE
error:invalid-request - HIER_NONE/- text/html
After some digging on this list I began to suspect websockets or other
non-http traffic coming across port 80. I decided to try squid
4.0.15 with on-unsupported-protocol. I get what I am guessing to be the
1476536369.742 0 xxx.xxx.xxx.xxx NONE/000 0 NONE
error:transaction-end-before-headers - HIER_NONE/- -
An interesting point to interject here is that my “Hits as % bytes sent”
in 3.5.x has always been in the 2 to 5% range, but there are periods
(sometimes long ones) where the inbound traffic to squid is much higher
than the outbound. When I switch to 4.0.x, I am now running about -27%
(note, negative twenty-seven) as bytes, which makes me suspect it is
logging the higher inbound than outbound now.
That difference sounds potentially important to me. I encourage you to
figure out what causes it (which is exactly what you have started doing,
of course).
Post by Jester Purtteman
So, apparently, this
unsupported protocol is triggering some sort of large download, but does
not end up going to the client.
There might be some exceptions, but non-tunneled
error:transaction-end-before-headers are not supposed to trigger a
download. Squid does not know what to download because Squid cannot
parse the request...
When enabled, tunneled error:transaction-end-before-headers do download
data, of course, but you may be able to measure how much they download
then by finding the corresponding transactions in access log.
Post by Jester Purtteman
I would like to know a couple things, first: is there some debugging
level other than ALL,9 that might give me some illumination?
I am sure there is. Once you know what the problem/cause is, it is easy
to come up with the corresponding optimal debug_options settings to show
the cause. Before that? You can try various settings (debugging sections
are semi-documented in doc/debug-sections.txt), but it is often not
worth your time.
Post by Jester Purtteman
ALL,9
generates about 15 MB of debug log per second at my current load level,
and these errors aren’t real frequent, so I end up with ~ 400 MB of text
that needs to be sifted through. As you can imagine, that can be a bit
brutal.
I do not quite understand the problem of a 400MB ALL,9 cache.log. IMHO,
Either you can navigate ALL,9 noise or you cannot; the total log size
does not really matter much beyond a few MB levels IMO (provided you
have enough disk space to store it and logging itself does not slow
Squid down too much to reproduce the problem).
Please note that I am not saying that you are doing something wrong or
even complaining about a non-problem. I am only saying that I do not see
a [solvable in the context of this email thread] problem with ALL,9 logs
so I cannot help you solve it.
I gotcha, I'll start the digging. I was curious what it would take to
get the a dump of what the request looked like, but its some non-HTTP,
so that probably doesn't make sense anyway
Post by Alex Rousskov
Post by Jester Purtteman
(1) For one thing, what are the implications of
“on_unsupported_protocol tunnel all”?
In rough terms, everything that is not SSL or HTTP will be tunneled.
AFAIK, non-HTTP inside SSL will not be tunneled yet (there is an
important patch for that going through squid-dev review right now).
Okay, that i think is the desired behavoir in my case, I basically want
squid to handle the requests that it knows how deal with, and ignore and
pass along the ones it cannot. The environment is pretty promiscuous,
and i don't need to restrict clients from using non-http stuff.
Post by Alex Rousskov
Post by Jester Purtteman
I did it as a quick attempt to
see if that had any new and interesting impacts, but is it safe-ish?
I do not know what you mean by "safe", but, in a sense, it is more
"safe" than having no proxy at all because your access.log will show you
those tunnels.
Safe as in, not causing security holes.
Post by Alex Rousskov
Post by Jester Purtteman
Am I letting the bad-guys come pouring through with that?
I personally do not know -- in general, it depends on the bad guys in
your environment. Others here may have deployment-specific stories that
I lack.
I was concerned that by permitting connects through the proxy there may
be security concerns, but i have the thing bolted down so that only our
clients can get to it, and it doesn't have much access to anywhere else
on the network, so I am thinking that is probably fine.
Post by Alex Rousskov
Post by Jester Purtteman
(2) What debug levels should I be thinking about to try and figure
out what is happening. Seems like we won’t get very far without
identifying what is throwing that error.
* a packet capture (you can limit the captured packet size if needed)
* access log format that logs all IPs and all TCP ports so that you can
match an access log line with captured packets/connection.
I'll just suck it up and do an ALL,9 capture. It will take me a couple
tries to get a capture that includes the problem, but it isn't a huge
problem. The packet capture idea is a good one too, I'll do that as
well. Similar issue (sifting a small amount of info out of an ocean of
data) but I think valuable.
Post by Alex Rousskov
Post by Jester Purtteman
(3) Has anyone else seen this? Right now, for example (after 10
minutes of typing an email) I’m actually running -61% Hits as Bytes!
(Negative!) Ouch!
As I said earlier, I am not sure the negative byte hit ratio is actually
related to these errors, but it could be. Squid v4 fixed a few
size-related accounting bugs. It is possible that we screw something up
in the process or that your actual byte hit ratio was always bad (but
you did not know about it because Squid was lying to you).
Can you compare Squid-reported numbers with OS/interfaces-reported
numbers somehow? If OS/interface numbers confirm v3.5 report but not
v4.0 report, then there is a bug we need to fix.
I can, i have interface statistics that I should be able to get pretty
close with. It won't be perfect (my ssh session will throw it off a
tad, etc) but it will be able to detect a 30%+ problem without issue.
I'll do some experiments and share the result.
Post by Alex Rousskov
HTH,
Alex.
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Alex Rousskov
2016-10-16 00:27:00 UTC
Permalink
The packet capture idea is a good one too, I'll do that as well.
Similar issue (sifting a small amount of info out of an ocean of data)
but I think valuable.
With a packet capture and a matching access.log, it is easy to find the
offending connections without Squid-specific knowledge because you can
ask Wireshark or a similar tool to locate the packets that match the
logged IPs and ports (the ones on the error:... lines in access.log).
After that, you just follow the TCP stream you found and look at its
packet payload to identify the protocol/intent...

With cache.log, the procedure is similar but there is no nice interface
to "follow the identified transaction". There are some very useful
scripts that can follow descriptors and internal Squid "jobs", but they
do require some low-level Squid-specific knowledge and experience to
operate correctly (unfortunately). Besides, you may not see the payload,
especially if Squid does not try to parse it.

Alex.
Jester Purtteman
2016-11-02 13:47:42 UTC
Permalink
Hello Alex et al.

So, life got on top of me instead, and I haven't had time to do much testing.... Well, not totally true, did some testing but haven't discovered anything useful. But, from the "new and interesting" department, I updated 3.5 build r14102, and I am not seeing any of the 'invalid-request' lines anymore. Don't know what to make of that, but could be, problem solved.

Jester Purtteman, P.E.
OptimERA Inc
-----Original Message-----
Sent: Saturday, October 15, 2016 5:27 PM
Subject: Re: [squid-users] Identifying the source of Invalid-request (squid 3) -
error:transaction-end-before-headers (Squid 4)
The packet capture idea is a good one too, I'll do that as well.
Similar issue (sifting a small amount of info out of an ocean of data)
but I think valuable.
With a packet capture and a matching access.log, it is easy to find the
offending connections without Squid-specific knowledge because you can
ask Wireshark or a similar tool to locate the packets that match the logged IPs
and ports (the ones on the error:... lines in access.log).
After that, you just follow the TCP stream you found and look at its packet
payload to identify the protocol/intent...
With cache.log, the procedure is similar but there is no nice interface to
"follow the identified transaction". There are some very useful scripts that
can follow descriptors and internal Squid "jobs", but they do require some
low-level Squid-specific knowledge and experience to operate correctly
(unfortunately). Besides, you may not see the payload, especially if Squid
does not try to parse it.
Alex.
Amos Jeffries
2016-11-03 01:58:19 UTC
Permalink
Post by Jester Purtteman
Hello Alex et al.
So, life got on top of me instead, and I haven't had time to do much testing.... Well, not totally true, did some testing but haven't discovered anything useful. But, from the "new and interesting" department, I updated 3.5 build r14102, and I am not seeing any of the 'invalid-request' lines anymore. Don't know what to make of that, but could be, problem solved.
Interesting. Nothing changed between 14098 (3.5.22 release) and 14102
relevant to connections. Just build fixes and documentation.

Cheers
Amos
Jester Purtteman
2016-11-03 14:13:36 UTC
Permalink
That is how I felt about it, since nothing in the changelog indicated anything that would impact that. I'll switch back to the old version tonight, maybe the client traffic that was inducing the entries that isn't trying anymore? I don't have a good answer, I'll try the old version tomorrow and see if the invalid-request lines come back.

Jester
-----Original Message-----
Behalf Of Amos Jeffries
Sent: Wednesday, November 2, 2016 6:58 PM
Subject: Re: [squid-users] Identifying the source of Invalid-request (squid 3) -
error:transaction-end-before-headers (Squid 4)
Hello Alex et al.
So, life got on top of me instead, and I haven't had time to do much
testing.... Well, not totally true, did some testing but haven't discovered
anything useful. But, from the "new and interesting" department, I updated
3.5 build r14102, and I am not seeing any of the 'invalid-request' lines
anymore. Don't know what to make of that, but could be, problem solved.
Interesting. Nothing changed between 14098 (3.5.22 release) and 14102
relevant to connections. Just build fixes and documentation.
Cheers
Amos
_______________________________________________
squid-users mailing list
http://lists.squid-cache.org/listinfo/squid-users
Loading...