Bug 13023 - Core Update 172: squid-asnbl-helper segfaulting and shutdown squid
Summary: Core Update 172: squid-asnbl-helper segfaulting and shutdown squid
Status: CLOSED FIXED
Alias: None
Product: IPFire
Classification: Unclassified
Component: --- (show other bugs)
Version: 2
Hardware: unspecified Unspecified
: Will affect an average number of users Major Usability
Assignee: Adolf Belka
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-30 19:30 UTC by Nicolas Pöhlmann
Modified: 2023-08-15 08:26 UTC (History)
4 users (show)

See Also:


Attachments
Make squid-asnbl compatible with python3 (4.03 KB, patch)
2023-03-08 06:12 UTC, Nicolas Pöhlmann
Details
Make squid-asnbl compatible with python3 (V2) (3.72 KB, patch)
2023-03-18 01:56 UTC, Nicolas Pöhlmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nicolas Pöhlmann 2022-12-30 19:30:58 UTC
After upgrading to Core Update 172 from 171, squid-asnbl-helper crashes squid after a few minutes with the following error:

Traceback (most recent call last):
  File "/usr/bin/asnbl-helper.py", line 488, in <module>
    resolvedasn = resolve_asn(singleip, ASNDB)
  File "/usr/bin/asnbl-helper.py", line 184, in resolve_asn
    result = asndb.lookup(str(ipaddr))

Which results in squid parent to detect a too fast restart of the children(s) and to shutdown.

squid[-]: 	Squid Parent: squid-1 process - exited with status 1
squid[-]: 	Squid Parent: (squid-1) process - started
squid[-]: 	Squid Parent: squid-1 process - exited with status 1
squid[-]: 	Squid Parent: (squid-1) process - started
squid[-]: 	Squid Parent: squid-1 process - exited with status 1
squid[-]: 	Squid Parent: (squid-1) process - started
squid[-]: 	Squid Parent: squid-1 process - exited with status 1
squid[-]: 	Squid Parent: (squid-1) process - started
squid[-]: 	Squid Parent: squid-1 process - exited with status 1
squid[-]: 	Squid Parent: (squid-1) process - started
squid[-]: 	Squid Parent: squid-1 process - exited with status 1
squid[-]: 	Squid Parent: squid-1 process - will not be restarted for 3600 seconds due t o repeated, frequent failures
squid[-]: 	Exiting due to repeated, frequent failures

Also when fast flux is activated, squid-asnbl-helper is complaining about a missing ASN-Blocklist source:

squid-asnbl-helper[-] WARN: No ASNBL configured. This is acceptable as long as this script is configured to do anything, you just have been warned...
squid-asnbl-helper[-] INFO: ASN database operational - excellent. Waiting for input...

The option for ACTIVE_ASNBLS is empty in /var/ipfire/proxy/asnbl-helper.conf maybe as Spamhaus has closed it's ASN-Blocklist?

As a workaround, the whole anomaly detection needs to be deselected for a normal squid behaviour.
Comment 1 Matthias Fischer 2022-12-31 14:25:26 UTC
Confirmed.

=> https://lists.ipfire.org/pipermail/development/2022-December/015115.html

Thanks for reporting.
Comment 2 Peter Müller 2023-01-07 21:08:24 UTC
This is odd, I haven't seen this error on my machines. Will investigate - perhaps a spillover effect from the libloc update?
Comment 3 Nicolas Pöhlmann 2023-01-15 07:59:02 UTC
(In reply to Peter Müller from comment #2)
> This is odd, I haven't seen this error on my machines. Will investigate -
> perhaps a spillover effect from the libloc update?

Don't know much about it, but when running a location db update I get the following error message:

# location --debug update
HTTP GET Request to location.ipfire.org
        URL: https://location.ipfire.org/databases/1/location.db.xz
        Headers:
                If-modified-since: Sun, 15 Jan 2023 05:24:18 GMT
                User-agent: location/0.9.15
HTTP Response: 200
        Headers:
                date: Sun, 15 Jan 2023 07:54:05 GMT
                last-modified: Sun, 15 Jan 2023 05:26:54 GMT
                etag: "513f2c-5f246b6f4867e"
                accept-ranges: bytes
                content-length: 5324588
                x-content-type-options: nosniff
                x-frame-options: deny
                referrer-policy: strict-origin
                x-xss-protection: 1; mode=block
                content-type: application/x-xz
                strict-transport-security: max-age=31536000; includeSubDomains; preload
                connection: close
Opening downloaded database at /var/lib/location/tmp3p87n16h
Downloaded new database from Sun, 15 Jan 2023 05:24:18 GMT
libloc: loc_database_verify: Error verifying the first signature: error:0D07207B:asn1 encoding routines:ASN1_get_object:header too long
Comment 4 Matthias Fischer 2023-01-23 17:36:11 UTC
Confirmed.

Exactly.
Comment 6 Nicolas Pöhlmann 2023-03-01 01:00:39 UTC
Sorry to report, but this problem isn't fixed completely in Core 173. Shortly after an automatic update of the blacklist, following error shutting down squid completely again:

Mar 01 01:27:14 squid-asnbl-helper[32004] WARN: Destination 'IP_ADDRESS_OF_PROXY' resolves to IP addresses 'IP_ADDRESS_OF_PROXY' without corresponding ASN, probably selectively announced
Mar 01 01:27:14 squid-asnbl-helper[32004] INFO: Denying access to destination 'IP_ADDRESS_OF_PROXY' due to suspected selective announcements
Traceback (most recent call last):
  File "/usr/bin/asnbl-helper.py", line 493, in <module>
    resolvedasn = resolve_asn(singleip, ASNDB)
  File "/usr/bin/asnbl-helper.py", line 193, in resolve_asn
    if not result.asn:
UnboundLocalError: local variable 'result' referenced before assignment
2023/03/01 01:34:23 kid1| WARNING: external_acl_type #Hlpr3 exited
...
Mar 01 01:34:24 squid-asnbl-helper[32478] WARN: No ASNBL configured. This is acceptable as long as this script is configured to do anything, you just have been warned...
Traceback (most recent call last):
  File "/usr/bin/asnbl-helper.py", line 493, in <module>
    resolvedasn = resolve_asn(singleip, ASNDB)
  File "/usr/bin/asnbl-helper.py", line 193, in resolve_asn
    if not result.asn:
UnboundLocalError: local variable 'result' referenced before assignment
...
2023/03/01 01:34:24 kid1| FATAL: The external_acl_type helpers are crashing too rapidly, need help!

So I think, this bug should be reopened until a full fix is available.
Comment 7 Nicolas Pöhlmann 2023-03-02 03:06:33 UTC
After some more debugging and log file analysis, the crash has no dependency to the blacklist update. The helper crashes multiple times every hour the last 24 hours, sometimes so often that it shutdown squid completely. After the deactivation of the fast flux detection, squid is running stable without a problem.

And I already checked that /usr/bin/asnbl-helper.py is identical to V0.2.4. So the latest available patch was correctly installed with the update.
Comment 8 Peter Müller 2023-03-05 14:49:04 UTC
Thank you for reporting back. I'm getting the feeling that we are dealing with multiple bugs in squid-asnbl here, and some of these are triggered only under certain circumstances. Thus far, I was unsuccessful trying to spot or reproduce this behaviour on any IPFire installations I have access to.

Having updated to Core Update 173, does libloc work properly again? Precisely, what does the output of "location --debug update" read?
Comment 9 Matthias Fischer 2023-03-05 15:53:43 UTC
Jm2C:

...
root@ipfire: /var/log/squid # location --debug update
Already on the latest version
...

On Core 173: no seen problems here. 'squid' is running (v5.8).
Comment 10 Nicolas Pöhlmann 2023-03-08 06:12:44 UTC
Created attachment 1140 [details]
Make squid-asnbl compatible with python3

This patch makes the functions 'resolve_asn' and 'asndb_response_tests' fully compatible with python3 and should resolve their 'UnboundLocalError' errors.
Comment 11 Nicolas Pöhlmann 2023-03-08 06:57:30 UTC
(In reply to Matthias Fischer from comment #9)
> On Core 173: no seen problems here. 'squid' is running (v5.8).

What's the expected squid version for CU173? Is it v5.7 or v5.8?

# squid --version
Squid Cache: Version 5.7
Service Name: squid
...

Location db reports also: "Already on the latest version".

Nethertheless, even when it was hard to deal with this bug, I found the problem and tested the patch for about 32 hours right now to make squid-asnbl-helper fully compatible to python3. Also the patch deals with how V3 sometimes falsely "recognizes" local result variables as global. Benefit about the changes will be a narrowed function definition, but I don't think there will be a measurable lower memory consumption or speed improvement.

For all who want more details what this patch do and why this error is so hard to reproduce:

First I discovered that the helper only sometimes throwing the error and quits even for the same values and queries. Also the timespan until the error happens was quite different for every restart of squid  (minutes to hours). And it does not depend on the traffic on the proxy, even one connection could cause a crash while ten or hundrets won't. After a few days of testing different solutions and done a lot of debugging, redesigning the function did not fully solve the problem. Such standard things like checking the result variable for NULL (or it's equivalent "is None" in python) before evaluating it's subfunction produces the exact same error message. But with that knowledge it more and more turns out that python3 sometimes 'detects' the local return variable if it was a misused global. So for a full fix, the return variable also has to be initialized that python3 won't detect it's usage as an 'UnboundLocalError' to succesfully fix this bug.
Comment 12 Adolf Belka 2023-03-08 13:56:15 UTC
(In reply to Nicolas Pöhlmann from comment #11)
> 
> What's the expected squid version for CU173? Is it v5.7 or v5.8?
> 
> # squid --version
> Squid Cache: Version 5.7
> Service Name: squid
> ...
> 
CU173 is running with 5.7

5.8 has been submitted by Matthias into what will become CU174.
Comment 13 Nicolas Pöhlmann 2023-03-18 01:56:38 UTC
Created attachment 1141 [details]
Make squid-asnbl compatible with python3 (V2)

Slightly updated the patch. V2 removed the global keyword as it's not needed anymore. The patch was tested and running 10 days without any problem.
Comment 14 Adolf Belka 2023-03-22 13:54:23 UTC
I will pick this up and submit a patch into the dev mailing list as just having it in Bugzilla won't change anything.

I would be good if you could also flag this up as an issue to the git site for the squid-asnbl package so they can fix it at source.
Comment 15 Adolf Belka 2023-03-22 18:32:27 UTC
Submission patch has been created and submitted to the dev mailing list and to patchwork.

https://lists.ipfire.org/pipermail/development/2023-March/015692.html
https://patchwork.ipfire.org/project/ipfire/patch/20230322182852.3449030-1-adolf.belka@ipfire.org/
Comment 16 Peter Müller 2023-07-01 19:33:25 UTC
https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=df30842927aac053cd669cee8105d2f39345c3b1

Apologies again for my tardy reaction on this. Will ping upstream again to get this fixed. Patch by Adolf has been merged for Core Update 177.
Comment 17 Adolf Belka 2023-07-27 20:25:47 UTC
Core Update 177 Testing released
Comment 18 Adolf Belka 2023-07-29 16:35:55 UTC
Can the original poster @Nicolas confirm if the problem is resolved in Core Update 177 Testing?
Comment 19 Nicolas Pöhlmann 2023-07-30 01:10:18 UTC
(In reply to Adolf Belka from comment #18)
> Can the original poster @Nicolas confirm if the problem is resolved in Core
> Update 177 Testing?

I currently didn't have a chance to setup a dev-machine to check it for CU 177-Testing until mid september (a little bit too late for CU 177 I think). Sorry.

But I can confirm that this patch has been running stable on 3 different production machines since CU 173 without a single issue in almost 5 months now.
Also I've done stress tests for this patch with up to 100 childrens on a 1 Gbit/s-Port. Result: PASS! ;)

Also other people with this problem reported it as fully functional on the community forum.

I wouldn't expect any complications for CU 177 as neither squid nor python isn't changed in any dramatic way on the git like CU 172 did, when the problem started.

Also keep in mind that this patch is designed as a "lowlevel" one with as few changes as possible. It will only make sure that the python interpreter doesn't sometimes detect the local variable as a global on fast machines. It has not a single more code change than needed to achieve that goal and to make it as stable as it can be.

And last but not least if the old script will run on squid in CU 174, this patch will definitely too.
Comment 20 Adolf Belka 2023-07-30 08:31:01 UTC
(In reply to Nicolas Pöhlmann from comment #19)
> I currently didn't have a chance to setup a dev-machine to check it for CU
> 177-Testing until mid september (a little bit too late for CU 177 I think).
> Sorry.
> 
No problems. I understand.

The test is more to ensure that the patch as merged into CU177 is doing things correctly and that no error occurred in my patch creation translation and the merging process.

I am pretty confident that everything will be fine. When CU177 is released then this bug will get closed as fixed.
When you do your update to CU177 then if everything is fine no problems.
If you were to find some issue you can just re-open the bug again but I don't expect that to be needed.
Comment 21 Adolf Belka 2023-07-30 08:44:13 UTC
The git site for the package that is used in IPFire is

https://github.com/twesterhever/squid-asnbl

I notice that this bug is not listed in the issues list at that git repo.

Is there a possibility that @Nicolas could raise this issue on that git site. When fixed I will do the patch update for the new released version.

I am not familiar enough with the coding to be able to answer any questions from the squid-asnbl developer.

It would just seem good to get the problem actually fixed upstream.
Comment 22 Adolf Belka 2023-07-30 08:49:24 UTC
Sorry I should have looked at the commits first rather than the issues page on the git repo before I sent the last bug comment.

@Nicolas, you don't have to do anything.

It looks like the developer has raised a commit which references this bug number and it was raised on January 7th 2023.

https://github.com/twesterhever/squid-asnbl/commit/c10ccaa35e0e0ab7a3216f261cfef0cdfaf3f185

I can look at raising a patch that applies that commit in place of @Nicolas changes.
Comment 23 Adolf Belka 2023-07-30 08:58:25 UTC
Sorry for all confusion.

I understand better now the sequence of things.

The commit I mentioned is actually in the current released version of squid-asnbl-0.2.4 (confirmed by looking in the IPFire source tarball)

It references this bug but was obviously related to some of the earlier comments in this bug and was believed to have dealt with the problem.

So the squid-asnbl developer probably thinks they did the bug fix and are not aware that the root cause of this bug eventually ended up related to how python3 is dealing with global variables.

So it would still be good if @Nicolas could raise an issue on the squid-asnbl git site.
Comment 24 Nicolas Pöhlmann 2023-07-30 21:57:47 UTC
No problem, here is the bug report on the squid-asnbl git site:

https://github.com/twesterhever/squid-asnbl/issues/2
Comment 26 Adolf Belka 2023-08-15 08:26:13 UTC
Just for final completion on this bug, the patch from @Nicolas has been merged into squid-asnbl in version 0.2.5 which will be installed into CU179.