Bug 12595 - location-importer.in is miscalculating CIDR prefix for inetnum objects not aligned to subnet boundaries
Summary: location-importer.in is miscalculating CIDR prefix for inetnum objects not al...
Status: CLOSED FIXED
Alias: None
Product: Location Database
Classification: Unclassified
Component: libloc (show other bugs)
Version: unspecified
Hardware: all All
: Will affect most users Major Usability
Assignee: Peter Müller
QA Contact: Michael Tremer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-21 17:17 UTC by Peter Müller
Modified: 2021-04-01 15:39 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Müller 2021-03-21 17:17:53 UTC
Sample as reported in https://bugzilla.ipfire.org/show_bug.cgi?id=12421#c8:

% Information related to '140.117.0.0 - 140.138.255.255'

% Abuse contact for '140.117.0.0 - 140.138.255.255' is 'hostmaster@twnic.net.tw'

inetnum:        140.117.0.0 - 140.138.255.255
netname:        TANET-BNETA
descr:          imported inetnum object for MOEC
country:        TW
admin-c:        TA61-AP
tech-c:         TA61-AP
status:         ALLOCATED PORTABLE
mnt-by:         MAINT-TW-TWNIC
mnt-irt:        IRT-TWNIC-AP
last-modified:  2015-12-01T22:24:36Z
source:         APNIC

irt:            IRT-TWNIC-AP
address:        Taipei, Taiwan, 100
e-mail:         hostmaster@twnic.net.tw
abuse-mailbox:  hostmaster@twnic.net.tw
admin-c:        TWA2-AP
tech-c:         TWA2-AP
auth:           # Filtered
remarks:        Please note that TWNIC is not an ISP and is not empowered
remarks:        to investigate complaints of network abuse.
mnt-by:         MAINT-TW-TWNIC
last-modified:  2015-10-08T07:58:24Z
source:         APNIC

person:         TANET ADMIN
nic-hdl:        TA61-AP
e-mail:         tanetadm@moe.edu.tw
address:        12F, No 106, Sec. 2, Heping E. Rd., Taipei
address:        Taipei, 106, R.O.C
phone:          +886-2-2737-7044
fax-no:         +886-2-2737-7043
country:        TW
mnt-by:         MAINT-TW-TWNIC
last-modified:  2009-02-12T02:40:31Z
source:         APNIC

% This query was served by the APNIC Whois Service version 1.88.15-SNAPSHOT (WHOIS-NODE2)

We need to investigate on availability of data for TWNIC assignments, and perhaps more (KRNIC?).
Comment 1 Peter Müller 2021-03-28 07:25:35 UTC
We are apparently missing the entire /13:

[root@maverick ~]# location lookup 140.135.36.163
140.135.36.163:
  Network                 : 140.128.0.0/13
  Autonomous System       : AS1659

oddly enough, it does not show up in the APNIC database either. :-/
Comment 2 Peter Müller 2021-03-28 07:56:17 UTC
For IPv4 networks not strictly aligned to subnet boundaries, line 613 in "location-importer.in" (https://git.ipfire.org/?p=location/libloc.git;a=blob;f=src/python/location-importer.in;h=25069250135666a2f1c206e60695f4f3c5bfede0;hb=HEAD#l613) is miscalculating the CIDR prefix:

user@machine:~> python3
Python 3.6.12 (default, Dec 02 2020, 09:44:23) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ipaddress
>>> start_address = ipaddress.ip_address("140.117.0.0")
>>> end_address = ipaddress.ip_address("140.138.255.255")
>>> num_addresses = int(end_address) - int(start_address)
>>> num_addresses
1441791
>>> prefix = 32
>>> import math
>>> math.log(num_addresses, 2)
20.459430618010618
>>> prefix -= math.log(num_addresses, 2)
>>> prefix
11.540569381989382
>>> "%s/%.0f" % (start_address, prefix)
'140.117.0.0/12'

140.117.0.0/12, however, is by no means equal to 140.117.0.0 - 140.138.255.255:

user@machine:~> sipcalc 140.117.0.0/12
-[ipv4 : 140.117.0.0/12] - 0

[CIDR]
Host address		- 140.117.0.0
Host address (decimal)	- 2356477952
Host address (hex)	- 8C750000
Network address		- 140.112.0.0
Network mask		- 255.240.0.0
Network mask (bits)	- 12
Network mask (hex)	- FFF00000
Broadcast address	- 140.127.255.255
Cisco wildcard		- 0.15.255.255
Addresses in network	- 1048576
Network range		- 140.112.0.0 - 140.127.255.255
Usable range		- 140.112.0.1 - 140.127.255.254

-

We need a different method of calculating the CIDR if the RIR data inetnum object is not strictly aligned to subnet boundaries. :-/
Comment 3 Peter Müller 2021-03-28 07:57:47 UTC
Next question to myself: How often is this happening?
Comment 4 Peter Müller 2021-03-28 08:44:12 UTC
(In reply to Peter Müller from comment #3)
> Next question to myself: How often is this happening?

Answer: Quite often:

root@location02:~/libloc# grep "Miscalculated CIDR mask:" /tmp/log | wc -l
8795
Comment 5 Peter Müller 2021-03-28 09:05:44 UTC
Working on a patch...
Comment 6 Peter Müller 2021-03-28 15:35:40 UTC
Seems like APNIC closes the connection after 6.5 minutes *sigh*:

SQL Query: ROLLBACK
Traceback (most recent call last):
  File "/usr/bin/location-importer", line 1074, in <module>
    main()
  File "/usr/bin/location-importer", line 1072, in main
    c.run()
  File "/usr/bin/location-importer", line 129, in run
    ret = args.func(args)
  File "/usr/bin/location-importer", line 393, in handle_update_whois
    for block in f:
  File "/usr/lib/python3/dist-packages/location/importer.py", line 175, in iterate_over_blocks
    for line in f:
  File "/usr/lib/python3.7/gzip.py", line 374, in readline
    return self._buffer.readline(size)
  File "/usr/lib/python3.7/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.7/gzip.py", line 482, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

real    6m31.118s
user    6m2.278s
sys     0m43.801s
Comment 7 Peter Müller 2021-03-28 16:08:05 UTC
Since computing the subnets in case of misalignments takes significantly longer, we apparently need to download RIR data files fully before processing them. This requires some modifications to the downloader routines in libloc.
Comment 8 Michael Tremer 2021-03-29 20:05:31 UTC
Please use ipaddress.summarize_address_range from the Python 3 standard library and it will give you the smallest list of networks that accommodate the start and end IP address.

I had tested this with a lot of input and am quite surprised that they are summarising the data like this, but I guess it makes sense in the bigger picture of keeping these databases as small as possible.
Comment 9 Peter Müller 2021-03-29 20:25:05 UTC
Done: https://patchwork.ipfire.org/patch/3997/
Comment 10 Michael Tremer 2021-03-29 20:27:42 UTC
Lovely :)
Comment 12 Peter Müller 2021-04-01 15:39:12 UTC
libloc 0.9.6 contains the patch mentioned above. It will be shipped to IPFire users in upcoming Core Update 156 as well, however, our location database server is already using it - hence the databases earlier than two days ago now contain correct network information for RIR data not aligned to subnet boundaries.

IPFire users will have new databases in place in at most five days from now.

Therefore, I am closing this as being FIXED. In case someone disagrees, please reopen it (and explain why). :-)

https://git.ipfire.org/?p=location/libloc.git;a=commit;h=577e5edb0cf7039d3c05c59b311a3365662fb213