Bug 12676

Summary: Blue zone interface doesn’t get created and fails to start if nic is unset and being used for vlan only
Product: IPFire Reporter: Matt <gitarman94>
Component: ---Assignee: Adolf Belka <adolf.belka>
Status: CLOSED FIXED QA Contact:
Severity: Major Usability    
Priority: - Unknown - CC: adolf.belka, info, jonatan.schlag, michael.tremer
Version: 2   
Hardware: all   
OS: All   

Description Matt 2021-08-11 11:08:56 UTC
Issue appears to have happened after a recent software update. The blue0 Interface does not appear to get created as it is not assigned to a physical nic port, however this was not always the case. The blue interface used to work as a vlan interface without an assigned nic. Need the functionality restored where an interface does not require a network card to be assigned in order for it to be used (ie. required for vlans).

Additional details at following thread.
https://community.ipfire.org/t/no-blue0-interface-anymore-after-core-155-update/5364
Comment 1 Michael Tremer 2021-08-12 09:01:34 UTC
Could you please post the output of:

> ACTION=add INTERFACE=green0 bash -x /lib/udev/network-hotplug-vlan
Comment 2 Matt 2021-08-12 10:26:20 UTC
+ '[' -n green0 ']'
+ CONFIG_FILE=/var/ipfire/ethernet/vlans
+ '[' -e /var/ipfire/ethernet/vlans ']'
++ /usr/local/bin/readhash /var/ipfire/ethernet/vlans
+ eval GREEN_MAC_ADDRESS= BLUE_PARENT_DEV=00:0d:b9:49:be:dc GREEN_PARENT_DEV= RED_VLAN_ID= BLUE_VLAN_ID=1003 RED_PARENT_DEV= RED_MAC_ADDRESS= ORANGE_MAC_ADDRESS= BLUE_MAC_ADDRESS=02:1b:96:5a:8e:81 GREEN_VLAN_ID= ORANGE_PARENT_DEV= ORANGE_VLAN_ID=
++ GREEN_MAC_ADDRESS=
++ BLUE_PARENT_DEV=00:0d:b9:49:be:dc
++ GREEN_PARENT_DEV=
++ RED_VLAN_ID=
++ BLUE_VLAN_ID=1003
++ RED_PARENT_DEV=
++ RED_MAC_ADDRESS=
++ ORANGE_MAC_ADDRESS=
++ BLUE_MAC_ADDRESS=02:1b:96:5a:8e:81
++ GREEN_VLAN_ID=
++ ORANGE_PARENT_DEV=
++ ORANGE_VLAN_ID=
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ PARENT_DEV=
+ VLAN_ID=
+ MAC_ADDRESS=
+ '[' '' = green0 ']'
+ continue
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ PARENT_DEV=
+ VLAN_ID=
+ MAC_ADDRESS=
+ '[' '' = green0 ']'
+ continue
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ PARENT_DEV=00:0d:b9:49:be:dc
+ VLAN_ID=1003
+ MAC_ADDRESS=02:1b:96:5a:8e:81
+ '[' 00:0d:b9:49:be:dc = green0 ']'
+ continue
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ PARENT_DEV=
+ VLAN_ID=
+ MAC_ADDRESS=
+ '[' '' = green0 ']'
+ continue
+ exit 0
Comment 3 Michael Tremer 2021-08-12 10:30:43 UTC
It looks like the shell code in line 65 is misinterpreted:

> [ "${PARENT_DEV}" = "${INTERFACE}" ] || [ "${PARENT_DEV}" = "$(</sys/class/net/${INTERFACE}/address)" ] || continue

https://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=config/udev/network-hotplug-vlan;h=178e1a67ba16647808354522e19d4f81e1e91fb5;hb=HEAD#l65

Could you please change it that it reads like this and run the command again?

> [ "${PARENT_DEV}" = "${INTERFACE}" -o "${PARENT_DEV}" = "$(</sys/class/net/${INTERFACE}/address)" ] || continue
Comment 4 Matt 2021-08-12 11:18:20 UTC
It looks like what's in the file on git, and the line that you told me to change are both different than what I have on my system.  Going to paste my file contents here before I make modifications as I don't know if changing the one line will break stuff...

[ -n "${INTERFACE}" ] || exit 2

CONFIG_FILE="/var/ipfire/ethernet/vlans"

# Skip immediately if no configuration file has been found.
[ -e "${CONFIG_FILE}" ] || exit 0

eval $(/usr/local/bin/readhash ${CONFIG_FILE})

for interface in green0 red0 blue0 orange0; do
	case "${interface}" in
		green*)
			PARENT_DEV=${GREEN_PARENT_DEV}
			VLAN_ID=${GREEN_VLAN_ID}
			MAC_ADDRESS=${GREEN_MAC_ADDRESS}
			;;
		red*)
			PARENT_DEV=${RED_PARENT_DEV}
			VLAN_ID=${RED_VLAN_ID}
			MAC_ADDRESS=${RED_MAC_ADDRESS}
			;;
		blue*)
			PARENT_DEV=${BLUE_PARENT_DEV}
			VLAN_ID=${BLUE_VLAN_ID}
			MAC_ADDRESS=${BLUE_MAC_ADDRESS}
			;;
		orange*)
			PARENT_DEV=${ORANGE_PARENT_DEV}
			VLAN_ID=${ORANGE_VLAN_ID}
			MAC_ADDRESS=${ORANGE_MAC_ADDRESS}
			;;
	esac

	# If the parent device does not match the interface that
	# has just come up, we will go on for the next one.
	[ "${PARENT_DEV}" = "${INTERFACE}" ] || continue

	# Check if the interface does already exists.
	# If so, we skip creating it.
	if [ -d "/sys/class/net/${interface}" ]; then
		echo "Interface ${interface} already exists." >&2
		continue
	fi

	if [ -z "${VLAN_ID}" ]; then
		echo "${interface}: You did not set the VLAN ID." >&2
		continue
	fi

	# Build command line.
	command="ip link add link ${PARENT_DEV} name ${interface}"
	if [ -n "${MAC_ADDRESS}" ]; then
		command="${command} address ${MAC_ADDRESS}"
	fi
	command="${command} type vlan id ${VLAN_ID}"

	echo "Creating VLAN interface ${interface}..."
	${command}

	# Bring up the parent device.
	ip link set ${PARENT_DEV} up
done

exit 0
Comment 5 Matt 2021-08-12 11:21:19 UTC
Also, incase you were wondering, my build version is "IPFire 2.25 (x86_64) - Core Update 158"
Comment 6 Michael Tremer 2021-08-12 11:26:13 UTC
Could you import the version of this script from Git and try again?
Comment 7 Matt 2021-08-12 11:45:20 UTC
+ '[' -n green0 ']'
+ VLAN_CONFIG_FILE=/var/ipfire/ethernet/vlans
+ MAIN_CONFIG_FILE=/var/ipfire/ethernet/settings
+ '[' -e /var/ipfire/ethernet/vlans ']'
+ '[' -e /var/ipfire/ethernet/settings ']'
++ /usr/local/bin/readhash /var/ipfire/ethernet/vlans
+ eval GREEN_MAC_ADDRESS= BLUE_PARENT_DEV=00:0d:b9:49:be:dc GREEN_PARENT_DEV= RED_VLAN_ID= BLUE_VLAN_ID=1003 RED_PARENT_DEV= RED_MAC_ADDRESS= ORANGE_MAC_ADDRESS= BLUE_MAC_ADDRESS=02:1b:96:5a:8e:81 GREEN_VLAN_ID= ORANGE_PARENT_DEV= ORANGE_VLAN_ID=
++ GREEN_MAC_ADDRESS=
++ BLUE_PARENT_DEV=00:0d:b9:49:be:dc
++ GREEN_PARENT_DEV=
++ RED_VLAN_ID=
++ BLUE_VLAN_ID=1003
++ RED_PARENT_DEV=
++ RED_MAC_ADDRESS=
++ ORANGE_MAC_ADDRESS=
++ BLUE_MAC_ADDRESS=02:1b:96:5a:8e:81
++ GREEN_VLAN_ID=
++ ORANGE_PARENT_DEV=
++ ORANGE_VLAN_ID=
++ /usr/local/bin/readhash /var/ipfire/ethernet/settings
+ eval RED_NETMASK=0.0.0.0 CONFIG_TYPE=3 ORANGE_SLAVES= RED_DHCP_FORCE_MTU= RED_SLAVES= RED_DEV=red0 GREEN_MACADDR=00:0d:b9:49:be:dc RED_TYPE=DHCP BLUE_MODE= GREEN_SLAVES= GREEN_STP= RED_MACADDR=00:0d:b9:49:be:de GREEN_MODE= GREEN_DEV=green0 BLUE_STP= RED_ADDRESS=0.0.0.0 BLUE_NETMASK=255.255.255.0 RED_DHCP_HOSTNAME=ipfire BLUE_NETADDRESS=192.168.10.0 GREEN_BROADCAST=192.168.0.255 RED_MODE= BLUE_ADDRESS=192.168.10.1 RED_BROADCAST=255.255.255.255 BLUE_SLAVES= GREEN_NETMASK=255.255.255.0 ORANGE_STP= GREEN_NETADDRESS=192.168.0.0 ORANGE_MODE= BLUE_MACADDR= BLUE_BROADCAST=192.168.10.255 GREEN_DRIVER=igb BLUE_DEV=blue0 RED_NETADDRESS=0.0.0.0 ORANGE_MACADDR= RED_STP= BLUE_DRIVER=igb GREEN_ADDRESS=192.168.0.1 RED_DRIVER=igb
++ RED_NETMASK=0.0.0.0
++ CONFIG_TYPE=3
++ ORANGE_SLAVES=
++ RED_DHCP_FORCE_MTU=
++ RED_SLAVES=
++ RED_DEV=red0
++ GREEN_MACADDR=00:0d:b9:49:be:dc
++ RED_TYPE=DHCP
++ BLUE_MODE=
++ GREEN_SLAVES=
++ GREEN_STP=
++ RED_MACADDR=00:0d:b9:49:be:de
++ GREEN_MODE=
++ GREEN_DEV=green0
++ BLUE_STP=
++ RED_ADDRESS=0.0.0.0
++ BLUE_NETMASK=255.255.255.0
++ RED_DHCP_HOSTNAME=ipfire
++ BLUE_NETADDRESS=192.168.10.0
++ GREEN_BROADCAST=192.168.0.255
++ RED_MODE=
++ BLUE_ADDRESS=192.168.10.1
++ RED_BROADCAST=255.255.255.255
++ BLUE_SLAVES=
++ GREEN_NETMASK=255.255.255.0
++ ORANGE_STP=
++ GREEN_NETADDRESS=192.168.0.0
++ ORANGE_MODE=
++ BLUE_MACADDR=
++ BLUE_BROADCAST=192.168.10.255
++ GREEN_DRIVER=igb
++ BLUE_DEV=blue0
++ RED_NETADDRESS=0.0.0.0
++ ORANGE_MACADDR=
++ RED_STP=
++ BLUE_DRIVER=igb
++ GREEN_ADDRESS=192.168.0.1
++ RED_DRIVER=igb
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ ZONE_MODE=
+ PARENT_DEV=
+ VLAN_ID=
+ MAC_ADDRESS=
+ '[' '' = green0 -o '' = 00:0d:b9:49:be:dc ']'
+ continue
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ ZONE_MODE=
+ PARENT_DEV=
+ VLAN_ID=
+ MAC_ADDRESS=
+ '[' '' = green0 -o '' = 00:0d:b9:49:be:dc ']'
+ continue
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ ZONE_MODE=
+ PARENT_DEV=00:0d:b9:49:be:dc
+ VLAN_ID=1003
+ MAC_ADDRESS=02:1b:96:5a:8e:81
+ '[' 00:0d:b9:49:be:dc = green0 -o 00:0d:b9:49:be:dc = 00:0d:b9:49:be:dc ']'
+ '[' '' = bridge ']'
+ '[' -d /sys/class/net/blue0 ']'
+ '[' -z 1003 ']'
+ command='ip link add link green0 name blue0'
+ '[' -n 02:1b:96:5a:8e:81 ']'
+ command='ip link add link green0 name blue0 address 02:1b:96:5a:8e:81'
+ command='ip link add link green0 name blue0 address 02:1b:96:5a:8e:81 type vlan id 1003'
+ echo 'Creating VLAN interface blue0...'
Creating VLAN interface blue0...
+ ip link add link green0 name blue0 address 02:1b:96:5a:8e:81 type vlan id 1003
+ ip link set green0 up
+ for interface in green0 red0 blue0 orange0
+ case "${interface}" in
+ ZONE_MODE=
+ PARENT_DEV=
+ VLAN_ID=
+ MAC_ADDRESS=
+ '[' '' = green0 -o '' = 00:0d:b9:49:be:dc ']'
+ continue
+ exit 0
Comment 8 Michael Tremer 2021-08-13 11:20:28 UTC
So this creates the interface which fixes your problem I suppose?
Comment 9 Michael Tremer 2021-08-13 11:22:51 UTC
@Arne: Could we ship this script again in the next updater?
Comment 10 Matt 2021-08-13 19:50:31 UTC
It didn’t start the blue interface, but once I started it, I didn't get that error again, so it appears that the interface is up. However, connecting to my blue network still isn’t getting me an ip address. I am telling to allow the whole blue subnet, so not sure where the failure is to get an ip.
Comment 11 Matt 2021-08-14 12:55:14 UTC
Did a system reboot this morning, also tried enabling and disabling captive portal, but didn’t seem to help. DHCP appears to be trying to give an IP (according to the system DHCP logs), but I don’t think it’s getting to the device? (Blue access set to allow 192.168.10.0/24)

Log:
08:06:17	dhcpd:	DHCPOFFER on 192.168.10.10 to 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:17	dhcpd:	DHCPDISCOVER from 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:09	dhcpd:	DHCPOFFER on 192.168.10.10 to 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:09	dhcpd:	DHCPDISCOVER from 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:05	dhcpd:	DHCPOFFER on 192.168.10.10 to 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:05	dhcpd:	DHCPDISCOVER from 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:02	dhcpd:	DHCPOFFER on 192.168.10.10 to 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:02	dhcpd:	DHCPDISCOVER from 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:02	dhcpd:	DHCPOFFER on 192.168.10.10 to 46:64:8a:ff:73:e3 (iPhone) via blue0
08:06:01	dhcpd:	DHCPDISCOVER from 46:64:8a:ff:73:e3 via blue
Comment 12 Michael Tremer 2021-08-14 15:04:14 UTC
DHCP is always allowed, no matter how blue access is configured.

There could be other stuff dropping the packet. Misconfigured switch maybe?
Comment 13 Matt 2021-08-14 17:26:54 UTC
Switch configuration hasn’t changed in years, something recently changed in ipfire. Not sure what yet.
Comment 14 Matt 2021-08-15 21:56:23 UTC
Whoa... ok, just got it working. Really weird series of events, but something reset or did something... not sure what.  So, here's what I did (over the period of 2 days). 

Rebooted router
rebooted switch
rebooted wifi
rebooted phone (test device)
Disconnected and reconnected from guest network multiple times
***
Told iphone to "forget" the network multiple times
Turned on and off Captive portal
Removed DHCP wpad proxy autoconfig settings
restarted dns


In the final moments leading up to the success, most of the second part of the list above was what I did. 
I was first able to get an ip address on the phone (not entirely sure how, probably some combination of rebooting or something) and was then able to ping the firewall on the blue network.  Next issue was that DNS wasn't working and couldn't seem to resolve anything on the internet (pages weren't resolving/loading). Did a forget of the network on the phone and reconnected, didn't work, removed the proxy wpad settings, didn't work.  Started the captive portal and also "forgot" the network on the phone and reconnected, and it worked... totally not sure why. Disabled the captive portal and deleted the tolken, disconnected/forgot the network again, re-connected and it stayed working... obtaining ip addresses and resolving websites without the portal or anything special configured.  Soooo... long story short, the file update you guys made the other day definitely worked, but not sure if my messing around with my network put me in a bad place after having fixed it early on.  I can post an update to the community thread that started all this and see if the other guy can try replacing the file and line with what you told me and restart his device and see if that worked for him right out of the box. If so, then I think we found the original issue.


Then
Comment 15 Marco Paland 2021-08-18 13:41:14 UTC
The filedate of /lib/udev/network-hotplug-vlan is March 19th 2019 on the according machine.
So I thing, this script didn't get updated for a very long time. Please ship it again with the next update.

Meanwhile, I replaced the old file with the actual one from the repository and changed line 65 as given in comment #3

I can confirm that this fixed the issue. blue0 is up again.

Thank you so much, Matt and Michael
Comment 16 Matt 2021-08-18 19:34:24 UTC
Hey guys, just did the update to 159 and my blue interface is up and assigning IP addresses, however, I’m running into the issue where DNS does not appear to be resolving correctly. For whatever reason I’m not able to access Internet sites due to DNS problems on the blue interface. Switching back to green works perfectly fine. Per my testing above I ended up cycling captive portal to get it to work, however that should not be required. Any thoughts on what’s going on?
Comment 17 Matt 2021-08-18 20:19:13 UTC
Ok, need to add to my previous comment. I was able to ping google.com from blue, and it worked, however, when trying to load their website from the IP, it failed. Tried the same tactics on Green and it worked. 

Also, No firewall rule changes were made during all my testing in the past while, and though proxy is turned on (including transparency), I'm not using it and have no rules blocking any communications that go around the proxy. Default is to allow outbound.
Comment 18 Matt 2021-08-19 10:22:09 UTC
Few more keys to the puzzle… checked the box for captive portal on blue and clicked save, unchecked same box and saved again (cycled captive portal on and off without connecting anything). Was then able to get internet pages to load and was working fine. Rebooted the firewall and tested again, broken again. Cycled captive portal again and changed a setting in the firewall rules so I could apply changes and restart the firewall, still working. Stopped blue dhcp and started it again, still working. Seems like captive portal is adding something or turning on something that is only temporary (temp table, iptable, in memory) and a reboot wipes the goodness.
Comment 19 Marco Paland 2021-08-19 15:59:20 UTC
Just one additional comment from me:
On two machines, blue0 came up again after updating the script, but without internet access. IP and gateway were correctly assigned.

"Firewall"/"Blue Access" was set correctly to 10.0.0.0/24, allowing all WLAN 10.0.0.x devices to access the internet.
Turned out that I needed to delete this entry in the GUI and had to create it again. Internet access was possible then.
So I guess that the underlying "blue access" format changed some time ago in that form that the stored config rule didn't match anymore.
Comment 20 Matt 2021-08-19 17:17:01 UTC
I tried the suggestion above, however, it did not work for me. I rebooted the firewall and tried again, still no internet page loads.
Comment 21 Matt 2021-08-26 13:23:27 UTC
Hey guys, still having a problem unless I cycle the captive portal after every system reboot. I’m wondering if there are other files on my box that didn’t get updated along the way? If someone could tell me which files I would need to replace from github, I’m willing to try and see if that fixes the problem.
Comment 22 Matt 2021-09-07 18:45:15 UTC
Found a way to try to see what files get modified when I cycle to captive portal (find . -mmin -1 -not -path "./proc/*" -not -path "./var/log/*" -not -path "./sys/*"), which appears to be the dhcp.leases and dhcp.leases~  files, however, I'm not seeing anything in them that's getting modified that would "fix" the issue. Not sure if there are any services being restarted or other actions being performed that maybe aren't writing to files... like into a temp table in memory? 

I started doing a diff on all files on the system against what's in github, which is extremely slow going, and since the folder structure of what's in git is different than the system, it's taking me hours to just do a few. However... I've found a number of files that were different than github and should not have been.  For whatever reason they were not updated, or missed an update along the way (firstsetup, partresize, random, udev, network-functions.pl).  I've replaced the contents of my files from that online, will do a reboot soon to see if that helped.

captive.index.cgi didn't provide much info, so not sure if it's in there or if maybe the captivectl is doing something in memory??

Really hoping someone can assist in providing a better place to look, or what's happening when captive portal gets turned on.
Comment 23 Adolf Belka 2024-07-22 21:17:33 UTC
Is this bug still valid now with Core Update 186.
Comment 24 Matt 2024-07-22 21:49:03 UTC
(In reply to Adolf Belka from comment #23)
> Is this bug still valid now with Core Update 186.

I’ve not experienced the issue since manually correcting the file per the instructions
Comment 25 Adolf Belka 2024-07-29 11:14:21 UTC
The bnetwork-hotplus-vlan udev script was modified in Core Update 132 to the version that is currently in IPFire.

No change was made to it in Core Update 155.

Maybe there was a change in some other package that meant that the shell code misinterpretation mentioned by @ms in comment 3 took effect.

The proposed change to network-hotplug-vlan has not been implemented.

I will  submit a patch to make the change proposed by @ms.
Comment 27 Michael Tremer 2024-08-03 09:46:30 UTC
(In reply to Adolf Belka from comment #26)
> Patch submitted.
> 
> https://lists.ipfire.org/hyperkitty/list/development@lists.ipfire.org/thread/
> N4XWXAKJTYJSAF6RO7Y4AI4AP2QM7FCD/
> 
> https://patchwork.ipfire.org/project/ipfire/list/?series=4385

Thank you. Merged into next.
Comment 28 Adolf Belka 2024-09-22 14:00:41 UTC
Core Update 188 has been released.

https://www.ipfire.org/blog/ipfire-2-29-core-update-188-has-been-released