Bug 13657 - The updated apache initscript from CU181 was not installed for all users
Summary: The updated apache initscript from CU181 was not installed for all users
Status: MODIFIED
Alias: None
Product: IPFire
Classification: Unclassified
Component: --- (show other bugs)
Version: 2
Hardware: all Unspecified
: - Unknown - Minor Usability
Assignee: Michael Tremer
QA Contact:
URL:
Keywords:
Depends on: 13656
Blocks:
  Show dependency treegraph
 
Reported: 2024-04-24 11:19 UTC by Adolf Belka
Modified: 2024-05-10 20:10 UTC (History)
1 user (show)

See Also:


Attachments
CU185 upgrade log restart section after restore of previous config (1.61 KB, application/x-troff-man)
2024-04-25 11:50 UTC, Adolf Belka
Details
CU185 upgrade log restart section after restore of another previous config (1.37 KB, application/x-troff-man)
2024-04-25 11:56 UTC, Adolf Belka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adolf Belka 2024-04-24 11:19:26 UTC
When investigating Bug 13656 it became clear that not all users had the updated apache initscript installed in Core Update 181.

The  users who did are the ones who suffered from bug 13656.

I did not have this issue on any of my three vm systems and also not on my production physical system.

In all these cases the apache initscript is still the version from before Core Update 181.

All the systems were updated at each Core Update.
Comment 1 Adolf Belka 2024-04-24 11:24:24 UTC
I have checked the update-core-upgrade-181.log on my production machine and one of the vm's and init.d/apache is not mentioned anywhere so that seems to suggest it never got dealt with by the update.sh script.

Checking the repo the /etc/rc.d/init.d apache entry was committed on 28/11/2023 in

config/rootfiles/core/181/filelists/files

and then again on 6/12/2023 in

config/rootfiles/oldcore/181/filelists/files

We probably need to ship it again to get the latest version but also need to understand why it would only have been upgraded on some users Core Update 180 to 181 upgrades.


In the bug description when I wrote

> I did not have this issue on any of my three vm systems and also not on my
> production physical system.

The issue I mean is a freezing WUI as per bug 13656.
Comment 2 Adolf Belka 2024-04-24 11:32:22 UTC
A separate ship of the existing apache initscript from CU181 won't be needed as I am doing a patch for that initscript as part of bug 13656.

So that updated initscript will be shipped, although we will need to check that carefully as part of CU186 Testing
Comment 3 Michael Tremer 2024-04-24 12:55:26 UTC
> https://git.ipfire.org/?p=ipfire-2.x.git;a=shortlog;h=refs/heads/core181

We did a rebuild of the Core Update after its release that added the updated initscript. So people who updated early will not have received the script.

The intention was to ship it again with the following update which we didn't do.

If we add it to the updater now, things should be fine.
Comment 4 Adolf Belka 2024-04-24 13:15:48 UTC
Okay, that is good that we know why some have the updated script and others not.

I update pretty quickly when the new updates come out so that makes sense that all my systems did not have the updated file.

So this bug there isn't anything more to do as it will be solved by the ship of the initscript with CU186.
Comment 5 Michael Tremer 2024-04-24 13:40:33 UTC
Do you feel like it would be a good idea to rebuilt c185 or even c184 to ship the fix?
Comment 6 Adolf Belka 2024-04-24 14:39:54 UTC
(In reply to Michael Tremer from comment #5)
> Do you feel like it would be a good idea to rebuilt c185 or even c184 to
> ship the fix?

My first thought when I saw your question was no but then after thinking about it, it might be good to do. Currently we have around 18% of users on CU185 so there could still be many more users to upgrade and be hit by the freezing WUI problem before we get CU186 out for release.

I am not sure that CU184 needs to have the fix built into it.

CU185 is the current latest version so any update will go to CU185. So as long as that has the correct apache initscript with the delay, then that should be fine in my understanding and we don't need any earlier released version to have the update.

Also CU184 does not have any apache restart command in the update.sh script.

So in conclusion I think it would be good to have it built into CU185 to catch people still updating to CU185 and then added into CU186 to ensure everyone already on CU185 has the latest apache initscript.

I will be looking closely at that in the CU186 Testing phase.

Does the above seem reasonable in its argument and rationale?
Comment 7 Michael Tremer 2024-04-24 15:37:53 UTC
Yes, this makes perfect sense:

> https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=6af919ec6cee61235bdecfeb806cc0456494578c

This should fix it then as soon as the build is through and I have published it.
Comment 8 Adolf Belka 2024-04-24 16:15:03 UTC
(In reply to Michael Tremer from comment #7)
> Yes, this makes perfect sense:
> 
> > https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=6af919ec6cee61235bdecfeb806cc0456494578c
> 
> This should fix it then as soon as the build is through and I have published
> it.

Before shipping the apache initscript it needs to be modified to have a delay between the stop and start commands.

The frozen WUI is what the users have experienced that got the updated initscript from CU181.

Users who did not get the updated script have had no problems.

In the forum it was found that with the new script the update log shows that Apache was stopped with an OK but then the start command had the message that apache was already running as a pid had been found. I believe that the stop command has not finished the pid removal before the start command finds that the pid is still there.

That is what bug13656 is about.

Here is an example of the update log message from a user on the forum.

Stopping Apache daemon…
e[1Ae[0Ge[-8Ge[1;34m[e[1;32m OK e[1;34m]e[0;39m
Starting Apache daemon…
httpd (pid 2812) already running
e[1Ae[0Ge[-8Ge[1;34m[e[1;32m OK e[1;34m]e[0;39m

Unless my understanding of what is causing the problem is flawed I believe that the start command finds that a pid is already there (not yet fully removed after the stop command) and therefore does not start apache. Then that pid is eventually removed but now apache is not running.

This is what the users on the forum found that apache was stopped when the WUI froze.
Comment 9 Michael Tremer 2024-04-25 10:11:42 UTC
(In reply to Adolf Belka from comment #8)
> (In reply to Michael Tremer from comment #7)
> > Yes, this makes perfect sense:
> > 
> > > https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=6af919ec6cee61235bdecfeb806cc0456494578c
> > 
> > This should fix it then as soon as the build is through and I have published
> > it.
> 
> Before shipping the apache initscript it needs to be modified to have a
> delay between the stop and start commands.

So the version that we had in the repository is out now. This might be better than what we had before I believe...

I also believe that the "apachectl -k stop" command should wait until the process has properly terminated. The reason that the old script complained that httpd was already running was because we only sent SIGTERM to the main process and didn't wait until all child processes have terminated, too. Correct me if that is a wrong assumption.

> In the forum it was found that with the new script the update log shows that
> Apache was stopped with an OK but then the start command had the message
> that apache was already running as a pid had been found. I believe that the
> stop command has not finished the pid removal before the start command finds
> that the pid is still there.

I hate PID files. Just for the record.
Comment 10 Adolf Belka 2024-04-25 10:30:32 UTC
(In reply to Michael Tremer from comment #9)
> (In reply to Adolf Belka from comment #8)
> > (In reply to Michael Tremer from comment #7)
> > > Yes, this makes perfect sense:
> > > 
> > > > https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=6af919ec6cee61235bdecfeb806cc0456494578c
> > > 
> > > This should fix it then as soon as the build is through and I have published
> > > it.
> > 
> > Before shipping the apache initscript it needs to be modified to have a
> > delay between the stop and start commands.
> 
> So the version that we had in the repository is out now. This might be
> better than what we had before I believe...

This will ensure that everyone has the new script but in its current form it might increase the number of people having frozen WUI screens when apache is restarted.

All the users who have had the frozen problem and have shown me their log files have had the new script. The script stops apache and gets an OK. Then it tries to start apache and says "Oh there is already a pid here. I don't need to do anything.

After the update is completed apache is not running.

> 
> I also believe that the "apachectl -k stop" command should wait until the
> process has properly terminated. The reason that the old script complained
> that httpd was already running was because we only sent SIGTERM to the main
> process and didn't wait until all child processes have terminated, too.
> Correct me if that is a wrong assumption.

The new script might be getting rid of all the child processes before it considers itself stopped but it looks to me that at that point the pid file might not yet be removed and the start command then says oh there is a pid file here so it is already running therefore I don't need to start it.

At least the above is my interpretation of what is happening.

I will look at installing a CU184 vm, which should give it the new script, record the pid that is used and then update it to CU185 and see if I get that message "httpd (pid xyz) already running". If I do then I can compare the pid value given with what was there before the update and see if it is the old pid not be removed quick enough or if there is some other problem.

Hopefully I don't just get a clean update with no frozen WUI or then we don't have the ability to reproduce the problem easily.

> 
> > In the forum it was found that with the new script the update log shows that
> > Apache was stopped with an OK but then the start command had the message
> > that apache was already running as a pid had been found. I believe that the
> > stop command has not finished the pid removal before the start command finds
> > that the pid is still there.
> 
> I hate PID files. Just for the record.
:-)
Comment 11 Adolf Belka 2024-04-25 10:50:32 UTC
For clarity.

I did not suffer from the frozen WUI problem on any of my IPFire systems. I had the old apache script on all of them.

All the people having the frozen WUI that provided info on their systems have had the new apache script.
Comment 12 Adolf Belka 2024-04-25 11:50:18 UTC
Created attachment 1525 [details]
CU185 upgrade log restart section after restore of previous config

I created a fresh CU184 vm install and as it was I did the upgrade to CU185.

The update went without any problems. Apache restarted without issues and the WUI did not freeze.

I then created another CU184 vm install and restored a previous configuration back into it.

I then did the update to CU185 and the WUI froze and when I checked in the logs the update had completed and apache was not running.

The attached file shows the restart section of the update log.

In this case there was no mention of the pid but apache gave an "Address already in use" message.
Comment 13 Adolf Belka 2024-04-25 11:56:03 UTC
Created attachment 1526 [details]
CU185 upgrade log restart section after restore of another previous config

I then created another CU184 vm install and did a restore from another config file.

The apache status gave the following pids

6114 6110 6109

Then ran the update and the WUI froze again.

This time the message in the log was about httpd already running with pid 6109

So clearly the end of the apache stop command is not when the poid can be guaranteed to have been removed. However from the other test you can also get "Address already in use" messages.

This suggests some change is needed to the stop command or that there needs to be a delay between the stop and start command for the restart.

However my patch proposal in bug 13656 may not be the best approach due to the "Address already in use" message that can also occur.
Comment 14 Michael Tremer 2024-05-10 12:11:57 UTC
I am not sure whether I have asked this before or not, but does this change fix the problem?

> https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=1724e5ac0ac4a139e9f7d574129f53a027197676
Comment 15 Adolf Belka 2024-05-10 19:43:55 UTC
(In reply to Michael Tremer from comment #14)
> I am not sure whether I have asked this before or not, but does this change
> fix the problem?
> 
> > https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=1724e5ac0ac4a139e9f7d574129f53a027197676


You asked that question for the previous fix and at the time I thought it had fixed the problem.

With the historical situation on this issue, I am not sure that I can say really confidently that it will definitely work.

I will test it out in CU186 Testing. Apache is being stopped and started in the update.sh script but with a lot of other actions/commands etc in between the stop and the start.
So it might work okay with CU186 but still fail in the future.

The only thing I can think of is to add an apache restart command into the update.sh script.

That should then be a reasonable test of the restart working :crossed_fingers:
Comment 16 Adolf Belka 2024-05-10 20:10:44 UTC
I just tested out this patch fix with the update from CU185 to CU186.

As it is a Testing release then the existing Core Update (185) is rerun before doing the CU186 version.

The WUI froze during the CU185 update but behind the browser IPFire continued with the update.

When that update reached the apache start command in the update.sh script, then the WUI screen unfroze.

So the fix has definitely worked from that limited test.