Summary: | nagios_nrpe update to 4.1.0-11 breaks "status information"-"Addon - Services" | ||
---|---|---|---|
Product: | IPFire | Reporter: | Edgar Wiesmann <edgar.wiesmann> |
Component: | --- | Assignee: | Robin Roevens <robin.roevens> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | Minor Usability | ||
Priority: | - Unknown - | CC: | adolf.belka, michael.tremer, peter.mueller, robin.roevens, wcavedog |
Version: | 2 | ||
Hardware: | unspecified | ||
OS: | Unspecified | ||
See Also: | https://bugzilla.ipfire.org/show_bug.cgi?id=11546 |
Description
Edgar Wiesmann
2022-09-19 12:40:11 UTC
addonctrl does not work with nrpe because it checks if the addon exosts and it looks for meta-nrpe when the file is actually called meta-nagios_nrpe. The old services.cgi code checked first if nagios_nrpe existed in the initscript directory and when it did not find an initscript called nagios_nrpe it never even used addonctrl and the nrpe service was never shown on the services page in the past. I have installed both nagios_nrpe and dbus and confirmed that neither of them showed up in the services addon table in that earlier CU version. The problem is occurring with addons that have an initscript name that is different from the package name. package name initscript name nagios_nrpe nrpe dbus messagebus I will look further into this but I suspect the simplest fix will be to make the initscript the same as the package name. I will also check for any other addons that have the same issue, such as dbus/messagebus. The simple approach won't work. There is another package called libvirt and it has two service files libvirtd and logvirtd So I am going to have to have a deeper think about how to resolve this. I have been trying to work on this bug but have got to the point where I recognise that I don't know enough top be able to figure out how to solve it. I suspect that @Robin might be better to work on this as he is very familiar with the new code. The problem is that with the new Services: meta data that some addons have a different Services initscript name than the package name used in the meta-* filename. This means that when the command from services.cgi is sent to addonctrl it is the Services meta data that is sent and addonctrl checks if that name is present in the meta- directory which then fails because of the different name. meta directory name initscript name nagios_nrpe nrpe dbus messagebus nfs nfs-server libvirt libvirtd and virtlogd I thought at first the simplest answer would be to make the initscript name the same as the package name used in the meta-* filename but then I found libvirt which has two services defined for its metadata. This problem existed in the past but was hidden because a check was first made in the initscript for if the addon package name was present there. If it was not then it was not shown in the addon services table. So in the past none of the above packages would ever have been visible in the table. Now the packages show up in the table but they cannot be started from that table and always show Stopped even if they are actually running. Hopefully someone better than me can pick this up. I seem to have missed the fact that addonctrl counts on the initscript name being the same as the addon name. Which indeed makes it unable to (re)start addons that have different initscript name(s) or have multiple initscripts. I do think it is a good evolution that now all real services are listed in services.cgi. And I think, it is best that we adapt addonctrl to also use the Service line from the metafile to know which service(s) to (re)start. How I think would be the best implementation of addonctrl: addonctrl <addon> <start|stop|....> -> should look in file meta-<addon> for the service name(s) and then restart that/those service(s). So in case of libvirt it should handle both services at once. But that still would make it difficult for services.cgi as there the user can choose to enable/disable/start/stop single real services. So here I can think of 2 possible solutions: * Adapt addonctrl even more to also support something like this: addonctrl <addon> <servicename> <start|stop|...> -> here addonctrl should check if <servicename> is listed in the meta-<addon> Service line (for safety so one can't use addonctrl to restart whatever :-)) and if so it can handle the initscript <servicename> OR * Adapt the layout of services.cgi so that the services table groups services by addon: | Addon / Service | Boot | Action | State | PID | Memory | |---------------------|---------|--------|---------|------|-----------| | <Addon> | [ ] | ^ v | | | <Servicename1> | <state> | <pid>|<mem usage>| | <Servicename2> | <state> | <pid>|<mem usage>| | <Addon> | [ ] | ^ v | | | <Servicename1> | <state> | <pid>|<mem usage>| |---------------------------------------------------------------------| I'm a bit hesitant to start working on addonctrl itself as that is C code and my C is soo rusty that it almost disintegrated :-).. But personally I do prefer my proposed addonctrl change to let users have fine grained control over the service(s) installed by addons. I would also agree with the conclusion to modify addonctrl I have never coded in c yet so maybe this is my opportunity to explore c. The addonctrl program is not too large so hopefully would be manageable for me. I am willing to give this a try if I can be given support/advice when I hit problems with the c coding. addonctrl is also used in guardian.cgi, tor.cgi and wlanap.cgi so I will also need to make sure that whatever is done works in all of those cgi pages. Thanks for the input @Robin I couldn't hold myself and I already started on addonctrl, currently with satisfying progress. So I'll take over the bug. I'll make sure to check also guardian.cgi, tor.cgi and wlanap.cgi I tested out your initial update of addonctrl on my vm testbed after doing a full build and installing using the created iso. I have to say it is working very well, at least when running it manually on the command line but it is an impressive status. I tested out the various commands and everything worked as expected. You do get an error message if you try and enable something that is not disabled but when running from services.cgi it will only be used when you click on the boot checkbox so it will always be swapping from enabled to disabled and vice versa and in that mode it worked fine. I also tested out all the fault conditions, such as invalid addon name, string too long, missing arguments etc. All the conditions reported back as expected. I did notice that the usage message does not include the status command. Running addonctrl libvirt start started both the services and running addonctrl libvirt stop libvirtd stopped only that service and left the other one running. So that is very good. I also noticed that you put the service argument at the end of the command line and made it optional which worked fine and will also mean that the guardian.cgi, tor.cgi and wlanap.cgi pages can work as existing. When you have done the mods to the services.cgi code then I will also test that out to confirm everything works as expected. I tested the addonctrl command with nagios_nrpe, libvirt, bacula and alsa. One thing I noticed with alsa is that the initscript usage only contains start and stop. Without a status command then services.cgi will not be able to identify if it has been started or not, otherwise I suspect it will always show as stopped. This initscript needs to have a status option added to it to make services.cgi work correctly. I will work on the alsa initscript separately. Thanks for the tests. I will check if I can do something about the enabled/disabled command; but indeed the webui should not have problems with that. And I will send you an updated service.cgi for testing later today. About alsa. I know it was hardcoded in services.cgi in the past to ignore that service, and that was removed by me after discussion about it with Michael where it turned out nobody really knew why it was hardcoded there and why it should be ignored. So we concluded that it should at least not be hardcoded there and if the service would (still) pose problems, they then should be handled differently than trying to hide them in services.cgi itself. So in case there is more going on with the alsa service, which would prevent users to manually start/stop the service correctly, I think it is best to open a new bug report where we can explore how to disable the start/stop buttons for this (and possibly other services where we don't want users to easily start/stop them). If it is only a missing status command, then it is probably not that hard to just implement that indeed. I've sent you the new services.cgi. Now all services where the name does not match the addon name, the addon name is also printend between parenthesis next to the service so the user knows where the service comes from. And ofcourse addonctrl is now always called with a specific service name next to the addon name to make sure only a single service is stopped/started. I was pondering on eventually splitting the table up in addon - service addon - service - service ... and also add start/stop buttons on addon-level to start/stop all services belonging to that addon. But I'm not sure if that would be handy or rather confusing to the enduser. (In reply to Robin Roevens from comment #9) > I've sent you the new services.cgi. > > Now all services where the name does not match the addon name, the addon > name is also printend between parenthesis next to the service so the user > knows where the service comes from. > And ofcourse addonctrl is now always called with a specific service name > next to the addon name to make sure only a single service is stopped/started. > > I was pondering on eventually splitting the table up in > addon > - service > addon > - service > - service > ... > and also add start/stop buttons on addon-level to start/stop all services > belonging to that addon. > But I'm not sure if that would be handy or rather confusing to the enduser. The modified services.cgi has been tested on my vm testbed together with the modified addonctrl file you previously provided. libvirt and nagios_nrpe have their names in parentheses next to the initscript name. Pressing the red arrow for one of the services causes it to change from Running to Stopped. The only one that this does not work for is alsa because it does not have the status option in its initscript. I will raise this as a separate bug and submit a patch to include the status option. Clicking on the boot checkbox switched it from enabled to disabled and moved the symlink from the rc3.d directory to the rc3.d/off/ directory. Clicking the checkbox again made it checked and enabled it agaion and the symlink was moved back from the rc3.d/off/ directory back to the rc3.d dierctory. The only addon that this did not work with was nrpe (nagios_nrpe) For this addon the installation does not add the boot symlinks as the configuration has to be set up first Creating a symlink for nrpe caused the boot checkbox to become checked and then the box can be toggled between enabled and disabled as the others. If someone tries to toggle the boot checkbox before creating the initscript symlink then it will become checked but I don't see that as a problem. If they had followed the nrpe addon page in the wiki they would have created the symlinks. So everything is working as planned. Re your last point I would leave it with the individual services and not try and group them. People who have installed the addon services should know what they are doing with those services by reading the wiki pages and searching for the addon documentation online. Tried out your updated addonctrl.c file. Built a complete fresh IPFire using the updated addonctrl and services.cgi and then re-tested in my vm testbed. Everything worked as before but there was a peculiarity with nagios_nrpe. No symlink was created for nrpe, although the nrpe initscript is in /etc/init.d/ I ran addonctrl nagios_nrpe enable and got Service nrpe is already enabled. Skipping... However it is not enabled. Then running addonctrl nagios_nrpe disable I got Service nrpe is already disabled. Skipping... So when a service has an initscript but no symlink has been created yet the addonctrl indicates it is both enabled and disabled. The code is much more complicated now and not so easy for me to understand what is happening but I suspect that the code checks if it is in the /off/ directory and if not there then it must be already enabled and conversely if it is not in the rc3.d directory then it must already be disabled. I think it would be better if a initscript was not in either the rc3.d or rc3.d/off directories then the error message should flag up that no symlink has yet been created. One additional thing I found is that when testing for invalid addon names. addonctrl rte$jk:?ii status Invalid add-on name: rte:?ii where obviously using a $ makes bash look for a variable name. addonctrl n^&*(%^$##yy status bash: syntax error near unexpected token `%^$##yy' This is probably because I am just using a very invalid name for the addon name and it looks like some form of bash command. In reality in IPFire it will never be anything like that so the code is probably okay as it is for checking for an invalid add-on name. (In reply to Adolf Belka from comment #11) > Everything worked as before but there was a peculiarity with nagios_nrpe. > > No symlink was created for nrpe, although the nrpe initscript is in > /etc/init.d/ > > I ran addonctrl nagios_nrpe enable and got > > Service nrpe is already enabled. Skipping... > > However it is not enabled. Then running addonctrl nagios_nrpe disable I got > > Service nrpe is already disabled. Skipping... How comes ? also with the original addonctrl one would not be able to enable or disable that service if no symlink was already created. > > > So when a service has an initscript but no symlink has been created yet the > addonctrl indicates it is both enabled and disabled. > > The code is much more complicated now and not so easy for me to understand > what is happening but I suspect that the code checks if it is in the /off/ > directory and if not there then it must be already enabled and conversely if > it is not in the rc3.d directory then it must already be disabled. That is right. It checks if the symlink is available in srcpath (rc3.d when disabling en in rc3.d/off when enabling) and if it is not there, it assumes it is already enabled/disabled. In the original addonctrl both commands would have given a file not found error returned from the 'mv' command. > > I think it would be better if a initscript was not in either the rc3.d or > rc3.d/off directories then the error message should flag up that no symlink > has yet been created. I don't think that situation should be able to happen? But for sake of completeness and to help 'debugging' the error should indeed be more clear if no symlink exists. I'll work on that later today. Alternatively addonctrl could create such a symlink, but I'm not sure where to get the correct startup order (S??), and if that should be a task for addonctrl at all. > > One additional thing I found is that when testing for invalid addon names. > > addonctrl rte$jk:?ii status > Invalid add-on name: rte:?ii > > where obviously using a $ makes bash look for a variable name. indeed "$jk" is parsed as variable (since ':' is not a valid character for use in a variable name), which does not exist, so it returns "", passing the string "rte:?ii" to addonctrl > > addonctrl n^&*(%^$##yy status > bash: syntax error near unexpected token `%^$##yy' > > This is probably because I am just using a very invalid name for the addon > name and it looks like some form of bash command. the & operator also has special meanings in bash. Not completely sure but I think here it is parsed as "end of command to run in background". And in that case bash would try to run "addonctrl n^" in the background and then execute whatever comes after the "&" which in this case is mostly gibberish to bash; '*' (expanding to all files in current dir) and '(' (grouping) could still be valid, but the '%' (string manipulation token) is then unexpected for bash, hence the 'unexpected token' error. And (also not 100% sure) since there is a syntax error on the same line, also the "addonctrl n^" command is not executed, otherwise you would see something like [#] <pid> [#]+ Finished also being printed out. To actually pass that string to addonctrl, you'll have to enclose it in (single) quotes. And addonctrl should then just throw the Invalid add-on name error. (In reply to Robin Roevens from comment #12) > > > > I think it would be better if a initscript was not in either the rc3.d or > > rc3.d/off directories then the error message should flag up that no symlink > > has yet been created. > > I don't think that situation should be able to happen? But for sake of > completeness and to help 'debugging' the error should indeed be more clear > if no symlink exists. I'll work on that later today. > Alternatively addonctrl could create such a symlink, but I'm not sure where > to get the correct startup order (S??), and if that should be a task for > addonctrl at all. I don't believe that is a task for addonctrl. Some addons don't provide any initscript and you are expected to create one, some have a default working initscript that is then symlinked as part of the install. Some addons have an initscript as a starting point but don't symlink it as there is no guaranteed working default initscript and you are expected to tune it to your needs and then symlink it. > > > > > addonctrl n^&*(%^$##yy status > > bash: syntax error near unexpected token `%^$##yy' > > > > This is probably because I am just using a very invalid name for the addon > > name and it looks like some form of bash command. > > the & operator also has special meanings in bash. Not completely sure but I > think here it is parsed as "end of command to run in background". And in > that case bash would try to run "addonctrl n^" in the background and then > execute whatever comes after the "&" which in this case is mostly gibberish > to bash; '*' (expanding to all files in current dir) and '(' (grouping) > could still be valid, but the '%' (string manipulation token) is then > unexpected for bash, hence the 'unexpected token' error. And (also not 100% > sure) since there is a syntax error on the same line, also the "addonctrl > n^" command is not executed, otherwise you would see something like > [#] <pid> > [#]+ Finished > also being printed out. > > To actually pass that string to addonctrl, you'll have to enclose it in > (single) quotes. > And addonctrl should then just throw the Invalid add-on name error. Would it be a good idea to put into the usage line that the addon name has to be in single quotes or is it better to just leave it as it is because getting such a gibberish addon name should be virtually impossible in the IPFire coding. Forgot to mention that when trying to check the boot box on the services.cgi page for nrpe then it just leaves the checkbox unchecked so that is a good result for any user. Effectively not being able to make the boot box checked in the WUI page is saying that no initscript symlink exists. I will look at adding that info to the services.cgi wiki page. (In reply to Adolf Belka from comment #13) > (In reply to Robin Roevens from comment #12) > > I don't believe that is a task for addonctrl. Some addons don't provide any > initscript and you are expected to create one, some have a default working > initscript that is then symlinked as part of the install. > Some addons have an initscript as a starting point but don't symlink it as > there is no guaranteed working default initscript and you are expected to > tune it to your needs and then symlink it. Ok, but then the original addonctrl more or less had the same problem. This version will have a clear error message pointing out what exactly is wrong if I implement the check if a symlink even exists at all. So that will definitively be better. > Would it be a good idea to put into the usage line that the addon name has > to be in single quotes or is it better to just leave it as it is because > getting such a gibberish addon name should be virtually impossible in the > IPFire coding. I don't think that is necessary as that is basic bash behaviour. But it is probably not a bad idea to implement quotes around addon name and service name when calling from services.cgi (or any other cgi) in case such a reserved character would be used in one of those in the future (I do think we need to avoid that.. but you never know, and since it 'is possible' we should make everything as less error-prone as we can.) (In reply to Adolf Belka from comment #14) > Forgot to mention that when trying to check the boot box on the services.cgi > page for nrpe then it just leaves the checkbox unchecked so that is a good > result for any user. Effectively not being able to make the boot box checked > in the WUI page is saying that no initscript symlink exists. > > I will look at adding that info to the services.cgi wiki page. maybe even a nicer solution is to just hide the check box when no symlink exists. I'll check if that is easy to implement. Otherwise, I think it would be appropriate to display a message to the user indicating that the service can't be enabled due to a missing symlink (as a result of the no symlink found error that I will implement in addonctrl). So that it doesn't do 'just nothing' from user perspective (and prevent bug reports or forum question about "why can't I enable service ..."). I've sent you a new version of both addonctrl and services.cgi. Addonctrl now displays an explicit error when you try to enable/disable a service which has no runlevel symlink. A new command 'boot-status' was added to make addonctrl show the current on-boot status of a service. Services.cgi is adapted to use this new 'boot-status' instead of it itself searching for initscript symlinks, resulting in less and cleaner code in services.cgi and solely using addonctrl for any addon service related state. If no runlevel symlink is found, a small exclamation point is now displayed instead of a checkbox in the 'Boot' column. The alt-text (tooltip) of that image is "No valid runlevel symlink was found for service initscript."; So the user knows why he can't enable the service. Bug 11546 contains additional lines desired for services.cgi - should this bug (12935) supercede it (11546)? (In reply to Paul Simmons from comment #18) > Bug 11546 contains additional lines desired for services.cgi - should this > bug (12935) supercede it (11546)? I don't think so. This bug is related to the Addon-Services table. rngd is not an addon but a system package which is in the Services table at the top of the WUI page. Very well, then. I had hopes 11546, which has languished for nearly 5 years, might be resolved, as services.cgi is being updated here. <sigh> (In reply to Paul Simmons from comment #20) > Very well, then. I had hopes 11546, which has languished for nearly 5 > years, might be resolved, as services.cgi is being updated here. > > <sigh> See my comment on bug#12900. If that is done then rngd would automatically get picked up by the changes being implemented for this bug. I have tested out the new code but had problems. Everything in sevices.cgi is showing the red triangle with an exclamation mark for the Boot status even though there are symlinks available. Tested out the addonctrl code from the command line with boot-status and got the following for code that has a symlink in place. # addonctrl bacula boot-status Unable to open directory /etc/rc.d/rc3.d/off. bacula is enabled on boot. For nagios_nrpe that has no symlink in place the following was found. # addonctrl nagios_nrpe boot-status Unable to open directory /etc/rc.d/rc3.d/off. nrpe is not available for boot. (Service has no valid symlink in either /etc/rc.d/rc3.d or /etc/rc.d/rc3.d/off). Then tried to disable libvirtd from libvirt # addonctrl libvirt disable libvirtd Error creating /etc/rc.d/rc3.d/off. (Error: 9) The directory off was not present in rc3.d Then tried to disable bacula # addonctrl bacula disable disabled service bacula So this worked and the off directory was successfully created. Now services.cgi shows the correct status for the Boot column. Then I removed the off directory and services.cgi went back to all red triangles for Boot column. Then ran the command to disable bacula twice and got the following result. # addonctrl bacula disable Error creating /etc/rc.d/rc3.d/off. (Error: 9) # addonctrl bacula disable disabled service bacula So the first time it gave an error for creating the off directory but running the same command a second time it was successful in creating the off directory. Let me know if you need more checks done or info to help you fault find. All other addonctrl commands work fine. (In reply to Adolf Belka from comment #22) > I have tested out the new code but had problems. > > Everything in sevices.cgi is showing the red triangle with an exclamation > mark for the Boot status even though there are symlinks available. This seems to be caused by the 'Unable to open directory' errors you have. > > Tested out the addonctrl code from the command line with boot-status and got > the following for code that has a symlink in place. > > > # addonctrl bacula boot-status > Unable to open directory /etc/rc.d/rc3.d/off. > bacula is enabled on boot. > > For nagios_nrpe that has no symlink in place the following was found. > > # addonctrl nagios_nrpe boot-status > Unable to open directory /etc/rc.d/rc3.d/off. > nrpe is not available for boot. (Service has no valid symlink in either > /etc/rc.d/rc3.d or /etc/rc.d/rc3.d/off). > So basically it does the job and gives correct info. Only an error is also displayed. I removed that error, since it has no real purpose there. The 'off' dir is actually allowed to not exists at that point. > Then tried to disable libvirtd from libvirt > > # addonctrl libvirt disable libvirtd > Error creating /etc/rc.d/rc3.d/off. (Error: 9) > > The directory off was not present in rc3.d > > Then tried to disable bacula > > # addonctrl bacula disable > disabled service bacula > > So this worked and the off directory was successfully created. Now > services.cgi shows the correct status for the Boot column. What happened is, the directory is created successfully. But the error is (falsely) triggered and the program bails out. Next try, the dir exists, and everything works. I corrected the check on the directory (which currently only succeeded when the dir already existed and not when it was successfully created). So now it should not give an error when the dir is successfully created or when the dir already exists. It should still fail when it fails to create dir for some reason. > > > Then I removed the off directory and services.cgi went back to all red > triangles for Boot column. > > Then ran the command to disable bacula twice and got the following result. > > # addonctrl bacula disable > Error creating /etc/rc.d/rc3.d/off. (Error: 9) > # addonctrl bacula disable > disabled service bacula > > So the first time it gave an error for creating the off directory but > running the same command a second time it was successful in creating the off > directory. > > Let me know if you need more checks done or info to help you fault find. > > > All other addonctrl commands work fine. Good to hear this. We are getting closer to submitting this patch :-) The latest version of addonctrl.c now works fine with no problems. Can turn things on and off and enable/disable them from either services.cgi or from the command line with addonctrl. Also the triangle with exclamation mark symbol is present if there is no symlink in place and this is replaced by a checked checkbox when the symlink is created and when deleted it goes back to the red triangle again. I also tried the guardian addon as this uses addonctrl for starting guardian and this also still works as previously with no problems. I think this is now at the stage where you can submit patches for the addonctrl.c, services.cgi and lang files. Thanks Adolf for the extensive testing and feedback. Patch was submitted: https://patchwork.ipfire.org/project/ipfire/list/?series=3083 A bug moves to MODIFIED when the patch is reviewed and merged into the next branch. Then it moves to ON_QA when the Testing branch is released for evaluation and testing. New version of patchset implementing remarks/improvements as pointed out on mailinglist: https://patchwork.ipfire.org/project/ipfire/list/?series=3086 New version of patchset implementing remarks/improvements as pointed out on mailinglist: https://patchwork.ipfire.org/project/ipfire/list/?series=3093 *** Bug 12965 has been marked as a duplicate of this bug. *** https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=24168c8898eb6a9bcc1a0f6103f0c87eb092e7ef https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=2b9b31b71d9b6211276b60ce9583198163a0c582 https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=207ca1141c1e1b1b2d384b82ac6b8238ad8d566c https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=4f205b54422afa069dd3956ebc4df2efe8600f75 https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=8ed997102eeaee19e06e13aea4dfe08f83338489 Checked with CU172 Testing. The various addon services are shown together with their service names if they are different to the package name and the icons for starting/stopping/restarting are working. Also the icon for a missing initscript is shown. Everything works as expected in the testing I have carried out. Looks good. |