<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.iwillfearnoevil.com/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Chubbard</id>
	<title>I Will Fear No Evil - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.iwillfearnoevil.com/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Chubbard"/>
	<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Special:Contributions/Chubbard"/>
	<updated>2026-05-08T15:46:10Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.2</generator>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=688</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=688"/>
		<updated>2026-02-11T17:34:15Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 02-10-26&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Work has been taking up a lot of time.  Slow progress on UI updates.&lt;br /&gt;
* Minor bugfixes on API&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-28-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Updating the polling system is taking much longer than I expected to really support independent polling.  Will continue focusing on this.&lt;br /&gt;
* might goof off on the UI and continue the more fun conversion to the dark / light mode support changes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* I am still cleaning the UX for dark and light mode in the dev branch.  Running behind on merging to main.&lt;br /&gt;
* Refactoring the Polling system to better support remote pollers out of the box&lt;br /&gt;
* Set base rules that host specific always overrides host or device groups for monitor checks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 09-08-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Working on the dashboard UX.  Have something I am not embarrassed about.  Now need to wire it in and get that format standard..  Hope to merge into main branch in a week or two&lt;br /&gt;
* Couple of new API routes setup for the dashboard to get summary data and hotspot information&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash_sed&amp;diff=687</id>
		<title>Bash sed</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash_sed&amp;diff=687"/>
		<updated>2025-12-30T17:44:10Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* Inline commenting by line range */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==== Useful sed commands ====&lt;br /&gt;
&lt;br /&gt;
== Ignore until EOF ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat foo | sed &amp;quot;1,/^stringMatch/d&amp;quot;&lt;br /&gt;
* Will ignore all before string match until EOF&lt;br /&gt;
* stringMatch must be start of line (^)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Match all until EOF ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat foo | sed &amp;quot;/^stringMatch2/q&amp;quot; | grep -v &amp;quot;^stringMatch2&amp;quot;&lt;br /&gt;
* will match ALL until match ( and remove match string LINE)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Strip pattern match ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;192.168.15.58 172.17.0.1 123.10.130.201&amp;quot; | sed 's/172.[[:digit:]]\+.[[:digit:]]\+.[[:digit:]]\+//'&lt;br /&gt;
192.168.15.58  123.10.130.201&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Will strip out IP addresses beginning with 172.&lt;br /&gt;
&lt;br /&gt;
== Replace in file only when matching regex ==&lt;br /&gt;
This will only alter values if the first arg matches&lt;br /&gt;
Nice for changing values in a script or ini&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sed -ie '/^foo/s/4/20/g' /tmp/b&lt;br /&gt;
&lt;br /&gt;
cat /tmp/b&lt;br /&gt;
foo=20&lt;br /&gt;
 foo=4&lt;br /&gt;
bar=5&lt;br /&gt;
 bar=5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Strip leading and trailing whitespace ==&lt;br /&gt;
I like this a little better than the awk version which is shorter.  However this makes sense in my head.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
| sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Inline commenting by line range ==&lt;br /&gt;
This is an interesting one.  I needed to add the # character from line 483 until 828 but still keep the data.  This sed was one I had never used but will need to remember.. Very useful&lt;br /&gt;
&lt;br /&gt;
It did successfully comment out all the lines without breaking stuff.  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sed '483,828 s/^/#/' clean_me.yml &amp;gt; filteredCleanFile.yml&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash_sed&amp;diff=686</id>
		<title>Bash sed</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash_sed&amp;diff=686"/>
		<updated>2025-12-30T17:43:20Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* Strip leading and trailing whitespace */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==== Useful sed commands ====&lt;br /&gt;
&lt;br /&gt;
== Ignore until EOF ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat foo | sed &amp;quot;1,/^stringMatch/d&amp;quot;&lt;br /&gt;
* Will ignore all before string match until EOF&lt;br /&gt;
* stringMatch must be start of line (^)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Match all until EOF ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat foo | sed &amp;quot;/^stringMatch2/q&amp;quot; | grep -v &amp;quot;^stringMatch2&amp;quot;&lt;br /&gt;
* will match ALL until match ( and remove match string LINE)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Strip pattern match ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;192.168.15.58 172.17.0.1 123.10.130.201&amp;quot; | sed 's/172.[[:digit:]]\+.[[:digit:]]\+.[[:digit:]]\+//'&lt;br /&gt;
192.168.15.58  123.10.130.201&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Will strip out IP addresses beginning with 172.&lt;br /&gt;
&lt;br /&gt;
== Replace in file only when matching regex ==&lt;br /&gt;
This will only alter values if the first arg matches&lt;br /&gt;
Nice for changing values in a script or ini&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sed -ie '/^foo/s/4/20/g' /tmp/b&lt;br /&gt;
&lt;br /&gt;
cat /tmp/b&lt;br /&gt;
foo=20&lt;br /&gt;
 foo=4&lt;br /&gt;
bar=5&lt;br /&gt;
 bar=5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Strip leading and trailing whitespace ==&lt;br /&gt;
I like this a little better than the awk version which is shorter.  However this makes sense in my head.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
| sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Inline commenting by line range ==&lt;br /&gt;
This is an interesting one.  I needed to add the # character from line 483 until 828 but still keep the data.  This sed was one I had never used but will need to remember.. Very useful&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sed '483,828 s/^/#/' clean_me.yml &amp;gt; filteredCleanFile.yml&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=685</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=685"/>
		<updated>2025-10-28T16:32:42Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-28-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Updating the polling system is taking much longer than I expected to really support independent polling.  Will continue focusing on this.&lt;br /&gt;
* might goof off on the UI and continue the more fun conversion to the dark / light mode support changes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* I am still cleaning the UX for dark and light mode in the dev branch.  Running behind on merging to main.&lt;br /&gt;
* Refactoring the Polling system to better support remote pollers out of the box&lt;br /&gt;
* Set base rules that host specific always overrides host or device groups for monitor checks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 09-08-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Working on the dashboard UX.  Have something I am not embarrassed about.  Now need to wire it in and get that format standard..  Hope to merge into main branch in a week or two&lt;br /&gt;
* Couple of new API routes setup for the dashboard to get summary data and hotspot information&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=684</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=684"/>
		<updated>2025-10-28T15:23:01Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-12-28&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Updating the polling system is taking much longer than I expected to really support independent polling.  Will continue focusing on this.&lt;br /&gt;
* might goof off on the UI and continue the more fun conversion to the dark / light mode support changes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* I am still cleaning the UX for dark and light mode in the dev branch.  Running behind on merging to main.&lt;br /&gt;
* Refactoring the Polling system to better support remote pollers out of the box&lt;br /&gt;
* Set base rules that host specific always overrides host or device groups for monitor checks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 09-08-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Working on the dashboard UX.  Have something I am not embarrassed about.  Now need to wire it in and get that format standard..  Hope to merge into main branch in a week or two&lt;br /&gt;
* Couple of new API routes setup for the dashboard to get summary data and hotspot information&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-interesting-command-examples&amp;diff=683</id>
		<title>Bash-interesting-command-examples</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-interesting-command-examples&amp;diff=683"/>
		<updated>2025-10-28T15:21:07Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;====Interesting one-liners====&lt;br /&gt;
* Find all drives and ignore loop devices&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@kvm03:/var/log# lsblk | grep -v &amp;quot;loop\|NAME&amp;quot; | grep &amp;quot;^[a-z]\|^[A-Z]&amp;quot; | awk '{print $1}'&lt;br /&gt;
sda&lt;br /&gt;
sdb&lt;br /&gt;
root@kvm03:/var/log# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@kvm03:/var/log# lsblk | grep disk | awk '{print $1}'&lt;br /&gt;
sda&lt;br /&gt;
sdb&lt;br /&gt;
root@kvm03:/var/log# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Continue match until match is found&lt;br /&gt;
* This is using awk, and seems quite powerful as a tool&lt;br /&gt;
* found this little gem at [https://unix.stackexchange.com/questions/21076/how-to-show-lines-after-each-grep-match-until-other-specific-match Stack Exchange]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
awk '/Word A/,/Word D/' filename&lt;br /&gt;
&lt;br /&gt;
/From/CONTINUE/Until/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Remove non-english directories&lt;br /&gt;
* change the type to f if you are looking for non-english files&lt;br /&gt;
* ALWAYS test find results before deleting, duh!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo find . -type d -not -name &amp;quot;[a-zA-Z0-9]*&amp;quot; -exec rm -rf {} \;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Who the hell thought it was a good idea to NOT have xml2 available in both Mac and Linux?  What the hell!?!?&lt;br /&gt;
* On Mac install gawk via brew&lt;br /&gt;
* will leverage xmllint, and then stupidity ensues... WTF?&lt;br /&gt;
* never trust this in prod-ish servers without a lot of testing.  chatGPT gave this suggestion as a workaround to get the xml2 behavior.&lt;br /&gt;
* I am betting that complex xml will make this choke or turn into a pumpkin but meh, it works well enough for simple stuff.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo '&amp;lt;root&amp;gt;&amp;lt;name&amp;gt;Chris&amp;lt;/name&amp;gt;&amp;lt;/root&amp;gt;' | xmllint --format - 2&amp;gt;/dev/null | gawk '&lt;br /&gt;
  /&amp;lt;[[:alnum:]_:-]+&amp;gt;/ {&lt;br /&gt;
    tag = gensub(/.*&amp;lt;([[:alnum:]_:-]+)&amp;gt;.*/, &amp;quot;\\1&amp;quot;, &amp;quot;g&amp;quot;)&lt;br /&gt;
    path = (path ? path &amp;quot;/&amp;quot; tag : &amp;quot;/&amp;quot; tag)&lt;br /&gt;
  }&lt;br /&gt;
  /&amp;lt;\/[[:alnum:]_:-]+&amp;gt;/ {&lt;br /&gt;
    if ($0 ~ /&amp;lt;[[:alnum:]_:-]+&amp;gt;[^&amp;lt;]+&amp;lt;\/[[:alnum:]_:-]+&amp;gt;/) {&lt;br /&gt;
      val = gensub(/.*&amp;lt;[[:alnum:]_:-]+&amp;gt;([^&amp;lt;]+)&amp;lt;\/[[:alnum:]_:-]+&amp;gt;.*/, &amp;quot;\\1&amp;quot;, &amp;quot;g&amp;quot;)&lt;br /&gt;
      print path &amp;quot;=&amp;quot; val&lt;br /&gt;
    }&lt;br /&gt;
    sub(/\/[^/]+$/, &amp;quot;&amp;quot;, path)&lt;br /&gt;
  }'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=682</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=682"/>
		<updated>2025-10-13T18:49:33Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* I am still cleaning the UX for dark and light mode in the dev branch.  Running behind on merging to main.&lt;br /&gt;
* Refactoring the Polling system to better support remote pollers out of the box&lt;br /&gt;
* Set base rules that host specific always overrides host or device groups for monitor checks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 09-08-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Working on the dashboard UX.  Have something I am not embarrassed about.  Now need to wire it in and get that format standard..  Hope to merge into main branch in a week or two&lt;br /&gt;
* Couple of new API routes setup for the dashboard to get summary data and hotspot information&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=681</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=681"/>
		<updated>2025-10-13T18:49:14Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* I am still cleaning the UX for dark and light mode in the dev branch.  Running behind on merging to main.&lt;br /&gt;
* Refactoring the Polling system to better support remote pollers out of the box&lt;br /&gt;
* Set base rules that host specific always overrides host or device groups for monitor checks&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 09-08-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Working on the dashboard UX.  Have something I am not embarrassed about.  Now need to wire it in and get that format standard..  Hope to merge into main branch in a week or two&lt;br /&gt;
* Couple of new API routes setup for the dashboard to get summary data and hotspot information&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=680</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=680"/>
		<updated>2025-09-09T16:50:44Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 09-08-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Working on the dashboard UX.  Have something I am not embarrassed about.  Now need to wire it in and get that format standard..  Hope to merge into main branch in a week or two&lt;br /&gt;
* Couple of new API routes setup for the dashboard to get summary data and hotspot information&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Mediawiki&amp;diff=679</id>
		<title>Category:Mediawiki</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Mediawiki&amp;diff=679"/>
		<updated>2025-08-27T17:19:49Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;Simply anything that is specific to Mediawiki that I have had to do over time or things I learned.&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Simply anything that is specific to Mediawiki that I have had to do over time or things I learned.&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Mediawiki&amp;diff=678</id>
		<title>Mediawiki</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Mediawiki&amp;diff=678"/>
		<updated>2025-08-27T17:08:22Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:mediawiki]]&lt;br /&gt;
===Mediawiki Notes===&lt;br /&gt;
General tasks done over time to maintain mediawiki.&lt;br /&gt;
&lt;br /&gt;
==Nuke SPAM accounts==&lt;br /&gt;
Login to the MySQL database to do this work..&lt;br /&gt;
* Spaamers are adding bot email address into notes, so we can use that as a better filter&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
UPDATE account_requests SET   acr_deleted = 1,   acr_rejected = DATE_FORMAT(NOW(), '%Y%m%d%H%i%s'),   acr_comment = 'Mass rejection due to account bot or spammer abuse' WHERE acr_deleted = 0 AND acr_email = acr_notes;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* This is the old version.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
UPDATE account_requests SET   acr_deleted = 1,   acr_rejected = DATE_FORMAT(NOW(), '%Y%m%d%H%i%s'),   acr_comment = 'Mass rejection due to account abuse spike' WHERE acr_deleted = 0;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Attempt indexing==&lt;br /&gt;
* TBD&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Mediawiki&amp;diff=677</id>
		<title>Mediawiki</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Mediawiki&amp;diff=677"/>
		<updated>2025-08-27T17:03:48Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;Category:mediawiki ===Mediawiki Notes=== General tasks done over time to maintain mediawiki.  ==Nuke SPAM accounts== Login to the MySQL database to do this work.. * UPDATE account_requests SET   acr_deleted = 1,   acr_rejected = DATE_FORMAT(NOW(), '%Y%m%d%H%i%s'),   acr_comment = 'Mass rejection due to account bot or spammer abuse' WHERE acr_deleted = 0 AND acr_email = acr_notes;  ==Attempt indexing== * TBD&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:mediawiki]]&lt;br /&gt;
===Mediawiki Notes===&lt;br /&gt;
General tasks done over time to maintain mediawiki.&lt;br /&gt;
&lt;br /&gt;
==Nuke SPAM accounts==&lt;br /&gt;
Login to the MySQL database to do this work..&lt;br /&gt;
* UPDATE account_requests SET   acr_deleted = 1,   acr_rejected = DATE_FORMAT(NOW(), '%Y%m%d%H%i%s'),   acr_comment = 'Mass rejection due to account bot or spammer abuse' WHERE acr_deleted = 0 AND acr_email = acr_notes;&lt;br /&gt;
&lt;br /&gt;
==Attempt indexing==&lt;br /&gt;
* TBD&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-sorting-apache&amp;diff=676</id>
		<title>Bash-sorting-apache</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-sorting-apache&amp;diff=676"/>
		<updated>2025-08-27T16:58:35Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:bash]][[Category:fail2ban]]&lt;br /&gt;
Example of how to sort a logfile for top hits (apache in this case)&lt;br /&gt;
&lt;br /&gt;
Why is this useful?  Think making a case statement and stuff that is a script kiddie attack gets auto-blocked via fail2ban.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
awk '$9 == &amp;quot;404&amp;quot; {print $7}' access.log |sort|uniq -c|sort -rn| head -n 30&lt;br /&gt;
Bonus: do this for nginx logs now :P&lt;br /&gt;
SRC: https://twitter.com/climagic/status/1448297516571762691&lt;br /&gt;
&lt;br /&gt;
Something like this could be used to further update fail2ban-flies&lt;br /&gt;
&lt;br /&gt;
Returned:&lt;br /&gt;
      7 /.env&lt;br /&gt;
      6 /ecp/Current/exporttool/microsoft.exchange.ediscovery.exporttool.application&lt;br /&gt;
      5 /GponForm/diag_Form?style/&lt;br /&gt;
      3 /owa/auth/x.js&lt;br /&gt;
      3 /owa/auth/logon.aspx?url=https%3a%2f%2f1%2fecp%2f&lt;br /&gt;
      3 /owa/auth/logon.aspx&lt;br /&gt;
      3 /actuator/health&lt;br /&gt;
      2 /wp-includes/js/jquery/jquery.js&lt;br /&gt;
      2 /vendor/phpunit/phpunit/build.xml&lt;br /&gt;
      2 /plugins/system/debug/debug.xml&lt;br /&gt;
      2 /OA_HTML/AppsLocalLogin.jsp&lt;br /&gt;
      2 /nice%20ports%2C/Tri%6Eity.txt%2ebak&lt;br /&gt;
      2 /misc/ajax.js&lt;br /&gt;
      2 /login&lt;br /&gt;
      2 /js/header-rollup-554.js&lt;br /&gt;
      2 /images/editor/separator.gif&lt;br /&gt;
      2 /.git/config&lt;br /&gt;
      2 /fckeditor/editor/filemanager/connectors/php/upload.php?Type=Media&lt;br /&gt;
      2 /admin/view/javascript/common.js&lt;br /&gt;
      2 /administrator/language/en-GB/install.xml&lt;br /&gt;
      2 /administrator/help/en-GB/toc.json&lt;br /&gt;
      2 /administrator/&lt;br /&gt;
      2 /admin/includes/general.js&lt;br /&gt;
      2 /aab9&lt;br /&gt;
      2 /aaa9&lt;br /&gt;
      1 /xmrlpc.php?daksldlkdsadas=1&lt;br /&gt;
      1 /wp-login.php&lt;br /&gt;
      1 /wp-includes/css/buttons.css&lt;br /&gt;
      1 /.well-known/security.txt&lt;br /&gt;
      1 /ucmdb-api/connect&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Iptables&amp;diff=675</id>
		<title>Iptables</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Iptables&amp;diff=675"/>
		<updated>2025-08-27T16:57:40Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==iptables notes==&lt;br /&gt;
Just some generic notes on working with IP tables.  Can be a PITA when an issue is happening to remember the details..&lt;br /&gt;
&lt;br /&gt;
REMEMBER:  You can block YOURSELF if you do not use your head!&lt;br /&gt;
&lt;br /&gt;
Dont block:&lt;br /&gt;
* Internal IP address&lt;br /&gt;
* YOURSELF external IP address (if logging in via public interfaces)&lt;br /&gt;
* Loopbacks...  Not sure what would happen but expect it would brick things..&lt;br /&gt;
&lt;br /&gt;
Simple blocks&lt;br /&gt;
* iptables -A INPUT -s 47.245.124.200 -j DROP&lt;br /&gt;
&lt;br /&gt;
Reminders:&lt;br /&gt;
* This will not persist across reboots&lt;br /&gt;
* setup of fail2ban would be a heck of a lot easier than manual blocking&lt;br /&gt;
* iptables -L -n is your friend&lt;br /&gt;
&lt;br /&gt;
[[Category:fail2ban]][[Category:iptables]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Fail2ban&amp;diff=674</id>
		<title>Category:Fail2ban</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Fail2ban&amp;diff=674"/>
		<updated>2025-08-27T16:56:27Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;Anything related to fail2ban can be added to this category&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Anything related to fail2ban can be added to this category&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Fail2ban&amp;diff=673</id>
		<title>Fail2ban</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Fail2ban&amp;diff=673"/>
		<updated>2025-08-27T16:56:08Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;===Fail2ban Notes=== There are a whole bunch of options and commands that I forget for fail2ban.  This is simply a list of things that can be done..  ==Ban IP== * sudo  fail2ban-client set JAIL banip 107.175.27.212  ==Status Jail== * sudo fail2ban-client status  ==UnBan IP== * sudo  fail2ban-client set JAIL unbanip 192.168.0.1  ==Testing Regex== * fail2ban-regex /var/log/haproxy.log 'haproxy(?:\[\d+\])?: &amp;lt;HOST&amp;gt;:\d+ \[.*\] default_ssl_http_in~ wiki/wiki01 .* &amp;quot;GET /mediawi...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fail2ban Notes===&lt;br /&gt;
There are a whole bunch of options and commands that I forget for fail2ban.  This is simply a list of things that can be done..&lt;br /&gt;
&lt;br /&gt;
==Ban IP==&lt;br /&gt;
* sudo  fail2ban-client set JAIL banip 107.175.27.212&lt;br /&gt;
&lt;br /&gt;
==Status Jail==&lt;br /&gt;
* sudo fail2ban-client status&lt;br /&gt;
&lt;br /&gt;
==UnBan IP==&lt;br /&gt;
* sudo  fail2ban-client set JAIL unbanip 192.168.0.1&lt;br /&gt;
&lt;br /&gt;
==Testing Regex==&lt;br /&gt;
* fail2ban-regex /var/log/haproxy.log 'haproxy(?:\[\d+\])?: &amp;lt;HOST&amp;gt;:\d+ \[.*\] default_ssl_http_in~ wiki/wiki01 .* &amp;quot;GET /mediawiki/index\.php\?title.*.RequestAccount.*$'&lt;br /&gt;
&lt;br /&gt;
==Ban Loops==&lt;br /&gt;
* for x in `grep mediawiki haproxy.log | grep -v 192.168.0.1 | grep Special | awk '{print $6}' | sed 's/:.*//g' | sort | uniq`;  do fail2ban-client set recidive banip $x ; done&lt;br /&gt;
&lt;br /&gt;
[[Category:fail2ban]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=672</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=672"/>
		<updated>2025-08-15T21:14:02Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Getting an account here...&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Anyone requesting an account from me DIRECTLY can have an account.&lt;br /&gt;
* The account request PAGE is being hammered by spammers and AI bots. ( so dont bother using it )&lt;br /&gt;
* It is not hard to find my personal email address, however I am not posting directly here to keep jerks from hitting me directly&lt;br /&gt;
* &amp;lt;strong&amp;gt;Anyone who DOES use the account request page ends up in fail2ban for at least 30 days. (sick of bots)&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=671</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=671"/>
		<updated>2025-08-13T17:16:09Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
* Looking at refactor of defined monitor association with hosts.  Right now it is clunky and not flexible for one offs per host&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=670</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=670"/>
		<updated>2025-08-13T17:15:24Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More UX work and seeing if I can improve the poller daemon a bit&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=669</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=669"/>
		<updated>2025-07-30T19:09:11Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-30-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* More making the frontend pretty.  I am really not AS embarrassed as I was.&lt;br /&gt;
* investigating breakout of array of monitors to allow for unique checks from the defined monitors.  Every monitor being the same across all hosts is causing some issues.  We need to be able to say &amp;quot;host1 needs value2&amp;quot; and &amp;quot;host2 needs value6&amp;quot; for the same service check.&lt;br /&gt;
* Still need to update the screenshots in the git repos to reflect the better looking UI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Url_encode&amp;diff=668</id>
		<title>Url encode</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Url_encode&amp;diff=668"/>
		<updated>2025-07-10T17:29:04Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* URL encoding function */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==URL encoding function==&lt;br /&gt;
* Simple way to encode URLs for bash scripts..  Brainless but seems to work ok for most use cases..&lt;br /&gt;
* Yes, this will encode the '/' as the '%2F' character, but curl does not spit errors and I have not seen a failure doing things this way&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
urlencode() {&lt;br /&gt;
  local string=&amp;quot;$1&amp;quot;&lt;br /&gt;
  local encoded=&amp;quot;&amp;quot;&lt;br /&gt;
  local pos c o&lt;br /&gt;
  for (( pos=0 ; pos&amp;lt;${#string} ; pos++ )); do&lt;br /&gt;
    c=${string:$pos:1}&lt;br /&gt;
    case &amp;quot;$c&amp;quot; in&lt;br /&gt;
      [a-zA-Z0-9.~_-]) o=&amp;quot;$c&amp;quot; ;;&lt;br /&gt;
      *) printf -v o '%%%02X' &amp;quot;'$c&amp;quot;&lt;br /&gt;
    esac&lt;br /&gt;
    encoded+=&amp;quot;$o&amp;quot;&lt;br /&gt;
  done&lt;br /&gt;
  echo &amp;quot;$encoded&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Url_encode&amp;diff=667</id>
		<title>Url encode</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Url_encode&amp;diff=667"/>
		<updated>2025-07-10T17:28:26Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* URL encoding function */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==URL encoding function==&lt;br /&gt;
* Simple way to encode URLs for bash scripts..  Brainless but seems to work ok for most use cases..&lt;br /&gt;
* Yes, this will encode the '/' character, but curl does not spit errors and I have not seen a failure doing things this way&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
urlencode() {&lt;br /&gt;
  local string=&amp;quot;$1&amp;quot;&lt;br /&gt;
  local encoded=&amp;quot;&amp;quot;&lt;br /&gt;
  local pos c o&lt;br /&gt;
  for (( pos=0 ; pos&amp;lt;${#string} ; pos++ )); do&lt;br /&gt;
    c=${string:$pos:1}&lt;br /&gt;
    case &amp;quot;$c&amp;quot; in&lt;br /&gt;
      [a-zA-Z0-9.~_-]) o=&amp;quot;$c&amp;quot; ;;&lt;br /&gt;
      *) printf -v o '%%%02X' &amp;quot;'$c&amp;quot;&lt;br /&gt;
    esac&lt;br /&gt;
    encoded+=&amp;quot;$o&amp;quot;&lt;br /&gt;
  done&lt;br /&gt;
  echo &amp;quot;$encoded&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Url_encode&amp;diff=666</id>
		<title>Url encode</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Url_encode&amp;diff=666"/>
		<updated>2025-07-10T17:26:31Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;==URL encoding function== * Simple way to encode URLs for bash scripts..  Brainless but seems to work ok for most use cases.. &amp;lt;pre&amp;gt; urlencode() {   local string=&amp;quot;$1&amp;quot;   local encoded=&amp;quot;&amp;quot;   local pos c o   for (( pos=0 ; pos&amp;lt;${#string} ; pos++ )); do     c=${string:$pos:1}     case &amp;quot;$c&amp;quot; in       [a-zA-Z0-9.~_-]) o=&amp;quot;$c&amp;quot; ;;       *) printf -v o '%%%02X' &amp;quot;'$c&amp;quot;     esac     encoded+=&amp;quot;$o&amp;quot;   done   echo &amp;quot;$encoded&amp;quot; } &amp;lt;/pre&amp;gt;    Category:Bash&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==URL encoding function==&lt;br /&gt;
* Simple way to encode URLs for bash scripts..  Brainless but seems to work ok for most use cases..&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
urlencode() {&lt;br /&gt;
  local string=&amp;quot;$1&amp;quot;&lt;br /&gt;
  local encoded=&amp;quot;&amp;quot;&lt;br /&gt;
  local pos c o&lt;br /&gt;
  for (( pos=0 ; pos&amp;lt;${#string} ; pos++ )); do&lt;br /&gt;
    c=${string:$pos:1}&lt;br /&gt;
    case &amp;quot;$c&amp;quot; in&lt;br /&gt;
      [a-zA-Z0-9.~_-]) o=&amp;quot;$c&amp;quot; ;;&lt;br /&gt;
      *) printf -v o '%%%02X' &amp;quot;'$c&amp;quot;&lt;br /&gt;
    esac&lt;br /&gt;
    encoded+=&amp;quot;$o&amp;quot;&lt;br /&gt;
  done&lt;br /&gt;
  echo &amp;quot;$encoded&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=665</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=665"/>
		<updated>2025-07-05T02:30:50Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 07-03-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Frontend variable cleanup.  Focus on devices.. It is messy and not maintainable as is.&lt;br /&gt;
* API error cleanup, specifically nulls in event modifications&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=664</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=664"/>
		<updated>2025-06-20T17:05:08Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-20-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* building more annotations&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=663</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=663"/>
		<updated>2025-06-16T19:15:05Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged branch openApiAndCleanup)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=662</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=662"/>
		<updated>2025-06-12T15:54:05Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=661</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=661"/>
		<updated>2025-06-12T15:53:54Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
* &amp;lt;strong&amp;gt; The backend has been updated to use PHP 8.2 from 8.1&amp;lt;/strong&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=660</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=660"/>
		<updated>2025-06-12T15:33:38Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 06-12-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Have basic swagger UI working on API server (unmerged)&lt;br /&gt;
* building more annotations&lt;br /&gt;
* Some PHP syntax error bug squishing&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=PHP_Notes_and_Examples&amp;diff=659</id>
		<title>PHP Notes and Examples</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=PHP_Notes_and_Examples&amp;diff=659"/>
		<updated>2025-06-11T18:05:46Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* Slim4 Framework Notes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===PHP Notes and Examples===&lt;br /&gt;
==change Array to JSON ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$foo=json_encode($array,1);&lt;br /&gt;
echo $foo . &amp;quot;\n&amp;quot;;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Convert a VALID JSON string to array==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$foo=json_decode($json, True);&lt;br /&gt;
print_r($foo);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Example foreach loop (simple)==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
foreach ($nestedArray1 as $Array1) {&lt;br /&gt;
  do something with new array $array1[??};&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Example Try Catch block==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Use Exception;&lt;br /&gt;
try {&lt;br /&gt;
  someRandoFunction;&lt;br /&gt;
}&lt;br /&gt;
catch (Exception $e) {&lt;br /&gt;
  doSomethingWhenErrorFound;&lt;br /&gt;
}&lt;br /&gt;
catch (ErrorName $e) {&lt;br /&gt;
  cryToMamma;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Checking if file or dir exists.  Starting point:==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
$path_parts = pathinfo('/www/htdocs/inc/lib.inc.php');&lt;br /&gt;
&lt;br /&gt;
echo $path_parts['dirname'], &amp;quot;\n&amp;quot;;&lt;br /&gt;
echo $path_parts['basename'], &amp;quot;\n&amp;quot;;&lt;br /&gt;
echo $path_parts['extension'], &amp;quot;\n&amp;quot;;&lt;br /&gt;
echo $path_parts['filename'], &amp;quot;\n&amp;quot;;&lt;br /&gt;
?&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above example will output:&lt;br /&gt;
&lt;br /&gt;
/www/htdocs/inc&lt;br /&gt;
lib.inc.php&lt;br /&gt;
php&lt;br /&gt;
lib.inc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Go BACKWARDS in array==&lt;br /&gt;
From: [https://stackoverflow.com/questions/2194388/php-how-to-get-the-element-before-the-last-element-from-an-array php-how-to-get-the-element-before-the-last-element-from-an-array]&lt;br /&gt;
* Note that you need more than 2 elements in the array for this to work properly according to answer.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;?php&lt;br /&gt;
$foo=[&amp;quot;one&amp;quot;,&amp;quot;two&amp;quot;,&amp;quot;three&amp;quot;,&amp;quot;four&amp;quot;];&lt;br /&gt;
echo &amp;quot;TEST &amp;quot; . $foo[count($foo)-2];&lt;br /&gt;
?&amp;gt;&lt;br /&gt;
Returns TEST three&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
All arrays have an &amp;quot;internal array pointer&amp;quot; which points to the current array element, PHP has several functions which allow you to navigate through the array and view the current elements key and value.&lt;br /&gt;
&lt;br /&gt;
    end() - Set the internal pointer of an array to its last element&lt;br /&gt;
    reset() - Set the internal pointer of an array to its first element&lt;br /&gt;
    prev() - Rewind the internal array pointer&lt;br /&gt;
    next() - Advance the internal array pointer of an array&lt;br /&gt;
    current() - Return the current element in an array&lt;br /&gt;
    key() - Fetch a key from an array&lt;br /&gt;
    each() - Return the current key and value pair from an array and advance the array cursor&lt;br /&gt;
&lt;br /&gt;
== Add Space before every capital letter==&lt;br /&gt;
From [https://bytes.com/topic/php/answers/8589-how-insert-space-before-capital-letters-string -how-insert-space-before-capital-letters-string]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
preg_replace('/(\w+)([A-Z])/U', '\\1 \\2', $cleanStorage); &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Working with file creation and removal ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
      $content = '&amp;lt;?php ' . $preProcessing . ' ?&amp;gt;';&lt;br /&gt;
      $file = dirname(__FILE__) . '/'. $event_name . '.php';&lt;br /&gt;
      file_put_contents($file, $content);&lt;br /&gt;
      include &amp;quot;$file&amp;quot;;&lt;br /&gt;
      unlink &amp;quot;$file&amp;quot;;&lt;br /&gt;
      $result['file'] = $file;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Set defaults if not provided to function ==&lt;br /&gt;
[https://stackoverflow.com/questions/9166914/using-default-arguments-in-a-function| Using defaults in a function in PHP]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
function foo($blah, $x = null, $y = null) {&lt;br /&gt;
    if (null === $x) {&lt;br /&gt;
        $x = &amp;quot;some value&amp;quot;;&lt;br /&gt;
    }&lt;br /&gt;
    if (null === $y) {&lt;br /&gt;
        $y = &amp;quot;some other value&amp;quot;;&lt;br /&gt;
    }&lt;br /&gt;
    code here!&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Make Indexed Array Unique ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$temp_array = array();&lt;br /&gt;
foreach ($array as &amp;amp;$v) {&lt;br /&gt;
    if (!isset($temp_array[$v['name']]))&lt;br /&gt;
        $temp_array[$v['name']] =&amp;amp; $v;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Array by reference ==&lt;br /&gt;
Using the &amp;amp; VARIABLE will change the value within the original array.  This was given by chatGPT questions.  Nice simple example&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
// Without reference&lt;br /&gt;
$array1 = [1, 2, 3];&lt;br /&gt;
foreach ($array1 as $v) {&lt;br /&gt;
    $v = $v * 2;&lt;br /&gt;
}&lt;br /&gt;
print_r($array1);  // Outputs: Array ( [0] =&amp;gt; 1 [1] =&amp;gt; 2 [2] =&amp;gt; 3 )&lt;br /&gt;
&lt;br /&gt;
// With reference&lt;br /&gt;
$array2 = [1, 2, 3];&lt;br /&gt;
foreach ($array2 as &amp;amp;$v) {&lt;br /&gt;
    $v = $v * 2;&lt;br /&gt;
}&lt;br /&gt;
print_r($array2);  // Outputs: Array ( [0] =&amp;gt; 2 [1] =&amp;gt; 4 [2] =&amp;gt; 6 )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Slim4 Framework Notes==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
return $this-&amp;gt;respondWithData($data);&lt;br /&gt;
throw new HttpBadRequestException($this-&amp;gt;request, &amp;quot;Error message details&amp;quot;);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Simple PHP validation==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find src -type f -name '*.php' -exec php -l {} \;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Some cheat-sheets ==&lt;br /&gt;
Random cheat-sheets found on the net that are kinda useful&lt;br /&gt;
&lt;br /&gt;
# [https://cheatography.com/davechild/cheat-sheets/php/| Looks like a pretty good PHP sheet]&lt;br /&gt;
# [https://cheatography.com/krabat1/cheat-sheets/php/| This one is REALLY big, but looks to have good info]&lt;br /&gt;
[[Category:PHP]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=658</id>
		<title>Git notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=658"/>
		<updated>2025-05-27T16:18:29Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* mis-synced between parent and submodule commits */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== General git commands that are useful ===&lt;br /&gt;
== Change git remote:==&lt;br /&gt;
&lt;br /&gt;
With or without ssh keys added to your user account..&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin git@gitlab01.iwillfearnoevil.com:monitoring/nmsui.git&lt;br /&gt;
or:&lt;br /&gt;
git remote set-url origin https://gitlab01.iwillfearnoevil.com/monitoring/nmsui.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Sort git branches by last commit ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=committerdate refs/heads/ --format='%(HEAD) %(color:yellow)%(refname:short)%(color:reset) - %(color:red)%(objectname:short)%(color:reset) - %(contents:subject) - %(authorname) (%(color:green)%(committerdate:relative)%(color:reset))'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Show git commit hashes for each branch sorted by date ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=-committerdate refs/heads/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Do a git diff between two branches ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git diff &amp;lt;branch&amp;gt;..origin/&amp;lt;branch2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Update Submodules ==&lt;br /&gt;
[https://stackoverflow.com/questions/1030169/pull-latest-changes-for-all-git-submodules pull-latest-changes-for-all-git-submodules]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --recursive --init&lt;br /&gt;
git submodule update --recursive --remote&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Checkout specific commit hash ==&lt;br /&gt;
This can be useful when the hash is not in the expected branch, or when you are in a detached head state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone [remote_address_here] my_repo&lt;br /&gt;
cd my_repo&lt;br /&gt;
git reset --hard [ENTER HERE THE COMMIT HASH YOU WANT]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change submodule URL ==&lt;br /&gt;
This is very much a hammer way of doing this.  I have seen elegant ways online, but they seem inconsistent when something goes wrong.  This way is reproducible as far as I am concerned.&lt;br /&gt;
&lt;br /&gt;
This example assumes that directory/ is where your submodule lives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
edit .gitmodule&lt;br /&gt;
change values&lt;br /&gt;
rm -rf directory/&lt;br /&gt;
git submodule update --init --recursive --remote&lt;br /&gt;
cd into directory/&lt;br /&gt;
git pull whatever submodule branch you need&lt;br /&gt;
cd ..&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'save new submodule changes'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Save image in README.md on github.com ==&lt;br /&gt;
Per: [https://stackoverflow.com/questions/14494747/how-to-add-images-to-readme-md-on-github Stackoverflow]&lt;br /&gt;
&lt;br /&gt;
Works:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Very Simple : Can be done using Ctrl + C/V&lt;br /&gt;
&lt;br /&gt;
Most of the answers here directly or indirectly involve uploading the image somewhere else &amp;amp; then providing a link to it.&lt;br /&gt;
&lt;br /&gt;
It can be done very simply by just copying any image and pasting it while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
    Copying the image - You can just click on the image file and use Ctrl + C or may copy the screenshot image to your clipboard using the snipping tool&lt;br /&gt;
    You can then simply do Ctrl + V while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
Guithub will automatically upload it to user-images.githubusercontent.com and a link to it will be inserted there&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change remote URL ==&lt;br /&gt;
[https://stackoverflow.com/questions/2432764/how-do-i-change-the-uri-url-for-a-remote-git-repository Change Git remote URL]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin new.git.url/here&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Ignore local changes and pull ==&lt;br /&gt;
When you simply want to start over and nuke all your local mess....&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/4157189/how-to-git-pull-while-ignoring-local-changes Stackoverflow on doing this]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git fetch --all&lt;br /&gt;
git reset --hard origin/&amp;lt;branch_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Clean up local orphan branches ==&lt;br /&gt;
When you have branches locally present that have been merged on the remote server this works to clean out your local repo.&lt;br /&gt;
&lt;br /&gt;
If there are still some that you want to keep, add in a 'grep -v &amp;quot;someName&amp;quot;' before xargs&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/7726949/remove-tracking-branches-no-longer-on-remote remove tracking branches no longer on remote]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git branch --merged | grep -v &amp;quot;\*&amp;quot; | xargs -n 1 git branch -d&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Get latest commit hash for scripts ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git rev-parse HEAD&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Remove submodules of submodules ==&lt;br /&gt;
* This has been a recurring painful process.  This is one solution that has worked for me&lt;br /&gt;
[https://stackoverflow.com/questions/4185365/no-submodule-mapping-found-in-gitmodule-for-a-path-thats-not-a-submodule Stack Overflow discussion on this]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone --recursive REPO&lt;br /&gt;
cd down/to/submodule&lt;br /&gt;
git checkout branch needed&lt;br /&gt;
rm -f .gitmodules&lt;br /&gt;
git submodule -init&lt;br /&gt;
git submodule -update&lt;br /&gt;
git rm --cached file/path/with/error/message&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove submodules of submodules'&lt;br /&gt;
git push&lt;br /&gt;
cd parent repository&lt;br /&gt;
cat .gitmodules&lt;br /&gt;
Verify no references to submodules of submodules&lt;br /&gt;
git submodule status --recursive&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove references in submodule to nested submodules'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Nuke a bad commit before push ==&lt;br /&gt;
* When you have saved a commit but it is not been pushed and you want to get rid of it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Keep the work done:&lt;br /&gt;
git reset --soft HEAD~1&lt;br /&gt;
&lt;br /&gt;
Destroy all work done and get last full commit&lt;br /&gt;
git reset --hard HEAD~1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Drop changes and pull submodules ==&lt;br /&gt;
* When you have a submodule that is out of whack with the main repo and you cannot figure out which commit it is supposed to be at easily&lt;br /&gt;
&lt;br /&gt;
* reset --hard gets rid of all local changes&lt;br /&gt;
* clean -fdx removes untracked files and directories&lt;br /&gt;
* The submodule update resets the submodule’s HEAD to what the parent repo wants&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd path/to/submodule&lt;br /&gt;
git reset --hard&lt;br /&gt;
git clean -fdx&lt;br /&gt;
cd ..&lt;br /&gt;
git submodule update --init --recursive --force&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== mis-synced between parent and submodule commits ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --init --recursive --force&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Find branch in detached head ==&lt;br /&gt;
[https://stackoverflow.com/questions/6059336/how-to-find-the-current-git-branch-in-detached-head-state | Find branch in detached head]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git show-ref | grep $(git log --pretty=%h -1) | sed 's|.*/\(.*\)|\1|' | sort -u | grep -v HEAD&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Git]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=657</id>
		<title>Git notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=657"/>
		<updated>2025-05-16T15:24:40Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* Drop changes and pull submodules */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== General git commands that are useful ===&lt;br /&gt;
== Change git remote:==&lt;br /&gt;
&lt;br /&gt;
With or without ssh keys added to your user account..&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin git@gitlab01.iwillfearnoevil.com:monitoring/nmsui.git&lt;br /&gt;
or:&lt;br /&gt;
git remote set-url origin https://gitlab01.iwillfearnoevil.com/monitoring/nmsui.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Sort git branches by last commit ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=committerdate refs/heads/ --format='%(HEAD) %(color:yellow)%(refname:short)%(color:reset) - %(color:red)%(objectname:short)%(color:reset) - %(contents:subject) - %(authorname) (%(color:green)%(committerdate:relative)%(color:reset))'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Show git commit hashes for each branch sorted by date ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=-committerdate refs/heads/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Do a git diff between two branches ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git diff &amp;lt;branch&amp;gt;..origin/&amp;lt;branch2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Update Submodules ==&lt;br /&gt;
[https://stackoverflow.com/questions/1030169/pull-latest-changes-for-all-git-submodules pull-latest-changes-for-all-git-submodules]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --recursive --init&lt;br /&gt;
git submodule update --recursive --remote&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Checkout specific commit hash ==&lt;br /&gt;
This can be useful when the hash is not in the expected branch, or when you are in a detached head state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone [remote_address_here] my_repo&lt;br /&gt;
cd my_repo&lt;br /&gt;
git reset --hard [ENTER HERE THE COMMIT HASH YOU WANT]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change submodule URL ==&lt;br /&gt;
This is very much a hammer way of doing this.  I have seen elegant ways online, but they seem inconsistent when something goes wrong.  This way is reproducible as far as I am concerned.&lt;br /&gt;
&lt;br /&gt;
This example assumes that directory/ is where your submodule lives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
edit .gitmodule&lt;br /&gt;
change values&lt;br /&gt;
rm -rf directory/&lt;br /&gt;
git submodule update --init --recursive --remote&lt;br /&gt;
cd into directory/&lt;br /&gt;
git pull whatever submodule branch you need&lt;br /&gt;
cd ..&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'save new submodule changes'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Save image in README.md on github.com ==&lt;br /&gt;
Per: [https://stackoverflow.com/questions/14494747/how-to-add-images-to-readme-md-on-github Stackoverflow]&lt;br /&gt;
&lt;br /&gt;
Works:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Very Simple : Can be done using Ctrl + C/V&lt;br /&gt;
&lt;br /&gt;
Most of the answers here directly or indirectly involve uploading the image somewhere else &amp;amp; then providing a link to it.&lt;br /&gt;
&lt;br /&gt;
It can be done very simply by just copying any image and pasting it while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
    Copying the image - You can just click on the image file and use Ctrl + C or may copy the screenshot image to your clipboard using the snipping tool&lt;br /&gt;
    You can then simply do Ctrl + V while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
Guithub will automatically upload it to user-images.githubusercontent.com and a link to it will be inserted there&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change remote URL ==&lt;br /&gt;
[https://stackoverflow.com/questions/2432764/how-do-i-change-the-uri-url-for-a-remote-git-repository Change Git remote URL]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin new.git.url/here&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Ignore local changes and pull ==&lt;br /&gt;
When you simply want to start over and nuke all your local mess....&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/4157189/how-to-git-pull-while-ignoring-local-changes Stackoverflow on doing this]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git fetch --all&lt;br /&gt;
git reset --hard origin/&amp;lt;branch_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Clean up local orphan branches ==&lt;br /&gt;
When you have branches locally present that have been merged on the remote server this works to clean out your local repo.&lt;br /&gt;
&lt;br /&gt;
If there are still some that you want to keep, add in a 'grep -v &amp;quot;someName&amp;quot;' before xargs&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/7726949/remove-tracking-branches-no-longer-on-remote remove tracking branches no longer on remote]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git branch --merged | grep -v &amp;quot;\*&amp;quot; | xargs -n 1 git branch -d&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Get latest commit hash for scripts ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git rev-parse HEAD&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Remove submodules of submodules ==&lt;br /&gt;
* This has been a recurring painful process.  This is one solution that has worked for me&lt;br /&gt;
[https://stackoverflow.com/questions/4185365/no-submodule-mapping-found-in-gitmodule-for-a-path-thats-not-a-submodule Stack Overflow discussion on this]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone --recursive REPO&lt;br /&gt;
cd down/to/submodule&lt;br /&gt;
git checkout branch needed&lt;br /&gt;
rm -f .gitmodules&lt;br /&gt;
git submodule -init&lt;br /&gt;
git submodule -update&lt;br /&gt;
git rm --cached file/path/with/error/message&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove submodules of submodules'&lt;br /&gt;
git push&lt;br /&gt;
cd parent repository&lt;br /&gt;
cat .gitmodules&lt;br /&gt;
Verify no references to submodules of submodules&lt;br /&gt;
git submodule status --recursive&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove references in submodule to nested submodules'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Nuke a bad commit before push ==&lt;br /&gt;
* When you have saved a commit but it is not been pushed and you want to get rid of it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Keep the work done:&lt;br /&gt;
git reset --soft HEAD~1&lt;br /&gt;
&lt;br /&gt;
Destroy all work done and get last full commit&lt;br /&gt;
git reset --hard HEAD~1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Drop changes and pull submodules ==&lt;br /&gt;
* When you have a submodule that is out of whack with the main repo and you cannot figure out which commit it is supposed to be at easily&lt;br /&gt;
&lt;br /&gt;
* reset --hard gets rid of all local changes&lt;br /&gt;
* clean -fdx removes untracked files and directories&lt;br /&gt;
* The submodule update resets the submodule’s HEAD to what the parent repo wants&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd path/to/submodule&lt;br /&gt;
git reset --hard&lt;br /&gt;
git clean -fdx&lt;br /&gt;
cd ..&lt;br /&gt;
git submodule update --init --recursive --force&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== mis-synced between parent and submodule commits ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --init --recursive --force&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Git]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=656</id>
		<title>Git notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=656"/>
		<updated>2025-05-16T15:23:39Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* Nuke a bad commit before push */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== General git commands that are useful ===&lt;br /&gt;
== Change git remote:==&lt;br /&gt;
&lt;br /&gt;
With or without ssh keys added to your user account..&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin git@gitlab01.iwillfearnoevil.com:monitoring/nmsui.git&lt;br /&gt;
or:&lt;br /&gt;
git remote set-url origin https://gitlab01.iwillfearnoevil.com/monitoring/nmsui.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Sort git branches by last commit ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=committerdate refs/heads/ --format='%(HEAD) %(color:yellow)%(refname:short)%(color:reset) - %(color:red)%(objectname:short)%(color:reset) - %(contents:subject) - %(authorname) (%(color:green)%(committerdate:relative)%(color:reset))'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Show git commit hashes for each branch sorted by date ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=-committerdate refs/heads/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Do a git diff between two branches ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git diff &amp;lt;branch&amp;gt;..origin/&amp;lt;branch2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Update Submodules ==&lt;br /&gt;
[https://stackoverflow.com/questions/1030169/pull-latest-changes-for-all-git-submodules pull-latest-changes-for-all-git-submodules]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --recursive --init&lt;br /&gt;
git submodule update --recursive --remote&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Checkout specific commit hash ==&lt;br /&gt;
This can be useful when the hash is not in the expected branch, or when you are in a detached head state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone [remote_address_here] my_repo&lt;br /&gt;
cd my_repo&lt;br /&gt;
git reset --hard [ENTER HERE THE COMMIT HASH YOU WANT]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change submodule URL ==&lt;br /&gt;
This is very much a hammer way of doing this.  I have seen elegant ways online, but they seem inconsistent when something goes wrong.  This way is reproducible as far as I am concerned.&lt;br /&gt;
&lt;br /&gt;
This example assumes that directory/ is where your submodule lives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
edit .gitmodule&lt;br /&gt;
change values&lt;br /&gt;
rm -rf directory/&lt;br /&gt;
git submodule update --init --recursive --remote&lt;br /&gt;
cd into directory/&lt;br /&gt;
git pull whatever submodule branch you need&lt;br /&gt;
cd ..&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'save new submodule changes'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Save image in README.md on github.com ==&lt;br /&gt;
Per: [https://stackoverflow.com/questions/14494747/how-to-add-images-to-readme-md-on-github Stackoverflow]&lt;br /&gt;
&lt;br /&gt;
Works:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Very Simple : Can be done using Ctrl + C/V&lt;br /&gt;
&lt;br /&gt;
Most of the answers here directly or indirectly involve uploading the image somewhere else &amp;amp; then providing a link to it.&lt;br /&gt;
&lt;br /&gt;
It can be done very simply by just copying any image and pasting it while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
    Copying the image - You can just click on the image file and use Ctrl + C or may copy the screenshot image to your clipboard using the snipping tool&lt;br /&gt;
    You can then simply do Ctrl + V while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
Guithub will automatically upload it to user-images.githubusercontent.com and a link to it will be inserted there&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change remote URL ==&lt;br /&gt;
[https://stackoverflow.com/questions/2432764/how-do-i-change-the-uri-url-for-a-remote-git-repository Change Git remote URL]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin new.git.url/here&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Ignore local changes and pull ==&lt;br /&gt;
When you simply want to start over and nuke all your local mess....&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/4157189/how-to-git-pull-while-ignoring-local-changes Stackoverflow on doing this]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git fetch --all&lt;br /&gt;
git reset --hard origin/&amp;lt;branch_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Clean up local orphan branches ==&lt;br /&gt;
When you have branches locally present that have been merged on the remote server this works to clean out your local repo.&lt;br /&gt;
&lt;br /&gt;
If there are still some that you want to keep, add in a 'grep -v &amp;quot;someName&amp;quot;' before xargs&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/7726949/remove-tracking-branches-no-longer-on-remote remove tracking branches no longer on remote]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git branch --merged | grep -v &amp;quot;\*&amp;quot; | xargs -n 1 git branch -d&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Get latest commit hash for scripts ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git rev-parse HEAD&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Remove submodules of submodules ==&lt;br /&gt;
* This has been a recurring painful process.  This is one solution that has worked for me&lt;br /&gt;
[https://stackoverflow.com/questions/4185365/no-submodule-mapping-found-in-gitmodule-for-a-path-thats-not-a-submodule Stack Overflow discussion on this]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone --recursive REPO&lt;br /&gt;
cd down/to/submodule&lt;br /&gt;
git checkout branch needed&lt;br /&gt;
rm -f .gitmodules&lt;br /&gt;
git submodule -init&lt;br /&gt;
git submodule -update&lt;br /&gt;
git rm --cached file/path/with/error/message&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove submodules of submodules'&lt;br /&gt;
git push&lt;br /&gt;
cd parent repository&lt;br /&gt;
cat .gitmodules&lt;br /&gt;
Verify no references to submodules of submodules&lt;br /&gt;
git submodule status --recursive&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove references in submodule to nested submodules'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Nuke a bad commit before push ==&lt;br /&gt;
* When you have saved a commit but it is not been pushed and you want to get rid of it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Keep the work done:&lt;br /&gt;
git reset --soft HEAD~1&lt;br /&gt;
&lt;br /&gt;
Destroy all work done and get last full commit&lt;br /&gt;
git reset --hard HEAD~1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Drop changes and pull submodules ==&lt;br /&gt;
* When you have a submodule that is out of whack with the main repo and you cannot figure out which commit it is supposed to be at easily&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd path/to/submodule&lt;br /&gt;
git reset --hard&lt;br /&gt;
git clean -fdx&lt;br /&gt;
cd ..&lt;br /&gt;
git submodule update --init --recursive --force&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== mis-synced between parent and submodule commits ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --init --recursive --force&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Git]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=655</id>
		<title>Git notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Git_notes&amp;diff=655"/>
		<updated>2025-05-02T17:59:07Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== General git commands that are useful ===&lt;br /&gt;
== Change git remote:==&lt;br /&gt;
&lt;br /&gt;
With or without ssh keys added to your user account..&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin git@gitlab01.iwillfearnoevil.com:monitoring/nmsui.git&lt;br /&gt;
or:&lt;br /&gt;
git remote set-url origin https://gitlab01.iwillfearnoevil.com/monitoring/nmsui.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Sort git branches by last commit ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=committerdate refs/heads/ --format='%(HEAD) %(color:yellow)%(refname:short)%(color:reset) - %(color:red)%(objectname:short)%(color:reset) - %(contents:subject) - %(authorname) (%(color:green)%(committerdate:relative)%(color:reset))'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Show git commit hashes for each branch sorted by date ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git for-each-ref --sort=-committerdate refs/heads/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Do a git diff between two branches ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git diff &amp;lt;branch&amp;gt;..origin/&amp;lt;branch2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Update Submodules ==&lt;br /&gt;
[https://stackoverflow.com/questions/1030169/pull-latest-changes-for-all-git-submodules pull-latest-changes-for-all-git-submodules]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git submodule update --recursive --init&lt;br /&gt;
git submodule update --recursive --remote&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Checkout specific commit hash ==&lt;br /&gt;
This can be useful when the hash is not in the expected branch, or when you are in a detached head state&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone [remote_address_here] my_repo&lt;br /&gt;
cd my_repo&lt;br /&gt;
git reset --hard [ENTER HERE THE COMMIT HASH YOU WANT]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change submodule URL ==&lt;br /&gt;
This is very much a hammer way of doing this.  I have seen elegant ways online, but they seem inconsistent when something goes wrong.  This way is reproducible as far as I am concerned.&lt;br /&gt;
&lt;br /&gt;
This example assumes that directory/ is where your submodule lives.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
edit .gitmodule&lt;br /&gt;
change values&lt;br /&gt;
rm -rf directory/&lt;br /&gt;
git submodule update --init --recursive --remote&lt;br /&gt;
cd into directory/&lt;br /&gt;
git pull whatever submodule branch you need&lt;br /&gt;
cd ..&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'save new submodule changes'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Save image in README.md on github.com ==&lt;br /&gt;
Per: [https://stackoverflow.com/questions/14494747/how-to-add-images-to-readme-md-on-github Stackoverflow]&lt;br /&gt;
&lt;br /&gt;
Works:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Very Simple : Can be done using Ctrl + C/V&lt;br /&gt;
&lt;br /&gt;
Most of the answers here directly or indirectly involve uploading the image somewhere else &amp;amp; then providing a link to it.&lt;br /&gt;
&lt;br /&gt;
It can be done very simply by just copying any image and pasting it while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
    Copying the image - You can just click on the image file and use Ctrl + C or may copy the screenshot image to your clipboard using the snipping tool&lt;br /&gt;
    You can then simply do Ctrl + V while editing Readme.md&lt;br /&gt;
&lt;br /&gt;
Guithub will automatically upload it to user-images.githubusercontent.com and a link to it will be inserted there&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Change remote URL ==&lt;br /&gt;
[https://stackoverflow.com/questions/2432764/how-do-i-change-the-uri-url-for-a-remote-git-repository Change Git remote URL]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git remote set-url origin new.git.url/here&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Ignore local changes and pull ==&lt;br /&gt;
When you simply want to start over and nuke all your local mess....&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/4157189/how-to-git-pull-while-ignoring-local-changes Stackoverflow on doing this]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git fetch --all&lt;br /&gt;
git reset --hard origin/&amp;lt;branch_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Clean up local orphan branches ==&lt;br /&gt;
When you have branches locally present that have been merged on the remote server this works to clean out your local repo.&lt;br /&gt;
&lt;br /&gt;
If there are still some that you want to keep, add in a 'grep -v &amp;quot;someName&amp;quot;' before xargs&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/7726949/remove-tracking-branches-no-longer-on-remote remove tracking branches no longer on remote]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git branch --merged | grep -v &amp;quot;\*&amp;quot; | xargs -n 1 git branch -d&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Get latest commit hash for scripts ==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git rev-parse HEAD&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Remove submodules of submodules ==&lt;br /&gt;
* This has been a recurring painful process.  This is one solution that has worked for me&lt;br /&gt;
[https://stackoverflow.com/questions/4185365/no-submodule-mapping-found-in-gitmodule-for-a-path-thats-not-a-submodule Stack Overflow discussion on this]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone --recursive REPO&lt;br /&gt;
cd down/to/submodule&lt;br /&gt;
git checkout branch needed&lt;br /&gt;
rm -f .gitmodules&lt;br /&gt;
git submodule -init&lt;br /&gt;
git submodule -update&lt;br /&gt;
git rm --cached file/path/with/error/message&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove submodules of submodules'&lt;br /&gt;
git push&lt;br /&gt;
cd parent repository&lt;br /&gt;
cat .gitmodules&lt;br /&gt;
Verify no references to submodules of submodules&lt;br /&gt;
git submodule status --recursive&lt;br /&gt;
git status&lt;br /&gt;
git add -A&lt;br /&gt;
git commit -m 'Remove references in submodule to nested submodules'&lt;br /&gt;
git push&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Nuke a bad commit before push ==&lt;br /&gt;
* When you have saved a commit but it is not been pushed and you want to get rid of it&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Keep the work done:&lt;br /&gt;
git reset --soft HEAD~1&lt;br /&gt;
&lt;br /&gt;
Destroy all work done and get last full commit&lt;br /&gt;
git reset --hard HEAD~1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Git]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=654</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=654"/>
		<updated>2025-04-21T18:31:49Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-21-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still running 2X2 on server.  Load Average consistent 1.10 15 min, and peaks 3.50 on 1min when iteration starts up&lt;br /&gt;
* Not been focused on this too much lately.  Continuing to try to get Devices to look the way they do in my head...&lt;br /&gt;
* Still eating Crayons...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Ssh&amp;diff=653</id>
		<title>Category:Ssh</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Ssh&amp;diff=653"/>
		<updated>2025-04-15T20:27:11Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;Generic &amp;quot;anything&amp;quot; related specifically to ssh and ssh services&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Generic &amp;quot;anything&amp;quot; related specifically to ssh and ssh services&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Ssh&amp;diff=652</id>
		<title>Ssh</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Ssh&amp;diff=652"/>
		<updated>2025-04-15T17:37:06Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;== SSH examples for abnormal tasks == Yes, I can never remember exactly how to get the !#$!@# port forwarding done correctly.  Bah!  ===SSH port forwarding for postgres=== * Accounting for possibility of using abnormal ssh ports and strange ssh keys. * Connect pgadmin to localhost:5432 and set your authentication  &amp;lt;pre&amp;gt; export KEY=~/.ssh/some_key export PORT=2345 export JUMPBOX=192.168.15.58 ssh -i ${KEY} -p ${PORT} -L 5432:192.168.15.250:5432 SSH_USER@$${JUMPBOX} -N &amp;lt;/p...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== SSH examples for abnormal tasks ==&lt;br /&gt;
Yes, I can never remember exactly how to get the !#$!@# port forwarding done correctly.  Bah!&lt;br /&gt;
&lt;br /&gt;
===SSH port forwarding for postgres===&lt;br /&gt;
* Accounting for possibility of using abnormal ssh ports and strange ssh keys.&lt;br /&gt;
* Connect pgadmin to localhost:5432 and set your authentication&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export KEY=~/.ssh/some_key&lt;br /&gt;
export PORT=2345&lt;br /&gt;
export JUMPBOX=192.168.15.58&lt;br /&gt;
ssh -i ${KEY} -p ${PORT} -L 5432:192.168.15.250:5432 SSH_USER@$${JUMPBOX} -N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]][[Category:Ssh]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-ps-examples&amp;diff=651</id>
		<title>Bash-ps-examples</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-ps-examples&amp;diff=651"/>
		<updated>2025-04-10T16:35:10Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Generic Page of PS options ==&lt;br /&gt;
&lt;br /&gt;
=== threads from process ===&lt;br /&gt;
See the threads for a given process&lt;br /&gt;
* [https://serverfault.com/questions/932406/how-to-tell-threads-from-processes-in-top-and-ps-on-linux show threads from process]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ps aux |grep processName&lt;br /&gt;
ps -fly -T -p PID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-ps-examples&amp;diff=650</id>
		<title>Bash-ps-examples</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-ps-examples&amp;diff=650"/>
		<updated>2025-04-10T16:34:59Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;== Generic Page of PS options ==  === threads from process === See the threads for a given process * [https://serverfault.com/questions/932406/how-to-tell-threads-from-processes-in-top-and-ps-on-linux show threads from process] &amp;lt;pre&amp;gt; ps aux |grep processName ps -fly -T -p PID &amp;lt;/pre&amp;gt;&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Generic Page of PS options ==&lt;br /&gt;
&lt;br /&gt;
=== threads from process ===&lt;br /&gt;
See the threads for a given process&lt;br /&gt;
* [https://serverfault.com/questions/932406/how-to-tell-threads-from-processes-in-top-and-ps-on-linux show threads from process]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ps aux |grep processName&lt;br /&gt;
ps -fly -T -p PID&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Postgres&amp;diff=649</id>
		<title>Postgres</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Postgres&amp;diff=649"/>
		<updated>2025-04-10T16:29:58Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Postgres Notes ==&lt;br /&gt;
I generally use MySQL, however there have been cases where I need to use Postgres.  I can never remember the exact syntax to do basic things, so here we are...&lt;br /&gt;
&lt;br /&gt;
=== Create User and Database ===&lt;br /&gt;
Admin login to Postgeres (fresh install Pg14)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo -u postgres psql postgres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Create User&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
CREATE ROLE someUser LOGIN PASSWORD 'somePassword';&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Create database and add someUser as the owner&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
CREATE DATABASE databaseName with owner = someUser;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Validate that this worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
psql -h localhost -d databaseName -U someUser -p 5432&lt;br /&gt;
Password for user someUser: somePassword&lt;br /&gt;
psql (14.11 (Ubuntu 14.11-0ubuntu0.22.04.1))&lt;br /&gt;
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)&lt;br /&gt;
Type &amp;quot;help&amp;quot; for help.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Useful commands to remember ===&lt;br /&gt;
* \q   quit&lt;br /&gt;
* \dt  display tables&lt;br /&gt;
* \l   list databases&lt;br /&gt;
&lt;br /&gt;
=== Migrate from MySQL to Psql ===&lt;br /&gt;
install pgloader application&lt;br /&gt;
&lt;br /&gt;
Edit script.lisp&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/* content of the script.lisp */&lt;br /&gt;
LOAD DATABASE&lt;br /&gt;
FROM mysql://mysqlUser:mysqlPassword@localhost|IP/oldDatabaseName&lt;br /&gt;
INTO postgresql://someUser:somePassword@localhost/databaseName;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Run: pgloader script.lisp&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
2024-04-03T18:56:34.028000Z LOG pgloader version &amp;quot;3.6.3~devel&amp;quot;&lt;br /&gt;
2024-04-03T18:56:34.036000Z LOG Data errors in '/tmp/pgloader/'&lt;br /&gt;
2024-04-03T18:56:34.036000Z LOG Parsing commands from file #P&amp;quot;/home/chubbard/script.lisp&amp;quot;&lt;br /&gt;
2024-04-03T18:56:34.268005Z LOG Migrating from #&amp;lt;MYSQL-CONNECTION mysql://mysqlUser@192.168.15.250:3306/oldDatabaseName {10080B3B03}&amp;gt;&lt;br /&gt;
2024-04-03T18:56:34.272005Z LOG Migrating into #&amp;lt;PGSQL-CONNECTION pgsql://someUser@localhost:5432/databaseName {10080B4783}&amp;gt;&lt;br /&gt;
2024-04-03T18:56:36.244042Z LOG report summary reset&lt;br /&gt;
             table name     errors       rows      bytes      total time&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
        fetch meta data          0          3                     1.204s&lt;br /&gt;
         Create Schemas          0          0                     0.024s&lt;br /&gt;
       Create SQL Types          0          0                     0.024s&lt;br /&gt;
          Create tables          0          2                     0.076s&lt;br /&gt;
         Set Table OIDs          0          1                     0.020s&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
     databaseName.state          0          8     0.7 kB          0.084s&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
COPY Threads Completion          0          4                     0.092s&lt;br /&gt;
 Index Build Completion          0          2                     0.080s&lt;br /&gt;
         Create Indexes          0          2                     0.016s&lt;br /&gt;
        Reset Sequences          0          1                     0.092s&lt;br /&gt;
           Primary Keys          0          1                     0.004s&lt;br /&gt;
    Create Foreign Keys          0          0                     0.000s&lt;br /&gt;
        Create Triggers          0          0                     0.000s&lt;br /&gt;
        Set Search Path          0          1                     0.000s&lt;br /&gt;
       Install Comments          0          0                     0.000s&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
      Total import time          ✓          8     0.7 kB          0.284s&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Dump a Postgres database ===&lt;br /&gt;
To dump just the schema and indexes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pg_dump -s databaseName -U someUser -h localhost &amp;gt; databaseName_database_schema_postgres.sql &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Find active queries ===&lt;br /&gt;
See what is actually happening at the time of a query&lt;br /&gt;
[https://stackoverflow.com/questions/27435839/how-to-list-active-connections-on-postgresql List active connections on postgresql]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
someDb=# SELECT&lt;br /&gt;
someDb-#     pid&lt;br /&gt;
someDb-#     ,datname&lt;br /&gt;
someDb-#     ,usename&lt;br /&gt;
someDb-#     ,application_name&lt;br /&gt;
someDb-#     ,client_hostname&lt;br /&gt;
someDb-#     ,client_port&lt;br /&gt;
someDb-#     ,backend_start&lt;br /&gt;
someDb-#     ,query_start&lt;br /&gt;
someDb-#     ,query&lt;br /&gt;
someDb-#     ,state&lt;br /&gt;
someDb-# FROM pg_stat_activity&lt;br /&gt;
someDb-# WHERE state = 'active';&lt;br /&gt;
 pid  | datname | usename | application_name | client_hostname | client_port |         backend_start         |          query_start          |                  query                  | state&lt;br /&gt;
------+---------+---------+------------------+-----------------+-------------+-------------------------------+-------------------------------+-----------------------------------------+--------&lt;br /&gt;
 9132 | someDb  | someDb  | psql             |                 |          -1 | 2025-04-10 16:23:07.854726+00 | 2025-04-10 16:25:08.334369+00 | SELECT                                 +| active&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     pid                                +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,datname                           +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,usename                           +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,application_name                  +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,client_hostname                   +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,client_port                       +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,backend_start                     +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,query_start                       +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,query                             +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,state                             +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               | FROM pg_stat_activity                  +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               | WHERE state = 'active';                 |&lt;br /&gt;
 5416 |         | someDb  | walreceiver      |                 |       57680 | 2025-04-10 15:38:21.861295+00 | 2025-04-10 15:38:21.874055+00 | START_REPLICATION A/E2000000 TIMELINE 1 | active&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Links ===&lt;br /&gt;
[https://stackoverflow.com/questions/2172569/how-to-login-and-authenticate-to-postgresql-after-a-fresh-install| Stack Overflow fresh install notes]&lt;br /&gt;
&lt;br /&gt;
[https://serverfault.com/questions/198002/postgresql-what-does-grant-all-privileges-on-database-do| Server Fault Grants and permissions information]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Postgres]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Postgres&amp;diff=648</id>
		<title>Postgres</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Postgres&amp;diff=648"/>
		<updated>2025-04-10T16:28:58Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* Dump a Postgres database */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Postgres Notes ==&lt;br /&gt;
I generally use MySQL, however there have been cases where I need to use Postgres.  I can never remember the exact syntax to do basic things, so here we are...&lt;br /&gt;
&lt;br /&gt;
=== Create User and Database ===&lt;br /&gt;
Admin login to Postgeres (fresh install Pg14)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo -u postgres psql postgres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Create User&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
CREATE ROLE someUser LOGIN PASSWORD 'somePassword';&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Create database and add someUser as the owner&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
CREATE DATABASE databaseName with owner = someUser;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Validate that this worked&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
psql -h localhost -d databaseName -U someUser -p 5432&lt;br /&gt;
Password for user someUser: somePassword&lt;br /&gt;
psql (14.11 (Ubuntu 14.11-0ubuntu0.22.04.1))&lt;br /&gt;
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)&lt;br /&gt;
Type &amp;quot;help&amp;quot; for help.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Useful commands to remember ===&lt;br /&gt;
* \q   quit&lt;br /&gt;
* \dt  display tables&lt;br /&gt;
* \l   list databases&lt;br /&gt;
&lt;br /&gt;
=== Migrate from MySQL to Psql ===&lt;br /&gt;
install pgloader application&lt;br /&gt;
&lt;br /&gt;
Edit script.lisp&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/* content of the script.lisp */&lt;br /&gt;
LOAD DATABASE&lt;br /&gt;
FROM mysql://mysqlUser:mysqlPassword@localhost|IP/oldDatabaseName&lt;br /&gt;
INTO postgresql://someUser:somePassword@localhost/databaseName;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Run: pgloader script.lisp&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
2024-04-03T18:56:34.028000Z LOG pgloader version &amp;quot;3.6.3~devel&amp;quot;&lt;br /&gt;
2024-04-03T18:56:34.036000Z LOG Data errors in '/tmp/pgloader/'&lt;br /&gt;
2024-04-03T18:56:34.036000Z LOG Parsing commands from file #P&amp;quot;/home/chubbard/script.lisp&amp;quot;&lt;br /&gt;
2024-04-03T18:56:34.268005Z LOG Migrating from #&amp;lt;MYSQL-CONNECTION mysql://mysqlUser@192.168.15.250:3306/oldDatabaseName {10080B3B03}&amp;gt;&lt;br /&gt;
2024-04-03T18:56:34.272005Z LOG Migrating into #&amp;lt;PGSQL-CONNECTION pgsql://someUser@localhost:5432/databaseName {10080B4783}&amp;gt;&lt;br /&gt;
2024-04-03T18:56:36.244042Z LOG report summary reset&lt;br /&gt;
             table name     errors       rows      bytes      total time&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
        fetch meta data          0          3                     1.204s&lt;br /&gt;
         Create Schemas          0          0                     0.024s&lt;br /&gt;
       Create SQL Types          0          0                     0.024s&lt;br /&gt;
          Create tables          0          2                     0.076s&lt;br /&gt;
         Set Table OIDs          0          1                     0.020s&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
     databaseName.state          0          8     0.7 kB          0.084s&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
COPY Threads Completion          0          4                     0.092s&lt;br /&gt;
 Index Build Completion          0          2                     0.080s&lt;br /&gt;
         Create Indexes          0          2                     0.016s&lt;br /&gt;
        Reset Sequences          0          1                     0.092s&lt;br /&gt;
           Primary Keys          0          1                     0.004s&lt;br /&gt;
    Create Foreign Keys          0          0                     0.000s&lt;br /&gt;
        Create Triggers          0          0                     0.000s&lt;br /&gt;
        Set Search Path          0          1                     0.000s&lt;br /&gt;
       Install Comments          0          0                     0.000s&lt;br /&gt;
-----------------------  ---------  ---------  ---------  --------------&lt;br /&gt;
      Total import time          ✓          8     0.7 kB          0.284s&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Dump a Postgres database ===&lt;br /&gt;
To dump just the schema and indexes&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pg_dump -s databaseName -U someUser -h localhost &amp;gt; databaseName_database_schema_postgres.sql &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Find active queries ===&lt;br /&gt;
See what is actually happening at the time of a query&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
someDb=# SELECT&lt;br /&gt;
someDb-#     pid&lt;br /&gt;
someDb-#     ,datname&lt;br /&gt;
someDb-#     ,usename&lt;br /&gt;
someDb-#     ,application_name&lt;br /&gt;
someDb-#     ,client_hostname&lt;br /&gt;
someDb-#     ,client_port&lt;br /&gt;
someDb-#     ,backend_start&lt;br /&gt;
someDb-#     ,query_start&lt;br /&gt;
someDb-#     ,query&lt;br /&gt;
someDb-#     ,state&lt;br /&gt;
someDb-# FROM pg_stat_activity&lt;br /&gt;
someDb-# WHERE state = 'active';&lt;br /&gt;
 pid  | datname | usename | application_name | client_hostname | client_port |         backend_start         |          query_start          |                  query                  | state&lt;br /&gt;
------+---------+---------+------------------+-----------------+-------------+-------------------------------+-------------------------------+-----------------------------------------+--------&lt;br /&gt;
 9132 | someDb  | someDb  | psql             |                 |          -1 | 2025-04-10 16:23:07.854726+00 | 2025-04-10 16:25:08.334369+00 | SELECT                                 +| active&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     pid                                +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,datname                           +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,usename                           +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,application_name                  +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,client_hostname                   +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,client_port                       +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,backend_start                     +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,query_start                       +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,query                             +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               |     ,state                             +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               | FROM pg_stat_activity                  +|&lt;br /&gt;
      |         |         |                  |                 |             |                               |                               | WHERE state = 'active';                 |&lt;br /&gt;
 5416 |         | someDb  | walreceiver      |                 |       57680 | 2025-04-10 15:38:21.861295+00 | 2025-04-10 15:38:21.874055+00 | START_REPLICATION A/E2000000 TIMELINE 1 | active&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Links ===&lt;br /&gt;
[https://stackoverflow.com/questions/2172569/how-to-login-and-authenticate-to-postgresql-after-a-fresh-install| Stack Overflow fresh install notes]&lt;br /&gt;
&lt;br /&gt;
[https://serverfault.com/questions/198002/postgresql-what-does-grant-all-privileges-on-database-do| Server Fault Grants and permissions information]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Postgres]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Ffmpeg_notes&amp;diff=647</id>
		<title>Ffmpeg notes</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Ffmpeg_notes&amp;diff=647"/>
		<updated>2025-04-09T03:12:56Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;ffmpeg when dealing with mp4 that skips and stutters in playback.  Seen in MythTV.&lt;br /&gt;
&lt;br /&gt;
[https://superuser.com/questions/895335/how-to-convert-an-mp4-to-mp4-using-ffmpeg| Superuser.com source link]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ffmpeg -i ./foo.mp4 -c:v libx264 -crf 24 -pix_fmt yuv420p -tune film -c:a aac -b:a 192k -ar 44100 -vol 300 -strict -2 -speed fastest ./bar.mp4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://www.reddit.com/r/ffmpeg/comments/cq9bbv/repairing_avi_file_brokenmissing_index/| reddit link with more details]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ffmpeg -i 'input' -map 0 -c copy 'output.mkv'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://www.reddit.com/r/ffmpeg/comments/9gm17g/convert_only_audio_dts_to_ac3_in_a_mkv_file/| Redit comment]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ffmpeg -i input.mkv -c:v copy -c:a ac3 -b:a 320k /path/to/usb/output.mkv&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://www.technibble.com/forums/threads/guide-to-making-mkv-audio-work-on-your-device-roku-etc.81333/| Remux audio to work for Roku]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
To get 2 channel:&lt;br /&gt;
the -ac:2 switch is used to tell it to use 2 channels&lt;br /&gt;
ffmpeg -i &amp;quot;National Treasure.mkv&amp;quot; -c:v copy -c:a aac -ac 2 -b:a 256K &amp;quot;National Treasure.mp4&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Convert from H265 to H264&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ffmpeg -xerror -i 'h265/The_Flash.mkv' -hide_banner -threads 2 -map 0 -c:a copy -c:s copy -c:v libx264 -pix_fmt yuv420p 'The_Flash.mkv'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Repair file where it says the mp3 header is borked..  Two step process&lt;br /&gt;
[https://video.stackexchange.com/questions/32718/fix-mp3float-header-missing-without-re-encode | Fix mp3 err without a re-encode]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ffmpeg -i ./Busou_Renkin_s01e01.avi -c:a copy Clean_Busou_Renkin_s01e01.mp3&lt;br /&gt;
ffmpeg -i ./Busou_Renkin_s01e01.avi -i ./Clean_Busou_Renkin_s01e01.mp3 -c:v copy -c:a copy -map 0:v:0 -map 1:a:0 Repaired_Busou_Renkin_s01e01.avi&lt;br /&gt;
ffmpeg -i Repaired_Busou_Renkin_s01e01.avi&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:ffmpeg]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash_rename_files&amp;diff=646</id>
		<title>Bash rename files</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash_rename_files&amp;diff=646"/>
		<updated>2025-04-03T21:43:23Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;==Rename files that have reserved characters==  When ls is showing that funny diamond-question value in the output or you see a litteral $ in the filename, here is the fix I was able to use... &amp;lt;pre&amp;gt; ls output via -b N\343o_Wave   via ls -i 213764100 'N'$'\343''o_Wave'  # Use the inode value to move the file to something typable find . -maxdepth 1 -inum 213764100 -exec mv {} N_o_Wave \; &amp;lt;/pre&amp;gt;  Category:Bash&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Rename files that have reserved characters==&lt;br /&gt;
&lt;br /&gt;
When ls is showing that funny diamond-question value in the output or you see a litteral $ in the filename, here is the fix I was able to use...&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls output via -b&lt;br /&gt;
N\343o_Wave &lt;br /&gt;
&lt;br /&gt;
via ls -i&lt;br /&gt;
213764100 'N'$'\343''o_Wave'&lt;br /&gt;
&lt;br /&gt;
# Use the inode value to move the file to something typable&lt;br /&gt;
find . -maxdepth 1 -inum 213764100 -exec mv {} N_o_Wave \;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-interesting-command-examples&amp;diff=645</id>
		<title>Bash-interesting-command-examples</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-interesting-command-examples&amp;diff=645"/>
		<updated>2025-03-27T16:35:23Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;====Interesting one-liners====&lt;br /&gt;
* Find all drives and ignore loop devices&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@kvm03:/var/log# lsblk | grep -v &amp;quot;loop\|NAME&amp;quot; | grep &amp;quot;^[a-z]\|^[A-Z]&amp;quot; | awk '{print $1}'&lt;br /&gt;
sda&lt;br /&gt;
sdb&lt;br /&gt;
root@kvm03:/var/log# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@kvm03:/var/log# lsblk | grep disk | awk '{print $1}'&lt;br /&gt;
sda&lt;br /&gt;
sdb&lt;br /&gt;
root@kvm03:/var/log# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Continue match until match is found&lt;br /&gt;
* This is using awk, and seems quite powerful as a tool&lt;br /&gt;
* found this little gem at [https://unix.stackexchange.com/questions/21076/how-to-show-lines-after-each-grep-match-until-other-specific-match Stack Exchange]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
awk '/Word A/,/Word D/' filename&lt;br /&gt;
&lt;br /&gt;
/From/CONTINUE/Until/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Remove non-english directories&lt;br /&gt;
* change the type to f if you are looking for non-english files&lt;br /&gt;
* ALWAYS test find results before deleting, duh!&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo find . -type d -not -name &amp;quot;[a-zA-Z0-9]*&amp;quot; -exec rm -rf {} \;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-interesting-command-examples&amp;diff=644</id>
		<title>Bash-interesting-command-examples</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Bash-interesting-command-examples&amp;diff=644"/>
		<updated>2025-03-27T16:34:26Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;====Interesting one-liners====&lt;br /&gt;
* Find all drives and ignore loop devices&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@kvm03:/var/log# lsblk | grep -v &amp;quot;loop\|NAME&amp;quot; | grep &amp;quot;^[a-z]\|^[A-Z]&amp;quot; | awk '{print $1}'&lt;br /&gt;
sda&lt;br /&gt;
sdb&lt;br /&gt;
root@kvm03:/var/log# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
root@kvm03:/var/log# lsblk | grep disk | awk '{print $1}'&lt;br /&gt;
sda&lt;br /&gt;
sdb&lt;br /&gt;
root@kvm03:/var/log# &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Continue match until match is found&lt;br /&gt;
* This is using awk, and seems quite powerful as a tool&lt;br /&gt;
* found this little gem at [https://unix.stackexchange.com/questions/21076/how-to-show-lines-after-each-grep-match-until-other-specific-match Stack Exchange]&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
awk '/Word A/,/Word D/' filename&lt;br /&gt;
&lt;br /&gt;
/From/CONTINUE/Until/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Remove non-english directories&lt;br /&gt;
* change the type to f if you are looking for non-english files&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sudo find . -type d -not -name &amp;quot;[a-zA-Z0-9]*&amp;quot; -exec rm -rf {} \;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Bash]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=643</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Main_Page&amp;diff=643"/>
		<updated>2025-03-26T15:53:44Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;strong&amp;gt;Fault Management notes, thoughts, and example code&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My code is finally up on github.com!  Right now I am squishing bugs that I am finding which are not critical issues, but look bad.  Additionally I have begun the ECE or event correlation engine work for the different dashboards.  I still have not gotten to an installer, or added the seed data for the database.  I will be focusing on that soon.  I want to get a little more functionality in place to prove the tool is useful before focusing on this part.  Additionally I have had some thoughts on how to templatize some of the checks better.  I am  still kinda kicking the idea around right now to see what kind of weak points I will run into, but the basic idea is along the lines of give me a SNMP table, and if return data is numbers graph/save it.  If args are passed for what we care about, parse and event on what is found all at the template level.  I think that I can build a simple skeleton for this where just about anything can be added for the check by doing this.  String compare, existence, or number comparison.  I kinda like this idea, since we can get a lot of data, but 99% of the time we only care about specific parts of a given table.  I would like to make this something that can be created from the web page if possible, but I am not sure how ugly that would get..&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I am mainly updating this wiki for the NMS work I am doing, and attempting to get some more exposure in the greater world for assistance on code.  I am not a developer, just someone who is frustrated at a lack of good tools.  I will be adding the NMS up to gitgub in the future as two separate repos.  The first one will be the API server.  The second is the front-end UI portion.  &lt;br /&gt;
&lt;br /&gt;
The backend is written in PHP, and has recently been migrated from PHP7.4 to 8.1.  I used the Slim4 skeleton as the base of the application.&lt;br /&gt;
&lt;br /&gt;
The frontend is using bootstrap, PHP and javascript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 03-26-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still working on UX, as I know I do not make pretty pages&lt;br /&gt;
* Looks a little better on the branch I am working on, but still not amazing or great... Sigh...&lt;br /&gt;
* I should stick to eating Crayons instead of trying to use them to make something good looking...&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 01-02-25&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently working on getting the UI to not look like a first time high school project&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Investigating template rework to better calculate percentages to store in metrics&lt;br /&gt;
* Beginning to investigate storing metric data in influxdb&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-30-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Currently hitting performance limits with 2 core/2G RAM with 61 servers.  Server getting laggy..&lt;br /&gt;
* Will encourage setups as &amp;lt; 50 servers, 2x2 is fine.&lt;br /&gt;
* &amp;gt; 51 and &amp;lt; 125 testing as 4x2.  This is not a RAM intense app, it is more thready and core bound.  Then Drive IO is going to become the bottleneck before RAM&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-15-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Purchased ChartJS license (I may hate JS, but it makes pretty pics)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 10-08-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Minor bug squishing&lt;br /&gt;
* Testing using evil Javascript to make pretty graphs&lt;br /&gt;
* Going to license CanvasJS for this (even a n00b can figure it out)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 08-18-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Some minor bug fixes, but not too much coding done&lt;br /&gt;
* beginning focus on ECE and seeing what a mess it is.  Not happy about it.  Will likely redesign this since it was a skel anyway.  This needs to be simple and understandable dammit.&lt;br /&gt;
* Looking at changing frontend UI for main page.  &lt;br /&gt;
* Building more templates for standards&lt;br /&gt;
* Dblchecking my logic for NRPE or shell commands.  This could use more work.  NRPE failing for something not the command itself should have discrete alarm values.  Need to think about this more..&lt;br /&gt;
* SNMP checks need to be smarter, and need parsible thresholds on a per host basis, not a per check basis&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update status: 04-16-24&amp;lt;/strong&amp;gt;&lt;br /&gt;
* Still squishy-squishy on bugs&lt;br /&gt;
* Focusing on stability and clean UI&lt;br /&gt;
&lt;br /&gt;
Current Loads: 47 devices&lt;br /&gt;
* 2 core 2 GB RAM&lt;br /&gt;
* Load averages consistent: 0.70,1.5,1.6&lt;br /&gt;
* From this with a average number of checks ( ~10 per host ) reliable monitoring can be done on 50 devices with decent results and minimal hardware for small environments.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 03-21-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Many Many squishes of bugs&lt;br /&gt;
* Reporting engine and templates much more usable&lt;br /&gt;
* More documentation links for application with base examples written&lt;br /&gt;
* Initial Event Correlation Engine (ECE) rules written&lt;br /&gt;
* Some pages written for ECE&lt;br /&gt;
* Did I mention squishing bugs?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt; Update on status: 02-11-2024 &amp;lt;/strong&amp;gt;&lt;br /&gt;
* Upgraded API to use PHP 8.2, not 8.1.&lt;br /&gt;
* GUI code was sanitized and released to github&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-GUI Vigilare GUI]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-API Vigilare API]&lt;br /&gt;
* [https://github.com/Guyverix/Vigilare-NMS-POLLER Vigilare Pollers]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The focus of this wiki is for notes and gotchas for things relating to technology.  It is mainly focused on the fault management tool that I am writing.  However oddball things I have found or commonly forget since I rarely use them are also present.  I do not go too in depth on the notes or what I find.  In general it is more a quick summary and if reasonable an example that shows what the result is.  I suspect that as time passes it will become more focused on the tool that I am writing.  However there will likely be oddball stuff thrown in here as well that does not have to do with fault management at all....&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;Overall idea&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I believe that fault management is commonly very much overlooked in many companies and more of a bolt-on after they have been embarrassed by an outage or network event that a customer noticed.  This feels backwards to me.  Fault management really should be one of the first things put into place and grow with your software and network over time.  Being able to tell at a glance the &amp;quot;truth&amp;quot; of your network is almost as critical as the network running itself!  Customer monitoring isn't monitoring and simply makes you look bad.  If a customer or end user has to tell you something is down, you have failed.  If it happens more than once then that should set off all kinds of warnings to technicians.  This should not happen.  The problem happens in my opinion due to management seeing monitoring as a time suck that does not really return money on their investment of time.  The only time they care is when they get embarrassed, and after that they loose interest until they are embarrassed again.  This is stupid.  A company or technician must be able to say with confidence that their applications are working AND PROVE IT.  If you cant prove anything it is just a hope that things are working as designed.  Until you know, you simply are guessing on the health and availability of an application.&lt;br /&gt;
&lt;br /&gt;
I have noticed a trend recently that companies are going to what I call &amp;quot;implied&amp;quot; monitoring.  More looking at metrics than active validation of an application or service.  While this does somewhat fit the bill for monitoring it does a major dis-service to the technicians that need to support the application.  Usually this kind of monitoring will cry when something is down but does not tell you what, or where the issue happens.  Only that it is happening.  It is also usually slower to report the failures.  That means more time for a customer to find issues than the technician.  Parsing logs, and dropping them in search indexes, or forwarding matches as events are all very useful, but they are not really watching for specific application issues at the host level.  I believe that an organization must work up to this kind of monitoring.  It must be bult on the basics.  If you do not have the basics in place, then the advanced monitoring is much less useful for an org.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;strong&amp;gt;&lt;br /&gt;
My common way to approach monitoring is to start as fundamental as possible.  On Linux based systems this is the way that has given me the best results.&lt;br /&gt;
&lt;br /&gt;
* daemon&lt;br /&gt;
** dead daemon, well dead app :)&lt;br /&gt;
* port&lt;br /&gt;
** zombie processes holding a port, or different daemons attempting to use the same port is bad&lt;br /&gt;
* challenge and response.&lt;br /&gt;
** Verify which app has control of the port (did you get an http, email, ssh header?  Did it respond at all?)&lt;br /&gt;
* performance&lt;br /&gt;
** This is the n+1 point.  After confirming the application is running, NOW is the time to verify it is at a basic level performant.&lt;br /&gt;
* log and event parsing&lt;br /&gt;
** In depth log parsing, and event correlations.  Without the above this is much less useful to a technician.&lt;br /&gt;
*** I have seen in RARE cases this done well, where it states what failed and which host.  However that is much less likely to happen as the logs parsed do not always state which application is at fault, only that there is one present.&lt;br /&gt;
&amp;lt;/strong&amp;gt;&lt;br /&gt;
&lt;br /&gt;
How you get the answers is less important than getting the answers, with one gotcha.  There is no reason to degrade the host with a complex and slow service check.  If the validation cannot be done cheaply, then it is likely something that should be broken down into more basic pieces.  Killing your servers or pod with service checks is exactly backwards of what you need.  Simple, fast, and accurate are what you need to focus on.  Inaccurate data is WORSE than no data.  Another useful trick is to assume failure until the application proves that it is doing what you expect.  I always try to avoid a bias of assuming something works until it can prove it in a service check.  Until that time, it is only your opinion that things are working.&lt;br /&gt;
&lt;br /&gt;
Historically this has been with SNMP and NRPE service checks on a common 5 minute iteration cycle.  This allows for scale of many checks across a fleet of hosts within the cycle, as well as a decent starting point for monitoring hosts.  I realize that some technicians prefer a faster cycle time, but at the end of the day, you are still talking about human response times and investigation times.  Getting things like sub-minute reporting does no good at all to a technician who is troubleshooting an issue.  Additionally, fast cycle times do not really lend themselves well to the concept of retry.  You will loose packets from time to time, your application will do weird crap from time to time.  That's just fact.  Make sure your monitoring does not loose its mind and scream the world is burning due to a transient packet-loss issue.  Using something like retry will make it more likely you are picking up a legitimate event and not a transient failure.&lt;br /&gt;
&lt;br /&gt;
Every time there is an &amp;quot;Outage&amp;quot; or severe service degrade, one of the first things that should be brought up is did the monitoring catch this issue?  Was it actually granular enough to state what the issue was, or just a side affect of the issue?  Getting an alarm for webiste down, vs. database down are two very different things.  Both will imply a 100% outage, but usually one is faster to repair than the other.  Also looking at the wrong things; say the webserver when the database is toast simply slows down the response for getting a database back online.&lt;br /&gt;
&lt;br /&gt;
== History of this wiki ==&lt;br /&gt;
&amp;lt;strong&amp;gt;MediaWiki destination for random notes and examples I do not want to loose or forget.&amp;lt;/strong&amp;gt;&lt;br /&gt;
* The site is for myself and friends who commonly use bash and other utilities and can have a one-stop-shop to find that oddball thing that was found six months ago and vaguely remembered.  The site overall is not for the general public, however if you make it in here feel free to browse.  &lt;br /&gt;
* Keep in mind however I do have security measures in place and poking hard at stuff will block you from the domain for 30 days +  if you hammer really hard.&lt;br /&gt;
* '''If someone wants access to actually ADD information in here please sign up for an account, and I will likely grant access.'''  I do try to set everything into some kind of category for easier searches as well as an attempt to keep this somewhat organized.  Whenever possible (or I remember to do it) I do try to link to the original sources of the information.  They are usually from SE, or other Q&amp;amp;A sites, so some of the comments are useful as well.&lt;br /&gt;
* This is not wikipedia, likely the site is not going to be polished, since it is more of a catchall wiki on doing different things.  There is not going to be too much rhyme or reason on what is posted on this wiki.  Overall if it is something that I have had to do more than once and look up every time, I will have notes on it in here so I will not have to search next time.&lt;br /&gt;
* There will be times that there are references to personal servers on my network, or oddball hostnames that are tied to iwillfearnoevil.com.  It is unlikely that access will be granted to those hosts unless there is a very specific reason to do so.  So dont bother asking :P&lt;br /&gt;
* '''Category NMS has notes on my progress of my NMS design, and thoughts on monitoring overall'''    [https://wiki.iwillfearnoevil.com/mediawiki/index.php/Category:NMS Category NMS]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Misc Notes on using Mediawiki ==&lt;br /&gt;
Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]&lt;br /&gt;
* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]&lt;br /&gt;
* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]&lt;br /&gt;
[[Category:General]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Ansible&amp;diff=642</id>
		<title>Category:Ansible</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Category:Ansible&amp;diff=642"/>
		<updated>2025-03-26T15:49:18Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;Just a catchall category for Ansible stuff.&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Just a catchall category for Ansible stuff.&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Ansible&amp;diff=641</id>
		<title>Ansible</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Ansible&amp;diff=641"/>
		<updated>2025-03-26T15:48:19Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;==Ansible Notes==  * Find details about ansible collection versions (RHEL) &amp;lt;pre&amp;gt; ~/.local/bin/ansible-galaxy collection list  # /home/USER/.local/lib/python3.?/site-packages/ansible_collections Collection                    Version ----------------------------- ------- amazon.aws                    1.5.1 ansible.netcommon             2.5.0 ansible.posix                 1.3.0 ***** SNIP ***** # /usr/share/ansible/collections/ansible_collections Collection           Versio...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Ansible Notes==&lt;br /&gt;
&lt;br /&gt;
* Find details about ansible collection versions (RHEL)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
~/.local/bin/ansible-galaxy collection list&lt;br /&gt;
&lt;br /&gt;
# /home/USER/.local/lib/python3.?/site-packages/ansible_collections&lt;br /&gt;
Collection                    Version&lt;br /&gt;
----------------------------- -------&lt;br /&gt;
amazon.aws                    1.5.1&lt;br /&gt;
ansible.netcommon             2.5.0&lt;br /&gt;
ansible.posix                 1.3.0&lt;br /&gt;
***** SNIP *****&lt;br /&gt;
# /usr/share/ansible/collections/ansible_collections&lt;br /&gt;
Collection           Version&lt;br /&gt;
-------------------- -------&lt;br /&gt;
amazon.aws           8.0.0&lt;br /&gt;
ansible.netcommon    6.1.1&lt;br /&gt;
ansible.posix        1.5.4&lt;br /&gt;
ansible.utils        4.1.0&lt;br /&gt;
**** SNIP ****&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Ansible]]&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Iptables&amp;diff=640</id>
		<title>Iptables</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Iptables&amp;diff=640"/>
		<updated>2025-03-11T18:02:15Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: /* iptables notes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==iptables notes==&lt;br /&gt;
Just some generic notes on working with IP tables.  Can be a PITA when an issue is happening to remember the details..&lt;br /&gt;
&lt;br /&gt;
REMEMBER:  You can block YOURSELF if you do not use your head!&lt;br /&gt;
&lt;br /&gt;
Dont block:&lt;br /&gt;
* Internal IP address&lt;br /&gt;
* YOURSELF external IP address (if logging in via public interfaces)&lt;br /&gt;
* Loopbacks...  Not sure what would happen but expect it would brick things..&lt;br /&gt;
&lt;br /&gt;
Simple blocks&lt;br /&gt;
* iptables -A INPUT -s 47.245.124.200 -j DROP&lt;br /&gt;
&lt;br /&gt;
Reminders:&lt;br /&gt;
* This will not persist across reboots&lt;br /&gt;
* setup of fail2ban would be a heck of a lot easier than manual blocking&lt;br /&gt;
* iptables -L -n is your friend&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
	<entry>
		<id>https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Iptables&amp;diff=639</id>
		<title>Iptables</title>
		<link rel="alternate" type="text/html" href="https://wiki.iwillfearnoevil.com/mediawiki/index.php?title=Iptables&amp;diff=639"/>
		<updated>2025-03-11T18:01:15Z</updated>

		<summary type="html">&lt;p&gt;Chubbard: Created page with &amp;quot;==iptables notes== Just some generic notes on working with IP tables.  Can be a PITA when an issue is happening to remember the details..  REMEMBER:  You can block YOURSELF if you do not use your head!  Dont block: Internal IP address YOURSELF external IP address (if logging in via public interfaces) Loopbacks...  Not sure what would happen but expect it would brick things..  Simple blocks * iptables -A INPUT -s 47.245.124.200 -j DROP  Reminders: * This will not persist...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==iptables notes==&lt;br /&gt;
Just some generic notes on working with IP tables.  Can be a PITA when an issue is happening to remember the details..&lt;br /&gt;
&lt;br /&gt;
REMEMBER:  You can block YOURSELF if you do not use your head!&lt;br /&gt;
&lt;br /&gt;
Dont block:&lt;br /&gt;
Internal IP address&lt;br /&gt;
YOURSELF external IP address (if logging in via public interfaces)&lt;br /&gt;
Loopbacks...  Not sure what would happen but expect it would brick things..&lt;br /&gt;
&lt;br /&gt;
Simple blocks&lt;br /&gt;
* iptables -A INPUT -s 47.245.124.200 -j DROP&lt;br /&gt;
&lt;br /&gt;
Reminders:&lt;br /&gt;
* This will not persist across reboots&lt;br /&gt;
* setup of fail2ban would be a heck of a lot easier than manual blocking&lt;/div&gt;</summary>
		<author><name>Chubbard</name></author>
	</entry>
</feed>