Nms reminders: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary Tag: Reverted |
m (→API) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
== Service checks and Events == | == Service checks and Events == | ||
* Service checks are NOT run on hosts that are | * Service checks are NOT run on hosts that are == "dead" | ||
* Logging shows "bypassed" | ** The exception is the alive check so we know when the host recovers | ||
* We can run service checks on hosts that are "unknown" | |||
* Logging shows "bypassed" when service check called on host that is dead | |||
== Daemon Notes == | == Daemon Notes == | ||
Line 21: | Line 23: | ||
== Database == | == Database == | ||
* Write a backup system for database. Streaming dump of db to a tar.gz file is fine | * Write a backup system for database. Streaming dump of db to a tar.gz file is fine | ||
== API == | |||
* Continue cleanup of APIs | |||
* Since we are not using Swagger, write a doc showing API's available | |||
* Write authentication middleware | |||
* Missing some API functions still (monitoring renames/updates, event hostname changes when host added after event happens, etc) | |||
* API needs to begin support of ephemeral hosts that can go away and come back (think k8s, or even docker) AWS EC2 hosts? (possible, likely edge case only) | |||
== Cloud Providers == | |||
* Begin investigation of AWS ingestion into system | |||
== Business Display == | |||
* Begin logic for UX display support for different functions. | |||
* Customers do not need to see events outside what affects the app at a HIGH level | |||
* Reporting Engine needs some basic builtin defaults and UI built | |||
* Alter Service checks AND event database to note when something is application vs. OS or infrastructure. | |||
** Default is OS/Infra unless explicitly told this is an application check | |||
== Frontend UI == | == Frontend UI == | ||
Line 27: | Line 46: | ||
[[Category:NMS]] | |||
[[ |
Latest revision as of 09:45, 14 July 2023
Notes on behaviors to remember
The app is getting complex now. Write down notes on the expected behavior so we do not get surprised.
Service checks and Events
- Service checks are NOT run on hosts that are == "dead"
- The exception is the alive check so we know when the host recovers
- We can run service checks on hosts that are "unknown"
- Logging shows "bypassed" when service check called on host that is dead
Daemon Notes
- genericPoller currently runs all external service checks.
- need database alteration to filter down to a poller host to support remote pollers as well
RRD Templates
- Need more examples on creating new templates
- Need database alteration to state file 123.rrd can be used against template foo_bar based on file PATH
- A single rrd file can have multiple graphs. A single template can only match a certain type of rrd. Do not put more logic in there to make them support different things. This will cause bugs and make it harder to troubleshoot for no good reason
Graphite Templates
- Needs a full rewrite to behave like RRD. Creating the URL for rendering should be defined by templates so that changes can be made reasonably.
- remove horrible regexes from database to make graphs easier to work with. Regex in this way to make URLs is dumb.
Database
- Write a backup system for database. Streaming dump of db to a tar.gz file is fine
API
- Continue cleanup of APIs
- Since we are not using Swagger, write a doc showing API's available
- Write authentication middleware
- Missing some API functions still (monitoring renames/updates, event hostname changes when host added after event happens, etc)
- API needs to begin support of ephemeral hosts that can go away and come back (think k8s, or even docker) AWS EC2 hosts? (possible, likely edge case only)
Cloud Providers
- Begin investigation of AWS ingestion into system
Business Display
- Begin logic for UX display support for different functions.
- Customers do not need to see events outside what affects the app at a HIGH level
- Reporting Engine needs some basic builtin defaults and UI built
- Alter Service checks AND event database to note when something is application vs. OS or infrastructure.
- Default is OS/Infra unless explicitly told this is an application check
Frontend UI
- Bribe someone who knows what good UX looks like to assist in getting something sane.
- What you have now works, but SUCKS from usability and "looks" perspective. Get help!