Nms reminders

From I Will Fear No Evil
Revision as of 09:45, 14 July 2023 by Chubbard (talk | contribs) (→‎API)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Notes on behaviors to remember

The app is getting complex now. Write down notes on the expected behavior so we do not get surprised.

Service checks and Events

  • Service checks are NOT run on hosts that are == "dead"
    • The exception is the alive check so we know when the host recovers
  • We can run service checks on hosts that are "unknown"
  • Logging shows "bypassed" when service check called on host that is dead

Daemon Notes

  • genericPoller currently runs all external service checks.
  • need database alteration to filter down to a poller host to support remote pollers as well

RRD Templates

  • Need more examples on creating new templates
  • Need database alteration to state file 123.rrd can be used against template foo_bar based on file PATH
  • A single rrd file can have multiple graphs. A single template can only match a certain type of rrd. Do not put more logic in there to make them support different things. This will cause bugs and make it harder to troubleshoot for no good reason

Graphite Templates

  • Needs a full rewrite to behave like RRD. Creating the URL for rendering should be defined by templates so that changes can be made reasonably.
  • remove horrible regexes from database to make graphs easier to work with. Regex in this way to make URLs is dumb.

Database

  • Write a backup system for database. Streaming dump of db to a tar.gz file is fine

API

  • Continue cleanup of APIs
  • Since we are not using Swagger, write a doc showing API's available
  • Write authentication middleware
  • Missing some API functions still (monitoring renames/updates, event hostname changes when host added after event happens, etc)
  • API needs to begin support of ephemeral hosts that can go away and come back (think k8s, or even docker) AWS EC2 hosts? (possible, likely edge case only)

Cloud Providers

  • Begin investigation of AWS ingestion into system

Business Display

  • Begin logic for UX display support for different functions.
  • Customers do not need to see events outside what affects the app at a HIGH level
  • Reporting Engine needs some basic builtin defaults and UI built
  • Alter Service checks AND event database to note when something is application vs. OS or infrastructure.
    • Default is OS/Infra unless explicitly told this is an application check

Frontend UI

  • Bribe someone who knows what good UX looks like to assist in getting something sane.
  • What you have now works, but SUCKS from usability and "looks" perspective. Get help!