This is where a good incident/problem management tracking system comes in handy. Sure, you can't chase down all the oddness happening on Windows, but there is nothing from stopping you from having the incidents logged. What you do is use the incident you clone the incident to an open problem record (regardless of whether you have a workaround or not) with all the details of what you saw and everything you did. Then keep it open till you determine the root cause of the problem.
When other incidents are logged, if you have defined the problem well enough then you search for a problem record that matches the incident symptoms and link it to the problem record. The problem record also holds the workaround that use used to get the end user up and running so you can use this if the issue is super critical.
If you find that you've linked a certain number of incidents to the problem, then you know you it's actually worthwhile doing root cause analysis and spend the time figuring out what is causing the error, so you go down the rabbit hole - and you can justify the time to do so.
When you figure out the root cause, if it's a simple resolution that doesn't require a major change to the environment then you may not have to do much to prevent the issue in future - sort of depends on the complexity/needs of your environment and organization. But regardless you raise a known error record and link the problem to this. In the known error record you document the problem and as many symptoms as possible (some people list workarounds here, others list the workarounds in the problem record, other prefer to keep workaround info strictly in incident records), the root cause and how you resolved the issue fully.
Regardless, mainly from the known issue record if you find you need to make a scheduled change that may impact environments then you lodge a change request through the CAB processes you have in place.
Normally I find for server and network infrastructure the change just requires coordination with teams who use the infrastructure, which if you've setup your CMDB properly you can work out by backtracking the infrastructure configuration items to linked services. I've found that if you have defined your service catalog properly then you will have defined your operational services and linked these to business services that are mostly customer facing. This helps impact analysis and finding the correct window in which to make the change.
For things like fixing application bugs, I have found that it's still worthwhile raising a change request, then have that change moved into the development fix process with all appropriate testing, etc - normally this then links into a wider release management process which may actually require a new overarching change management request as other fixes are part of the change - sometimes you need to review the impact of how deploying the release might impact the environment in unexpected ways.
I that's basically a big chunk of ITIL, and I found that if it's done correctly and busy-work is reduced (mainly by asking for too much info), when an appropriate setup of the service layer is made and the CMDB has been mapped well, then it actually can help medium to large organizations pretty effectively. The key is to define a catalog of services across the business, without this it's hard to know the impact of incidents, bugs and any changes you may want to make in your environment.
You can start small though. Just go with broad categories like "printer", "CAD/CAM", etc and log reported/solved times and some text on both.
It's been a while ago since I worked with a Windows network, 15-18 years or so, but graphs from that was enough to prove investing in multi-purpose on-site free support network printers was a good idea. Support logs dropped by about 20% if I remember correctly (lots of crappy inkjets) and it saved the company some money in printer repairs and not buying ink cartridges and toners all over the place.
After that budget talks and time for in-depth problem solving became easier.
We ended up in something ITIL like naturally. We just started scripting solutions naturally and shared them between each other. Some of those scripts ended up being pushed to clients so traveling sales people could remap network drives and other simple things. Then we wrote a GUI for them - because clicking is easier apparently. That didn't work properly but proved the case for remote control/monitoring/inventory software (well, control really but it was IT buying the software so..)
It probably did help a little that I wrote in C and my co-worker at the time thinks x86 assembly is self documenting.
Now days I develop and use Linux for basically everything except gaming. Friday horrors persists though. This week it was trying to find a solution to a problem in others code that include sql triggers, framework triggers, various code components and quite a few custom sql tables/relations that I haven't worked with before.
When other incidents are logged, if you have defined the problem well enough then you search for a problem record that matches the incident symptoms and link it to the problem record. The problem record also holds the workaround that use used to get the end user up and running so you can use this if the issue is super critical.
If you find that you've linked a certain number of incidents to the problem, then you know you it's actually worthwhile doing root cause analysis and spend the time figuring out what is causing the error, so you go down the rabbit hole - and you can justify the time to do so.
When you figure out the root cause, if it's a simple resolution that doesn't require a major change to the environment then you may not have to do much to prevent the issue in future - sort of depends on the complexity/needs of your environment and organization. But regardless you raise a known error record and link the problem to this. In the known error record you document the problem and as many symptoms as possible (some people list workarounds here, others list the workarounds in the problem record, other prefer to keep workaround info strictly in incident records), the root cause and how you resolved the issue fully.
Regardless, mainly from the known issue record if you find you need to make a scheduled change that may impact environments then you lodge a change request through the CAB processes you have in place.
Normally I find for server and network infrastructure the change just requires coordination with teams who use the infrastructure, which if you've setup your CMDB properly you can work out by backtracking the infrastructure configuration items to linked services. I've found that if you have defined your service catalog properly then you will have defined your operational services and linked these to business services that are mostly customer facing. This helps impact analysis and finding the correct window in which to make the change.
For things like fixing application bugs, I have found that it's still worthwhile raising a change request, then have that change moved into the development fix process with all appropriate testing, etc - normally this then links into a wider release management process which may actually require a new overarching change management request as other fixes are part of the change - sometimes you need to review the impact of how deploying the release might impact the environment in unexpected ways.
I that's basically a big chunk of ITIL, and I found that if it's done correctly and busy-work is reduced (mainly by asking for too much info), when an appropriate setup of the service layer is made and the CMDB has been mapped well, then it actually can help medium to large organizations pretty effectively. The key is to define a catalog of services across the business, without this it's hard to know the impact of incidents, bugs and any changes you may want to make in your environment.