-
Task
-
Resolution: Won't Do
-
Major
-
None
-
None
-
None
We have chronic provisioning problems on Nebula and no documented process for troubleshooting. This has led to hours of wasted time.
A few examples:
- The 8/2/16 MTU change caused a number of problems for us, even though we were apparently on notification list
- We do not receive notifications of security blackholing servers
- A networking problem, possibly related to changes in zone policies, caused us to not be able to provision new clusters for testing for ~48 hours
We need to document a process for troubleshooting Nebula issues. The most straightforward approach seems to me to
- Find a repeatable test case for the problem
- File an issue with Nebula team
- Document interactions and resolution
Most importantly, we need to not try and fix these problems until we've confirmed that nothing has changed in the OpenStack configuration.