- All Implemented Interfaces:
- org.quartz.Job
@PersistJobDataAfterExecution
@DisallowConcurrentExecution
public class OrphanedBuildMonitorJob
extends Object
implements org.quartz.Job
This class looks for orphaned builds - i.e. builds that claim to be in a certain state, but from server status
it's clear that they will never be able to make transition out of that state.
Currently, the following situations are detected:
- Build claims to be queued, but it's not in the queue for an extended period of time.
- Build claims to be active, but there is no agent that's actually building it
- Build claims to be finished, but it actually failed to finish after database status change.
In case we find a problematic build we cannot just remove it:
1. maybe an agent is already working on it, but just did not report in yet (agentId == null, but everything OK)
2. maybe the agent is not responsive (agentId != null, agent may or may not come back).
3. maybe 1. happened and will be followed by 2.
4. maybe the build is currently finishing, but the mechanism catches it by accident - build that is finishing properly is indistinguishable from build that is stuck in finishing state.
etc.etc.
In case of 1., we should give the agent some time to report in.
In case of 2. or 3., we should let the AgentManager remove the agent and the build, it will do so in heartbeatTimeoutSeconds+heartbeat seconds
In case of 4., we should let the build finish naturally before intervening on chain level.
So, waiting heartbeatTimeoutSeconds + 2xheartbeat before taking action seems like a good idea