Some of our customers have builds that go on for a significant amount of time. The ability to cancel those builds as soon as possible provides developers with a quicker feedback loop, saving developer time and reducing the cost of running agents.
What is fail fast?
If a particular job fails, you can immediately fail the whole build, and give that information back to the user rather than waiting for everything to finish. In other words, if we know that the build has reached a terminal state, and there's no way it's going to pass, then we can fail it immediately.
Detecting failed builds
The recommended way to detect failed builds is by running some code that is always listening for events. You can use this example Go code in a lambda function. We recommend using the Amazon EventBridge notification service to listen for Buildkite webhooks, and extract the build state from the returned information.
The event we are looking for in this situation is job.finished, which contains information about how and when the job finished. If the condition is met, we use the Buildkite GraphQL API to cancel and annotate the build to visually show the reason for cancellation and link to the failed job.
Creating an EventBridge notification in Buildkite
Here’s how to set up your own fail fast notification:
In the Buildkite dashboard, create AWS EventBridge notification.
In the AWS dashboard, associate the notification with the event bus.
Select the EventBus and create a rule which will define how the data is read.
Define an event pattern. When we use a pattern it means that the lambda will only be invoked when the pattern exists, not every time a job comes through.
For the purpose of fail fast we recommend the following as the event pattern:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
{ "detail-type": [ "Job Finished" ], "detail": { "job": { "state": [ "finished" ], "soft_failed": [ false ] } } }
Match the associated event bus.
Select your lambda function to use as a target and enable event logging.
Select create.
What to expect
Run the build. If job.finished is found, the rest of the build is cancelled.
You can see here as soon as the user clicks on the build, they'll see that it has failed because of a hard failure. They can then click the link in the annotation, which will take them to the failed jobs log output.
Try it out
If you’re interested in learning more about Buildkite integrations including Amazon EventBridge notifications, start your free trial today.