Introduction
Your workflow file has a syntax error on line 12 and GitHub's error message is Invalid workflow file. No line number in the UI. No hint about what's wrong. You stare at YAML indentation for ten minutes. Welcome to Actions.
But the alternative is worse. Jenkins needs a server. CircleCI needs a credit card and a webhook. GitHub Actions lives where your code already lives, runs when things change, and costs nothing on public repos. The tradeoff is YAML. Lots of YAML.
This covers what goes wrong and how to fix it. Triggers that fire when you don't expect them. Jobs that fail silently. Caching that doesn't actually cache. Secrets that leak into logs if you're not careful. The "how it works" stuff is woven into the "how it breaks" stuff, because that's how you actually learn Actions.
Workflow File Anatomy
Workflows live in .github/workflows/. One YAML file per workflow. GitHub picks them up automatically.
name: CI Pipelineon: push: branches: [main] pull_request: branches: [main] jobs: build: runs-on: ubuntu-lateststeps: - name: Checkout codeuses: actions/checkout@v4 - name: Set up Node.jsuses: actions/setup-node@v4with: node-version: '20' - name: Install dependenciesrun: npm ci - name: Run testsrun: npm test
The comments tell you everything. on controls when it runs, jobs is where work happens, each job gets a fresh VM. Every workflow includes actions/checkout@v4 because without it the runner has no code.
Each job starts completely clean. No leftover node_modules. No state from last run. Reproducible, but slow. That's the price.
Triggers and Events
Triggers are where things go wrong first. Your workflow runs on every push to every branch and burns through your Actions minutes in a week. Or it doesn't run on PRs at all because you forgot to add pull_request. Or it fires twice on the same commit because you have both push and pull_request targeting main.
Push and Pull Request
Path filtering saves you.
on: push: branches: - main - 'release/**'paths: - 'src/**' - 'package.json' - 'package-lock.json'paths-ignore: - 'docs/**' - '*.md'pull_request: branches: [main] types: - opened - synchronize - reopened
Someone fixes a README typo and your full test suite runs for eight minutes. The paths filter kills that. For monorepos, not optional.
paths-ignore does the inverse. And types on pull requests controls which PR activities trigger the workflow. The defaults -- opened, synchronize, reopened -- cover most cases. You can also trigger on labeled or closed for lifecycle automations, but most teams never touch this.
Scheduled Workflows
Dependency audits. Nightly builds. Stale issue cleanup. Cron handles these.
on: schedule: # Run every day at 2:00 AM UTC - cron: '0 2 * * *'# Run every Monday at 9:00 AM UTC - cron: '0 9 * * 1'
Scheduled workflows only run on the default branch. This has bitten me. You add a cron trigger in a feature branch, push, wait -- nothing happens. Merge to main first. And execution times are approximate. GitHub delays runs by several minutes during high load, sometimes more.
Manual Triggers with workflow_dispatch
workflow_dispatch puts a "Run workflow" button in the Actions tab. Deployments, one-off tasks, anything you want triggered by hand.
on: workflow_dispatch: inputs: environment: description: 'Target deployment environment'required: truetype: choiceoptions: - staging - productiondry_run: description: 'Run in dry-run mode'required: falsetype: booleandefault: falsejobs: deploy: runs-on: ubuntu-lateststeps: - name: Deploy to ${{ inputs.environment }}run: | echo"Deploying to ${{ inputs.environment }}"echo"Dry run: ${{ inputs.dry_run }}"
A dropdown forcing the choice between "staging" and "production" prevents someone from typing "prod" or "Production" and having the deploy script silently do nothing. The boolean dry-run input -- always add one. Verifying what a deployment would do before it does it has saved me more than once.
Jobs, Steps and Actions
Jobs that fail silently are the worst kind of CI problem. Everything looks green, but the deploy job quietly skipped because a conditional evaluated wrong. Or two jobs that should run sequentially ran in parallel and stomped on each other's artifacts.
By default, jobs run in parallel. The needs keyword creates dependencies. if expressions skip steps or entire jobs conditionally. And this is where most silent failures hide.
name: Build, Test and Deployon: push: branches: [main] jobs: lint: runs-on: ubuntu-lateststeps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4with: node-version: '20' - run: npm ci - run: npm run linttest: runs-on: ubuntu-lateststeps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4with: node-version: '20' - run: npm ci - run: npm testdeploy: needs: [lint, test] runs-on: ubuntu-latestif: github.ref == 'refs/heads/main'steps: - uses: actions/checkout@v4 - name: Deploy to productionrun: | echo"Deploying application..."# Your deployment commands here
lint and test run in parallel. deploy waits for both via needs: [lint, test]. If either fails, deployment is skipped. The if condition adds another guard: deploy only on main.
But here's the failure mode nobody warns you about. If you add if: always() to a job because you want it to run even when tests fail (maybe it posts a Slack notification), that job also runs when the workflow is cancelled. Now you get phantom Slack messages for workflows someone manually stopped. Use if: success() || failure() instead. Subtle difference, big impact.
Keep jobs focused. One thing each. Stuffing lint, test, build, and deploy into a single monolithic job feels simpler until the build fails and you re-run twenty minutes of tests just to get past a lint error. Smaller jobs let you retry individual pieces. So split them.
Matrix Builds and Parallel Testing
You define a set of parameters -- OS versions, runtime versions, whatever varies. GitHub Actions creates a job for every combination and runs them all in parallel. Wall-clock time stays roughly the same as a single run. The matrix below tests 3 operating systems against 3 Node versions, minus one exclusion, producing 8 parallel jobs. It also adds a coverage flag to just one combination so you upload coverage reports once instead of eight times. fail-fast: false keeps all jobs running even if one fails -- the default behavior cancels everything the moment any combination breaks, which makes debugging across environments nearly impossible.
jobs: test: runs-on: ${{ matrix.os }}strategy: fail-fast: falsematrix: os: [ubuntu-latest, macos-latest, windows-latest] node-version: [18, 20, 22] include: - os: ubuntu-latestnode-version: 20coverage: trueexclude: - os: windows-latestnode-version: 18steps: - uses: actions/checkout@v4 - name: Use Node.js ${{ matrix.node-version }}uses: actions/setup-node@v4with: node-version: ${{ matrix.node-version }} - run: npm ci - run: npm test - name: Upload coverageif: matrix.coverage == trueuses: codecov/codecov-action@v4
Standard stuff.
Most teams overuse matrix builds. If you're building an application -- not a library -- and your Dockerfile pins Node 20, testing against 18 and 22 is wasted compute. Nobody will ever run the app on those versions. Matrix builds are for libraries where users bring their own runtime. For applications, matrix over something that actually varies: database versions, browser engines, feature flags.
Secrets and Environment Variables
Secrets leak into logs more often than people admit. The automatic redaction works for exact matches, but if your secret gets base64-encoded, URL-encoded, or split across multiple log lines, GitHub won't catch it. This has bitten me.
Secrets go in Settings > Secrets and variables > Actions. Encrypted, invisible to forks, redacted in logs (mostly). Environment variables are set at the workflow, job, or step level. Not encrypted. Do not put anything sensitive in a plain env block.
Never hardcode credentials in workflow files. Sensitive values go in GitHub secrets. Configuration that varies between environments goes in environment variables populated from those secrets.
GitHub Environments add deployment gates. Required reviewers. Wait timers. Environment-specific secrets. Staging deploys can proceed automatically while production requires approval.
And secrets are not available in PR workflows from forks. Intentional. But if your CI needs secrets for fork PRs, you can use pull_request_target. Be careful though -- it runs with write permissions to the base repo. One malicious PR can exfiltrate every secret you have.
Caching Dependencies for Speed
The cache key is the whole trick. Get it wrong and you're caching nothing.
jobs: build: runs-on: ubuntu-lateststeps: - uses: actions/checkout@v4 - name: Set up Node.js with cachinguses: actions/setup-node@v4with: node-version: '20'cache: 'npm' - name: Cache Playwright browsersuses: actions/cache@v4with: path: ~/.cache/ms-playwrightkey: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}restore-keys: | playwright-${{ runner.os }}- - run: npm ci - run: npx playwright install --with-deps - run: npm test
hashFiles('package-lock.json') hashes your lockfile. Dependency change busts the cache. Unrelated changes don't. The restore-keys provide fallback patterns when there's no exact match.
On a cache miss, npm ci takes 40-50 seconds. Playwright browsers take another 90. On a hit, near-instant. Over hundreds of runs per month, that's real money on the Actions bill.
Do not cache node_modules directly. Cache the npm/Yarn/pnpm global store and let npm ci handle the rest. There's a 10 GB cache limit per repository. Caching Docker layers and Playwright browsers across matrix variants hits that ceiling fast. And caches are branch-scoped -- a PR branch reads from the base branch's cache but not from other PRs, which means your first PR run is always slower than you expect.
Reusable Workflows and Composite Actions
Copy-pasting the same "checkout, setup Node, install deps, run tests" across twelve repos is where most teams are right now. Upgrading the Node version means twelve PRs. Reusable workflows fix this, but honestly most teams adopt them too early and create abstractions that are harder to debug than the duplication they replaced.
Wait until you have at least five repos with near-identical workflows. Then extract.
name: Reusable Test Workflowon: workflow_call: inputs: node-version: description: 'Node.js version to use'required: falsetype: stringdefault: '20'working-directory: description: 'Directory to run commands in'required: falsetype: stringdefault: '.'secrets: CODECOV_TOKEN: required: falsejobs: test: runs-on: ubuntu-latestdefaults: run: working-directory: ${{ inputs.working-directory }}steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4with: node-version: ${{ inputs.node-version }}cache: 'npm' - run: npm ci - run: npm test -- --coverage - name: Upload coverageif: secrets.CODECOV_TOKEN != ''uses: codecov/codecov-action@v4with: token: ${{ secrets.CODECOV_TOKEN }}
Calling it from another repository:
name: CIon: push: branches: [main] pull_request: branches: [main] jobs: test: uses: your-org/shared-workflows/.github/workflows/reusable-test.yml@mainwith: node-version: '20'secrets: CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
One edit to the shared workflow. Every calling repo picks it up. That's the payoff.
workflow_call makes a workflow reusable. Secrets are not inherited by default -- you pass them explicitly or use secrets: inherit if you trust the called workflow with everything. And here's an opinion that'll get pushback: secrets: inherit is almost always fine for internal repos. The explicit passing is ceremony that protects against nothing when both repos are owned by the same team.
Composite actions are the step-level counterpart. Where reusable workflows replace entire jobs, composite actions bundle groups of steps. "Set up our standard toolchain" or "authenticate to AWS" type things. Defined with an action.yml file.
Most organizations end up with a mix. A few reusable workflows for standard pipelines. Several composite actions for repeated step sequences. But the debugging experience is worse for both -- when something fails inside a reusable workflow called from a composite action called from another reusable workflow, good luck reading that log output.
What's Coming
GitHub is shipping larger runners, arm64 support, and better reusable workflow inputs. GPU runners are in preview. The platform keeps expanding.
The YAML isn't going anywhere though.
Set up OIDC with your cloud provider from day one so you never have long-lived secrets to rotate. Add caching early -- the speed difference changes how often people push. And hold off on reusable workflows until the copy-paste actually hurts. Premature abstraction in CI is just as bad as in application code.