Skip to main content

Git Repo Scanner

License Apache-2.0GitHub release (latest SemVer)OWASP Lab ProjectArtifact HUBGitHub Repo starsMastodon Follower

What is Git-Repo-Scanner?

Git-Repo-Scanner is a small Python script which discovers repositories on GitHub or GitLab. The main purpose of this scanner is to provide a cascading input for the gitleaks and semgrep scanners.

Deployment

The git-repo-scanner chart can be deployed via helm:

# Install HelmChart (use -n to configure another namespace)
helm upgrade --install git-repo-scanner oci://ghcr.io/securecodebox/helm/git-repo-scanner

Scanner Configuration

The scanner options can be divided into two groups for Gitlab and GitHub. You can choose the git repository type with the option:

--git-type github
or
--git-type Gitlab

GitHub

For type GitHub you can use the following options:

  • --organization: The name of the GitHub organization you want to scan.
  • --url: The url of the api for a GitHub enterprise server. Skip this option for repos on https://github.com.
  • --access-token: Your personal GitHub access token (needs full repo rights if you want to also find private repositories, otherwise repo:status and public_repo is sufficient).
  • --ignore-repos: A list of GitHub repository ids you want to ignore
  • --obey-rate-limit: True to obey the rate limit of the GitHub server (default), otherwise False
  • --activity-since-duration: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --activity-until-duration: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --annotate-latest-commit-id: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.

For now only organizations are supported, so the option is mandatory. We strongly recommend providing an access token for authentication, otherwise the API rate limiting will kick in after about 30 repositories scanned.

GitLab

For type GitLab you can use the following options:

  • --url: The url of the GitLab server.
  • --access-token: Your personal GitLab access token (needs at least read_api and read_repository scopes).
  • --group: A specific GitLab group id you want to san, including subgroups.
  • --ignore-groups: A list of GitLab group ids you want to ignore
  • --ignore-repos: A list of GitLab project ids you want to ignore
  • --obey-rate-limit: True to obey the rate limit of the GitLab server (default), otherwise False
  • --activity-since-duration: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --activity-until-duration: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --annotate-latest-commit-id: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.

For Gitlab, the url and the access token is mandatory. If you don't provide a specific group id, all projects on the Gitlab server are going to be discovered.

Requirements

Kubernetes: >=v1.11.0-0

Values

KeyTypeDefaultDescription
cascadingRules.enabledboolfalseEnables or disables the installation of the default cascading rules for this scanner
imagePullSecretslist[]Define imagePullSecrets when a private registry is used (see: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
parser.affinityobject{}Optional affinity settings that control how the parser job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/)
parser.envlist[]Optional environment variables mapped into each parseJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
parser.image.pullPolicystring"IfNotPresent"Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
parser.image.repositorystring"docker.io/securecodebox/parser-git-repo-scanner"Parser image repository
parser.image.tagstringdefaults to the charts versionParser image tag
parser.nodeSelectorobject{}Optional nodeSelector settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/)
parser.resourcesobject{ requests: { cpu: "200m", memory: "100Mi" }, limits: { cpu: "400m", memory: "200Mi" } }Optional resources lets you control resource limits and requests for the parser container. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
parser.scopeLimiterAliasesobject{}Optional finding aliases to be used in the scopeLimiter.
parser.tolerationslist[]Optional tolerations settings that control how the parser job is scheduled (see: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
parser.ttlSecondsAfterFinishedstringnilseconds after which the Kubernetes job for the parser will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/
scanner.activeDeadlineSecondsstringnilThere are situations where you want to fail a scan Job after some amount of time. To do so, set activeDeadlineSeconds to define an active deadline (in seconds) when considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup)
scanner.affinityobject{}Optional affinity settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/)
scanner.backoffLimitint3There are situations where you want to fail a scan Job after some amount of retries due to a logical error in configuration etc. To do so, set backoffLimit to specify the number of retries before considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy)
scanner.envlist[]Optional environment variables mapped into each scanJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
scanner.extraContainerslist[]Optional additional Containers started with each scanJob (see: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
scanner.extraVolumeMountslist[]Optional VolumeMounts mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.extraVolumeslist[]Optional Volumes mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.image.pullPolicystring"IfNotPresent"Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
scanner.image.repositorystring"docker.io/securecodebox/scanner-git-repo-scanner"Container Image to run the scan
scanner.image.tagstringnildefaults to the charts version
scanner.nameAppendstringnilappend a string to the default scantype name.
scanner.nodeSelectorobject{}Optional nodeSelector settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/)
scanner.podSecurityContextobject{}Optional securityContext set on scanner pod (see: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
scanner.resourcesobject{}CPU/memory resource requests/limits (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/, https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/)
scanner.securityContextobject{"allowPrivilegeEscalation":false,"capabilities":{"drop":["all"]},"privileged":false,"readOnlyRootFilesystem":true,"runAsNonRoot":true}Optional securityContext set on scanner container (see: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
scanner.securityContext.allowPrivilegeEscalationboolfalseEnsure that users privileges cannot be escalated
scanner.securityContext.capabilities.drop[0]string"all"This drops all linux privileges from the container.
scanner.securityContext.privilegedboolfalseEnsures that the scanner container is not run in privileged mode
scanner.securityContext.readOnlyRootFilesystembooltruePrevents write access to the containers file system
scanner.securityContext.runAsNonRootbooltrueEnforces that the scanner image is run as a non root user
scanner.suspendboolfalseif set to true the scan job will be suspended after creation. You can then resume the job using kubectl resume <jobname> or using a job scheduler like kueue
scanner.tolerationslist[]Optional tolerations settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
scanner.ttlSecondsAfterFinishedstringnilseconds after which the Kubernetes job for the scanner will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/

License

License

Code of secureCodeBox is licensed under the Apache License 2.0.

CPU architectures

The scanner is currently supported for these CPU architectures:

  • linux/amd64

Examples

github-secureCodeBox-scan

This example scans the organization secureCodeBox on github. Remember to add an access token to not encounter rate limiting:

# SPDX-FileCopyrightText: the secureCodeBox authors
#
# SPDX-License-Identifier: Apache-2.0

apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "scan-github"
spec:
scanType: "git-repo-scanner"
parameters:
- "--git-type"
- "github"
- "--organization"
- "secureCodeBox"
cascades:
matchLabels:
securecodebox.io/intensive: medium
securecodebox.io/invasive: non-invasive

gitlab-group-scan

This example shows how to scan a specific group on a GitLab server. It also excludes certain subgroups and projects contained in this group:

# SPDX-FileCopyrightText: the secureCodeBox authors
#
# SPDX-License-Identifier: Apache-2.0

apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "scan-company-gitlab-group"
spec:
scanType: "git-repo-scanner"
parameters:
- "--git-type"
- "gitlab"
- "--url"
- "https://gitlab.your-company.com"
- "--access-token"
- "<YOUR-GITLAB-TOKEN>"
- "--group" #Gitlab group id
- "542"
- "--ignore-groups" #A group can contain subgroups
- "723"
- "--ignore-projects" #Gitlab project id
- "423"
- "123"
cascades:
matchLabels:
securecodebox.io/intensive: medium
securecodebox.io/invasive: non-invasive