반응형

 

1. wsl 설치

wsl --install -d Ubuntu-26.04

 

 

2. 접속 후 업데이트 및 업그레이드

sudo apt update && sudo apt upgrade -y

 

 

3. k3s 설치

// 설치
curl -sfL https://get.k3s.io | sh -

// 테스트 (굳이 안해도됨)
sudo chmod 644 /etc/rancher/k3s/k3s.yaml

// 확인 (굳이 안해도됨)
kubectl get node --kubeconfig /etc/rancher/k3s/k3s.yaml
NAME              STATUS   ROLES           AGE    VERSION
desktop   Ready    control-plane   4m5s   v1.35.5+k3s1

 

설치 후 kubeconfig 위치 변경

// 폴더 생성
mkdir -p ~/.kube

// 복사
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config

// 권한 변경
sudo chown $(id -u):$(id -g) ~/.kube/config

// 환경변수 설정
echo 'export KUBECONFIG=~/.kube/config' >> ~/.bashrc
source ~/.bashrc

// 테스트
kubectl get node
NAME              STATUS   ROLES           AGE    VERSION
desktop   Ready    control-plane   102s   v1.35.5+k3s1

 

 

4. WSL 내부 GPU확인

nvdia-smi 명령어를 통해 아래와 같이 나오면 GPU사용가능

nvidia-smi
Wed Jun 17 23:10:50 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.01              Driver Version: 576.88         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5xxx        On  |   00000000:01:00.0  On |                  N/A |
|  0%   43C    P8             xxW /  xxxW |    xxxxMiB /  xxxxxMiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

 

 

5. NVIDIA Container Toolkit 설치

해당 가이드대로 설치한다

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

sudo apt-get update && sudo apt-get install -y --no-install-recommends \
   ca-certificates \
   curl \
   gnupg2
   
  curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    
sudo apt-get update

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.19.1-1
sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

 

 

6. containerd 설정 후 재시작

k3s는 재시작하면 config.toml을 재생성하므로 tmpl를 만들어두고 이를 사용하도록 한다.

/var/lib/rancher/k3s/agent/etc/containerd 해당 경로의 config.toml 파일을 복사한 뒤 내용을 추가한다.

// 복사
sudo cp /var/lib/rancher/k3s/agent/etc/containerd/config.toml /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl

// 추가
sudo vi /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl

// 파일 끝에 추가
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
  privileged_without_host_devices = false
  runtime_engine = ""
  runtime_root = ""
  runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
    BinaryName = "/usr/bin/nvidia-container-runtime"
    
// 재시작
sudo systemctl restart k3s

 

 

7. GPU Operator 설치

helm 설치

sudo apt-get install curl gpg apt-transport-https --yes
curl -fsSL https://packages.buildkite.com/helm-linux/helm-debian/gpgkey | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://packages.buildkite.com/helm-linux/helm-debian/any/ any main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

 

helm repo 추가

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update

 

gpu-operator 설치

driver는 wsl에서 관리하고, toolkit은 위에서 설치했으므로 enabled=false로 설정하고 설치

// 설치
helm install gpu-operator nvidia/gpu-operator \
-n gpu-operator \
--create-namespace \
--set driver.enabled=false \
--set toolkit.enabled=false

// 조회
kubectl get pods -n gpu-operator
NAME                                                          READY   STATUS    RESTARTS   AGE
gpu-operator-fdfdb94d4-k5hgv                                  1/1     Running   0          98s
gpu-operator-node-feature-discovery-gc-585b876f9c-ln6fh       1/1     Running   0          98s
gpu-operator-node-feature-discovery-master-7f6684fb45-srbnx   1/1     Running   0          98s
gpu-operator-node-feature-discovery-worker-prczd              1/1     Running   0          98s

 

 

설치하면 4개가 나오는데 사실 nvidia 관련 pod들이 더 실행되어야한다.

k3s로 설치해서 node에 라벨을 자동으로 안붙여줘서 그런거같은데, node에 라벨을 붙여주고 restart한다

// 로그찍어보면 label이 없다고함
kubectl logs -f gpu-operator-fdfdb94d4-k5hgv    -n gpu-operator
{"level":"info","ts":1781708654.030799,"logger":"controllers.ClusterPolicy","msg":"No NFD label found, polling for new nodes.","requeueAfter":45}

// 라벨을 붙여준다
kubectl label node <node이름> nvidia.com/gpu.present=true feature.node.kubernetes.io/pci-10de.present=true

// rollout
kubectl rollout restart deploy -n gpu-operator
kubectl rollout restart ds -n gpu-operator

 

다시 get pod해보면 nvidia 관련된 pod들이 실행된다.

하지만, 에러가 발생하는데 WSL은 기본적으로 shared mount를 지원 안 해서 발생한다.

// 조회
kubectl get pods -n gpu-operator
NAME                                                          READY   STATUS                      RESTARTS   AGE
gpu-feature-discovery-z5scz                                   0/1     Init:0/1                    0          35s
gpu-operator-54f9874dfb-2jktc                                 1/1     Running                     0          72s
gpu-operator-node-feature-discovery-gc-77c98d7b86-tgspd       1/1     Running                     0          72s
gpu-operator-node-feature-discovery-master-6b7c5c5c7c-9ctf5   1/1     Running                     0          72s
gpu-operator-node-feature-discovery-worker-nh68d              1/1     Running                     0          52s
nvidia-dcgm-exporter-k9xwp                                    0/1     Init:0/1                    0          36s
nvidia-device-plugin-daemonset-4zrkn                          0/1     Init:0/1                    0          38s
nvidia-operator-validator-fnd8l                               0/1     Init:CreateContainerError   0          39s


// 에러로그
kubectl describe pods nvidia-operator-validator-fnd8l -n gpu-operator
Error: failed to generate container "87d5f73a778cafe98bcc8b47b2fdfcd374483328f6eab18bca95b132040e7de5" spec: failed to generate spec: path "/" is mounted on "/" but it is not a shared or slave mount

 

기타 다른 해결방법이 많은데, 우선 간단하게 해결하는 방법은 아래와 같다. 대신 wsl 기동할때마다 해줘야함

// wsl 재시작할때마다 해줘야 함
sudo mount --make-rshared /

// 이후 다시 조회하면 정상으로 실행됨.
kubectl get pods -n gpu-operator
NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-z5scz                                   1/1     Running     0          3m11s
gpu-operator-54f9874dfb-2jktc                                 1/1     Running     0          3m48s
gpu-operator-node-feature-discovery-gc-77c98d7b86-tgspd       1/1     Running     0          3m48s
gpu-operator-node-feature-discovery-master-6b7c5c5c7c-9ctf5   1/1     Running     0          3m48s
gpu-operator-node-feature-discovery-worker-nh68d              1/1     Running     0          3m28s
nvidia-cuda-validator-p2sdv                                   0/1     Completed   0          22s
nvidia-dcgm-exporter-k9xwp                                    0/1     Running     0          3m12s
nvidia-device-plugin-daemonset-4zrkn                          1/1     Running     0          3m14s
nvidia-operator-validator-fnd8l                               1/1     Running     0          3m15s

 

describe node를 해서 nvidia.com/gpu: 1이 잘 나오는지 확인한다.

안나오면 설정이 제대로 되지 않은 것이다.

kubectl describe node
...
Allocatable:
  ...
  nvidia.com/gpu:     1
...

 

 

8. 테스트 pod 실행

테스트 pod를 실행해본다. 아래는 참고사항이다.

- GPU limits를 명시할 때 requests는 명시하지 않아도 된다.

- limits와 requests를 모두 명시할 수 있지만, 두 값은 동일해야 한다.

- limits 명시 없이는 GPU requests를 명시할 수 없다.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  restartPolicy: Never
  containers:
    - name: cuda
      image: "nvidia/cuda:12.3.0-base-ubuntu22.04"
      command:
      - nvidia-smi
      resources:
        limits:
          nvidia.com/gpu: 1

 

pod를 조회 후 log를 찍어 확인한다.

// Pod 조회
kubectl get pods
NAME       READY   STATUS      RESTARTS   AGE
gpu-test   0/1     Completed   0          15s

// log 출력
kubectl logs -f gpu-test
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.01              Driver Version: 576.88         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5xxx        On  |   00000000:01:00.0  On |                  N/A |
|  0%   43C    P8             xxW /  xxxW |    xxxxMiB /  xxxxxMiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

 

 

기타. 에러관련

'failed to create containerd container: cdi device injection failed unresolvable cdi devices k8s.device-plugin.nvidia.com/gpu' 가끔 해당 에러를 출력하는 경우가 있는데, 2가지방법으로 처리가 가능하다.

 

1) containerd 재설정

enable_cdi 옵션 추가

// 수정
sudo vi /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl

// 해당부분 찾아서 기존에 있는 내용에 아래 2줄 추가
[plugins."io.containerd.grpc.v1.cri"]
  ...
  enable_cdi = true
  cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
  
    
// 재시작
sudo systemctl restart k3s

 

 

2) cdi 변경

cdi를 비활성화 한다.

kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \
    -p='[{"op": "replace", "path": "/spec/cdi/enabled", "value":false}]'

 

 

기본적으로 WSL에서 GPU Operator 사용 방법이며, Timeslice, Volcano, KAI, MIG(DRA) 등을 같이 사용한다.

* MIG는 참고로 일반 PC용 그래픽카드에서는 지원하지 않는다. 집에서 테스트하기엔 어려움

반응형
반응형

 

Kibana Dashboard 를 pdf 로 추출하는 것은 유료 기능이지만, 해당 크롬 익스텐션은 무료로 키바나 대시보드를 이미지로 추출할 수 있는 기능이다.

오직 키바나 대시보드에서만 사용가능하다. 이미지로 추출하고 싶은 특정 키바나 대시보드에 접속하여 업로드한 크롬 익스텐션을 이용해 이미지를 추출한다.

이미지는 kibana-dashboard.png 로 자동 다운로드 된다.

 

다운로드 및 가이드는 아래 git url로 대체한다.

https://github.com/wonkwangyeon/Kibana-Dashboard-Image-Exporter

 

GitHub - wonkwangyeon/Kibana-Dashboard-Image-Exporter: Kibana dashboard image exporter for Chrome Extension

Kibana dashboard image exporter for Chrome Extension - wonkwangyeon/Kibana-Dashboard-Image-Exporter

github.com

 

반응형
반응형

 

ElasticAPM 및 ElasticAPM Agent 사용하지않고,

Opentelemetry Agent -> Opentelemetry Collector -> ElasticSearch 로 전달하게되면,

Kibana의 Observability 메뉴에서 UI 로 조회가 되지않는다.

(물론 Opentelemetry Collector -> ElasticAPM -> ElasticSearch 또는 EDOT를 사용하면 되긴 함.)

 

EDOT를 쓰기엔 너무 무겁고 ElasticAPM을 사용하기에도 조금 그렇다면,

일반 Opentelemetry Collector에서 ElasticAPM 을 Custom 해야한다.

 

 

1. OCB 설치

https://opentelemetry.io/docs/collector/extend/ocb/

공식가이드에 따라 설치하면 되나, ElasticSearch Custom Collector 버전 때문인지, 조금 낮은버전 설치한다.

현재 가이드 기준 0.143.0 이지만 0.138.0 설치했다.

curl --proto '=https' --tlsv1.2 -fL -o ocb https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/cmd%2Fbuilder%2Fv0.138.0/ocb_0.138.0_linux_arm64

 

실행파일로 변경

chmod +x ocb

 

 

2. Go언어 버전확인 및 builder-config 설정

ElasticSearch Custom Collector 버전이 현재 날짜 기준 1.25.5 에서는 동작하지않아 1.24.11 버전을 설치하였다.

 

https://www.elastic.co/docs/reference/edot-collector/custom-collector

 

위 가이드를 통해 builder-config를 설정하면되는데, 가이드가 최신인듯하면서 구식이라 설정을 아래와 같이 변경해주어야한다.

dist.name과 output_path의 경우 내가 사용할 이름으로 설정하였다.

 

참고로 가이드에는 basicauthextension이 누락되어있으나, 필요하여 따로 추가해주었다.

basicatuhextension이 없으면 collector에서 아래 옵션 사용해서 사용하면됨.

headers:

    authorization: Basic base64

dist:
  name: otelcol-edot
  description: Elastic Distribution of OpenTelemetry Collectors
  output_path: ./otelcol-edot

receivers:
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/apachereceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver v0.139.0
  - gomod:
      github.com/elastic/opentelemetry-collector-components/receiver/elasticapmintakereceiver v0.21.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/filelogreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/httpcheckreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/iisreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jmxreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sclusterreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8seventsreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sobjectsreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kubeletstatsreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/mysqlreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/nginxreceiver v0.139.0
  - gomod:
      go.opentelemetry.io/collector/receiver/nopreceiver v0.139.0
  - gomod:
      go.opentelemetry.io/collector/receiver/otlpreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/postgresqlreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/receivercreator v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/redisreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/windowseventlogreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/windowsperfcountersreceiver v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/receiver/zipkinreceiver v0.139.0

processors:
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/attributesprocessor v0.139.0
  - gomod:
      go.opentelemetry.io/collector/processor/batchprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor v0.139.0
  - gomod:
      github.com/elastic/opentelemetry-collector-components/processor/elasticapmprocessor v0.21.0
  - gomod:
      github.com/elastic/opentelemetry-collector-components/processor/elasticinframetricsprocessor v0.20.0
  - gomod:
      github.com/elastic/opentelemetry-collector-components/processor/elastictraceprocessor v0.20.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/geoipprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor v0.139.0
  - gomod:
      go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/resourcedetectionprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/resourceprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor v0.139.0

exporters:
  - gomod:
      go.opentelemetry.io/collector/exporter/debugexporter v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/exporter/fileexporter v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/exporter/loadbalancingexporter v0.139.0
  - gomod:
      go.opentelemetry.io/collector/exporter/nopexporter v0.139.0
  - gomod:
      go.opentelemetry.io/collector/exporter/otlpexporter v0.139.0
  - gomod:
      go.opentelemetry.io/collector/exporter/otlphttpexporter v0.139.0

connectors:
  - gomod:
      github.com/elastic/opentelemetry-collector-components/connector/elasticapmconnector v0.20.0
  - gomod:
      go.opentelemetry.io/collector/connector/forwardconnector v0.139.0
  - gomod:
      github.com/elastic/opentelemetry-collector-components/connector/profilingmetricsconnector v0.20.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/connector/routingconnector v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/connector/spanmetricsconnector v0.139.0

extensions:
  - gomod:
      github.com/elastic/opentelemetry-collector-components/extension/apikeyauthextension v0.22.0
  - gomod:
      github.com/elastic/opentelemetry-collector-components/extension/apmconfigextension v0.20.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/bearertokenauthextension v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage/filestorage v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/headerssetterextension v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckv2extension v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/k8sleaderelector v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/k8sobserver v0.139.0
  - gomod:
      go.opentelemetry.io/collector/extension/memorylimiterextension v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/pprofextension v0.139.0
  - gomod:
      github.com/open-telemetry/opentelemetry-collector-contrib/extension/basicauthextension v0.139.0 ## 추가

providers:
  - gomod:
      go.opentelemetry.io/collector/confmap/provider/envprovider v1.45.0
  - gomod:
      go.opentelemetry.io/collector/confmap/provider/fileprovider v1.45.0
  - gomod:
      go.opentelemetry.io/collector/confmap/provider/httpprovider v1.45.0
  - gomod:
      go.opentelemetry.io/collector/confmap/provider/httpsprovider v1.45.0
  - gomod:
      go.opentelemetry.io/collector/confmap/provider/yamlprovider v1.45.0

 

3. 빌드 테스트

아래 명령어를 통해 빌드가 잘되는지 확인한다.

(웬만하면 안될 경우 go 언어 버전 또는 ocb 버전문제이다.)

./ocb --config builder-config.yaml

 

성공하면 아래와 같이 나온다.

$ ls -al
builder-config.yaml ocb  otelcol-edo

 

 

4. 이미지 빌드

이제 collector-config.yaml 과 Dockerfile을 생성해준다.

 

collector-config는 아래와 같이 공식가이드대로 생성해주었다.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [debug]
    metrics:
      receivers: [otlp]
      exporters: [debug]
    logs:
      receivers: [otlp]
      exporters: [debug]

 

Dockerfile도 공식가이드에서 golang 버전과 ocb 버전, ENTRYPOINT를 변경해주었다.

FROM alpine:3.19 AS certs
RUN apk --update add ca-certificates

FROM golang:1.24.11 AS build-stage
WORKDIR /build

COPY ./builder-config.yaml builder-config.yaml

RUN --mount=type=cache,target=/root/.cache/go-build GO111MODULE=on go install go.opentelemetry.io/collector/cmd/builder@v0.138.0
RUN --mount=type=cache,target=/root/.cache/go-build builder --config builder-config.yaml

FROM gcr.io/distroless/base:latest

ARG USER_UID=10001
USER ${USER_UID}

COPY ./collector-config.yaml /otelcol/collector-config.yaml
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --chmod=755 --from=build-stage /build/otelcol-edot /otelcol

ENTRYPOINT ["/otelcol/otelcol-edot"]
CMD ["--config", "/otelcol/collector-config.yaml"]

EXPOSE 4317 4318 12001

 

다 만들면 아래와 같이 되어있다.

.
├── builder-config.yaml
├── collector-config.yaml
└── Dockerfile
└── ocb			# 삭제해도 됨
└── otelcol-edot	# 삭제해도 됨

 

 

가이드대로 그대로 빌드한다.

# Enable Docker multi-arch builds
docker run --rm --privileged tonistiigi/binfmt --install all
docker buildx create --name mybuilder --use

# Build the Docker image as Linux AMD and ARM
# and load the result to "docker images"
docker buildx build --load \
  -t <collector_distribution_image_name>:<version> \
  --platform=linux/amd64,linux/arm64 .

# Test the newly built image
docker run -it --rm -p 4317:4317 -p 4318:4318 \
    --name otelcol <collector_distribution_image_name>:<version>

 

 

otel-custom:1.0 이라고 빌드하여서 아래와 같이 되어있다.

IMAGE                           ID             DISK USAGE   CONTENT SIZE
otel-custom:1.0                 c6bd6908f678        318MB         64.1M

 

 

5. Opentelemetry Collector 설치

이제 해당 이미지를 이미지 서버에 추가하여 배포한다.

테스트를 위해  operator, collecto 따로 설치하였다.

 

operator

helm install opentelemetry-operator open-telemetry/opentelemetry-operator --set admissionWebhooks.certManager.enabled=false --set admissionWebhooks.autoGenerateCert.enabled=true -n tracing

 

collector

helm install opentelemetry-collector open-telemetry/opentelemetry-collector --set image.repository=docker.io/library/otel-custom --set image.tag=1.0 --set image.pullPolicy=IfNotPresent --set mode=deployment -n tracing

 

collector 설치 후 configmap수정

exporters에 elasticsearch와 connector에 elasticapm을 추가하였고

recevier에도 추가했다

apiVersion: v1
data:
  relay: |
    exporters:
      debug: {}
      elasticsearch:  ## 추가
        endpoint: https://127.0.0.1:9200
        headers:
          authorization: Basic 
        tls:
          insecure: false
          insecure_skip_verify: true
    extensions:
      health_check:
        endpoint: ${env:MY_POD_IP}:13133
    processors:
      batch: {}
      elasticapm: {}
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
    connectors:       ## 추가
      elasticapm: {}  ## 추가
    receivers:
      jaeger:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:14250
          thrift_compact:
            endpoint: ${env:MY_POD_IP}:6831
          thrift_http:
            endpoint: ${env:MY_POD_IP}:14268
      otlp:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:4317
          http:
            endpoint: ${env:MY_POD_IP}:4318
      prometheus:
        config:
          scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
            - targets:
              - ${env:MY_POD_IP}:8888
      zipkin:
        endpoint: ${env:MY_POD_IP}:9411
    service:
      extensions:
      - health_check
      pipelines:
        logs:
          exporters:
          - debug
          - elasticsearch ## 추가
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
        metrics:
          exporters:
          - debug
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
          - prometheus
        traces:
          exporters:
          - debug
          - elasticsearch   ## 추가
          processors:
          - memory_limiter
          - batch
          - elasticapm ## 추가
          receivers:
          - otlp
          - jaeger
          - zipkin
      telemetry:
        metrics:
          readers:
          - pull:
              exporter:
                prometheus:
                  host: ${env:MY_POD_IP}
                  port: 8888

 

 

6. 또한, 테스트용으로 aws-sample 도 설치하였다

테스트용 aws-sample

kubectl apply -f https://github.com/aws-containers/retail-store-sample-app/releases/latest/download/kubernetes.yaml -n aws

 

 

auto instrumentation 추가

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: demo-instrumentation
spec:
  exporter:
    endpoint: http://opentelemetry-collector.tracing.svc.cluster.local:4318
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"

 

inject-java

## 환경변수에 활성화 설정
OTEL_JAVAAGENT_ENABLED: "true"

## Annotation에 추가
instrumentation.opentelemetry.io/inject-java: "true"

 

 

7. aws-sample 에 접속해서 테스트하면 원래 Opentelemetry Collector만 사용하면 안나오던 UI가 나온다.

반응형

'Develop > ElasticSearch' 카테고리의 다른 글

ElasticSearch Clustering  (0) 2025.12.29
ElasticSearch and Kibana install (Ubuntu 설치)  (0) 2025.12.29

+ Recent posts