SadServers
  • Scenarios
  • Dashboard
  • Solutions
    For Individuals For Businesses
  • Ranking
  • Newsletter
  • Documentation
    FAQ Support Pro Accounts Pro+ Accounts Business Accounts Gift API CLI/TUI Privacy Troubleshooting Interviews
  • Blog
  • Pricing
  • Gift
    Gift Purchase Gift Redeem
  • About
Log In - Sign Up

Realistic-interviews Troubleshooting Scenarios

advent2025 ai apache bash c caddy clickhouse cron csv data processing disk volumes dns docker envoy etcd ftp git golang gunicorn hack haproxy harbor hashicorp vault helm java jenkins json kubernetes linux-other mongodb mysql nginx node.js php podman postgres prometheus python rabbitmq redis sql sqlite ssh ssl supervisord systemd traefik
realistic / interviews new pro business

Realistic-interviews

Scenarios that are more realistic and suited for job interviews
# Name Level Time Type
1 "Hamburg": Find the AWS EC2 volume Easy 60 m Do Pro
"Hamburg": Find the AWS EC2 volume

Scenario: "Hamburg": Find the AWS EC2 volume

Level: Easy

Type: Do

Access: Paid

Description: We have a lot of AWS EBS volumes, the description of which we have save to a file with: aws ec2 describe-volumes > aws-volumes.json.
One of the volumes contains important data and we need to identify which volume (its ID), but we only remember these characteristics: gp3, created before 30/09/2025 , Size < 64 , Iops < 1500, Throughput > 300.

Find the correct volume and put its InstanceId into the ~/mysolution file, e.g.: echo "i-00000000000000000" > ~/mysolution

Test: Running md5sum /home/admin/mysolution returns e7e34463823bf7e39358bf6bb24336d8 (we also accept the file without a new line at the end).

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 60 minutes.

2 "Woluwe": Too many images Medium 30 m Fix Pro
"Woluwe": Too many images

Scenario: "Woluwe": Too many images

Level: Medium

Type: Fix

Access: Paid

Description: A pipeline created a lot of Docker images locally for a web app. All these images except for one contain a typo introduced by a developer: there's an incorrect image instruction to pipe "HelloWorld" to "index.htmlz" instead of using the correct "index.html"
Find which image doesn't have the typo (and uses the correct "index.html"), tag this correct image as "prod" (rather than fixing the current prod image) and then deploy it with docker run -d --name prod -p 3000:3000 prod so it responds correctly to HTTP requests on port :3000 instead of "404 Not Found".

Test: curl http://localhost:3000 should respond with HelloWorld;529

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 30 minutes.

3 "Torino": Optimize grande Docker image Medium 30 m Do Pro
"Torino": Optimize grande Docker image

Scenario: "Torino": Optimize grande Docker image

Level: Medium

Type: Do

Access: Paid

Description: A Torino Node.js application is located in the ~/torino-app directory.
You can run it directly with: nohup node app.js > app.log 2>&1 &. You can also verify that it works by running: curl localhost:3000

There is already a torino Docker image built with the Dockerfile in ~/torino-app, but the resulting image size is 916 MB.

Your task is to optimize the Docker image size:
1. Build a new Docker image for the Torino application, also called torino:latest but with a total size under 122 MB
2. Create and run a container using this optimized image.

NOTE: You can only use the existing Docker images in the server.
To build a Node application you need to COPY in your Dockerfile, besides the app.js , the package*.json files and without Internet access, the node_modules directory, since you cannot RUN npm install.

Test: The torino Docker image is less than 122 MB and curl http://localhost:3000 returns Hello from Torino!

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 30 minutes.

4 "San Juan": mucho Traefik Medium 20 m Fix
"San Juan": mucho Traefik

Scenario: "San Juan": mucho Traefik

Level: Medium

Type: Fix

Access: Email

Description: There is a Traefik load balancer that must be up and running. The server and the backend services are managed by Docker Compose. Running curl -s app.sadserver | head -n1 must return the host ID of one of the backend servers, running the command again must return a new host ID. The server seems to be working some times, some others fails or just times out.

The round-robin configuration should make the webserver iterate through the back-end servers.

Test: curl -s app.sadserver | head -n1 returns something like Hostname:

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 20 minutes.

5 "Lyon": Migrate Ingress-NGINX to Traefik Medium 20 m Do New
"Lyon": Migrate Ingress-NGINX to Traefik

Scenario: "Lyon": Migrate Ingress-NGINX to Traefik

Level: Medium

Type: Do

Access: Email

Description: Ingress-NGINX is being retired. As the DevOps Engineer, you will replace it with Traefik on the production Kubernetes cluster in a private VPC. This scenario is a local proof-of-concept for that migration.

The current K8s cluster has a "Hello World" pod running, i.e.: curl hello.lyon.local returns "Hello world" (see note 1). You should be able to see the same content delivered via Traefik once the ingress-nginx is down.

Notes: 1: Wait at the start until k8s is fully up before doing curl, otherwise you get 503, you can check for ex with k get pod -n ingress-nginx
2: The k8s manifests are under the ~/app dir.
3: ingress-nginx was deployed with a Helm chart.
4: The Helm chart for traefik is available under /home/admin/traefik (The Traefik image is already loaded in k3s).
5: Traefik dashboard and probes/metrics port by default is :8080 but that's used by the system; use a different port or disable.
6: The domain hello.lyon.local is actually pointing to the localhost.
7: The ingress must be listening on port 80 for any IP so it can respond to localhost:80 or actually to *:80

TIP: You can use k as an alias for kubectl, and it has autocomplete enabled.

Test: When the command curl -i hello.lyon.local is executed, it returns the message Hello World, while only the traefik pod must be present (instead of ingress-nginx).

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 20 minutes.

6 "Stockholm": DNS health check issue Medium 20 m Fix Pro New
"Stockholm": DNS health check issue

Scenario: "Stockholm": DNS health check issue

Level: Medium

Type: Fix

Access: Paid

Description: The internal status portal on this host should answer on http://127.0.0.1:9167/ with a body containing OK.

It worked until operations ran a package cleanup.

The portal service (stockholm-portal) only runs after a DNS health check at /usr/local/bin/stockholm-dns-check.sh succeeds.

Make the necessary changes so the portal works again.

Do not modify /usr/local/bin/stockholm-dns-check.sh.

Test: The health script /usr/local/bin/stockholm-dns-check.sh runs successfully, stockholm-portal is active, and curl http://127.0.0.1:9167/ returns a response whose body contains OK.

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 20 minutes.

7 "Cordoba": df is lying (or is it du?) Medium 10 m Fix New
"Cordoba": df is lying (or is it du?)

Scenario: "Cordoba": df is lying (or is it du?)

Level: Medium

Type: Fix

Access: Email

Description: Monitoring reports that the root filesystem is under pressure, but a quick du of /var/log shows almost nothing in the logs of the running application at /var/log/cordoba-app.

Find what is holding the space and reclaim it so df and du agree again for practical purposes; currently there's a ~300 MB discrepancy on the root partition /

The service unit is cordoba-hoarder.service.

Test: df -h / and sudo du -sh / report the same used space after reclaiming the ~300 MB discrepancy.

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 10 minutes.

8 "Tallinn": BuildKit & Docker build mismatch Medium 15 m Fix New
"Tallinn": BuildKit & Docker build mismatch

Scenario: "Tallinn": BuildKit & Docker build mismatch

Level: Medium

Type: Fix

Access: Email

Description: This VM runs a tiny container app, tallinn-service, whose only job is to print an API version string (for example tallinn-api-version=1.4.0). The image is built from /home/admin/tallinn-app with docker build.

The dev team raised the API contract to 2.0.0 in src/api_version.txt and ran a new build, but QA still rejects the image tagged tallinn-app:current: it reports 1.4.0 at runtime. A recent CI log is in /home/admin/build.log.

Fix the docker build outcome so the deploy image matches what the sources ask for.

Fix the image tagged tallinn-app:current so the on-disk contract file and the shipped binary both report API 2.0.0.

Test: Image tallinn-app:current exists, /etc/tallinn/api_version is 2.0.0, and /usr/local/bin/tallinn-service prints tallinn-api-version=2.0.0.

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 15 minutes.

9 "Anatolia": compromised server Hard 40 m Fix Pro
"Anatolia": compromised server

Scenario: "Anatolia": compromised server

Level: Hard

Type: Fix

Access: Paid

Description: This web server has been compromised and is not serving the home page anymore, those troubleshooting skills you have as DevOps are urgently needed to solve the mystery of the missed home page and restore the integrity of the server.

Note: The default configuration files under /etc/apache2 are not the problem.

This scenario is based on a real server that was "hacked". Ideally you'd recover from infrastrucrure as code playbooks and clean data backups on a new server with the vulnerabilities fixed. Instead, in this exercise you are asked to clean manually the compromised server, restore it to a working condition and ideally, find how the server was broken into. The solution test only checks that the web service is working.

Test: curl localhost must return SadServer - Anatolia

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 40 minutes.

10 "Sapporo": ephemeral tokens Hard 20 m Fix New
"Sapporo": ephemeral tokens

Scenario: "Sapporo": ephemeral tokens

Level: Hard

Type: Fix

Access: Email

Description: The Sapporo gate API on this host should answer on http://127.0.0.1:9180/ with a body containing OK.

A background service writes short-lived tokens to /var/lib/sapporo/pulse (each value is visible for only a fraction of a second, then the file is cleared again). The gate compares /home/admin/sapporo/active-token against the latest emitted token.

The installed collector at /home/admin/sapporo-collector.sh (triggered by sapporo-collector.timer once per minute) never keeps up; active-token stays empty or stale and the gate keeps failing.

Fix collection so the current token is captured reliably and the gate returns OK.

Test: curl http://127.0.0.1:9180/ returns a response whose body contains OK, and /home/admin/sapporo/active-token holds a token matching the current pulse (format SAPPORO- followed by eight hex digits).

The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute.

Time to Solve: 20 minutes.

Send Us Feedback or Get Notified
For announcements like new scenarios. We'll never share your email with anyone else.
SadServersSadServers

Real-world Linux and DevOps scenarios for hands-on learning and technical assessment.

Uptime Robot ratio (30 days)
Product
  • Scenarios
  • For Individuals
  • For Businesses
  • Pricing
Resources
  • FAQ
  • Blog
  • Newsletter
Company
  • About Us
  • Support
  • Privacy Policy
  • Terms of Service
  • Contact
Connect With Us
info@sadservers.com

Made in Canada 🇨🇦
Updated: 2026-06-02 21:12 UTC – 5583953