SadServers - Linux & DevOps Troubleshooting Interviews

#	Name	Level	Time	Type
1	"Bilbao": Basic Kubernetes Problems	Easy	10 m	Fix	"Bilbao": Basic Kubernetes Problems Scenario: "Bilbao": Basic Kubernetes Problems Level: Easy Type: Fix Access: Email Description: There's a Kubernetes Deployment with an Nginx pod and a Load Balancer declared in the manifest.yml file. The pod is not coming up. Fix it so that you can access the Nginx container through the Load Balancer. There's no "sudo" (root) access. Test: Running `curl 10.43.216.196` returns the default Nginx Welcome page. See /home/admin/agent/check.sh for the test that "Check My Solution" runs. Time to Solve: 10 minutes.
2	Linux Server Review - Guided Learning	Easy	30 m	Do	Linux Server Review - Guided Learning Scenario: Linux Server Review - Guided Learning Level: Easy Type: Do Access: Email Description: This is a guided learning scenario. Follow this Linux Server Review Scenario Guide The purpose of this scenario is to review a Linux server and be able to answer questions like: What's the purpose of the server? What's the hardware (CPU / RAM / disk / net) utilization of the server? is there a problem there? What is running and what's going on in the server? Note: This challenge doesn't have a specific solution (there's no "Check My Solution") Test: (there's no test) Time to Solve: 30 minutes.
3	"Manhattan": can't write data into database.	Medium	20 m	Fix No Registration	"Manhattan": can't write data into database. Scenario: "Manhattan": can't write data into database. Level: Medium Type: Fix Access: Public Description: Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don't need to know details about it (although it may help). Helpful Postgres information: it's a service that listens to a port (:5432) and writes to disk in a data directory, the location of which is defined in the data_directory parameter of the configuration file /etc/postgresql/14/main/postgresql.conf. In our case Postgres is managed by systemd as a unit with name postgresql. Test: (from default admin user) `sudo -u postgres psql -c "insert into persons(name) values ('jane smith');" -d dt` Should return:`INSERT 0 1` Time to Solve: 20 minutes.
4	"Tokyo": can't serve web file	Medium	15 m	Fix	"Tokyo": can't serve web file Scenario: "Tokyo": can't serve web file Level: Medium Type: Fix Access: Email Description: There's a web server serving a file /var/www/html/index.html with content "hello sadserver" but when we try to check it locally with an HTTP client like `curl 127.0.0.1:80`, nothing is returned. This scenario is not about the particular web server configuration and you only need to have general knowledge about how web servers work. Test: `curl 127.0.0.1:80` should return: `hello sadserver` Time to Solve: 15 minutes.
5	"Cape Town": Borked Nginx	Medium	15 m	Fix	"Cape Town": Borked Nginx Scenario: "Cape Town": Borked Nginx Level: Medium Type: Fix Access: Email Description: There's an Nginx web server installed and managed by systemd. Running `curl -I 127.0.0.1:80` returns `curl: (7) Failed to connect to localhost port 80: Connection refused` , fix it so when you curl you get the default Nginx page. Test: `curl -Is 127.0.0.1:80\|head -1` returns `HTTP/1.1 200 OK` Time to Solve: 15 minutes.
6	"Salta": Docker container won't start.	Medium	15 m	Fix	"Salta": Docker container won't start. Scenario: "Salta": Docker container won't start. Level: Medium Type: Fix Access: Email Description: There's a "dockerized" Node.js web application in the `/home/admin/app` directory. Create a Docker container so you get a web app on port :8888 and can curl to it. For the solution to be valid, there should be only one running Docker container. Test: `curl localhost:8888` returns `Hello World!` from a running container. Time to Solve: 15 minutes.
7	"Melbourne": WSGI with Gunicorn	Medium	20 m	Fix Pro	"Melbourne": WSGI with Gunicorn Scenario: "Melbourne": WSGI with Gunicorn Level: Medium Type: Fix Access: Paid Description: There is a Python WSGI web application file at /home/admin/wsgi.py , the purpose of which is to serve the string "Hello, world!". This file is served by a Gunicorn server which is fronted by an nginx server (both servers managed by systemd). So the flow of an HTTP request is: Web Client (curl) -> Nginx -> Gunicorn -> wsgi.py . The objective is to be able to curl the localhost (on default port :80) and get back "Hello, world!", using the current setup. Test: `curl -s http://localhost` returns `Hello, world!` (serving the wsgi.py file via Gunicorn and Nginx) Time to Solve: 20 minutes.
8	"Lisbon": etcd SSL cert troubles	Medium	20 m	Fix	"Lisbon": etcd SSL cert troubles Scenario: "Lisbon": etcd SSL cert troubles Level: Medium Type: Fix Access: Email Description: There's an etcd server running on https://localhost:2379 , get the value for the key "foo", ie `etcdctl get foo` or `curl https://localhost:2379/v2/keys/foo` Test: `etcdctl get foo` returns `bar`. Time to Solve: 20 minutes.
9	"Kihei": Surely Not Another Disk Space Scenario	Medium	30 m	Fix	"Kihei": Surely Not Another Disk Space Scenario Scenario: "Kihei": Surely Not Another Disk Space Scenario Level: Medium Type: Fix Access: Email Description: There is a /home/admin/kihei program. Make the changes necessary so it runs succesfully, without deleting the /home/admin/datafile file. Test: Running `/home/admin/kihei` returns `Done.`. Time to Solve: 30 minutes.
10	"Unimak Island": Fun with Mr Jason	Medium	15 m	Do Pro	"Unimak Island": Fun with Mr Jason Scenario: "Unimak Island": Fun with Mr Jason Level: Medium Type: Do Access: Paid Description: Using the file station_information.json , find the station_id where "has_kiosk" is false and "capacity" is greater than 30. Save the station_id of the solution in the /home/admin/mysolution file, for example: `echo "ec040a94-4de7-4fb3-aea0-ec5892034a69" > ~/mysolution` You can use the installed utilities jq, gron, jid as well as Python3 and Golang. Test: `md5sum /home/admin/mysolution` returns `8d8414808b15d55dad857fd5aeb2aebc` Time to Solve: 15 minutes.
11	"Ivujivik": Parlez-vous Français?	Medium	20 m	Do Pro	"Ivujivik": Parlez-vous Français? Scenario: "Ivujivik": Parlez-vous Français? Level: Medium Type: Do Access: Paid Description: Given the CSV file /home/admin/table_tableau11.csv, find the Electoral District Name/Nom de circonscription that has the largest number of Rejected Ballots/Bulletins rejetés and also has a population of less than 100,000. The initial CSV file may be corrupted or invalid in a way that can be fixed without changing its data. Installed in the VM are: Python3, Go, sqlite3, miller directly and PostgreSQL, MySQL in Docker images. Save the solution in the /home/admin/mysolution , with the name as it is in the file, for example: `echo "Trois-Rivières" > ~/mysolution` (the solution must be terminated by newline). Test: `md5sum` /home/admin/mysolution returns `e399d171f21839a65f8f8ab55ed1e1a1` Time to Solve: 20 minutes.
12	"Buenos Aires": Kubernetes Pod Crashing	Medium	20 m	Fix	"Buenos Aires": Kubernetes Pod Crashing Scenario: "Buenos Aires": Kubernetes Pod Crashing Level: Medium Type: Fix Access: Email Description: There are two pods: "logger" and "logshipper" living in the default namespace. Unfortunately, logshipper has an issue (crashlooping) and is forbidden to see what logger is trying to say. Could you help fix Logshipper? Do not change the K8S definition of the logshipper pod. Use "sudo". Because k8s takes a minute or two to change the pod state initially, the check for the scenario is made to fail in the first two minutes. Credit Srivatsav Kondragunta Test: `kubectl get pods -l app=logshipper --no-headers -o json \| jq -r '.items[] \| "\(.status.containerStatuses[0].ready)"'` returns `true` Time to Solve: 20 minutes.
13	"Tarifa": Between Two Seas	Medium	20 m	Fix Pro	"Tarifa": Between Two Seas Scenario: "Tarifa": Between Two Seas Level: Medium Type: Fix Access: Paid Description: There are three Docker containers defined in the docker-compose.yml file: an HAProxy accepting connetions on port :5000 of the host, and two nginx containers, not exposed to the host. The person who tried to set this up wanted to have HAProxy in front of the (backend or upstream) nginx containers load-balancing them but something is not working. Test: Running `curl localhost:5000` several times returns both `hello there from nginx_0` and `hello there from nginx_1` Check /home/admin/agent/check.sh for the test that "Check My Solution" runs. Time to Solve: 20 minutes.
14	"Warsaw": Prometheus can't scrape the webserver	Medium	30 m	Fix Pro	"Warsaw": Prometheus can't scrape the webserver Scenario: "Warsaw": Prometheus can't scrape the webserver Level: Medium Type: Fix Access: Paid Description: A developer created a golang application that is exposing the /metrics endpoint. They have a problem with scraping the metrics from the application. They asked you to help find the problem. Full source code of the application is available at the /home/admin/app directory. Credit Kamil Błaż Test: The endpoint http://localhost:9000/metrics should return HTTP code 200. The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute. Time to Solve: 30 minutes.
15	"Moyogalpa": Security Snag. The Trials of Mary and John	Medium	30 m	Fix	"Moyogalpa": Security Snag. The Trials of Mary and John Scenario: "Moyogalpa": Security Snag. The Trials of Mary and John Level: Medium Type: Fix Access: Email Description: Mary and John are working on a Golang web application, and the security team has asked them to implement security measures. Unfortunately, they have broken the application, and it no longer functions. They need your help to fix it. The fixed application should be able to allow clients to communicate with the application over HTTPS without ignoring any checks. (eg: `curl https://webapp:7000/users.html`) and serve its static files. Test: `curl https://webapp:7000/users.html` should return the content of file. The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute. Time to Solve: 30 minutes.
16	"Helsingør": The first walls of postgres physical replication	Medium	20 m	Fix	"Helsingør": The first walls of postgres physical replication Scenario: "Helsingør": The first walls of postgres physical replication Level: Medium Type: Fix Access: Email Description: You're setting up a PostgreSQL database with replication, you decided to use Docker along with Docker Compose to make it easier to manage and test, after a few hours of work you finished the job and the master database is up and running, but you're having trouble with the replica. You need to figure out what's wrong with the replica and fix it. Since you are using Docker Compose, you can check the status of the running containers using `docker compose ps` or `docker ps` will do the job too). You may also want to check the logs of the containers. All definition for the containers are inside the docker-compose.yml file. You can stand up the environment by running `docker compose up -d` and set it down by running `docker compose down`. If you make any change to the docker-compose.yml file, you can restart the containers by running `docker compose up -d --force-recreate`. Test: Postgres replica container works. The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute. Time to Solve: 20 minutes.
17	"Bekasi": Supervisor is still around	Medium	20 m	Fix Pro	"Bekasi": Supervisor is still around Scenario: "Bekasi": Supervisor is still around Level: Medium Type: Fix Access: Paid Description: There is an nginx service running on port 443, it is the main web server for the company and looks like a new employee has deployed some changes to the configuration of supervisor and now it is not working as expected. If you try to access `curl -k https://bekasi` it should return `Hello SadServers!` but for some reason it is not. You cannot modify files from the /home/admin/bekasi folder in order to pass the check.sh You must find out what the issue is and fix it. Test: `curl -k https://bekasi` returns `Hello SadServers!` The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute. Time to Solve: 20 minutes.
18	"Batumi": Troubleshoot "A" cannot connect to "B"	Medium	20 m	Fix Pro	"Batumi": Troubleshoot "A" cannot connect to "B" Scenario: "Batumi": Troubleshoot "A" cannot connect to "B" Level: Medium Type: Fix Access: Paid Description: (To learn the skills to solve this challenge, see Can't Connect to a Service: Linux Troubleshooting Guide) There is a web server (Caddy) on HTTP port :80 but `curl http://127.0.0.1` doesn't work. Find out what's wrong and make the necessary fixes so the web server returns a URL. Note: as a limitation, the file /home/admin/db_connector.py must not be modified so that the challenge is considered solved properly. The web server has to respond on the IP address 127.0.0.1; not only on "localhost". Test: The command `curl http://127.0.0.1` returns a URL address. The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute. Time to Solve: 20 minutes.
19	"Karakorum": WTFIT – What The Fun Is This?	Hard	20 m	Fix Pro	"Karakorum": WTFIT – What The Fun Is This? Scenario: "Karakorum": WTFIT – What The Fun Is This? Level: Hard Type: Fix Access: Paid Description: There's a binary at `/home/admin/wtfit` that nobody knows how it works or what it does ("what the fun is this"). Someone remembers something about wtfit needing to communicate to a service in order to start. Run this wtfit program so it doesn't exit with an error, fixing or working around things that you need but are broken in this server. (Note that you can open more than one web "terminal"). Test: Running `/home/admin/wtfit` returns `OK.` Time to Solve: 20 minutes.
20	"Hong-Kong": can't write data into database.	Hard	20 m	Fix	"Hong-Kong": can't write data into database. Scenario: "Hong-Kong": can't write data into database. Level: Hard Type: Fix Access: Email Description: (Similar to "Manhattan" scenario but harder). Your objective is to be able to insert a row in an existing Postgres database. The issue is not specific to Postgres and you don't need to know details about it (although it may help). Postgres information: it's a service that listens to a port (:5432) and writes to disk in a data directory, the location of which is defined in the data_directory parameter of the configuration file /etc/postgresql/14/main/postgresql.conf. In our case Postgres is managed by systemd as a unit with name postgresql. Test: `sudo -u postgres psql -c "insert into persons(name) values ('jane smith');" -d dt` Should return:`INSERT 0 1` Time to Solve: 20 minutes.
21	"Pokhara": SSH and other sshenanigans	Hard	30 m	Fix Pro	"Pokhara": SSH and other sshenanigans Scenario: "Pokhara": SSH and other sshenanigans Level: Hard Type: Fix Access: Paid Description: A user `client` was added to the server, as well as their SSH public key. The objective is to be able to SSH locally (there's only one server) as this user client using their ssh keys. This is, if as root you change to this user `sudo su; su client`, you should be able to login with ssh: `ssh localhost`. Test: As user admin: `sudo -u client ssh client@localhost 'pwd'` returns `/home/client` Time to Solve: 30 minutes.
22	"Belo-Horizonte": A Java Enigma	Hard	20 m	Fix Pro	"Belo-Horizonte": A Java Enigma Scenario: "Belo-Horizonte": A Java Enigma Level: Hard Type: Fix Access: Paid Description: (Credit for the idea: fuero) There is a one-class Java application in your /home/admin directory. Running the program will print out a secret code, or you may be able to extract the secret from the class file without executing it but I'm not providing any special tools for that. Put the secret code in a /home/admin/solution file, eg `echo "code" > /home/admin/solution`. Test: `md5sum /home/admin/solution \|awk '{print $1}'` returns `9d2bd7aabb26681eacd9444da6b6643c` Time to Solve: 20 minutes.
23	"Chennai": Pull a Rabbit from a Hat	Hard	30 m	Fix Pro	"Chennai": Pull a Rabbit from a Hat Scenario: "Chennai": Pull a Rabbit from a Hat Level: Hard Type: Fix Access: Paid Description: There is a RabbitMQ (RMQ) cluster defined in a docker-compose.yml file. Bring this system up and then run the producer.py script in such a way that is able to send messages to RMQ. In particular you have to send the message "hello-lwc". - RMQ is a queuing system: messages are put in the queue with a "producer" and they are taken out from the other side by a "consumer". The queue name has to be the same for both. - To send the message "hello-lwc": `python3 ~/producer.py hello-lwc`. Should return `Message sent to RabbitMQ`. "IncompatibleProtocolError" means RMQ is not working properly. - To test consuming it: `python3 ~/consumer.py`, this will retrieve the next message from the queue and print it. Once everything is working send more than one message so there's at least one in the queue when the validation runs. - Do not change the consumer.py and producer.py files; if you do the Check My Solution will fail. Test: `python3 ~/consumer.py` returns `hello-lwc` See /home/admin/agent/check.sh for the exact test. Time to Solve: 30 minutes.
24	"Florence": Database Migration Hell	Hard	30 m	Fix Pro	"Florence": Database Migration Hell Scenario: "Florence": Database Migration Hell Level: Hard Type: Fix Access: Paid Description: You are working as a DevOps Engineer in a company and another team member left the company and left the docker-compose.yml of a database-backed web application unfinished. Generally, the problem revolves around the database migration and docker compose. Additionally on front of the application there is an Nginx server and you need to fix the proper access to it as well. The source of code is in /home/admin/app Credit Kamil Błaż Test: `curl --cacert /etc/nginx/certs/sadserver.crt https://sadserver.local` returns a message containing "ready to serve requests" The "Check My Solution" button runs the script /home/admin/agent/check.sh, which you can see and execute Time to Solve: 30 minutes.

Troubleshooting Scenarios

Realistic-interviews

Scenarios that are more realistic and suited for job interviews

Troubleshooting Scenarios

Realistic-interviews

Scenarios that are more realistic and suited for job interviews

"Bilbao": Basic Kubernetes Problems

Linux Server Review - Guided Learning

"Manhattan": can't write data into database.

"Tokyo": can't serve web file

"Cape Town": Borked Nginx

"Salta": Docker container won't start.

"Melbourne": WSGI with Gunicorn

"Lisbon": etcd SSL cert troubles

"Kihei": Surely Not Another Disk Space Scenario

"Unimak Island": Fun with Mr Jason

"Ivujivik": Parlez-vous Français?

"Buenos Aires": Kubernetes Pod Crashing

"Tarifa": Between Two Seas

"Warsaw": Prometheus can't scrape the webserver

"Moyogalpa": Security Snag. The Trials of Mary and John

"Helsingør": The first walls of postgres physical replication

"Bekasi": Supervisor is still around

"Batumi": Troubleshoot "A" cannot connect to "B"

"Karakorum": WTFIT – What The Fun Is This?

"Hong-Kong": can't write data into database.

"Pokhara": SSH and other sshenanigans

"Belo-Horizonte": A Java Enigma

"Chennai": Pull a Rabbit from a Hat

"Florence": Database Migration Hell

Send Us Feedback or Get Notified