The annoying online course
With a sneaky meta-challenge
Introduction
Depending on your job and the industry your employer is in, you may need to take mandatory courses on topics like money laundering, organized crime, and data protection at regular intervals. Many of these courses are web applications that present information and then have a quiz where you need to achieve a certain score to pass.
In my case, my colleagues and I had to take a course about fraud, which was created by a fellow employee. The course was quite frustrating because there was no way to see which questions you got wrong if you failed the final test. After two failures, you couldn't try again and had to contact the trainer to unlock your course. This inconvenience sparked our curiosity and led us down an interesting path.
The AI Path
A colleague of mine, to which I will refer as

came up with the idea of utilizing a large language model to do the job.He copied all the information text into a markdown document and then gave the test questions to the LLM along with the text. This method worked very well, resulting in a success rate of 70% to 80%, which was enough to pass the test.
The downside was that extracting the data was a bit tedious because the course wasn't just a large text document. It included additional elements, like text cards that needed to be clicked to view the content. We considered using an automation tool like Playwright to capture the text but decided not to spend more time on it at that point.
The Reversing Path
In a second approach, I looked into how the testing application actually works. I knew that the courses could be created by non-technical people, so there should be some artifact defining the course contents, not just plain HTML/JavaScript. The LMS we use is called SABA Cloud. This was once a standalone company that was eventually acquired by Cornerstone on Demand, a vendor with a wide range of software products in the HR space. Interestingly, I used to work for another vendor in that space, which was also acquired by Cornerstone on Demand.
To understand what artifact the course application consists of, I used the browser developer tools to inspect the network traffic. There are about two hundred requests just when opening the course. There is a lot of JavaScript, CSS, HTML, images, and fonts. The application opens a nested frame set, and the JavaScript seems to be a mix of different eras, like plain JavaScript and WebPack-packaged Angular compiled from TypeScript. This appears to be an application that has grown over a long time, and no one ever wanted to invest in a rewrite.
I did not find any actual course text in the network traffic. I expected to see some request returning the course text as a JSON object. However, what I did find was this:

There is a massive Base64 string that starts with eyJ. If you have worked with JWT before, this might seem familiar because it represents {. I copied the string into CyberChef and used From Base64 and JSON Beautify. This gave me a document containing all the course text, the questionnaire, and all the answers!
The basic structure of the document is as follows:
{
"course": {
"lessons": [
{
"title": "Quiz",
"items": [
{
"type": "MULTIPLE_RESPONSE",
"title": "Question",
"answers": [
{
"id": "1",
"title": "Answer #1",
},
{
"id": "2",
"title": "Answer #2",
},
{
"id": "3",
"title": "Answer #2",
}
],
"corrects": [
"1",
"2",
],
}
]
}
]
}
}
So, a course consists of lessons, which define items. The item type indicates whether it’s just text, a multiple choice question, a single choice question, or something else.
In this test, there are only two types of questions:
MULTIPLE_CHOICE
Only one answer is correct.MULTIPLE_RESPONSE
Multiple answers may be correct.
There are probably more types of questions available (I think I remember some drag & drop actions from previous tests).
The Tool
With this information, it’s quite easy to extract the correct answers from the course. I created a small Python script that does the job and creates a markdown document with the correct answers. Make sure to have Beautiful Soup installed to run it.
pip install bs4
import json
from bs4 import BeautifulSoup
def get_questions(lesson):
result = []
for item in lesson["items"]:
if item["type"] == "MULTIPLE_RESPONSE" or item["type"] == "MULTIPLE_CHOICE":
result.append(item)
return result
with open("course.json", "r", encoding="utf8") as course_file:
with open("solution.md", "wt", encoding="utf8") as solution:
solution.write("# Solution\n\n")
course_json = json.load(course_file)
for lesson in course_json["course"]["lessons"]:
questions = get_questions(lesson)
if len(questions) == 0:
continue
for question in questions:
soup = BeautifulSoup(question["title"], features="html.parser")
solution.write(f"## {soup.get_text()}\n\n")
for answer in question["answers"]:
solution.write("- ")
if (
question["type"] == "MULTIPLE_RESPONSE"
and answer["id"] in question["corrects"]
) or (
question["type"] == "MULTIPLE_CHOICE"
and answer["id"] == question["correct"]
):
solution.write("✅")
else:
solution.write("❌️")
soup2 = BeautifulSoup(answer["title"], features="html.parser")
solution.write(f"{soup2.get_text()}\n")
solution.write("\n")
My Takeaways
This was a one-time test with just one course. I'm not claiming it works for every course, but it doesn't seem specific to this course alone.
Attaching the test results to the client payload is a poor choice. In my opinion, Base64 encoding was used to hide the course contents, indicating the author knew it was a bad idea. The right solution would have been to include the course description without solutions and evaluate it on the server.
I hope you found this insight somewhat interesting. I'm not trying to criticize that specific vendor. I've seen both better and worse in software development.
Update
I had the chance to check two additional courses. It seems that courses handle the questionnaire in completely different ways. One of the courses also returned the results to the browser, but this was done openly as an XHR request to a resource called response.json. I couldn't identify the mechanism for the other course.