Google Code Jam 2021- Cheating Detection (Qualification Round Question #5)

Hashem Alsaket
5 min readMar 29, 2021

Cheating Detection

Over the 50 coding contests I’ve competed in through the past year, this question became my favorite for many reasons:

Cheating Detection. Google Code Jam- Qualification Round 2021.

In my experience, it is rare to come across a coding question in contest with so many qualities that show up in many real world problems (some of which require approximated solutions several times per second). Fraud detection, anomaly detection and signal processing, to name a few, all share many qualities with this problem. A person who regularly displays unexpected behavior among thousands goes unnoticed until the behavior is measured against many others’ behaviors. One of many trading algorithms sending trades several times per second abruptly stops sending trades every third second of the 12 o’clock hour. These are real world examples where qualities found in this problem directly apply.

In addition to applicability, another rare sighting in this problem is a required understanding of algorithms, data structures, probability and statistics to approximate a solution.

Part 1: Simulating The Problem

First, we need to identify the goal set for two separate approximations as points are awarded based on the strength of the model:

A. The ability to detect the cheater in ≥10% of all cases

B. The ability to detect the cheater in ≥86% of all cases

The python simulation below works like this:

  • Store the 100 test takers’ ids in a dictionary
  • For each of the 100 simulations, do the following:
  • 1. Identify the test takers’ skill levels with a random uniform distribution between [-3, 3]
  • 2. Identify the question difficulty for each of the 10,000 questions using a random uniform distribution between [-3, 3]
  • 3. For each test taker, use the sigmoid function to compute the probability that each test taker answered a given question correctly considering the test taker’s skill level and the question’s difficulty then use the probability to identify correct answers
  • 4. If the id is 59 (the cheater we are using for our simulation), iterate through each question once more and mark a random 50% (the coin flip from the prompt) of the questions as correct- if the cheater already answered the question correctly before cheating and the random draw was going to mark that question as correct, the remaining questions are still unaffected by that coin toss (coin tosses are independent events)
  • 5. Find the number of correct answers for each test taker for each run
import numpy as np
import pandas as pd
import random

d_test_takers = {i: [] for i in range(100)}

for run in range(100):
# skill levels with uniform
# distribution for 100 test takers
skill_levels = np.random.uniform(-3, 3, 100)
# question difficulty with uniform
# distribution for 10,000 questions
question_difficulty = np.random.uniform(-3, 3, 10000)

# iterate through each test taker
for i in range(len(skill_levels)):
# the sigmoid function is used to
# compute probability of correct answer based
# on skill level and question difficulty
probs = [(1 / (1 + np.exp(-(skill_levels[i] - question_difficulty[j])))) for j in range(10000)]
# collect answers
qs = []
for k in range(10000):
# get answer based on probability correct from line 20
qs.append(random.choices(population=[0, 1], weights=[1-probs[k], probs[k]])[0])
# if i is 59 (the cheater's id),
# cheat 50% of the time
if i == 59:
for j in range(10000):
if qs[j] == 0:
qs[j] = random.choices(population=[0, 1], weights=[0.5, 0.5])[0]
# get number of correct answers
correct_answers = sum(qs)
# append to respective id for current run
d_test_takers[i].append(correct_answers)

Part 2: Analysis And Rudimentary Approximation

At this point, a few things come to mind:

  • What is the cheater’s expected ranking for any given game?
  • What are the lowest and highest possible rankings this cheater can achieve?
  • How different is the cheater’s behavior from the rest of the test takers?
  • Is standard deviation sufficient?
  • Do we need more powerful statistical testing tools that leverage Benford’s Law, t-test, Chi-squared test…machine learning? (desperation sets in as time winds down in these low time contests- no we do not need machine learning although I did consider it as time winded down lol)

Before we dig into some approaches, let’s take note of something immediately useful. For a typical test, the cheater is expected to correctly answer 50% of questions not answered correctly using the form of cheating identified in the prompt. So, without answering a single question correctly left to own ability, the cheater will still answer 50% of the questions correctly using the cheating tactic. The expectation is as follows:

After 100 simulations, the cheater receives the highest score 19% of the time. So before we move forward with a more intense analysis, we have correctly identified the cheater in at least 10% of cases.

df = pd.DataFrame(d_test_takers)
# how many times does the cheater have the highest score?
# after 100 simulations of 100 test takers answering
# 10,000 questions, the cheater receives the highest
# score 19% of the time
print([x for x in df.idxmax(axis="columns")].count(59))

Part 3: Approaches, Thought Process, Results, Overkill

This section will describe, at a high level, some approaches and their outcomes.

Building a case through many tests:

  • Kolmogorov-Smirnov test for goodness of fit: not enough evidence as too many other test takers yield very low p-values
  • Chi-squared test: expected frequency of cheater is not given so this test cannot be used
  • Benford’s Law: cheater’s distribution of first digit of probabilities is too close to too many other test takers’ to yield any usable results
  • Simulate the cheater’s performance across the 10,000 questions several thousand times, collect the max score for each simulation- from the first approximation, we know the cheater will yield the max score most frequently. This worked and can catch the cheater >99% of the time!…but it takes too long and Time Limit Exceeded will be returned, resulting in no points for the second batch of points

What Works

As with many other problems in life (although especially when it comes to timed competition), fatigue sets in and the lens through which we look at a problem becomes distorted until after the fact with some rest- when it’s a bit too late. Through all these complicated tests, the more powerful (still less powerful than the several thousand simulation in the 4th bullet point described above) secondary approximation can be built on a closely related thought process. Rather than take the maximum number of questions answered correctly, we can take the maximum standard deviation relative to skill level and question difficulty to achieve ≥86% correct identification rate. That’s it.

Me

As always, thank you for reading and I hope you learned something useful! Feel free to check out my GitHub and connect with me on LinkedIn. Thank you for reading!

--

--