Skip to main content

SQL Murder Mystery - Knight Lab

 SQL Murder mystery is a project by Knight Lab team. Which focusses on using SQL Queries to solve a case which involves in finding the killer behind a murder crime.

This blog walks through the approach i did in order to find the killer behind this murder.




So this is the Schema Diagram of the given tables, which we should use to figure out the killer.

  • get_fit_now_check_in - this table gives information about checkin details of each member of the gym
  • get_fit_now_member - this table contains information about the general details of a member of the gym
  • person - this table contains information about the address details of the person
  • drivers_license - this table contains information about the vehicle used by the person
  • interview - this table contains the information regarding what the person revealed during the interview with the detectives
  • facebook_event_checkin - this table contains information about the events the person attended and uploaded about on facebook
  • crime_scene_report - this table provides overall information about the crime
  • income - this table gives income of each person
  • solution - this table contains the answer using which we should cross-check later

The Given clue was - The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan 15 2018 and that it took place in SQL city.

Using the clues that is given, we know the details about place,crime and date. Hence on querying the crime_scene_report table,

we get details about 2 witnesses. Let's split further approach into 2 witnesses.

                                 WITNESS - 1

  1. He lives in last house
  2. Lives at an area called "Northwestern Dr"



As we can get name and id and other details of this witness, we next can query and see what his interview was.

  1. The membership number on the bag started with "48Z" (only gold members)
  2. The man got into a car with a plate that included "H42W"
  3. Also that man had a "Get fit now" gym bag
        So we can conclude that killer is a "gold" member of get fit now gym.

                   WITNESS - 2

  1. Name is Annabel
  2. Lives at Franklin Avenue

      On querying and finding what she had said in her interview, we find that

  1. The killer was from her gym
  2. She saw him during her time at gym on Jan 9 th
    So , by now we know that the killer was at the gym at the same time as the witness.

 

Using above query, we found out the check_in and check_out time of Annabel on Jan 9, 2018.

Now we need to find out the names of people who were inside the gym during that same time.


We have narrowed it down to 2 Suspects from Annabel witness, hence on examining the evidence from witness 1 and querying we get


 Thus we find out that the killer is Jeremy Bowers,  which we now have to cross-check with the solution.


The killer has been figured out, but here is a twist, the result is prompting us to read the interview of the killer.


Upon querying and reading his interview, we get to know that he was hired by another person. The hints that he revealed are,
  1. Rich woman
  2. Red hair
  3. Height around 5'5 to 5'7
  4. Drives Tesla Model S
  5. Attended SQL Symphony concert 3 times in Dec 2007
Using these we should further investigate and find the woman.


The name of the woman is "Miranda Priestly".


On cross-checking with the solution,The answer was found to be Correct. 
And Miranda Priestly was the mastermind behind this Crime.







Comments

Popular posts from this blog

Cdf, Pdf and P - Value Explained

The probability density function ( pdf ) and cumulative distribution function ( cdf ) are two of the most important statistical functions in reliability and are very closely related. When these functions are known, almost any other reliability measure of interest can be derived or obtained. Figure 1 Figure 2 PDF - Probability Density Function or density  of a continuous random variable , is a function  whose value at any given sample (or point) in the sample space  (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample In a more precise sense , the PDF is used to specify the probability of a random variable falling within a particular range of values , as opposed to taking one value.   CDF -  cumulative distribution function  ( CDF ) of a real-valued random variable   X, or just  distribution function  of  X, evaluated at  x, is the probability  that  X will take

Sigmoid Function Simplified

                             Sigmoid function Equation Definition  Technical Wikipedia Definition :  A sigmoid function is a bounded , differentiable , real function that is defined for all real input values and has a non-negative derivative at each point  and exactly one inflection point. Simple Definition - It is basically a 'S' curve or activator function which is commonly used in Machine Learning, it is used to bound/transform/squash a wide range of values within [0,1]. Basic illustration of how Sigmoid Function works, Defining the Input - Array containing 99999 equally spaced numbers between -10,10 Defining the Sigmoid Function -  Plot of the transformed values -   Graph between input and transformed output                                              Distplot of the Transformed values As we can see from the above plots the values are bounded between 0,1 for any given value. Plot if it had been e^x instead of e^-x : Now going back to the Technical Definition, A sigmoid