Microsoft has released a Kusto Query Language(KQL) challenge for us to solve. It consists of five different challenges.
Lets take a look at the first challenge – “The rarest book is missing!”
One thing I’ve learned solving these cases is to carefully read the challenge description. A few clues that sticks out:
- Rarest book is missing – De Revolutionibus Magnis Data
- Published the year of – 1613
- The author is – Gustav Kustov
- RFID tags are useless as the one attached to our book is missing
- Weight of books and shelves are recorded
Objective – find the self the book is located on
Let us start by looking at the missing book itself. By running this query we’re able to get results about the book
Books | where book_title == "De Revolutionibus Magnis Data"
The query outputs the book we need with some attributes; rf_id, book_title, publish_date, author, language, number_of_pages and weight_gram. If you notice we have the exact weight of the book in grams, this may become useful.
As we’re trying to locate which shelf the book is placed on, we need to take a look at what the shelves data looks like. To get insights into what kind of information a shelf has, run the following query
Shelves | top 100 by shelf
We’re seeing a unique shelf number, rf_ids of the books located on the shelf and total weight of the shelf with books. My first thought after running these queries was to try locate the book based on rf_id. Can it be that easy?
Shelves | mv-expand rf_id = rf_ids to typeof(string) | lookup Books on rf_id | where rf_id == "0cbc13e0aa7d487e8e797d3de3823161"
Unfortunately not. Remember, the book is located on one of the shelves, but are missing the rf_id tag. So we’re unable to locate the self the book is located on by rf_id. So lets investigate the weight aspect. We have the total weight of each book as well as each self. Lets construct a query together –
We need to query the table Shelves. Our shelves rf_ids can consist of more than one book, so we need to convert it to a string. We then need to look up the book based on rf_id in order to sum the actual weight of the book in our shelves. Calculate the difference of shelves weight and aggrigated book weight. Finally filter on which shelf that has approximately our missing weight. My query is constructed as follows:
Shelves | mv-expand rf_id = rf_ids to typeof(string) | lookup Books on rf_id | summarize sum(weight_gram) by shelf, total_weight | project shelf, sum_weight_gram, margin = sum_weight_gram - total_weight | where margin < - 1000
This gives us the correct result! But why is that so? Let us deconstruct it!
Shelves – the table we’re querying
mv-expand – expanding the multi-value array into single records of type string
loopup – finds books based on the rf_ids in each shelf
summarize – produces a table that aggregates total weight by shelf as well as summarises actual total weight of the shelf measured by each book on the shelf
project – project and calculates columns. Projects shelf, actual weight of shelf and the margin between theoretical weight and actual weight.
where – spits out all records where the margin is less than – 1000 grams.
Our result consists of one shelf; 4242 where the margin is -1798 grams. This is the shelf the book is located! Congratulations, we’ve solved the challenge!