Find duplicate values in a json-file

Problem

I was implementing a product-api that returned json-data with product-data. In a meeting the question came up if there are any duplicate data in it. There were too much data to go manually search through the result. I wanted to find duplicate productId in the resuslt.

Idea

I could store the result in a json-file on my computer and search through it via terminal. The terminal brings speed advantages and I knew already that there are good tools for this task.

Solution with example

Grep and awk are the tools to use for this task.

With grep I instruct the terminal to search for the json-data field I would like to look for.

The data are stored in a file name `blog_test_data.json’ in the current directory. Point your terminal to this file. The file content looks like this:

 {
    "productId": "456",
    "productName": "Produkt B",
    "price": 29.99,
    "category": "Haushalt",
    "description": "Praktisches Gerät für den Haushalt.",
    "stock": 100,
    "rating": 4.7,
    "manufacturer": "Hersteller Y",
    "releaseDate": "2023-03-15",
    "color": "Blau"
  },

Just a grep with a regex:

grep -o '"productId": "[^"]*"' blog_test_data.json                                                                                                                                                                                                                      [26-01-17 9:50]

"productId": "123"
"productId": "456"
"productId": "789"
"productId": "123"
"productId": "456"
"productId": "112"

Returns all productIds. As we can see there are duplicates but we want to only get the duplicates sorted by id value.

With pipes and awk the working code looks like this

grep -o '"productId": "[^"]*"' blog_test_data.json | awk -F: '{print $2}' | tr -d '"' | sort | uniq -d

123
456

This results in a list of duplicate `productId` sorted by value. It can be handy in case you are sorting text values.

With these Id’s you know the duplicates now.