WhatsApp Data Analysis

Paul Nnakwe
3 min readAug 19, 2022

I collected data from a WhatsApp group I belonged to and analyzed it to identify patterns and individual activities.

To export the chat, I went to group chat, clicked on the More option(…) at the header, and clicked on ‘More’ then ‘Export Chat’.

Here’s a link to guide you:

The chat is exported as a text file.
This was my first time analyzing a text file and I encountered some great surprises.

Data Cleaning and Transformation

Here is how the data looks like when imported into PowerBI

You can see the format: Date, Time — Username: — Message

This presents a direction to split the data into different columns.

Here’s where it gets more challenging. Broadcast messages which are very long eventually overflow into the rows and columns below the initial column.

I noticed this pattern applied to the broadcasts alone.
To resolve this I simply removed the rows containing the error on the date column as they were spillovers of broadcast messages (There are more efficient ways to go about this).

I further proceeded to clean the data using filtering, extraction, adding new columns for date and time, trimming, etc.

Codes

I also noticed a pattern skimming through the data

I could see <Media Ommitted> wherever a media message was supposed to be.
A added B: where a new user is introduced

A Left : where a user exits

These are records of individual actions excluding sending messages.
I had to create a list to codify these actions to bring out more insights.

“ADDED” Data

I created a new query to show the invitation data.
In the screenshot, I have isolated the data into formats where we can see who invited the other user and the DateTime of this particular addition.

I transformed it into the image below

Code Tables

I added a user table. Since I wanted to anonymize the data, I created a table with all the users and assigned the code “USER N” to them.

Finally, a date table was added to the queries. This will be useful for any time intelligence analysis.

Dashboard

I wanted to show the following:

  • Number of messages sent
  • Media shared
  • Most active participants by time etc.

My audience was majorly the group members. I wanted them to be able to filter the data to see their personal performance. I created a web and mobile view to support their devices.

Here’s what the dashboard looks like:

Click the link below to see the interactive dashboard:

--

--

Paul Nnakwe

Social Media Analyst, Researcher and Content Strategist. Content 🤝 Data 🤝 Trends