DeepProfiling using Clickstream Logs

Neeraj Agarwal
Algoscale
Published in
3 min readJan 29, 2019

--

The advertising industry has drastically changed over time with the innovation of new technology and platforms. Online purchasing has evidently become more powerful these days. Advertisers need to be meticulous in targeting the right audience with optimal effectiveness and creativity.

Marketers intend to invest a lot in advertising about their business online but, they do not receive the expected benefits. Why so? Let’s take a simple example here. Say you have a business of selling baseball bats. So, the right audience to your ad would be males of the age 10–35 and the right place would be to advertise it somewhere where the context is about sports. Now, what if someone searches about bats (animal) and your ad shows up? Or it gets displayed on some website related to cosmetics. Doesn’t make sense at all, right? All your money and efforts go in vain. Therefore, acquiring and utilizing your digital data in the right way has become very vital to know who your high valued customers are. Researches have shown that digital targeting significantly improves the response to advertisements.

These are some of the points that should be taken care of for online advertising:

  1. Ads should contain relevant topics and context related to the business.
  2. Tracking the Real-time data of your interested customers.
  3. Targeting the right consumers by their demographics.

One of our clients wanted to utilize their clickstream stream data to suggest advertisers on their network the right channels to drive their customers. Also, they wanted to track which ads attracted more impressions and clicks so that they could plan and allocate media budgets accordingly.

Our team vigorously crawled colossal figure of websites to extract content for training the model. The crawled data had tonnes of unusable text that surrounded the real content of the web pages. The first job was to clean up the junk data and extract the core essence of the page. The compiled content was then tagged which were falling into one of the 32 IAB categories and further 200+ subcategories. A classification model was built using Spark using the tagged data and put in production.

Then the real-time clickstream data was captured by our trackers placed on the advertiser’s website. Kafka was used for live streaming of the data. Then, user profiling was done based on the semantics of the content which was captured through word embeddings with the above classifier. User profiling was performed considering various data points like age, gender, Named Entity Recognition (NER), location, purchase intent, vertical preferences, etc. to segment and target customers on various parameters. Translation APIs were used to convert the content from multiple languages to English. Finally, the Kafka data augmented with the derived data points were then pushed into Cassandra to target these users and provide insights to the advertisers.

We helped the client to come across the insights wherein they could provide data-driven reports to the advertisers to invest efficiently in ads and attract the right traffic.

--

--

Neeraj Agarwal
Algoscale

Data Science | Big-Data | Product Engineering @ Algoscale