top of page


Updated: 5 days ago


This telecommunications company wanted to grow their customer base and increase their profitability via customer segmentation analysis. However, there were two issues that required solutions before they could proceed:

  1.  The data required for this analysis was stored in several cloud-based and on-premise applications utilizing different databases.

  2. The data sources did not have standardized fields to match observations across the applications.


To solve their first issue, a dynamically scalable and expandable data lake architecture was deployed in a modern DevOps environment using Azure Storage and Snowflake to handle current and future data consumption over the next 3-5 years. Data import and replication tools were developed in Python to refresh the data from eight of their applications on a daily basis. In addition, a CI/CD pipeline through GitHub was implemented to test and push changes automatically to an Azure Function App.


Their second issue was resolved using a combination of regular expressions (regexp) and fuzzy string matching algorithms written in SQL and Python to identify records across the data sources that could be matched. This initiative also uncovered several data quality issues in the systems that needed to be fixed prior to analysis.


The telecommunications company successfully developed and implemented a customer segmentation scoring model to better understand how to increase current customer engagement and surface promising or untapped business opportunities.





bottom of page