As a follow up to my article about the most important step I want to talk more about why good business understanding is the most crucial part in developing a solution.
Clarifying the business needs has some immediate advantages:
- It guarantees that we answer the right question
- It makes it easier to present the solutions in the same language that business people speak and understand well
- It ultimately makes it easier to get stakeholder buy-in
The success of Data Science is coupled with how much stakeholder buy-in exists for Data-related solutions. They often cannot tell the difference between a good and a bad technical implementation of a model, but they can tell the difference between good and a bad business understanding of the presenter. It’s easier for them to trust your results, if they feel like you understand the business.
- Lincoln, who would've been a very slow lumberjack or developer
Even from a technical perspective, it is important that all goals are aligned (algorithm <-> business question <-> KPI of departement/how success of the departement is measured).
Understanding the business has a big impact on the implementation:
- Stating success criteria: When is the solution good enough, how does it translate to business value, comparison to a benchmark possible?
- Defining guard rail metrics - metrics that should not be effected negatively while testing/using the model
- Loss function: Business Impact of False Positives vs False Negatives, how much deviation is bad, what general metrics fit the business intuition the best?
- Whatever metric we focus on, will the focus on this metric lead to unwanted incentives? (Goodhart’s Law)
- Human-in-the-loop: How can human interaction overwrite the model if necessary? Is it even necessary?
- Look at available data: Which data do I need for it? (Business knowledge useful/ business person can help)
- Which data is available and are there any special things to consider? (Data Anaylsts know a lot about data, data engineers know about potential quirks in etl-pipeline
If all of this is handled well, you can in theory create almost the entire final presentation before having trained any model or written any line of code! A general outline usually looks like this:
- State business problem
- How are we trying to solve it? (Mapping business problems to quantifiable statistics / KPI for model/analysis, how a model will be deployed to solve it)
- Assumptions we made along the way (different kinds of loss in classification, data assumptions, missing important variables etc.)
- Solution, showing metrics and KPIs in pretty graphs