It’s Hard To Tell The Difference Between Mondays
Thank you all for subscribing and for your feedback this week. The new posts with external content have gotten positive reviews, so they will become a permanent part of the newsletter. I will experiment with a few formats to find the one you get the most value from. Please keep the comments and emails coming so I can quickly land on the best design.
This is another ‘watch your models for drift’ moment. Macroeconomic factors are shifting rapidly, and you should see the impacts in your datasets.
Some inflationary forces are subsiding while the cost of oil is rising. Those changes will impact pricing and supply chain datasets the most. The danger here is the uneven nature of those impacts. The total inflation picture will probably be stable as these changes offset each other. However, the individual business impacts will be positive for some and negative for others.
Consumer spending patterns are shifting. The move from products to services will continue. Those impact customer lifetime value, segmentation, demand forecasting, and product mix. Consumers are also spending less across the board, and the impacts are just starting to enter datasets. The initial impacts will be obvious but watch for network effects.
Two banks are experiencing the consequences of questionable financial decisions. Credit Suisse and Deutsche Bank are both issuing some negative guidance. The financial experts are worried about contagion, meaning these banks’ problems could spread. They are so large and touch so much of the financial world that their problems quickly become other financial institutions’ problems.
The network effects from pension funds and potential bailouts are always significant. In the 2008 financial crisis, impacts on lending were the most pronounced. I don’t believe this is a 2008-level event, but we can expect a tighter credit market which means lower levels of liquidity going to consumers and businesses.
Data science’s value centers on highly reliable models, but that’s poorly understood. How do I know what the most likely impacts of macroeconomic factors are? I understand the assumptions baked into models. Whenever I see a change in data that breaks common model assumptions, I know it’s worth calling out.
Input cost stability is a common assumption. Customer preferences are another. Input costs and customer preferences are emergent behaviors of complex systems. The globalized production and supply chain is a complex system prone to network effects. War in Ukraine can impact mining productivity in Mauritania.
Our models rarely extend to that depth. A supply chain model assumes stability in the systems that it interacts with. Monitoring can gather data from these systems and detect impacts before they make their way into the supply chain. There’s often advanced warning if we monitor our assumptions.
I just got off a call with a C-level leader at a data science R&D services company. After 10 minutes, I saw a knowledge gap that’s all too common. They didn’t know the difference between analytics and data science. The line between descriptive models and predictive, prescriptive, and diagnostic models does not exist for them.
That knowledge gap is present across the field. Descriptive models reveal patterns in datasets that are not obvious to people. The larger the dataset, the more insights the model can surface that people have not. Those insights are precious, as I explained in Saturday’s post. Basic data gathering improved my business in a matter of weeks.
The value locked in large datasets and gathering data about parts of the business that previously generated no data is massive. These projects can take a few weeks, cost less than $25K, and start returning value in less than 3 months. These gateway projects are generating most of the interest in data science.
The highest returns and competitive advantages are generated by predictive, prescriptive, and diagnostic models. When companies talk about scaling to handle more use cases, those are the models they need. To achieve the next maturity phase, businesses must understand the difference.
The challenge is perception. Descriptive models seem predictive. Insights and trends are assumed to be stable. Descriptive models have an excellent understanding of the data but a near 0 understanding of the systems that generate it.
The dataset is not complete enough to fully represent the systems involved in creating it. The assumptions baked into those models are unknown and cannot be monitored for stability. They fail in silence, but the belief in their accuracy persists.
Nike’s demand forecasting models are good examples. They have excess inventory because suppliers are back online after COVID shutdowns, and transit times have dropped. Their actual inventory is a sum of what is in their warehouses, in transit, and ordered but running late. Sellable inventory and total inventory are two different features that models must consider.
They had a large volume of shoes all come in simultaneously when they expected deliveries to be spread out. They ordered more to meet the seasonal holiday demand, compounding their problems. Consumer demand is in decline. When models fail to monitor their assumptions continuously, this is the result.
Nike used descriptive models to make forward-looking decisions. The result is a 10% drop in share price. They will be forced to put shoes on sale, which will damage their premium price point in the long run and drop margins next quarter.
I can’t fault Nike’s leadership or decision-makers. The assumptions baked into their demand forecasting models had been stable for so long that decision-makers trusted the models. No one advertised the risks and scenarios that would cause models to behave unpredictably. Basic experimental methods would have revealed the models’ structural weaknesses. Monitoring would have given some advanced warning.
There is no such thing as perfect. Apple recently told their suppliers to reduce the total iPhone 14 output. Apple ordered an increase to offset the same issues Nike was experiencing. However, they were far more measured with their production increase requests. Apple cut back sooner, too, because they realize their current inventory levels and original production output are sufficient to meet demand.
I expect Apple will still report excess iPhone 14 inventory, but not to the same extent as Nike. Their models were not omniscient. Apple knew about their models’ assumptions and saw those breaking. It’s hard to predict the future when complex systems are involved. It’s easier to see the present playing out and evaluate the larger implications when decision-makers know what to look for.
Maturity happens in phases. Early experiments discover assumptions because they are focused on model explainability. Data and model literacy training helps users to articulate their reliability requirements and use data with full knowledge of its limitations. Working together, data teams and users mitigate risks.