I think it's a fair question, isn't it? Out in Hypeland, we're showcasing GPT-4 passing the bar exam and getting admitted to med school. However, we're stuck with Siri and Alexa in the real world. We are supposed to be on the eve of AI replacing us all, but at the same time, I can't even make dinner reservations on my phone without repeating myself half a dozen times.
If I try to get Alexa to turn the lights on, it fails to understand me roughly 20% of the time. Forget about asking Alexa to do anything with the TV on in the background. I don't think it's unreasonable to expect Alexa and Siri to work better than they do today. Siri is over 11 years old, and Alexa turns 9 later this year. There's no way we should still struggle with basic functionality at this stage in those products' evolution.
Siri is so bad that Apple engineers want to use something else in their AR headsets. That's not a vote of no confidence from someone outside the company. That's Apple's own people. When data scientists are forced to eat our own dog food, we find its taste lacking. Yet we continue to ship substandard products to customers.
The United States Immigration Service delivered a mobile app to help asylum seekers with the application process. It uses facial recognition to confirm that the person using the app is who they say they are. Unfortunately, the facial recognition model doesn't work very well on darker-skinned individuals, who are the majority of asylum seekers. According to John Oliver's segment on Last Week Tonight, some people have resorted to using high-powered lights to get the app to recognize their faces.
It's 2023. We fixed this problem back in 2016. Asylum seekers are using facial recognition to unlock the iPhones that the US Immigration app runs on. How is anyone still deploying facial recognition systems that haven't been adequately trained on a representative data set? As data science becomes a mature field, we can't keep embarrassing ourselves this way.
Do you want to build better data and AI products? My instructor-led courses will help.
1) Data and AI Strategist Certification
2) Data and AI Product Management
It’s your last chance to reserve your seat and early pricing. Subscribers get 15% off the early pricing but only for a limited time.
The More You Look, The Worse It Gets
Canva has a text-to-image app. I typed in "Intelligent assistant sitting behind a desk in an office building," and this is the best of 4 terrible options. The AI product obviously doesn't work well enough to ship. It's cool for a controlled demo proving the technology is progressing, but it is not ready for customers. However, listening to people talk about it on social media, you'd think it was nothing short of amazing.
It should be professionally embarrassing, but data and AI product failure has been normalized. Tesla advertised autonomous cars. There were caveats, but these vehicles were supposed to be reliable on the road. They absolutely aren't, and the autopilot should never have been so hyped. It was oversold, and people got into serious accidents as a result.
With any other type of product, that lack of reliability would have led to an uproar of outrage from customers. It's indicative of how low the bar is set for AI products that we haven't seen one yet. The response to this point has been to blame the people who trusted autopilot instead of holding Tesla accountable for being more transparent. Let that sink in. We blame people for being silly enough to trust a machine learning-based product.
AI products' dubious track records span all the way back to IBM's Watson. It may have won Jeopardy, but it was kicked out of the medical community. Doctors used it for a short time, and even though it was free, they still gave it back. That should have been the end of the road for Watson, but it's making another appearance. SAP recently signed a partnership agreement with IBM to integrate Watson as SAP's AI engine.
I'm headed to IBM Think next week to learn about the latest rollout of Watson's AI capabilities. I expect this time, it will be far more capable and successful. IBM's come a long way, and SAP wouldn't partner with them if Watson hadn't made significant improvements. Still, they would have changed the name in any other field because the stigma associated with earlier failures would have been too much.
The bar isn't just low, it's nonexistent. Data and AI products have gotten a pass on quality for quite some time, and I have a feeling that's ending. As a field, we must tackle the root causes of these failures and build solutions that meet the marketplace's expectations. We must figure out how to create finished products, not flimsy prototypes.
We may be amazing at math and manipulating data, but we are absolute trash at delivering products that work. Our track record must change if we are going to achieve any level of traction in the business world or customer ecosystems. The stakes were pretty low when no one knew they were using AI. ChatGPT changed all of that. The eyes of CEOs, boards of directors, and investors are on AI products. Customers are actually using AI.
With expectations comes increased scrutiny. When people realize how much AI is really out there and how badly it works, there will be a backlash. We have a short window to fix this, and I have some…recommendations.
Serving Recommendations
This image is the latest in a long, sad line of recommendations that I've been served by one of the top AI companies on Earth. Infinite compute power and talent are at Amazon's fingertips. Is this really the best they can do? They recommended my own book to me.
Most recommendation e-mails I get from Amazon, and every other retailer, contain products I've frequently bought or looked at in the past. Recommendation systems are ubiquitous in the data science world. It's one of the most common products that we build. As you can see from the Amazon example, they perform terribly in the real world.
Take a step back and ask yourself, what's the point or business objective of this e-mail? They're trying to encourage me to buy something. Beyond that, the goal is to get me to buy something I wouldn't have without the e-mail. To do this well, we must have some grasp on the causal relationships between interventions and customer actions.
Cheap recommender systems are a cop-out and don't address the business need. All the system is doing is parroting anything I've looked at more than once and not bought. Poorly performing data and AI products are so common because we don't dive into the actual problem we're trying to solve.
Reminding me of a product I have looked at isn't going to accomplish the business goal, and I'm sure Amazon knows that. If the sale is going to happen, reminding me of the item will only succeed in pulling the sale forward in time. Will the recommendation keep me from buying this book from another seller? Probably not, since the price across sites is the same, and Amazon must have data on that too.
I've bought dozens of books from Amazon over the last 8 years. The company should have my buying patterns nailed down by now. I typically look at a book once and buy it. I come to Amazon already knowing which book I want to purchase. Examining my workflow with respect to buying books should reveal that this particular e-mail would have no effect. If I were going to buy the book, I would have.
I'm covering this case in depth to make it clear that this is not a complex problem to solve, nor does Amazon lack the data to tackle it. This isn't a technology problem or a lack of talent. This e-mail and all the other examples indicate a disconnect between the need and the product that gets delivered. The deeper problem is most data products check a box instead of solving the problem, and no one is looking at the business impacts.
As I said earlier, that's changing. C-level leaders track initiatives when their expectations are set. Believe me, their expectations are set, and attention is on us. AI has been a side project until very recently. Now, data and AI products are in the spotlight. Failures will be more publicized than data scientists are used to, which will bring consequences.
I Recommend A Better Approach
Several retailers have brought me in to build recommendation systems like Amazon's. In most cases, the best solution is not to build the recommendation system in the first place. It's the wrong tool for this business problem, but it sounds like a great idea when you don't fully understand the business objective.
I start initiatives by clarifying the problem space. We should start designing this initiative by asking what's the best way to get customers to buy more items. I help clients realize they have a visibility problem. The business doesn't understand its customers well enough to serve high-quality recommendations.
Instead of a recommendation system, the business really needs a picture of what customers are buying from competitors. There is a segment of customers buying items from competitors that they could buy from the business. When you evaluate the problem this way, it changes how we think about a solution. The first data product the company needs is something to identify customer segments with the greatest opportunity for new business.
It no longer makes sense to show customers a recommendation from their browsing history. The business needs to find customers that shop with competitors and give them a reason to switch. In Amazon's case, I don't buy books from its competitors. This e-mail would have been more valuable if it targeted something I buy from their competitors.
Electronics is one example of an item I do not buy from Amazon. I typically get those from Best Buy. Does Amazon want that business? Probably not. With shipping and already low margins, that's not the right product category to target. Again, a data product can help clarify the business need and define the types of products Amazon wants to transfer away from competitors.
Reliability Requirements Are Another Missing Piece
Once the first series of data products are delivered to clarify the problem space, I work with clients to define how well a solution must perform to deliver value. Models work with varying degrees of reliability. If the solution works 3% of the time, will that justify the costs? Is there a higher-value initiative the business should consider instead?
I connect business metrics to model metrics at this phase. Reliability requirements should be defined to show how model performance impacts business outcomes. With Canva's text-to-image app, "Blue Cat" returns a high-quality result, and "Blue Person" looks like a terrible idea for several reasons.
The business must decide to release or hold off on this AI product. The first question I ask touches on revenue. How much more are people paying with this app than without it? Right now, the answer is 0, which makes the decision a whole lot simpler. The feature isn't polished enough and could do more harm than good. Let's wait until it's ready and there is a path to monetization.
"But Adobe has a generative AI image creator, so we must too!" A statement like this one is driving a lot of AI product development. I advise clients to build a data product that validates or refutes that statement. C-level leaders should need more than FOMO to greenlight a data or AI product.
The number of customers that will leave Canva for Adobe if there isn't a generative AI image app should be quantified. Then reliability requirements must be defined. How well does the text-to-image need to work so those customers stay with Canva? I'm pretty sure that the current reliability levels are not cutting it.
The higher the reliability requirements, the more expensive the initiative becomes. If Canva isn't generating new revenue from text-to-image functionality, this initiative will likely cost more than the revenue it preserves. Just because it could be built doesn't mean it should be built.
In the next year, we will see a lot of data and AI products rushed to market without this assessment done. They won't meet customer reliability requirements and will tank. Businesses will lose trust if our field racks up too many public failures. The future of data and AI will be decided by products. We need frameworks to support delivering quality and evaluating what we should work on. Without them, this moment in the spotlight will be brief and end badly.
> How well does the text-to-image need to work so those customers SAY with Canva
It's a typo I guess