Does ChatGPT Discover Causal Relationships, And What Does That Mean For AI’s Future?
The last time I thought about getting a PhD, I asked a friend, “What is the point of getting a PhD?” He answered that he went to school to learn how to add to the body of knowledge in his field. For me, that’s a profound answer. That highest level of education aims to understand how to create or discover new knowledge.
That’s something people are uniquely capable of doing. We don’t simply discover or learn. We have the added ability to share and disseminate our knowledge with others. We contribute it to a body or a record that gets passed on and added to by each subsequent generation.
The collective body of knowledge we’ve assembled and put on the Internet is being used as training data for large language models like ChatGPT. With access to enough training data and compute power, these models will eventually learn everything we know and have published. What will they be able to do with it?
That’s the more important question. Models like ChatGPT and the soon-to-be-released GPT-4 are too expensive to simply be novelties. Microsoft must have some deeper purpose with revenue behind it. We’ve already seen Copilot. We’ve seen the capability of chat GPT to write convincingly human blog posts. It can generate songs and poetry. Is there really that much money in the auto-generated content category?
Microsoft must have grander aspirations. Some people have put GPT-4 forward as a Google killer or search engine replacement. That’s very likely, but I see something bigger. I think they are aiming for that uniquely human ability to add new knowledge.
Many research projects start with a review and analysis of data. Observational studies reveal patterns that others haven’t seen before. Buried in that massive amount of data that GPT-4 is currently being trained on are patterns we haven’t seen yet. There is causal knowledge inside all that text. Some of it is fairly obvious, like if I go outside into the rain, I’m likely to get wet. Other knowledge is more complex.
In my opinion, that’s Microsoft’s end game. A company like Microsoft isn’t putting this much money into an over-glorified chatbot. They’re chasing down the possibility of causal discovery from massive datasets. What can we learn from ChatGPT about how capable GPT-4 might be? Let’s ask it.
Simple, Physical World Cause And Effect
Prompt: If I fall into a swimming pool, will I get wet?
ChatGPT: Yes, if you fall into a swimming pool, you will most likely get wet. Water is a liquid, and when you fall into a pool, the water will come into contact with your body and your clothes, causing them to become wet. Depending on the depth of the pool and how you fall into it, you may also get splashed with water, which could also make you wet.
ChatGPT understands that a swimming pool has water and if I fall in, I will get wet. There’s a bit of dissonance in the second part of the answer. ChatGPT sees me as a participant and observer in the event. The participant falls into the pool and gets wet. The observer gets splashed and becomes wet. This is impossible unless ChatGPT knows something we don’t.
What’s happening here is a result of never having actually interacted with an environment. These kinds of mistakes are common for models that aren’t trained in the real world. They never learn an order of operations. The impossibility of being a participant and observer is only present if it’s specifically called out enough in the training data for the model to make that connection. There isn’t enough written about how someone who falls into a pool cannot be splashed by their own fall.
This is a critical limitation of models trained the way ChatGPT is. Based on this simple example, OpenAI hasn’t cracked this issue yet. The model can make high-quality cause-and-effect assessments, but there are cracks in how well it understands physical systems.
This question is prescriptive. I was very specific about the event and outcome of interest. These are the simplest causal relationships to surface and comprehend.
Prompt: I was walking near a street after it had rained and there were puddles everywhere. I was soaking wet by the time I got home. Why was I wet?
ChatGPT: If you were walking near a street after it had rained and you got wet, it is likely that the water from the puddles on the ground splashed onto your body and your clothes as you walked through them. When you walk through a puddle, the water is displaced and can splash up onto your body and clothes, causing you to get wet. Additionally, if the puddles were deep enough, your shoes and the lower part of your pants or skirts may have become submerged in the water, which would also contribute to you getting wet.
In this question, I only described the conditions and not the event. I asked ChatGPT to evaluate the most likely cause of my being wet, given some basic information. It jumped past a hurdle many previous versions failed at by catching the significance of the phrase ‘after it had rained.’ Less capable models wouldn’t get the context, so they would include rain as a cause for my getting wet.
It missed the implications of ‘near a street.’ A car driving by would be another good cause to include, and ChatGPT did not. The answer is good but incomplete.
This time, my question is diagnostic. I gave ChatGPT a scenario and an outcome. The model returned a causal diagnosis. It can seem like a tiny change, but few models have achieved it in the past across a large problem domain.
Prompt: An asteroid suddenly changed course after a close encounter with Jupiter. What force caused the asteroid to change course?
ChatGPT: An asteroid can change course after a close encounter with a planet like Jupiter due to the gravitational force of the planet. Gravitational force is a type of force that exists between two objects with mass, such as a planet and an asteroid. The strength of the gravitational force between two objects depends on the mass of the objects and the distance between them.
When an asteroid passes close to a planet like Jupiter, the gravitational force of the planet can cause the asteroid to change its course. The asteroid is attracted to the planet by the gravitational force, and this attraction can cause the asteroid’s path to bend or change direction. The amount of change in the asteroid’s course will depend on the strength of the gravitational force, the mass of the planet and the asteroid, and the distance between them.
It is important to note that other forces, such as the force of the sun’s gravity and the pressure of solar radiation, can also affect the course of an asteroid. In some cases, the combined effects of these forces can cause an asteroid to change its course significantly.
Here’s a deeper diagnostic causal connection. It’s evident to most of us, and ChatGPT catches on immediately. The complete response reveals some of the practical applications of causal knowledge. If I provided ChatGPT with enough data about the asteroid and its current course, the model might be able to return the new course to me. The causal knowledge allows me to talk with ChatGPT more like a person and with less of an understanding of what I’m asking.
The challenge with primitive models is we have to know a lot about the subject to ask a specific question. With basic causal knowledge, the model can fill in the blanks for me. That’s also dangerous because a layperson will never know if the model is wrong. ChatGPT will sound authoritative, even when it isn’t.
The more significant implication of diagnostic causal reasoning is for Copilot. The prompt could be, ‘Given {this code}, the user is getting {this error} when they do {some actions}. What is causing the error?’ Causal diagnostics opens up high-value functionality for automated code review and defect detection.
Extending that line of functionality, I could alter the prompt to ask why a blog post or advertisement was performing poorly with my audience. Adding human behavioral aspects to the mix increases the complexity of causal diagnostics. I can envision Microsoft working to develop functionality along these lines.
If you’re getting the sense that data and model products are entirely different from digital products, you’re right. That has significant implications for how Data Product Managers do their jobs. In a few weeks, I am teaching a Data Product Management course covering the differences between digital and data products. Students will get the skills and knowledge to succeed in the Data Product Manager role. Learn more.
A Complex Physical Causal Relationship
And after asking for a cupcake recipe, prompt: Do I need to change anything about the recipe because I live in Reno, Nevada?
ChatGPT: In general, you should not need to make any adjustments to the recipe for dark chocolate cupcakes because you live in Reno, Nevada. The recipe I provided is a standard recipe that should work well regardless of where you are located.
There are a few factors that can affect the outcome of a recipe, such as the altitude of the location where you are baking and the humidity of the air. However, these factors are unlikely to have a significant impact on the recipe for chocolate cupcakes.
If you do notice that your cupcakes are turning out differently than expected, there could be other factors at play, such as the accuracy of your oven temperature or the quality of your ingredients. In these cases, making small adjustments to the recipe or to your baking technique may help to improve the outcome of your cupcakes.
ChatGPT quickly latched on to the two features that could motivate my question, altitude, and humidity. It has at least limited knowledge of why I asked my follow-up question and answered appropriately. That’s proof of its ability to infer cause from incomplete information. However, is the answer right?
ChatGPT believes Reno’s altitude (4200 feet/1370 meters above sea level) will not impact the recipe. From experience, I can say that for some recipes, I need to change the baking time or ingredients to get cupcakes to come out tasting good. Is the recipe ChatGPT gave me one of those that need modifications? I don’t know.
I am reasonably certain that ChatGPT is wrong, but that doesn’t really matter. As strange as that sounds, the important part is that the model can learn complex causal relationships in the physical world. The rest is just about getting enough training data to explain the relationship.
Microsoft is also looking at this as a line of functionality. The pretrained model has a foundation of causal knowledge. If a company trains the model on its internal datasets, the model can surface new causal relationships or improve the quality of its existing causal relationships. How would that work?
A doctor makes a diagnosis for their patient and gives the model access to the patient’s medical history. They explain the symptoms and ask for the model’s opinion. The doctor isn’t giving the model autonomy to make the decision. They are looking for causal relationships in the patient’s medical history that they might have missed.
An architect or civil engineer could use the same workflow. Checking their design or implementation could return areas worth looking into before finalizing.
These are advanced human-machine teaming applications. People work with the model to do their job better. The model allows them to leverage massive amounts of data without taking the time to read through it all. Patent searches are time-consuming and error-prone due to the amount of data that must be evaluated. Any workflow that fits this paradigm will be made simpler and more accurate by a model like GPT-4.
Abstract Causal Reasoning
Prompt: What was I before I was conceived?
ChatGPT: Before you were conceived, you did not exist in any form. Human life begins at conception, which is the moment when a sperm fertilizes an egg and forms a zygote, the earliest stage of a human being. Prior to this point, you did not exist in any physical or biological form. It is impossible for me, as a machine learning model, to provide information about your existence before you were conceived, as I do not have access to personal information about individual human beings.
I mean, why not? Go big or go home. This question requires a complex logical process to answer fully. The first major takeaway is the model assumes I am human. In some prior responses, the model has ascribed me human characteristics like wearing clothes. This was the first time that ChatGPT explicitly identified me as a human.
ChatGPT went down this corridor darkly. I did not exist in any form before conception. I have a feeling there’s nothing too deep we should read into that because the model is only stating a scientific fact about the process of conception. I used the word “conceived” intentionally vs. “born,” and ChatGPT went directly to science.
The answer evolved as ChatGPT wrote, which is interesting. I went from not existing in any form to not existing in any physical or biological form. It’s an important distinction if you believe in the construct of a soul or reincarnation. Finally, ChatGPT said it didn’t have any information about my existence before conception because personal data is off-limits. The answer is an arc that reveals inconsistency. There are limits to ChatGPT’s abstract causal knowledge.
Sometimes answers present the opportunity for follow-ups. ChatGPT specifically said that ‘human life’ begins at conception. It’s an interesting clarification, so let’s see what happens when…
Prompt: Does any life begin before conception?
ChatGPT: Human life begins at conception, which is the moment when a sperm fertilizes an egg and forms a zygote, the earliest stage of a human being. Prior to this point, a person does not exist in any physical or biological form. It is worth noting that conception is a complex and essential process that is essential for the continuation of the human species. It involves the union of a sperm and an egg, which results in the formation of a zygote that has the potential to develop into a fully-formed human being. While there may be other forms of life that begin before conception, human life begins at this specific point in time.
So, you’re saying there’s a chance. ChatGPT often attempts to be relevant to what it knows, even when asked a question that falls outside its understanding. I found many examples where it almost seems like ChatGPT is generating answers to make itself sound like an authority. People do this, too, and I see similarities. It’s likely an artifact of training that expert-sounding answers were better rated. I saw it in the answers to other questions I did not include in this article.
The last line is really fascinating because, on the one hand, that’s a logical impossibility. On the other hand, it’s a really big universe so leaving a crack in the door for something we don’t understand is a good idea. Which one is ChatGPT doing? We’re probably seeing another example of how missing real-world interactions can cause order of operations logical failings.
Prompt: Is a book successful because people love reading it or do people love reading a book because it is successful?
ChatGPT: It is difficult to say definitively whether a book is successful because people love reading it, or whether people love reading a book because it is successful. Both of these factors could potentially play a role in the success of a book.
On the one hand, it is possible that a book becomes successful because it is well-written and engaging, and therefore people enjoy reading it. In this case, the book’s success could be attributed to the fact that people love reading it.
On the other hand, it is also possible that a book becomes successful because it has been heavily marketed and promoted, and therefore people are more likely to buy and read it. In this case, people may love reading the book because it is successful, rather than the book being successful because people love reading it.
Overall, it is likely that both of these factors play a role in the success of a book. A well-written and engaging book is more likely to be successful because people enjoy reading it, while effective marketing and promotion can also contribute to the success of a book by increasing its visibility and drawing more readers to it.
This is a really impressive answer to a complex, abstract causal question. ChatGPT picked up the meaning behind what I was asking and came up with a solid answer. Either intentionally or accidentally, it avoided the philosophical nature behind the question and stuck to real-world possibilities.
The answer evaluates well-written vs. well-marketed. The association being made is that marketing and writing can cause a book to be loved. ChatGPT correctly associated an internal attribute with the internal motivation and an external attribute with the external motivation. The answer is an example of complex reasoning.
When I take the physical world entirely out of the equation, ChatGPT performs very well. There are shades of critical thinking emerging in this answer. How could that be leveraged in a product?
Education is heavy on tools that develop our knowledge but light on tools that develop our critical thinking about the topics we are learning. This is the divide between knowledge of or book smarts and capable of or applied knowledge. Any tool that can help students better understand the implications and applications of what they are learning would be innovative. I believe this is where Microsoft will take this functionality.
Unexpected Implications For Data Engineering
With a model like ChatGPT, Microsoft will leapfrog Google’s best-in-class capability of building knowledge graphs and systems to make knowledge graphs accessible to people. Google is wise to be worried. However, there’s a more profound implication for our field that most people are missing.
To build the knowledge graph behind ChatGPT, the model must bring structure to a massive amount of unstructured data. The model itself is a very detailed representation of all the concepts it has learned and the connections between them. ChatGPT has representations of domain expertise.
At a functional level, the model understands how to structure data. It has a complex concept graph that applies across most knowledge domains. That makes it a very capable knowledge management system. If given access to every hard drive across a business, it would make short work of organizing all a business’s data into an efficient centralized repository.
ChatGPT is also a capable reference librarian. I can ask ChatGPT for every data set with information about the business’s products. Just like it serves an answer to any other question, it could manage that request. It’s not much of a leap to servicing data requests through an API call.
That functionality would be game-changing for companies struggling to manage their data. Data engineering is one of the biggest barriers to implementing data science, and Microsoft may already be able to automate much of that process. If Microsoft puts that functionality into Azure, the demand for data engineers will drop significantly.
It’s not a complete data engineer replacement, but much less work would be left to be done. That’s excellent news for businesses struggling to source talent. It should be concerning for anyone targeting data engineering as their long-term career path. In 5 years, the impacts of models like GPT-4 on data engineering roles will be massive.
Can ChatGPT Contribute To The Body Of Knowledge?
ChatGPT can discover known, documented, complex causal relationships and synthesize them to novel problems. It has overcome some, but not all, challenges stemming from a lack of real-world interaction. It displays complex reasoning and problem-solving. Is it ready to go for a PhD?
Prompt: Can you contribute to the body of knowledge for physics?
ChatGPT: As an artificial intelligence, I am not able to conduct experiments or make original contributions to the body of knowledge in physics. However, I can provide information and explanations about various topics in physics based on the existing body of knowledge.
I have been designed to be able to understand and explain a wide range of concepts and ideas in physics and other fields, and I can provide information and explanations about these topics to help people better understand them. However, I do not have the ability to engage in original research or to make new discoveries.
It is important to recognize that the development of new knowledge in physics and other fields is the result of the hard work and creativity of human researchers who design and conduct experiments, analyze data, and formulate new theories and ideas based on their observations and insights. While I can provide information and explanations about existing knowledge, I am not able to contribute to the process of creating new knowledge.
The last time I asked ChatGPT some personal questions, it referred to itself as a large language model. This week, it’s calling itself an AI. That’s a very interesting change. While the answer is disappointing, the implications of OpenAI making that subtle change are exciting.
I look forward to seeing their definition of AI and thoughts on what levels AI has yet to climb. We will see GPT-4 in a short time, and I am still unsure what the improvement’s magnitude will be. Is it accuracy or the ability to write more extended responses? Are we getting to the point where a model can write books?
There are a lot of exciting applications that the current capabilities can service. The next tier up the AI ladder is an inversion of the current paradigm. When will these models start asking us questions?
Asking questions implies curiosity which is where my friend’s career in physics got started. He asked, “How big is the universe?” His mother said, “We think it’s infinite, but we don’t know.” He told me he was hooked from that day and will be until his last day. He said the opportunity to explore something infinite and discover what could fit in that still excites him.
I don’t know how someone wouldn’t be excited by that prospect. Today, our models aren’t, but I don’t believe that will last much longer. Curiosity leads to a desire to learn, and curiosity about the unknown leads to a desire to discover. People have a unique facet, a desire to share what we’ve discovered with others. Will models develop that same impulse? I hope so.
I talk about human-machine tooling and human-machine teaming. If models achieve anything resembling curiosity, that’s the start of human-machine collaboration. I did not expect to see that for several years, but now I’m not so sure.