Episode 3

Graph Neural Networks and Generative AI with Jure Leskovec

‍

On this episode, we’re joined by Jure Leskovec, Stanford professor and co-founder at Kumo.ai.

Graph neural networks (GNNs) are gaining popularity in the AI community, helping ML teams build advanced AI applications that provide deep insights to tackle real-world problems. Stanford professor and co-founder at Kumo.AI, Jure Leskovec, whose work is at the intersection of graph neural networks, knowledge graphs, and generative AI, will explore how organizations can incorporate GNNs in their generative AI initiatives.

About the Guest

Parul Pandey has a background in Electrical Engineering and currently works as a Principal Data Scientist at H2O.ai. Prior to this, she was working as a Machine Learning Engineer at Weights & Biases. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voices in the Software Development category in 2019. Parul has written multiple articles focused on Data Science and Software development for various publications and mentors, speaks, and delivers workshops on topics related to Responsible AI. She is currently part of the “The 2023 Kaggle AI Report” as an area chair and section editor, focusing on the section dedicated to the theme of continued studies of AI ethics.

Transcript

Krishna Gade: Thank you for joining us today at the AI Explained. The topic today is Graph Neural Networks and Generative AI. I'm Krishna Gade and I'm the CEO of Fiddler AI. I'm very excited to welcome our special guest, Jure Leskovec, is a Professor at Stanford School of Computer Science and Co-founder at Kumo.

Krishna Gade: Jure, I had the good fortune to work with Jure briefly at Pinterest when he was a Chief Scientist, and also congratulations Jure on the significant accomplishment on the SIGKDD 2023 Innovation Award.

Jure Leskovec: Yeah, thank you.

Krishna Gade: For your contributions to graph mining in network science and applied ML. Great. So welcome Jure. You started, you've been working on this graph neural networks. This is the topic of the show. Maybe you could give us an overview of graph neural networks and how do they differ from other neural networks?

Jure Leskovec: Sure. Hi everyone excited to be here. What to say? I've been working on graphs for a very long time and actually started working on this one. Social networks emerged. And at that point in time we were very, it was fascinating to say, "Hey, these natural graphs are emerging. Can we quantify them? Can we understand them? Can we measure them?"

Jure Leskovec: Actually just, uh yesterday I realized like 15 years ago is when this was happening and I was an intern as a PhD student at Microsoft at that time. We took the Microsoft Instant Messenger. It was the first kind of planetary scale of how people talk to each other and what is kind of the social network of the, of the entire world.

Jure Leskovec: And, and we, uh, we kind of verified the six degrees of separation hypothesis, uh, that, you know, goes back in social sciences, to 1960s. So that's kind of where we started but really, in networks, it was always around how do we model, how do we predict? And it was extremely, extremely hard, right? If you think about, if you have a graph and you wanna build a machine learning model on top of it, you know nodes have attributes, features. Edges have properties. Usually everything is timestamped. You have different types of entities, and the question is, how would you take that data and make it ready for machine learning model to consume?

Jure Leskovec: And in the old days, it meant you had to do feature engineering. And then the problem with feature engineering is, how do you capture the network structure around you? So if you just think, let's say from the social network point of view, it's like yourself. So maybe what's your degree? How many connections you have, then how many connections do your neighbors have?

Jure Leskovec: Are your neighbors connected with each other? What types of neighbors do, you have? What are their features, properties, genders, ages, locations, and things like that. And you can see how this is exponentially harder to do manually than traditional feature engineering, that people have been doing, let's say, on on traditional ML.

Jure Leskovec: So it was very hard. Then the first kind of set of methods we invented was around embeddings. So basically saying, can we build task agnostic node embeddings. And this was back in 2015 or so when methods like Word2vec were invented and we generalized those, uh, to graphs.

Jure Leskovec: And the methods, on the graph side are called DeepWalk or Node2vec. And those were basically kind of, uh, task agnostic, shallow embeddings. So the assumption was you are given a graph, no node types, no edge types, just a wire structure, and you wanna learn or, uh, the embedding of every node. So we wanna estimate coordinates for every node.

Krishna Gade: The Node2vec work is basically built out of the adjacency matrix of the graph.

Jure Leskovec: Essentially the input is the adjacency matrix, and then what you really do is you do random walks on that adjacency, matrix and then you are basically trying to predict what nodes, if you start from a node, what other nodes get visited. And that prediction is based on a kind of a soft max logistic model of the embeddings, and you are estimating the coordinates of every note. So it means the number of parameters in this model is linear in the size of the graph, because every node you have to estimate its coordinates.

Krishna Gade: Basically looking at the n-dimensional vector where N is a number of vertices and you are basically trying to create an...

Jure Leskovec: N x D, N x D

Krishna Gade: Exactly

Jure Leskovec: Number of nodes x the dimension. And that has been a huge evolution and, it really improved performance on link prediction and, and node classification and things like that.

Jure Leskovec: And, uh, the node for example, this was heavily used by Facebook, uh, and it's implemented in PyTorch BigGraph.

Jure Leskovec: They built a industrial scale distributed system to kind of learn the embeddings of all these things, but it's very finicky because first you are learning this in a task agnostic way. So it's not specific to the entity you wanna uh, to task you wanna predict and uh, whenever the graph changes, whenever new nodes get added, you have to recompute all the embedded you have to re-estimate positions of all the nodes.

Jure Leskovec: So, so it's kind of not..

Krishna Gade: Pretty expensive.

Jure Leskovec: Yeah, very expensive. Right. So then this brings us to the notion of graph neural networks, right? If the, these embeddings, we call them shallow embeddings, task agnostic embeddings. Shallow in a sense that you have the, you as a part of your learning process, you estimate these coordinates in a, and that's it.

Jure Leskovec: There's no neural network. So then this brings us to graph neural networks. And graph neural networks are an extremely general way to look at neural networks. And in, if you look at it mathematically, graph neural, like for example, a convolutional neural network or even a transformer is a subclass of a graph neural network. So I can write a CNN, uh, in the kind of graph neural network formalism. So it's extremely general. And it's general. I, I can say more, but it's general in a sense that it allows us now to apply deep learning to a allow us to apply representation, learning to complex data, right? And I say complex data, you know, image is a fixed size matrix and I have my convolutional operator that kind of, I think I can, you know, I'm sliding text is a linear sequence and I'm learning this self attentions.

Jure Leskovec: But graph is neither, graph is much more, more complex in a sense that I can take text and represent it as a line graph. I can take a matrix and I can represent it as a grid graph. But if I have a general graph then it's unclear how those models would generalize.

Krishna Gade: It captures the relationships.

Krishna Gade: if you can model a sentence into a graph, it's not only looking at adjacent words, but it's also able to like look at relationships of words across wherever they are in that sentence.

Jure Leskovec: Exactly. Exactly. And here I think the way to maybe think of these types of models, these types of graph neural networks, is that this is a very flexible modeling architecture. And it's a modeling architecture that kind of adapts to the shape of the data, it adapts to the local shape of the data. So that's the key here. And another way to think of graph neural networks is that they are a message passing architecture on top of the graph. And another way to say it, and I'm using a lot of my hands.

Jure Leskovec: It's the following, is to think that you have a node and you'd like to predict something about that node.

Krishna Gade: Yeah.

Jure Leskovec: Of course, that node has a set of features, so the easiest way to predict about the node is to use the node features to predict something, maybe whether the customer is going to churn or something.

Krishna Gade: Let's maybe concretize it in the case of a use case, Jure. What have you found very compelling use case for GNN in practice.

Jure Leskovec: Oh, yeah.

AI Explained: GNNs and Generative AI: How does it work?

Jure Leskovec: I can say a few things. So we pioneered this technology and really scaled it up like, a thousand times bigger than what was done before at Pinterest.

Jure Leskovec: We basically built a GNN learning platform for recommendations. And it had huge impact on the bottom line of Pinterest across a number of different use cases from recommendations, shopping, advertising, fraud, this kind of trust and safety as, or however integrity, however you want to call it.

Jure Leskovec: Things like that. And let me explain you now how, why GNNs work so well. The reason GNN work so well is, let's say this, let's take social media as an example. And then we can talk where other domains, where this can be applied, right? But let's say you are, let's say social media company, right?

Jure Leskovec: You have users, you have, I don't know, posts, you have interactions, you have followers and so on, right? In the traditional way, you can take the user and predict something about the user. Let's say we are predicting churn in a sense, will the user log in the next two weeks or not . So another way to think of this is when I wanna make a prediction about you.

Jure Leskovec: I can take information from your neighbors, whatever posts you have written, whatever other posts you have liked, whatever other users you are connected to. So this means that for you, I take these neighbors of yourself, and then these neighbors can take also the neighbors of neighbors. So I can do almost like a BFS tree around you.

Jure Leskovec: And that BFS tree I can now arrange in a tree, and that's a neural network architecture now, right? Where you basically take information, let's say 2, 3, 4, hops away from you. Take the features of those nodes and learn operators that take those features, that transform them and then every node on the top kind of aggregates from the children, and sends it to the parent.

Jure Leskovec: So it means, now if I'm predicting about you, I'm using information about all your interactions and all the interactions of those interactions in an optimal end-to-end way for your predictive task. And what this means when I say the model adapts, is that if you have a lot of friendships and a few posts, the BFS tree around you will look very different than for, I know some different user who has different posts and things like that. So basically every node in a graph defines it in some sense, defines it its own neural network structure.

Jure Leskovec: It defines its own computation forward test graph. And usually we do this a number of steps deep, as many as you like. In social networks it is less if you have molecules or proteins or these spatial graphs that, are very big.

Jure Leskovec: We can do hundredsuh, levels deep to basically then learn how do you optimally propagate information from far away points to the, of the network, to the center so that you make accurate prediction.

Jure Leskovec: And what is good about this it is this is able to learn. We basically have math and theorems to prove that. It's able to learn both the structure of the network around you as well as the properties of the nodes that are around you.

Jure Leskovec: So it's naturally able, for example, to learn collaborative filtering. To say, oh, you are connected to these products that other users are connected to. What are those users connected to on the other end? So let's recommend that. That's one way.

Krishna Gade: So, like in recommender systems, right? Like you mentioned collaborative filtering there at one point in the last few years, the two tower networks became very popular and lots of companies implemented them for recommendations models. Is a GNN a generalization of a two tower model, or how does it differ from that?

Jure Leskovec: I think uh, a two tower model at the end, you are basically scoring something and something, but what's what yes, you are on top of the tower, but what's below the tower? And you can think of usually you would have a GNN to be below the tower. The reason for that is that in this use cases, really, what defines an entity is its interactions. Right. You know, I'm user 1, 2, 3, 4. This is my age. I you know, I live in Silicon Valley.

Jure Leskovec: I mean, that's all you know about me. It's like you're very data poor about me. And, but what you really know about me is, what do I click? What do I buy? What do I interact with, right? So what's your knowledge about me comes from these interactions. So you have to account for these interactions. In the model naturally to enrich because otherwise you just have my static profile or whatever.

Jure Leskovec: Even for example, at Pinterest, right? At Pinterest, yes, we have images, text every pin, right? It's an image and text and so on. You'd say, wow, so much information, right? But then when you build computer vision models on top of this images, it's like the vision model gets confused. Is this a soil or is it ground meat?

Jure Leskovec: Is this a rug to put on the floor or is it a tapestry to hang on the wall? It looks about the same. Is this a garden fence or is it a bed railing? They both look the same, right? But if you look at user interactions, then this get totally separated out in the graph.

Krishna Gade: So what people tagged each one will give you a lot more information.

Jure Leskovec: For example, so when I'm creating an embedding of the image, I build a neural network that takes other images that also these users interact with. So you can be like, okay, I don't know whether I'm a rug or a tapestry, but people who also clicked on other images, they clicked on rugs, on something that's obviously a rug. So maybe, you know I am a rug as well.

Krishna Gade: Yep. Yep. Makes sense.

Jure Leskovec: So that's kind of where this becomesvery very powerful.

Krishna Gade: Social networks obviously are amenable to be modeled as graphs and makes a ton of sense to you know, use this type of technology for recommendations. You, in one of your talks, you mentioned that the tabular data you know, the kind of the bread and butter models most people build: your customer acquisition models, churn protection models, fraud detection models.

Krishna Gade: You mentioned that machine learning on that tabular data is brittle. Could you elaborate more there?

Jure Leskovec: Yeah, exactly. So this is something I'm really excited about because it always has bothered me that you needed to have a graph and then if you were a social network company, you said, "okay, I admit I have a graph" but everyone else says, "no, I don't have a graph. Go away." Right?

Jure Leskovec: And, then really creating a graph was left to the kind of human imagination. So what we are pioneering at uh, at my current startup called Kumo. Kumo.AI is basically a way to automatically generate graphs and apply deep learning to your data. So, now basically the claim I'd like to make is that everyone has a graph. And why does everyone have a graph? Because your most valuable data is stored in a relational database. You have multiple tables that are connected with primary foreign correlations. And for you to do machine learning over a database today over your relational tables, is to join these tables and aggregate them. And that's a a terrible way to do it because you have this rich data richly connected, and now you are manually joining these tables to say, I will join the user table with the purchases table, and now I'll compute the average purchase price. And then two weeks later, another data scientist will come and say, "Oh, I don't, I won't compute the average. I'll compute the average on a Sunday. Look, my dear manager. I provided lift to our model. Where is my promotion?" You know what I mean?

Krishna Gade: So you build lots of feature engineering and try to come up with clever features that will improve the model.

Jure Leskovec: Exactly. And I'll tell you like at one of the um, leading uh, companies who does uh, this kind of uh, rentals of rooms. A price of the room is encoded as 120 different features.

Jure Leskovec: Okay, so it means that there is 120 Spark jobs or whatever Databricks jobs, whatever you wanna call them, that compute that take that single price input and compute 120 different values. So it's 120 different jobs that have to run every day, every night, every hour. That then some data scientists can now take this 120 and build a model, right? And this means that years of time went into thinking, how do you encode price of a room?

Krishna Gade: So you're explicitly modeling interactions in these features basically.

Jure Leskovec: Exactly. So what we can do at Kumo is that you basically come in and say, here's my set of tables, here is primary foreign key relations between them. You register your schema, and now you can start building machine learning models on top of this without any feature engineering. So no feature engineering, no feature experimentation, because the insight here is that you can take your database, your tables, you represent this as a heterogeneous hypergraph, and now you can apply graph neural networks to learn how to propagate information across the graph, which is how to propagate information across the tables to give you accurate prediction. And the cool thing here is that you never do a join. It's the neurons who are joining and aggregating data on the fly, right? So the neuron learns how to take your age and your location and how to combine this with, cross that with some other locations to give you the optimal feature for whatever you are predicting.

Krishna Gade: Lots of people still claim today that they've not been able to successfully apply neural networks on tabular data and XGBoost is still the kind of king of models for tabular data, right?

Jure Leskovec: That's a huge, there is a huge uh, sloppiness in your statement.

Krishna Gade: <<laugh>>

Jure Leskovec: Sorry. So what's sloppy? Because it's terrible, right? When people say tabular data, you think, oh yeah, probably. They mean multiple tables. No, they don't. They mean one table. Now, show me a real use case where you have one table, you don't have one table, you have multiple tables. How do you go from multiple tables to one table?

Jure Leskovec: Is you go through feature engineering. And after you've done your feature engineering, yeah. Now you apply XGBoost, you can apply uh, uh, the kind of table or transformers and things like that. It's, to me, this is it's a problem that that, that, doesn't, it's like it's a, you already done so much work to join these tables and come up with the features, but that's not the big problem. The big problem is how to go from multiple tables to one table .

Krishna Gade: So, in this case, GNNs are not replacing XGBoost, they're making your feature engineering better or faster?

Jure Leskovec: The GNNs just make predictions of the stuff you want.

Krishna Gade: Okay. So they can actually also make the end not only the feature...

Jure Leskovec: it's an end to end. It's an end to end. So that's the beauty, right? It's like now you are, because with feature engineering, you are throwing data away.

Jure Leskovec: You only learn from whatever you summarized in your feature, whatever you thought to, include there. While the GNN can really now learn from the entirety of your data, from the, from all the entities, from the entire schema, and identify a signal, if it's two tables away, three tables away. It's going to bring that signal there and you will never do a three-way join to do feature engineering.

Jure Leskovec: You wouldn't come up with that ever. So that's the, that's the beauty. And I can I'm excited about this so I can keep talking, but that's the kind of the key innovation is that you can finally do a representation learning on multi tabular data. And people say tabular machine learning, they mean one table.

Jure Leskovec: That's not useful. You want multiple tables and Kumo can...

Krishna Gade: So let's say if I am a data scientist and I'm building like these type of tabular data models and I have a database and now I want to represent it as a graph and get started on GNN. Now how should I go about it?

Jure Leskovec: I think you have two options. Option number one is that you go open source, you use Pytorch Geometric, which is the open source library we developed at Stanford. And we have strong backing and strong partnership with Nvidia and Intel who are making sure that the library is optimized and runs really well on their hardware, was even uh, showcased in Jensen Huang's keynote and things like that.

Jure Leskovec: And we have a big user base on, I think it's almost like 400 contributors, nearly 20,000 GitHub stars, active Slack channel of 5,000, 4,000 people and so on. So it's a huge community. And it's a default GNN library for um, all the research. So all the latest, the greatest is implemented on top of PyG or Pytorch Geometric as it's called. So it's pyg.org. pyg.org. So that's one way to do it.

Jure Leskovec: The problem with that is that even if you have the library, it took us a team of engineers and about four years at Pinterest to build an entire machine learning platform around it. So what you have to take care of is you have to care of how to get from raw data to the graph.

Jure Leskovec: How are you going to store that graph? How are you going then to create mini batches from that graph so that now you can specify your GNN architecture in PyTorch and train that, and then where will you store the model? How are you going to serve that? How are you going to manage the lifecycle and so on.

Jure Leskovec: So there's a lot of stuff to do. So what we built at Kumo is we built a scalable industrial grade graph learning platform, that basically allows you to just not even, you don't even have to worry about the graph. All you worry about is what tables do you wanna use and what are primary foreign key relations between the tables. And the tables can include text, they can include images.

Jure Leskovec: All that is fine. And then on top of that, basically the platform automatically creates a graph out of that, distributes it across the machines, and it's ready for, it's ready to be used for machine learning.

Krishna Gade: So before I move on to the generative AI topic, there's some interesting questions that have come on the chat.

Krishna Gade: So let me pick up a few of these things. So a few questions are like, "can we use GNNs in financial forecasting?" Or like how do prevent overfitting GNNs because you are basically trying to model so many relationships and first order, second order relationships that are present in the data. How do you sort of, you know, take care of those things?

Jure Leskovec: Great point. So we've seen a huge range of applications for these GNNs, and the reason for that is that if you think of the graph terms, you have different types of tasks in a graph, you can have tasks that are predicting about a single node.

Jure Leskovec: So what's example of that, as an example is predicting churn, predicting lifetime value, right? You are maybe predicting sales volume of a product. That's about one node, one node, one entity. So that's great. So you can do all these type of prediction problems. Then you can do link pairwise prediction problems.

Jure Leskovec: So what's an example of that? Example is affinity, brand affinity, recommendation. What product is the customer going to buy next? So you are predicting something about the customer and the product and you're predicting something about this link. So specifying all kinds of recommender, system problems, brand affinity, things like that.

Jure Leskovec: Very easy in the same framework. And then last, you can also do graph or sub graph level prediction tasks, which for example, is very natural if you say, is this molecule toxic or not? Because a molecule is a graph of friendships, bonds between the atoms. So it's a graph. Molecule is not a string.

Jure Leskovec: The most natural representation is graphical. I have the atoms and how the atoms bond with each other. So you can do these types of things or if you wanna do some kind of fraud or uh, anti-money laundering and things like that. And of course the question was forecasting, you can build beautiful forecasting models because the network can learn both from your own time series.

Jure Leskovec: So you can do this kind of auto aggressive, but then also based on the correlations, connections with other time series that you have, let's say, sales of other products. You can really learn how to borrow information optimally from other entities to make an accurate prediction.

Jure Leskovec: So we've seen also really good performance on forecasting.

Krishna Gade: Yeah. So one of the things with this activity data, right? You know, let's say you're trying to model churn in a high scale website or you know, there's constantly, the data is changing you know, now we have all this sophisticated data infrastructure that's built to collect the data and update these databases, right?

Krishna Gade: How scalable are these GNNs to be updateable and to retrain them? Do we have to retrain them every time there's new data that shows up? How do they work in practice?

Jure Leskovec: That's a great, that's a great question. So if I was earlier saying that these embedding methods, they need to be recomputed every time.

Jure Leskovec: Graph neural networks are not like that. Graph neural networks are inductive. What does this mean is that when your graph is changing and you wanna make a prediction for a node, all you have to do is quickly identify the breadth for search tree around that node and make a prediction. So you can train on one part of the graph and apply the model to the next part of the graph.

Jure Leskovec: So what people, for example, do in practice is that maybe you would retrain your model at some cadence, maybe weekly, or monthly or whatever, but then you would apply, keep applying your model daily or in real time or however, you are doing this. And the platform scales you know, at Kumo we scaled to 50 billion entities.

Jure Leskovec: So we are scaling to about two Pinterests from a few years ago. So, we can definitely scale. The last thing I will say before we move on.What is super cool about about this view of doing, let's say deep learning over multi tabular data basically over a relational store, over a data warehouse.

Jure Leskovec: It's that the same way as in a data warehouse. You use SQL to kind of query and aggregate the past. What we did at Kumo, we have this SQL-like language that allows you to predict the future. And what I mean by that is if in the data warehouse you would say, "select me all the users whose transaction amount in the last two months was less than a hundred dollars," I can do the same in Kumo by saying "select me the users whose transaction amount is predicted to be less than $200 in the next two months." And based on that specification, what Kumo does, it basically says, "Uhhuh, you are trying to predict some of transaction values inside that time window. Okay? So I'm going to now do a sliding window over your data andinterpret this query to create a target label that is time consistent. I'm going to attach this to the graph and I'm going to automatically build a model to predict that for you."

Jure Leskovec: So you as a data scientist, all you have to do is you have to specify what target is predicted for what entity. So you don't even need to create a training data set. You don't need to create a target label.

Jure Leskovec: You don't need to worry about time travel and time consistency, and all these bugs that we all had. When you create a feature, your model works great, and then two weeks later you realize that you are predicting something that has already happened and is captured in that feature.

Jure Leskovec: It happened to me so many times, I cannot say. But that's the beauty. You can really iterate quickly and get your models out the door. And maybe the last thing I say. This view of let's say, no need to feature engineer on a database, it means you can get models that are much more performant.

Jure Leskovec: You can get them out of the door very quickly, and you can also easily put them in production because all you need to do is you need to refresh the tables and apply the model, refresh the tables, apply the model. So really after you defined what your model is, which is basically what's the quantity that's predicted.

Jure Leskovec: You just have two REST API calls, retrain or apply. And that's it. So from design to deployment, it's a REST API call. It's not that, "oh, now we have to productionize this pipeline. We have to productionize this features, we have to have all these workflows." Not really.

Krishna Gade: So one of the things that at Fiddler, we work with a lot of customers that care about explainability of these models, right? So when you're applying these GNNs on tabular data, is it possible even to explain these predictions that are coming out of GNNs? What are your thoughts there?

Jure Leskovec: That's a great, that's a great question. I think, let's say the way we think of generally right, or the way in the old world we thought of explainability was usually through some kind of feature importances. Trying to understand how features uh, affect the prediction.

Jure Leskovec: In the deep learning world, things are different because there is no explicit features, right? It's all learn the embeddings, if you want to think of it that way. So I think that and that's the explainability we built. We built that at Kumo and what I think it's actually, it's even more powerful because it allows you to point back to the raw data, right?

Jure Leskovec: It allows you to go back and say, this column, this row, this event had affected my prediction. It's not and in this means that it's much more explainable because it's much more the space of explanations. It's much richer than saying, okay, you have three features, here are feature importances of something.

Jure Leskovec: We can do, a lot of that and we see that it really kind of resonates with customers that you can get more detailed explanations, and then of course you can do what-if type things. You can do cohort analysis, you can do a lot of, a lot of different things to really build the trust in the model and really try to understand how is the model thinking? How is the model learning?

Krishna Gade: Awesome. So switching gears a little bit I think now the talk of the town is generative AI. It's been the last six to eight months now. How are GNNs useful in building generative AI models? What are some application domains that you are seeing where GNN-based generative AI models could be promising?

Jure Leskovec: I would say there's very different view. There's a lot to say here. When we say generative AI, right? Like for everyone or for most of the people that seems to be like ChatGPT something, right? But really at Stanford we established what we call Center for Research on Foundation Models.

Jure Leskovec: So really these large language models are just one example of what we call a foundation model. A foundation model is a large pre-trained model that is pre-trained on a lot of data and has this kind of zero-shot type capability. And these LLMs are amazing at this broad, uh, let I would call them common sense, common knowledge, broad internet type stuff, right?

Jure Leskovec: Which is amazing. But you can think of foundation models for other domains as well. You could think of a foundation model for biology, right? You can think of foundation model for medicine. You can think of foundation model for drug discovery, for molecules. All right. So now it's not only the natural language, but the models has become multimodal where one, one of the modal, like where modalities can be images, text, and then graphs, structured data, relational data as well.

Jure Leskovec: So we see a lot of benefit there. And then where we also see a lot of, so that's one way and I'm happy to take, tell more about these kind of foundation models beyond natural language and, this, I dunno, common sense knowledge or whatever you wanna call it. And then another place where we see huge use of GNNs or graphs for generative AI is knowledge.

Jure Leskovec: It's knowledge basis, like your private data. Most valuable private data, again, is stored in, relational tables. Some of it might be text and you could index that text through this kind of retrieval augmented generation. But now imagine if you have live data in a database or if you have a knowledge graph of some sort that describes your business and you want your, chatbot to be accurate, not to hallucinate, right?

Jure Leskovec: If you say maybe Fiddler company has a knowledge graph about "how does Krishna like his coffee" and when a new, I dunno, somebody asks "how does Krishna like the coffee?" We'd like to make sure we have the right answer. And, if you change your preference or whatever, that is readily reflected in the answers of the model.

Jure Leskovec: And this is not about you know, so having some document database or something like that, it's having this structure. Maybe it's how much of each product do we have in stock? What's the price? What is this? And that's where this becomes very important. It's now basically thinking of this as multimodel.

Jure Leskovec: But not modality as image and text, but more as like text and a database or text and a relational knowledge graph that can be updated live and can be retrieved from to to give more accurate answers. And we've developed some research and have papers around this as well as we are working on this at Kumo.

Jure Leskovec: So we are very excited about this direction as well.

Krishna Gade: So let's say if we were to develop a foundation model on a particular set of structured data, let's say it could be biological data sets or maybe other structured dataset, financial data. Now could you then use that foundation model and you know, kind of like fine-tune it for your own data sets?

Krishna Gade: Or do you kind of, some of the practices that LLMs are exposing, are they applicable for these foundational GNNs or is that like still in research right now?

Jure Leskovec: No, no, definitely. I think you you have two ways: one ways is to think of this as fine-tuning. Another one is to think, to basically have a retriever that doesn't retrieve passages, but retrieves parts of your knowledge base in a structured form, right? So you can think of this as background knowledge or as up to date knowledge that you are retrieving back to the LLM and then you are communicating with that through the nature language interface.

Jure Leskovec: Becomes also very interesting, right? Because what this also allows you to do, is allows you to do predictions, right? Like especially with the Kumo functionality, right? Today, kind of LLMs are able to give you this kind of expected common sense answers. Maybe they're able to retrieve something from some static or dynamic knowledge base, but as soon as you would start asking predictive questions.

Jure Leskovec: Then Kumo also nicely comes in as a predictive platform that's end-to-end data driven. And it's not like you ask a predictive question and then the, chatbot says, "okay, thank you. Now please gimme the please engineers the features for me." It's not like that.

Krishna Gade: So you could layer on like a conversation interface on top of the GNN to not only just get knowledge, but also just have some predictions being done on the database.

Jure Leskovec: Exactly, exactly. You can start conversing about what's going to happen, what is predicted to happen, or even if you think about missing data imputation and things like that. GNNs are state of the art for missing for missing data or for incomplete data, or for something that hasn't yet happened or something that you haven't observed yet, and so on.

Krishna Gade: So great. This is awesome. Let me take a few more audience questions at this point. There's an interesting question. This is probablya lot of people out in this stage. Right. So, you know, how do you decide to move from a collaborative filtering type of approaches for recommendation models based on similarity and past history to a graph neural network?

Krishna Gade: When would you do that as scale and cost to consideration? Obviously collaborating within has been there for a long time, and obviously GNNs are taking it to the next level. What are your thoughts?

Jure Leskovec: I think you do it when my answer would be, you always do it.

Jure Leskovec: The question is, can you do it? Can you pull it off? I think it's really the, honest answer. I would say it requires a lot of expertise both at the system building if you wanna do it yourself. It took a team of really strong people. Plus, myself, right?

Jure Leskovec: To do this at Pinterest. So you will see benefits immediately, or through, through experimentation. You are going to see them, but the system becomes much more complex. And then the problem is that if you're starting to stitch this system together, you will say, oh, let me, I know use some graph database.

Jure Leskovec: There are vendors, uh, you know, there's Neptune, there is Neo4j, and so on. But then those things are like super slow, not optimized for machine learning. So it'll be a huge. At the end, it'll be a disaster in a sense. You'll be working on that and it'll be like, "Hey, it doesn't work. This is not for us. And, so on."

Jure Leskovec: And that's the worst that can happen. So what we did at Pinterest, we built ground up, we built our own graph store, optimized for machine learning and so on. And, we got amazing results. At Kumo, we now have the platform for others to try out and to use. And we have clients who are, for example have 200 million users doing recommend doing basically product or order recommendations for over 200 million users. And we are beating internal teams with real time capabilities and years of model tuning. Another option is to, simply go and try out Kumo.

Jure Leskovec: And I would say last thing, right? Like these models are not that heavy, heavy in a sense, right? Like that now you have uh, billions of parameters. They are, they're more lightweight. So even, on 200 million users it takes a day to train.

Krishna Gade: So, there are also governance related concerns here, right?

Krishna Gade: People are worried about you know, if we are encoding all of this information in a graph, and let's say you have to delete some parts of the graph. You have to delete some users. How would these GNNs allow for that? You know, could you modify these things?

Jure Leskovec: Oh yeah. The graph. You can graph gets refreshed. You can refresh the graph daily or whatever. So, it's no, it's no problem. It's no problem. And all at kind of updating and, and, kind of having this forgetting and updating, and adding, and removing.

Jure Leskovec: That's, very easy. And yeah, you, want your underlying data structure to be up to date. So that we that of course we take care of.

Krishna Gade: So you basically delete the, delete whatever nodes or rules...

Jure Leskovec: You can delete or you know, you make sure

Krishna Gade: You should refresh the model basically.

Jure Leskovec: Exactly. You refresh the, model doesn't need to be retrained or refreshed because it's so flexible. What's going to change that structure of the BFS tree is going to change a bit. But the GNN is so flexible that it can ingest any tree. Any, BFS tree. If you want to think of it that way.

Jure Leskovec: The graph changes, the model can still be applied. You drop all the connections of one type. The model can still be applied. So the, you drop 20% of the connections, the model can still be applied. So the model, these models are super robust to changes, to noise, to drop out.

Jure Leskovec: Things like that. So that's not an issue at all. All you have to do is make sure that your tables in a data warehouse are up to date. So if you know how to delete a row from a table the rest will be taken care of.

Krishna Gade: So there's a more detailed technical question. You know, When you are training these models, right?

Krishna Gade: How do you fix the number of layers? Is it based on the data volume? And have you ever faced over smoothing problems in industrial applications? And this is more of a technical question as you build AI and know what were the things that you have to go through?

Jure Leskovec: No, that's a great, that's a great question.

Jure Leskovec: I think the specific answers to this are a bit domain or use case specific. But what I would say is, for example, the depth will depend on the structure of your graph, right? So if you are a kind of a natural social network like graph then, you will go a couple of steps deep.

Jure Leskovec: But your individual layers can be very expressive and very powerful. But like in the, because really the way to think of this is you have two types of depth. You have the depth in terms of neural network layers, and you have a depth in terms of the graph depth. And, the two shouldn't be confused.

Jure Leskovec: So you could in a social network, if you go six steps away, you reached every human on the earth. So you don't wanna learn from every human on the earth. It's over smoothing and all that we were discussing. So you wanna go maybe two steps away, but what you wanna do is you want your layers to be much more expressive or to each layer to include multiple sublayers, if you wanna think of that way of data processing that really learns how to combine and aggregate this information. That's really the trick is to have a pre-processing layers then the message passing layers, post-processing layers, and then stack this together.

Jure Leskovec: So you need to think of this hierarchically to do this right. And of course, now if you have, I know, a long molecule, a long protein, then you wanna have a deep network that propagates information from one part to the molecule to the other part. Which is different, right?

Krishna Gade: And so like these days people are using lots of vector databases to store their private knowledge and private data, would this change with like when you're trying to use GNNs or do you do vector databases coexist with GNNs? Or do you have to go to a graph store or? How does that work?

Jure Leskovec: So first I think maybe you said graph store. So definitely vector databases are great for using embeddings and retrieving them.

Jure Leskovec: What is nice with of course any, with GNNs, you can output your embeddings. For example, if people follow, let's say Pinterest work and Pinterest research papers that we published in Pinterest, we embedded everything in the same space. Users, queries, pins and, it was amazing. You actually think proved our search as well.

Jure Leskovec: Like people would type in queries, we would embed that and retrieve based on the learned embeddings. And we were really able to do search in a data driven human feedback type of way to a huge benefit. So definitely that is the case. What I will, I will say is in some sense you need both, right?

Jure Leskovec: You need a graph store to generate these BFS trees that define the structure of the GNN. And of course, in the end you don't really generate a BFS, but you are very smart. How do you generate that tree? Because if you hit Kim Kardashian, then you are kind of

Krishna Gade: Yeah. Fan out is huge.

Jure Leskovec: The fan out is huge. All connections are meaningless. There's nothing to do there, right? So you wanna be more strategic how you avoid avoid nodes like that, right? Makes joking a bit, right? But Kumo provides a platform that allows you to do that.

Jure Leskovec: Otherwise, you have to build this yourself and innovate it yourself. But we are very smart and very careful. How do we sample that tree structure from which we learn and which nodes get selected and which nodes are informative for the prediction task. So there is a lot of smarts in there. It's not just, oh, do the BFS and so on, because yeah, you hit a high degree node and then what do you do?

Jure Leskovec: So that is the case. And then of course you can then index the embeddings and retrieve. What is interesting, and we've seen this actually in some use cases, is that embeddings can be very limiting. Especially if you think in, let's say this kind of recommender system setting there is an embedding of the user and there is an embedding of the item.

Jure Leskovec: But really the embedding of items for you, Krishna, is different than it is for me. So it means that if I have fixed embeddings for items and, let's say embeddings for users, then users can only move around these items. So what we find out is that if you use the GNN directly, then basically what GNN is doing internally.

Jure Leskovec: It's almost like creating a per user specific embedding of all the products. So you're getting, you are getting huge improvement in accuracy because you are not materializing the embeddings, but you let the GNN to actually give you the score.

Krishna Gade: You're using the BFS approach to dynamically figure out the user.

Jure Leskovec: Exactly. So conceptually what it's doing, it's almost like giving you personalized product embeddings so you can really retrieve in a personalized way.

Krishna Gade: And a question related to embedding system. You know, how is the node embedding in comparison with KG embeddings, like transE, RotatE, and how could, how it be effective in link prediction?

Jure Leskovec: Good. So here is the, like all this like knowledge graph completion like transE complex and, TransR, and RotatE, and all, those guys, right? These are all based on shallow embedding. They all just estimate an embedding of the node in the graph. And, in those cases, the underlying assumption is that you have a set of node ID and a set of relation types.

Jure Leskovec: So you're usually learning something per relation. So that's very, it's shallow. There's no neural network. It's the, I think it's the old, I would say it's the old school of doing this. So now if you have a, this attribute less, information less knowledge graph that only has the relational structure, then those, methods I think are good.

Jure Leskovec: But what you have in reality, you have rich data associated with the notes. So you wanna be using GNNs to do link prediction. For example one of the methods that works really well for link prediction on GNNs is called ID-GNN. That goes back to those personalized embeddings that I was saying earlier.

Jure Leskovec: And, of course what you then get with the graph neural network is you get now a function that computes the embedding. So you can apply this function to any node in the graph structure can change and you just reapply the function. So you are inductive, you can generalize, you can transfer to new graphs and so on.

Krishna Gade: Maybe like one last question from my side as we wrap up. Thank you so much for spending time with us. What are you looking forward to Jure in this field? This is a very exciting new space. I have dabbled myself in some graph theory in grad school and worked in graph partitioning and all, and I'm always fascinated by graphs.

Krishna Gade: Applying graphs to neural networks is very interesting and seems very expensive competitionally, creating these personalized embedding models. But it seems like you guys have cracked something here. What are you excited about in this space? What is and maybe, what are you looking forward to this year? In this space?

Jure Leskovec: Yeah. What? I'm like really, I think what is exciting. I'll answer this from several viewpoints. I think what what always bothered me in this space is that where does the graph come from? And it seemed graph feels like this abstract mathematical concept, and people were pushing it away.

Jure Leskovec: So from that point of view I'm really, excited about what we are doing with Kumo, because it's not about graphs. It's bigger than graphs. It's about relational data. It's about data and data warehouses. And of course there is graphs underneath which is awesome, which makes me happy.

Jure Leskovec: But from the user, customer point of view, they don't need to think about this. I know dry mathematical representation, they can really think about the data and the relationships between the data. So that's something I think can really change and have huge impact in industry. So that's what I'm really excited about on the research research side.

Jure Leskovec: We are very excited about this notions of like pre-training foundation models for this type of data. Zero-shot capability, few-shot capability and how would you transfer learn across graphs, across data sets. That becomes very, interesting. I think it's really cool, like now kind of going back to the notion of Kumo, right?

Jure Leskovec: Because traditionally in machine learning with manual feature engineering, you don't even know what is transfer learning, what is pre-training. You cannot would you do it's doesn't, those concepts don't exist. But now that you have a neural network that you, can apply to your data warehouse.

Jure Leskovec: Now you can start thinking about pre-training, you can start thinking about multitask training. You have a task with lots of data. You have a task with little data. You can have the bottom layers of neural network shared or, whatever, right? So you can do all these kinds of things that were unachievable or it wasn't even clear what they mean in the old world of single table.

Jure Leskovec: Let's do feature engineering. But now you can pre-train, you can multitask, especially your kind of data pool or data imbalance tasks benefit, in a huge way through this new view on the learning in data warehouses. So that's what I'm really excited about and think it through how would this vision materialize and how do we make it perform and beautiful and useful.

Krishna Gade: Very exciting you know, super fascinating. Thank you so much for spending time with us today. I learned a ton talking to you. I'm sure our listeners have learned a lot. For those of you, I don't think you need any introduction to Jure. You can go and search his Stanford page, his startup Kumo AI. If you have any questions, feel free to reach out to him.

Jure Leskovec: And it's easy to find my email address so absolutely.

Krishna Gade: Awesome. Thank you so much.

Jure Leskovec: Thank you so much. I enjoyed it a lot. Thank you Krishna. And thanks everyone for attention and for uh, really good insightful questions.

Jure Leskovec: Awesome. Thank you.