协作型过滤_公司如何使用协作过滤来准确了解您想要的内容-白红宇

协作型过滤_公司如何使用协作过滤来准确了解您想要的内容

阅读量：2521 次

发布时间：2019-05-11

本文共 14182 字，大约阅读时间需要 47 分钟。

协作型过滤

How do companies like Amazon and Netflix know precisely what you want? Whether it’s that new set of speakers that you’ve been eyeballing, or the next Black Mirror episode — their use of predictive algorithms has made the job of selling you stuff ridiculously efficient.

像Amazon和Netflix这样的公司如何确切地知道您想要什么？不管是您一直在注视着新的演讲者集，还是下一集《黑镜》(Black Mirror)，他们对预测算法的使用都使向您推销产品的工作效率极高。

But as much as we’d all like a juicy conspiracy theory, no, they don’t employ psychics.

但是，尽管我们都希望有一个多汁的阴谋论，但不，他们没有雇用心理学家。

They use something far more magical — mathematics. Today, we’ll look at an approach called collaborative filtering.

他们使用了更神奇的东西-数学。今天，我们将研究一种称为协作过滤的方法。

协作过滤到底是什么？ (What exactly is collaborative filtering?)

As mentions in his awesome deep learning course at , structured deep learning models don’t get much love these days.

正如 ( )在深度学习课程中提到的那样，，结构化深度学习模型并没有获得多少关注。

Probably because you wouldn’t get to see stuff like this:

可能是因为您看不到以下内容：

But structured algorithms like collaborative filtering are the ones being used most often in the real world. They’re the reason that the stuff that shows up on the bottom of the page on Amazon seems so tempting to buy.

但是，诸如协作过滤之类的结构化算法是现实世界中最常用的算法。这就是出现在亚马逊页面底部的东西似乎很诱人的原因。

Collaborative filtering works on a fundamental principle: you are likely to like what someone similar to you likes.

协作过滤的基本原理是： 您可能喜欢自己喜欢的人。

The algorithm’s job is to find someone who has buying or watching habits similar to yours, and suggest to you what he/she gave a high rating to.

该算法的工作是找到一个具有与您相似的购买或观看习惯的人，并向您建议他/她给予了很高的评价。

It can also work the other way around.

它也可以以其他方式工作。

The algorithm can recommend a product that is similar to another product that you previously gave a high rating to. All of this similarity checking and comparison is done by some fairly straightforward linear algebra (matrix math).

该算法可以推荐与您先前给予较高评价的另一产品相似的产品。所有这些相似性检查和比较都是通过一些相当简单的线性代数(矩阵数学)完成的。

真的那么容易吗？ (Is it really that easy?)

Not so fast. Before we start throwing vectors and dot-products around, let’s address a significant problem faced by any recommender system algorithm — .

没那么快。在开始使用向量和点积之前，让我们解决任何推荐系统算法所面临的重大问题- 。

You see, collaborative filtering works well when you have two things:

您会看到，当您遇到两件事时，协作过滤效果很好：

a lot of data on what each customer likes (based on what they previously rated high)
有关每个客户喜欢的东西的大量数据(基于他们之前的高评价)

a lot of data on what audience each movie or product might cater to (based on the type of people who rated it high).
有关每部电影或产品可能迎合哪些受众的大量数据(基于对其评分较高的人群的类型)。

But how about new users and new products, for which you don’t have much information?

但是对于您没有太多信息的新用户和新产品呢？

Collaborative filtering doesn’t work well in these scenarios, so you might have to try something else. Some common solutions involve analyzing metadata or making new users go through a few questions to learn their initial preferences.

协作过滤在这些情况下不能很好地工作，因此您可能不得不尝试其他方法。一些常见的解决方案包括分析元数据或使新用户遇到一些问题以了解他们的初始偏好。

好了，现在介绍一些很酷的东西 (Ok, now onto the cool stuff)

Like most machine learning problems, it’s probably a good idea to first take a look at the data. From now on, I’ll be using the example of movies and ratings (mostly inspired by the dataset used in the fast.ai course).

像大多数机器学习问题一样，首先查看数据可能是一个好主意。从现在开始，我将使用电影和分级的示例(主要是受fast.ai课程中使用的数据集的启发)。

We’re going to visualize it by building a table of users against the score they gave to movies.

我们将根据用户对电影的评分来建立用户表，以使其形象化。

Each row represents a user, and each column a movie.

每行代表一个用户，每列代表一部电影。

Cross-referencing will tell you what rating a user assigned to a film (on a scale of 1–5, where 0 means ‘didn’t watch’).

交叉引用将告诉您用户为电影指定的等级(等级为1-5，其中0表示“不观看”)。

We’ll consider our collaborative filtering model a success if it’s able to fill in the zeros. This would mean that it’s able to predict how each user would rate a movie, based on both what the user is like and what the film is like.

如果能够填充零，我们将认为我们的协作过滤模型是成功的。这意味着它能够根据用户的喜好和电影的喜好来预测每个用户对电影的评价。

Now for the algorithm. We’re going to set up 2 matrices: one for the users and another for the movies. These are called . Let’s call them W_u (for the users) and W_m (for the movies).

现在介绍算法。我们将设置2个矩阵：一个用于用户，另一个用于电影。这些称为。我们称它们为W_u (对于用户)和W_m (对于电影)。

Each matrix is going to be filled with e-dimensional vectors (basically arrays of size e). What is e, you ask? It’s a magic number that I’ll address later. For now, just let e be your favorite natural number.

每个矩阵都将填充e维矢量(基本上是大小为e的数组)。你问e是什么？这是个魔术数字，我稍后再讲。现在，只是让e为你最喜欢的自然数。

Notice that the table above, if you remove the row and column headings, also looks like a matrix. This is no coincidence. If you’re familiar with matrix multiplication, you’ll know that a 2*3 matrix times a 3*2 matrix gives a 2*2 matrix.

请注意，如果删除了行标题和列标题，则上表也看起来像矩阵。这不是巧合。如果您熟悉矩阵乘法，就会知道2 * 3矩阵乘以3 * 2矩阵得出2 * 2矩阵。

If you want to learn more about matrix multiplication, you should check out .

如果您想了解有关矩阵乘法的更多信息，则应查看。

Using the same logic, we can multiply our movie and user matrices. The dimensions will work out exactly right to give a matrix that’s the size of the original table dataset (well, technically you have to transpose one of them, but I’m skipping the implementation details).

使用相同的逻辑，我们可以将电影和用户矩阵相乘。这些维度将完全正确地给出一个矩阵，该矩阵的大小与原始表数据集的大小相同(嗯，从技术上讲，您必须转置其中的一个，但我跳过了实现细节)。

If we can learn the values of the entries in our movies matrix and user matrix, we could, in theory, get our original table back by multiplying the two.

如果我们可以了解电影矩阵和用户矩阵中条目的值，则从理论上讲，我们可以通过将两者相乘得到原始表。

We have our ground truth: the original table. All we need to do is figure out the numbers (also known as the weights) that somehow multiply together to give us back the original table.

我们有我们的基本道理：原始表格。我们所需要做的就是找出数字(也称为权重)，它们以某种方式相乘即可返回原始表。

Enter the mystic art of machine learning.

进入机器学习的神秘艺术。

Here’s how we’re going to do it:

这是我们要做的事情：

We start off with completely random numbers in the movie matrix and user matrix.
我们从电影矩阵和用户矩阵中的完全随机数开始。

Then, we multiply the two to get another matrix (which, at this point is also completely random) that looks like our original table.
然后，我们将两者相乘得到另一个矩阵(在这一点上，它也是完全随机的)，看起来像我们的原始表。

By comparing our predicted values with the real values from the table, we define a loss function. This is basically a measure of how far off our predicted rating was from the actual rating.
通过将我们的预测值与表中的实际值进行比较，我们定义了损失函数。基本上，这是对我们的预测评分与实际评分之间的差距的一种度量。

Note, we also have to skip the zeros, since we don’t want our model predicting a rating of 0 for anyone. That would be pretty useless.

注意，我们也必须跳过零，因为我们不希望我们的模型预测任何人的评级为0。那将毫无用处。

If you want more info on loss functions, I’d recommend ’s .

如果您想了解有关损失函数的更多信息，建议您使用的。

After finding the losses, we use and to optimize the two matrices to get just the right values.

找到损失后，我们使用和来优化两个矩阵，以得到正确的值。

BOOM! We’re done!

繁荣！大功告成！

Ok, a quick recap:

好，快速回顾一下：

We have a table with ratings that each user gave each movie. If a user didn’t watch the movie, the table says ‘0’. We want to predict the zeros.
我们有一个表格，其中列出了每个用户给每部电影的评分。如果用户没有看电影，则表将显示为“ 0”。我们要预测零。

To do so, we built two matrices, one for the users, and one for the movies. Each matrix is basically just a stack of e-dimensional vectors.
为此，我们建立了两个矩阵，一个用于用户，一个用于电影。每个矩阵基本上只是e维向量的堆栈。

To predict ratings, we multiply the matrices together to get another matrix that’s the same shape as the table has our predictions in it. Initially, the table has only gibberish.
为了预测收视率，我们将矩阵相乘得到另一个形状与表中的预测相同的矩阵。最初，该表只有乱码。

But after using loss functions to find our mistakes, and employing the dynamic duo of backpropagation and gradient descent, we now have a model that can accurately predict what rating a user would give to a movie. Sweet.
但是，在使用损失函数发现错误并采用反向传播和梯度下降的动态二重奏之后，我们现在有了一个模型，可以准确地预测用户对电影的评价。甜。

好的...但是为什么行得通呢？ (Ok… but why does it work?)

Now if you’re like me, you get it. But you don’t really get it. How do these random multiplications read minds? Why can’t we just backprop the original table and fill up the zeros? Why go through the elaborate scheme of cooking up two separate matrices and then rebuild the table? Why? Why? Why? Patience, young grasshopper. All is as the force wills it.

现在，如果您像我，就可以理解。但是，你真的不明白这一点。这些随机乘法如何读心术？为什么我们不能只反向支持原始表并填充零？为什么要经过精心设计的方案，即准备两个单独的矩阵，然后重建表？为什么？为什么？为什么？耐心，年轻的蚱hopper。一切都如势力所愿。

Remember how I said that that I was going to address the ‘e’ mystery? Well, now I am.

还记得我曾经说过要解决“ e ”之谜的说法吗？好吧，我现在。

Recall that the matrices we constructed were essentially stacks of vectors. One vector per user, and one vector per movie. This was not a meaningless decision.

回想一下，我们构建的矩阵本质上是向量的堆栈。每个用户一个向量，每个电影一个向量。这不是一个毫无意义的决定。

Each vector is a representation of what kind of person the corresponding user is. It condenses your likes and dislikes, your thoughts and feelings, your hopes and fears, into a numpy.array[] .

每个向量代表相应用户是什么样的人。它numpy.array[]您的好恶，您的思想和感受，您的希望和恐惧numpy.array[]成一个numpy.array[] 。

To understand this better, let’s zoom into a particular user vector, assuming that e=3:

为了更好地理解这一点，让我们放大一个特定的用户向量，假设e = 3：

Here, the three components of the vector are[100, 0, 50] . Each component represents some characteristic of the user, which the machine learns from looking at his/her previous rating.

在此，向量的三个分量是[100, 0, 50] 。每个组件代表用户的某些特征，机器可以通过查看其先前的评分来学习。

Suppose (and this is not really accurate, it’s just an analogy) that the three components have the following meaning:

假设这三个组件具有以下含义(这并不是很准确，只是一个类比)：

Hopefully, you can get a sense of how the vector represents the idea of the user’s preferences.

希望您能对矢量如何表示用户偏好的想法有所了解。

So in the example above, our good friend u apparently loves action movies, isn’t big on romance movies, and likes comedy movies too, but not as much as action movies.

因此，在上面的示例中，我们的好朋友u显然热爱动作电影，对浪漫电影的兴趣不大，也喜欢喜剧电影，但不像动作电影那么多。

This is how our machine learning model comprehends human complexity — by embedding it in an e dimensional vector space.

这就是我们的机器学习模型将人的复杂性嵌入到e维向量空间中的方式。

So e is nothing but a little number that we choose (called a hyper-parameter). The bigger it is, the more nuanced information we can capture about our users. But make it too big, and computation will take too long.

因此， e只是我们选择的少量数字(称为超参数)。它越大，我们就可以捕获有关用户的细微差别信息。但是，将其设置得太大，将花费很长时间。

But wait. It’s get’s cooler. Take a look at a movie vector:

可是等等。变凉了。看电影矢量：

And now, analyze the (human-interpreted) meaning of the components:

现在，分析这些组件的(人类解释的)含义：

Our blockbuster, m, seems to be a primarily a romance movie, with a fair dose of comedy sprinkled on top. And we know all this without even watching the movie or reading a single review ourselves!

我们的大片m似乎主要是一部浪漫电影，上面还撒了很多喜剧片。我们知道这一切，甚至都没有看电影或自己阅读任何评论！

By looking at what types of users gave high and low ratings to movies, the algorithm can now build vectors that represent the essence of what a movie is like.

通过查看哪些类型的用户对电影进行了高低评价，该算法现在可以构建代表电影本质的向量。

For the grand finale, consider how we might use this information. We have a user, u and a movie, m. Both are vectors. How do we predict what rating u might give to m? We use the .

对于总决赛，请考虑我们如何使用此信息。我们有一个用户u和一部电影m 。两者都是向量。我们如何预测u对m的评级？我们使用。

The dot product is what you get when you multiply the components of one vector with the components of another, and add up the results. The result is a scalar (a regular, no-strings-attached, good ol’ fashion real number).

点积是将一个向量的分量与另一向量的分量相乘并相加所得结果。结果是一个标量(一个常规的，无字符串的，良好的时尚实数)。

So for our case, the dot product of u and m will be:

因此，对于我们来说，u和m的点积为：

A measly 1350. Well, everything’s relative. But we would have gotten a considerably larger number had we not been multiplying two of the components by 0.

仅1350。恩，一切都是相对的。但是如果不将两个分量乘以0，我们将得到一个更大的数字。

It’s pretty clear that it would be a bad idea to recommend m to u. A terrible idea, in fact.

很明显，向您推荐m是个坏主意。实际上，这是一个糟糕的主意。

我们可以使模型更好 (We Can Make Our Model Even Better)

To get the actual rating prediction, we squish the scalar value through a scaled sigmoid function, that bounds the result between 0 and 5.

为了获得实际的收视率预测，我们通过缩放的S型函数压缩标量值，该函数将结果限制在0到5之间。

If you’re a little concerned about all the hand-wavy tricks we’re doing, rest assured, the computer can figure it all out.

如果您有点担心我们正在执行的所有手动操作，请放心，计算机可以完全解决这些问题。

In fact, we’re just making its job easier, by doing things like explicitly telling it that all ratings must be greater bigger than 0 and less than 5.

实际上，我们通过做类似明确告诉它所有等级必须大于0且小于5的事情来使它的工作更轻松。

Here’s another trick — before squishing our scalar value (called an activation) into the sigmoid function, we can add a little number called the bias, b. There will be two biases, one for each user and one for each movie.

这是另一个技巧-在将标量值(称为激活)压入S型函数之前，我们可以添加一个小数，称为偏差b 。会有两个偏差，每个用户一个，每个电影一个。

Stacking these together, we get a bias vector for the all users (together), and a bias vector for all movies (together). The biases account for some movies being universally loved/hated and some users loving/hating movies in general.

将这些堆叠在一起，我们得到所有用户的偏差向量(一起)，以及所有电影的偏差向量(一起)。这些偏见导致某些电影普遍受到喜爱/讨厌，而某些用户则普遍喜爱/讨厌电影。

And with that, I present to you the equation that can control your life (or at least, your online shopping/viewing habits):

因此，我向您介绍了可以控制您的生活(或至少是您的在线购物/观看习惯)的方程式：

这是什么意思？ (What does any of this mean?)

To me, the craziest part about all this is that we are talking about human concepts. Action, romance, comedy, likes, dislikes. All of them are human ideas. And to think that they could all be communicated in a mathematical object is truly fascinating.

对我而言，所有这一切中最疯狂的部分是我们在谈论人类概念。动作，浪漫，喜剧，喜欢，不喜欢。所有这些都是人类的想法。并且认为它们可以在数学对象中进行交流真是令人着迷。

Now I know that it’s all just really clearly defined algorithms and human data. But I think there’s still something incredible in the fact that matrix multiplications can teach computers about who we are as individuals.

现在，我知道所有这些都是非常明确定义的算法和人类数据。但是我认为矩阵乘法仍然可以教计算机关于我们作为个人的身份这一事实仍然令人难以置信。

After all, despite all the things that make us different — what we like, what we look like, who we spend time with, where we are, how we think, how we interact, and how we feel — to the machines that determine what we buy, what we watch, who we talk to, what we do, where we spend our time, and where we don’t, we are all elements of the same linear vector space.

毕竟，尽管存在所有使我们与众不同的事物-我们喜欢什么，我们看起来什么样，我们与谁共度时光，我们在哪里，我们如何思考，我们如何互动以及我们如何感觉-决定机器的机器我们购买，观察，与谁交谈，做什么，在哪里消磨时光，不在哪里消磨时，我们都是同一线性向量空间的元素。

There’s beauty in that.

那有美。