Sparse vector and Dense vector
date
Dec 21, 2021
slug
10060
status
Published
tags
Math.NET
summary
type
Post
密集向量 (Dense Vector)就是一个普通的double数组,例如:向量
(1,0,1,3)
用密集格式表示为[1,0,1,3]
而稀疏向量 (Sparse Vector)由两个并列的 数组indices和values组成,上面向量的value数组为
(1, 1, 3)
,只取非零数值;indices数组为(0, 2, 3)
表示向量0
的位置的值是1
,2
的位置的值是1
,而3
的位置的值是3
,其他的位置都是0,即为{(0, 2, 3), (1, 1, 3)}
Conceptually it is the same. Just a vector.
The data structure behind it is different tho. Being sparse means that it won’t explicitly contains each coordinate. I’ll explain.
Consider a dimensional vector
You sometimes know that your vector will have a lot of ui=0ui=0 value. Then you may want, to avoid memory wasting, to store values that are not 0, and then, and consider, other values as zero. This is hugely useful when one-hot is used.
Usually sparse vector are represented by a tuple (id, value) such as:
if
;
otherwise (if
i
is not in id
)From a dev point of view, getting sparse vector from dense vector is like doing:
sparse_vec = {“id”: [], “values”: []}
d = len(dense_vec)
for i in range(0, d):
if d[i] != 0:
sparse_vec["id"].append(i)
sparse_vec["values"].append(d[i])
And for exemple a dense vector
(1, 2, 0, 0, 5, 0, 9, 0, 0)
will be represented as {(0,1,4,6), (1, 2, 5, 9)}