Sparse vector and Dense vector

date

Dec 21, 2021

slug

10060

status

Published

tags

Math.NET

summary

type

Post

密集向量 (Dense Vector)就是一个普通的double数组，例如：向量(1,0,1,3)用密集格式表示为[1,0,1,3]

而稀疏向量（Sparse Vector）由两个并列的数组indices和values组成，上面向量的value数组为(1, 1, 3)，只取非零数值；indices数组为(0, 2, 3)表示向量0的位置的值是1，2的位置的值是1，而3的位置的值是3，其他的位置都是0，即为{(0, 2, 3), (1, 1, 3)}

Conceptually it is the same. Just a vector.

The data structure behind it is different tho. Being sparse means that it won’t explicitly contains each coordinate. I’ll explain.

Consider a dimensional vector

You sometimes know that your vector will have a lot of ui=0ui=0 value. Then you may want, to avoid memory wasting, to store values that are not 0, and then, and consider, other values as zero. This is hugely useful when one-hot is used.

Usually sparse vector are represented by a tuple (id, value) such as:

;

otherwise (if i is not in id)

From a dev point of view, getting sparse vector from dense vector is like doing:

sparse_vec = {“id”: [], “values”: []} 
d = len(dense_vec) 
for i in range(0, d): 
	if d[i] != 0: 
		sparse_vec["id"].append(i) 
		sparse_vec["values"].append(d[i])

And for exemple a dense vector (1, 2, 0, 0, 5, 0, 9, 0, 0) will be represented as {(0,1,4,6), (1, 2, 5, 9)}