Truncated SVD for Dimensionality Reduction in Sparse Feature Matrices
Discussing how truncated SVD differs from normal SVD
Sparse feature matrices require special dimensionality reduction techniques such as Truncated Singular Value Decomposition (Truncated SVD) as most of the values in the matrix are zero!
Sparse representation of a matrix
A feature matrix refers to a matrix with all input features and is typically represented by the variable, X. It is the training dataset that we use for training the model. When most of the values in the feature matrix are zero, it is often represented as a sparse matrix to save memory and computing time.
The following matrix has many zero elements.
We can convert it to a sparse matrix using the following code.
from scipy.sparse import csr_matrix
X_sparse = csr_matrix(X) # Where X refers to the above matrix (numpy array)
print(X_sparse)
We get the following output.
In the sparse representation, only non-zero elements are stored in the format of (row, column) value. For example, (1, 0) 1 denotes the value 1 is in the 2nd row and the first column in the matrix (indices begin with zero).