Sparse feature matrices require special dimensionality reduction techniques such as Truncated Singular Value Decomposition (Truncated SVD) as most of the values in the matrix are zero!
Sparse representation of a matrix
A feature matrix refers to a matrix with all input features and is typically represented by the variable, X. It is the training dataset that we use for training the model. When most of the values in the feature matrix are zero, it is often represented as a sparse matrix to save memory and computing time.
The following matrix has many zero elements.
We can convert it to a sparse matrix using the following code.
from scipy.sparse import csr_matrix
X_sparse = csr_matrix(X) # Where X refers to the above matrix (numpy array)
We get the following output.
In the sparse representation, only non-zero elements are stored in the format of (row, column) value. For example, (1, 0) 1 denotes the value 1 is in the 2nd row and the first column in the matrix (indices begin with zero).
TruncatedSVD()accepts this type of sparse matrices directly.
We can also compute the number of non-zero elements in the sparse matrix.
# This outputs 2
However, this sparse matrix still has the shape, (3, 3)!
# This outputs (3, 3)
What is truncated SVD?
As we have discussed previously, normal SVD is a type of matrix factorization method that comes from linear algebra.
In normal SVD, we decompose the input matrix into a product of three special matrices.
Just like normal SVD, truncated SVD is also a type of matrix factorization method used for linear dimensionality reduction in machine…