2.4. Biclustering

Biclustering can be performed with the modulesklearn.cluster.bicluster. Biclustering algorithms simultaneouslycluster rows and columns of a data matrix. These clusters of rows andcolumns are known as biclusters. Each determines a submatrix of theoriginal data matrix with some desired properties.

For instance, given a matrix of shape (10, 10), one possible biclusterwith three rows and two columns induces a submatrix of shape (3, 2):

>>>

  1. >>> import numpy as np
  2. >>> data = np.arange(100).reshape(10, 10)
  3. >>> rows = np.array([0, 2, 3])[:, np.newaxis]
  4. >>> columns = np.array([1, 2])
  5. >>> data[rows, columns]
  6. array([[ 1, 2],
  7. [21, 22],
  8. [31, 32]])

For visualization purposes, given a bicluster, the rows and columns ofthe data matrix may be rearranged to make the bicluster contiguous.

Algorithms differ in how they define biclusters. Some of thecommon types include:

  • constant values, constant rows, or constant columns

  • unusually high or low values

  • submatrices with low variance

  • correlated rows or columns

Algorithms also differ in how rows and columns may be assigned tobiclusters, which leads to different bicluster structures. Blockdiagonal or checkerboard structures occur when rows and columns aredivided into partitions.

If each row and each column belongs to exactly one bicluster, thenrearranging the rows and columns of the data matrix reveals thebiclusters on the diagonal. Here is an example of this structurewhere biclusters have higher average values than the other rows andcolumns:

../_images/sphx_glr_plot_spectral_coclustering_0031.pngAn example of biclusters formed by partitioning rows and columns.

In the checkerboard case, each row belongs to all column clusters, andeach column belongs to all row clusters. Here is an example of thisstructure where the variance of the values within each bicluster issmall:

../_images/sphx_glr_plot_spectral_biclustering_0031.pngAn example of checkerboard biclusters.

After fitting a model, row and column cluster membership can be foundin the rows and columns attributes. rows[i] is a binary vectorwith nonzero entries corresponding to rows that belong to biclusteri. Similarly, columns[i] indicates which columns belong tobicluster i.

Some models also have rowlabels and columnlabels attributes.These models partition the rows and columns, such as in the blockdiagonal and checkerboard bicluster structures.

Note

Biclustering has many other names in different fields includingco-clustering, two-mode clustering, two-way clustering, blockclustering, coupled two-way clustering, etc. The names of somealgorithms, such as the Spectral Co-Clustering algorithm, reflectthese alternate names.

2.4.1. Spectral Co-Clustering

The SpectralCoclustering algorithm finds biclusters withvalues higher than those in the corresponding other rows and columns.Each row and each column belongs to exactly one bicluster, sorearranging the rows and columns to make partitions contiguous revealsthese high values along the diagonal:

Note

The algorithm treats the input data matrix as a bipartite graph: therows and columns of the matrix correspond to the two sets of vertices,and each entry corresponds to an edge between a row and a column. Thealgorithm approximates the normalized cut of this graph to find heavysubgraphs.

2.4.1.1. Mathematical formulation

An approximate solution to the optimal normalized cut may be found viathe generalized eigenvalue decomposition of the Laplacian of thegraph. Usually this would mean working directly with the Laplacianmatrix. If the original data matrix

2.4. Biclustering - 图3 has shape2.4. Biclustering - 图4, the Laplacian matrix for the corresponding bipartite graphhas shape2.4. Biclustering - 图5. However, in this case it ispossible to work directly with2.4. Biclustering - 图6, which is smaller and moreefficient.

The input matrix

2.4. Biclustering - 图7 is preprocessed as follows:

2.4. Biclustering - 图8

Where

2.4. Biclustering - 图9 is the diagonal matrix with entry2.4. Biclustering - 图10 equal to2.4. Biclustering - 图11 and2.4. Biclustering - 图12 is the diagonal matrix withentry2.4. Biclustering - 图13 equal to2.4. Biclustering - 图14.

The singular value decomposition,

2.4. Biclustering - 图15,provides the partitions of the rows and columns of2.4. Biclustering - 图16. A subsetof the left singular vectors gives the row partitions, and a subsetof the right singular vectors gives the column partitions.

The

2.4. Biclustering - 图17 singular vectors, startingfrom the second, provide the desired partitioning information. Theyare used to form the matrix2.4. Biclustering - 图18:

2.4. Biclustering - 图19

where the columns of

2.4. Biclustering - 图20 are2.4. Biclustering - 图21, and similarly for2.4. Biclustering - 图22.

Then the rows of

2.4. Biclustering - 图23 are clustered using k-means. The first n_rows labels provide the row partitioning,and the remaining n_columns labels provide the column partitioning.

Examples:

References:

2.4.2. Spectral Biclustering

The SpectralBiclustering algorithm assumes that the inputdata matrix has a hidden checkerboard structure. The rows and columnsof a matrix with this structure may be partitioned so that the entriesof any bicluster in the Cartesian product of row clusters and columnclusters are approximately constant. For instance, if there are tworow partitions and three column partitions, each row will belong tothree biclusters, and each column will belong to two biclusters.

The algorithm partitions the rows and columns of a matrix so that acorresponding blockwise-constant checkerboard matrix provides a goodapproximation to the original matrix.

2.4.2.1. Mathematical formulation

The input matrix

2.4. Biclustering - 图24 is first normalized to make thecheckerboard pattern more obvious. There are three possible methods:

  • Independent row and column normalization, as in SpectralCo-Clustering. This method makes the rows sum to a constant and thecolumns sum to a different constant.

  • Bistochastization: repeated row and column normalization untilconvergence. This method makes both rows and columns sum to thesame constant.

  • Log normalization: the log of the data matrix is computed:

2.4. Biclustering - 图25. Then the column mean2.4. Biclustering - 图26, row mean2.4. Biclustering - 图27, and overall mean2.4. Biclustering - 图28 of2.4. Biclustering - 图29 are computed. The final matrix is computedaccording to the formula

2.4. Biclustering - 图30

After normalizing, the first few singular vectors are computed, justas in the Spectral Co-Clustering algorithm.

If log normalization was used, all the singular vectors aremeaningful. However, if independent normalization or bistochastizationwere used, the first singular vectors,

2.4. Biclustering - 图31 and2.4. Biclustering - 图32.are discarded. From now on, the “first” singular vectors refers to2.4. Biclustering - 图33 and2.4. Biclustering - 图34 except in thecase of log normalization.

Given these singular vectors, they are ranked according to which canbe best approximated by a piecewise-constant vector. Theapproximations for each vector are found using one-dimensional k-meansand scored using the Euclidean distance. Some subset of the best leftand right singular vector are selected. Next, the data is projected tothis best subset of singular vectors and clustered.

For instance, if

2.4. Biclustering - 图35 singular vectors were calculated, the2.4. Biclustering - 图36 best are found as described, where2.4. Biclustering - 图37. Let2.4. Biclustering - 图38 be the matrix with columns the2.4. Biclustering - 图39 best left singularvectors, and similarly2.4. Biclustering - 图40 for the right. To partition the rows,the rows of2.4. Biclustering - 图41 are projected to a2.4. Biclustering - 图42 dimensional space:2.4. Biclustering - 图43. Treating the2.4. Biclustering - 图44 rows of this2.4. Biclustering - 图45matrix as samples and clustering using k-means yields the row labels.Similarly, projecting the columns to2.4. Biclustering - 图46 andclustering this2.4. Biclustering - 图47 matrix yields the column labels.

Examples:

References:

2.4.3. Biclustering evaluation

There are two ways of evaluating a biclustering result: internal andexternal. Internal measures, such as cluster stability, rely only onthe data and the result themselves. Currently there are no internalbicluster measures in scikit-learn. External measures refer to anexternal source of information, such as the true solution. Whenworking with real data the true solution is usually unknown, butbiclustering artificial data may be useful for evaluating algorithmsprecisely because the true solution is known.

To compare a set of found biclusters to the set of true biclusters,two similarity measures are needed: a similarity measure forindividual biclusters, and a way to combine these individualsimilarities into an overall score.

To compare individual biclusters, several measures have been used. Fornow, only the Jaccard index is implemented:

2.4. Biclustering - 图48

where

2.4. Biclustering - 图49 and2.4. Biclustering - 图50 are biclusters,2.4. Biclustering - 图51 isthe number of elements in their intersection. The Jaccard indexachieves its minimum of 0 when the biclusters to not overlap at alland its maximum of 1 when they are identical.

Several methods have been developed to compare two sets of biclusters.For now, only consensus_score (Hochreiter et. al., 2010) isavailable:

  • Compute bicluster similarities for pairs of biclusters, one in eachset, using the Jaccard index or a similar measure.

  • Assign biclusters from one set to another in a one-to-one fashionto maximize the sum of their similarities. This step is performedusing the Hungarian algorithm.

  • The final sum of similarities is divided by the size of the largerset.

The minimum consensus score, 0, occurs when all pairs of biclustersare totally dissimilar. The maximum score, 1, occurs when both setsare identical.

References: