аватар question@mail.ru · 01.01.1970 03:00

What does the Random_state parameter mean at Sklea.manifold.tsne and other Scikit-Lea classes?

Put 3 different values ​​ random_state , this: (None, 0, 1) .

I did not understand what the essence of this method is. I read the documentation, the answer on the site, but I did not understand.

аватар answer@mail.ru · 01.01.1970 03:00

The essence of the random_state parameter (in all functions and methods from SciKit-Lea) is reproducible random values. I.e. if you explicitly set random_state to a value other than None - then the generated pseudorandom values will have the same values every time they are called.

Example:

In [1]: import numpy as npIn [2]: np.random.seed(31415)In [3]: np.random.randint(10, size=np.random.randint(10, size=(5,5))Out[3]:array([[7, 3, 5, 8, 2],       [6, 6, 3, 5, 6],       [0, 0, 8, 3, 6],       [1, 6, 8, 5, 1],       [4, 6, 9, 2, 7]])In [4]: np.random.seed(31415)In [5]: np.random.randint(10, size=int(10, size=(5,5))Out[5]:array([[7, 3, 5, 8, 2],       [6, 6, 3, 5, 6],       [0, 0, 8, 3, 6],       [1, 6, 8, 5, 1],       [4, 6, 9, 2, 7]])In [6]: np.random.seed(31415)In [7]: np.random.randint(10, size=(5,5))Out[7]:array([[7, 3, 5, 8, 2],       [6, 6, 3, 5, 6],       [0, 0, 8, 3, 6],       [1, 6, 8, 5, 1], [4, 6, 9, 2, 7]])

PS if you run this code on your computer - you will get the same values in the matrices.

Why is this necessary?

In machine leaing tasks and beyond, a pseudorandom number generator is often used to initialize various parameters, weights in neural networks, and randomly divide a data set into training and verification sets.

Accordingly, if we want to compare several methods or different sets of parameters, then for an honest comparison we need to use the same training and verification sets.

It can also be useful to create datasets in a random but reproducible way. For example, you have created several different computational methods and want to compare or verify their correctness. - to do this, you must use the same input data.


UPD: If you set the same value random_state, then the result t-SNE on the same input data will also be the same:

In [120]: from sklea.manifold import TSNEIn [121]: a = np.random.rand(1000, 50)In [122]: res1 = TSNE(n_components=2, random_state=123).span>).fit_transform(a)In [123]: res2 = TSNE(n_components=2, random_state=123123).fit_transform(a)In [124]: res1.sum()Out[124]: -205.98636Inlass="">205.98636In [125]: res2.sum()Out[125]: -205.98636In [126]: res1 == res2Out[126]:array([[ True,  True],       [ True,  True],       [ True,  True],       ...,       [ True,  True],       [ True,  True],       [ True,  True]])In [127]: (res1 == res2).an>,  True]])In [127]: (res1 == res2).all                                    

Latest

Similar