Upload
vuongminh
View
222
Download
1
Embed Size (px)
Citation preview
Word2vec for windows 10 64bit
Install python
Download python install package from
https://www.python.org/downloads/windows/
Install python with default settings
input `python`, if the windows output like this means the installation is successful
Install Anaconda
Download anaconda for windows 64bit from https://www.anaconda.com/download/
Install genism
Input `conda install gensim`
Training word vectors
Save the code as train.pyPS: Change the corpus directory path 1-billion-word-language-modeling-benchmark-r13output/toy
contains several tokenized txt filesclass gensim.models.word2vec.Word2Vec(sentences=None, size=100, alpha=0.025, window=5, min_count=5, max_vocab_size=None, sample=0.001, seed=1, workers=3, min_alpha=0.0001, sg=0, hs=0, negative=5, cbow_mean=1, hashfxn=<built-in function hash>, iter=5, null_word=0, trim_rule=None, sorted_vocab=1, batch_words=10000, compute_loss=False)
input `python train.py` and the program will save the word vectors as `wv.txt`
Check the word vectors
Input `python` and enter the interactive mode of pythonInput following code to load the word vectors in the `wv.txt`import genismmodel=genism.models.Word2Vec(‘wv.txt’)
input `model.most_similar(positive=[‘good’],topn=10)` to output the 10 most similar words of `good` in the word vector space.
input model.wv[‘good’] to retrieve the vector of `good`
Analogy
v(king)-v(man)=v(queen)-v(woman)
input model.most_similar(positive=['king','woman'],negative=['man'])
More information
https://radimrehurek.com/gensim/models/word2vec.html
Word2vec for Ubuntu 16.04 64bit
Install virtualenv
Open terminal and input `sudo apt-get install virtualenv`
New a virtual environment
Input `mkdir env` and input `virtualenv env`
Install genism
Input `sudo apt-get install python-dev`