1 Ubuntu 16.04 LTS安装
由于是作为tensorflow的训练机,所以稳定性考虑更重要,选择最新的LTS版本。由于Intel第七代CPU过于新,而Ubuntu对其支持不好,导致安装光盘根本无法启动安装,解决方案:
- 用安装U盘或者其他进入到GRUB启动选项
- 用上下键选择安装选项,然后按键盘的e,进入启动脚本编辑环境
- 找到quiet splash,替换为nomodeset
- 然后正常安装系统
- 完成后给系统设置固定IP地址
2 Samba服务器
安装samba的目的是为了便于文件交互,方面后面的操作和以后远程代码copy
2.1 Install Samba
apt get update
apt get install samba
2.2 Modify Config
vi /etc/samba/smb.confsecurity = user # add
[homes] # modifycomment = Home Directories # modifybrowseable = no # modifypath = /home/user/work # add, edit user to your user nameread only = no # modify
2.3 Add Samba passwd
pdbedit -a -u username
2.4 service restart
/etc/init.d/samba restart
3 ssh证书登录
3.1 Windows
3.1.1 key generate
- use puttygen generate rsa key, and public key
- save private key
- copy Pub key for openSSH, and save as authorized_keys
- use samba to send authorized_keys to server
3.1.2 server setup
cd ~
mkdir .ssh
cd .ssh
cat xxx/authorized_keys >> authorized_keys
3.1.3 putty client setup
- Session -> IP & Port: 192.168.1.27 & 22
- Connection -> Data -> user name: file
- Connection -> SSH -> Auth: load the private key
- Session -> Saved Sessions: save all those setting as XXX
3.2 Linux
3.2.1 key generate
ssh-keygen -t rsa -b 1024
ssh-keygen -y -f ~/.ssh/id_rsa >> keytext
sudo ssh-keygen -f "/home/user/.ssh/known_hosts" -R 192.168.1.89 # add server to trust list
3.2.2 server setup
cat keytext >> authorized_keys
3.2.3 client login
ssh user@192.168.1.88
3.3 More safe login setup of server
close password login port using ssh:
vim /etc/ssh/sshd_configPasswordAuthentication no ### change
4 安装python3下GPU版本的tensorflow
apt-get update
apt-get upgradeapt-get install python3-pip
pip3 install --upgrade pippip3 install pylint numpy scipy matplotlib
pip3 install tensorflow-gpu
5 安装CUDA(以root用户操作)
5.1 CUDA下载
- CUDA官方下载地址下载CUDA安装包
- CUDNN官方下载地址下载CNN安装包
5.2安装
5.2.1 创建系统默认显卡驱动nouveau的黑名单
vim /etc/modprobe.d/blacklist-nouveau.confblacklist nouveau
options nouveau modeset=0
5.2.2 重新生成initramfs内核
sudo update-initramfs -u
5.2.3 系统重启
reboot
5.2.4 ssh登录tensorflow PC,kill x-server进程
service lightdm stop
5.2.5 CUDA安装
进入CUDA下载目录,然后执行如下命令进行安装(我下载的文件名为cuda_8.0.61_375.26_linux.run)
chmod +x cuda_8.0.61_375.26_linux.run
./cuda_8.0.61_375.26_linux.run
安装的时候全部默认安装,除了OpenGL
5.2.6 CUDNN安装
进入CUDNN下载目录,然后执行如下命令安装(我的下载文件名为 cudnn-8.0-linux-x64-v5.1.solitairetheme8 )
tar -zxvf cudnn-8.0-linux-x64-v5.1.solitairetheme8cd cuda
mv ./lib64/* /usr/local/cuda/lib64/
mv include/cudnn.h /usr/local/cuda/include/cd /usr/local/cuda/lib64/
chmod +r libcudnn.so.5.1.10
ln -sf libcudnn.so.5.1.10 libcudnn.so.5
ln -sf libcudnn.so.5 libcudnn.so
5.2.7 环境配置
- 在/etc/profile中添加CUDA环境变量
vim /etc/profilePATH=/usr/local/cuda/bin:$PATH # add at the end of profile
export PATH # add at the end of profile
- 刷新环境变量
source /etc/profile
5.2.8 添加lib库路径
- 在/etc/ld.so.conf.d/中新建一个cuda.conf文件
vim /etc/ld.so.conf.d/cuda.conf/usr/local/cuda/lib64
/lib
- 更新系统lib文件的链接路径
ldconfig -v
5.2.9 确认安装
#!/usr/bin/python3import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a+b))
如果输出结果中有如下信息,则表示CUDA配置成功
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally