Bug 6089 - pytorch 框架安装后缺少pybind 库,系统yum源提供的版本编译apex时出现编译错误
Summary: pytorch 框架安装后缺少pybind 库,系统yum源提供的版本编译apex时出现编译错误
Status: RESOLVED FIXED
Alias: None
Product: Anolis OS 23
Classification: Anolis OS
Component: BaseOS Packages (show other bugs) BaseOS Packages
Version: 23.0
Hardware: All Linux
: P2-High S2-major
Target Milestone: ---
Assignee: xuchunmei
QA Contact: bolong_tbl
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-03 19:31 UTC by feitian200603
Modified: 2023-09-05 11:10 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description feitian200603 alibaba_cloud_group 2023-08-03 19:31:03 UTC
pytorch 基于ubuntu的官方容器镜像中,pytorch 自带pybind库:
/opt/conda/lib/python3.10/site-packages/torch/include/pybind11
但是系统rpm包方式安装的pytorch后缺少该库,需要额外手动安装python3-pybind11
Comment 1 xuchunmei alibaba_cloud_group 2023-08-08 15:38:56 UTC
pytorch构建时依赖的是系统的pybind11,所以如果需要pybind11,可以直接安装系统提供的。
安装方式:
dnf install python3-pybind11
Comment 2 feitian200603 alibaba_cloud_group 2023-08-21 10:26:08 UTC
系统yum源中的pybind11,编译过程中出现问题:
/usr/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/usr/include/pybind11/cast.h:45:120: error: expected template-name before ‘<’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/usr/include/pybind11/cast.h:45:120: error: expected identifier before ‘<’ token
/usr/include/pybind11/cast.h:45:123: error: expected primary-expression before ‘>’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/usr/include/pybind11/cast.h:45:126: error: expected primary-expression before ‘)’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
Comment 3 xuchunmei alibaba_cloud_group 2023-08-25 14:21:37 UTC
(In reply to feitian200603 from comment #2)
> 系统yum源中的pybind11,编译过程中出现问题:
> /usr/include/pybind11/cast.h: In function ‘typename
> pybind11::detail::type_caster<typename
> pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T>
> pybind11::detail::cast_op(make_caster<T>&)’:
> /usr/include/pybind11/cast.h:45:120: error: expected template-name before
> ‘<’ token
>    45 |     return caster.operator typename make_caster<T>::template
> cast_op_type<T>();
>       |                                                                     
> ^
> /usr/include/pybind11/cast.h:45:120: error: expected identifier before ‘<’
> token
> /usr/include/pybind11/cast.h:45:123: error: expected primary-expression
> before ‘>’ token
>    45 |     return caster.operator typename make_caster<T>::template
> cast_op_type<T>();
>       |                                                                     
> ^
> /usr/include/pybind11/cast.h:45:126: error: expected primary-expression
> before ‘)’ token
>    45 |     return caster.operator typename make_caster<T>::template
> cast_op_type<T>();

编译什么出现报错,请描述清楚问题复现步骤。
Comment 4 feitian200603 alibaba_cloud_group 2023-09-04 14:48:54 UTC
复现步骤如下:
yum install pbzip2  bzip2  iputils gcc mpich nvidia-modprobe
wget https://github.com/NVIDIA/apex/archive/refs/tags/22.03.tar.gz
tar -zxvf 22.03.tar.gz
cd apex-22.03 && pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Comment 5 xuchunmei alibaba_cloud_group 2023-09-05 11:10:51 UTC
(In reply to feitian200603 from comment #4)
> 复现步骤如下:
> yum install pbzip2  bzip2  iputils gcc mpich nvidia-modprobe
> wget https://github.com/NVIDIA/apex/archive/refs/tags/22.03.tar.gz
> tar -zxvf 22.03.tar.gz
> cd apex-22.03 && pip install -v --disable-pip-version-check --no-cache-dir
> --no-build-isolation --global-option="--cpp_ext"
> --global-option="--cuda_ext" ./

anolis23上pytorch-devel已更新,默认包含pybind11的头文件。
按照如下步骤可以正常构建apex:
wget https://github.com/NVIDIA/apex/archive/refs/tags/22.03.tar.gz
tar -zxvf 22.03.tar.gz
python3 setup.py build
python3 setup.py bdist_wheel