Bug 8029 - [ModelScope][容器镜像]默认未安装libnccl库,导致通义实验室20个官方模型无法直接运行
Summary: [ModelScope][容器镜像]默认未安装libnccl库,导致通义实验室20个官方模型无法直接运行
Status: RESOLVED FIXED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: Images&Installations (show other bugs) Images&Installations
Version: 8.8
Hardware: All Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: zhongling
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-24 10:17 UTC by bolong_tbl
Modified: 2024-02-21 10:10 UTC (History)
0 users

See Also:


Attachments
失败模型1-10 (95.14 KB, image/png)
2024-01-24 10:17 UTC, bolong_tbl
Details
失败模型11-20 (93.96 KB, image/png)
2024-01-24 10:19 UTC, bolong_tbl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bolong_tbl alibaba_cloud_group 2024-01-24 10:17:03 UTC
Created attachment 982 [details]
失败模型1-10

Description of problem:
镜像默认未安装libnccl库,导致通义实验室20个官方模型无法直接运行,ModelScope社区Ubuntu镜像默认安装了libnccl无该问题。

影响模型列表见附件,包括text-to-video-synthesis等模型

Version-Release number of selected component (if applicable):
镜像地址:registry.openanolis.cn/openanolis/modelscope:1.10.0-an8

Steps to Reproduce:
1. 启动容器docker run -it registry.openanolis.cn/openanolis/modelscope:1.10.0-an8
2. 根据ModelScope官网,找到模型对应的代码范例(code example),以text-to-video-synthesis为例:

from modelscope.pipelines import pipeline
from modelscope.outputs import OutputKeys

p = pipeline('text-to-video-synthesis', 'damo/text-to-video-synthesis')
test_text = {
        'text': 'A panda eating bamboo on a rock.',
    }
output_video_path = p(test_text, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
print('output_video_path:', output_video_path)

3. 执行代码范例

Actual results:

执行发生报错
/lib/python3.8/site-packages/tensorflow_core/__init__.py", line 28, in 
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow_core/python/__init__.py", line 49, in 
    from tensorflow.python import pywrap_tensorflow
  File "/opt/conda/lib/python3.8/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 74, in 
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/tensorflow_core/python/pywrap_tensorflow.py", line 58, in 
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/opt/conda/lib/python3.8/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 28, in 
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/opt/conda/lib/python3.8/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/opt/conda/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/opt/conda/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libnccl.so.2: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.


Expected results:
执行不报错

Additional info:
ModelScope社区Ubuntu镜像默认安装了libnccl

root@6610060b5bc0:/# apt list |grep libnccl

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libnccl-dev/now 2.15.5-1+cuda11.8 amd64 [已安装,本地]
libnccl2/now 2.15.5-1+cuda11.8 amd64 [已安装,本地]
Comment 1 bolong_tbl alibaba_cloud_group 2024-01-24 10:19:17 UTC
Created attachment 983 [details]
失败模型11-20
Comment 2 zhongling 2024-02-21 10:10:25 UTC
通过默认安装libnccl解决了该问题