Description of problem: 镜像默认使用的tensorboard疑似缺少依赖,导致modelscope社区自带用例执行失败。 Version-Release number of selected component (if applicable): 容器镜像信息: [root@localhost modelscope]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.openanolis.cn/openanolis/modelscope 1.10.0-an8 736efcaf9a7c 5 weeks ago 25.3GB How reproducible: 以gpu的方式创建并启动容器 docker create --gpus all -it -v /tmp:/tmp 736efcaf9a7c 进入容器后clone modelscope社区代码 git clone https://github.com/modelscope/modelscope.git cd modelscope 执行测试用例: [root@1a186991ac23 modelscope]# python3 tests/trainers/test_translation_evaluation_trainer.py 2024-01-25 22:52:38,805 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found. 2024-01-25 22:52:38,809 - modelscope - INFO - TensorFlow version 2.9.2 Found. 2024-01-25 22:52:38,809 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2024-01-25 22:52:38,850 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 da292646adab6334a8f0cd8a272bf9b1 and a total number of 946 components indexed 2024-01-25 22:52:45,061 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:52:45,942 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 /opt/conda/lib/python3.8/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:326: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`. (Deprecated NumPy 1.24) np.bool8: (False, True), /opt/conda/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:205: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`. (Deprecated NumPy 1.24) np.bool8: (False, True), <frozen importlib._bootstrap>:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 96 from PyObject 2024-01-25 22:52:47,942 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:52:48,687 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 E2024-01-25 22:52:49,647 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:52:50,525 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:52:51,175 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:52:51,861 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 E ====================================================================== ERROR: test_run_with_unite_mup_base (__main__.TranslationEvaluationTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/opt/conda/lib/python3.8/site-packages/tensorboard/compat/__init__.py) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "tests/trainers/test_translation_evaluation_trainer.py", line 37, in test_run_with_unite_mup_base trainer = build_trainer(name=self.name, default_args=default_args) File "/opt/conda/lib/python3.8/site-packages/modelscope/trainers/builder.py", line 39, in build_trainer return build_from_cfg(cfg, TRAINERS, default_args=default_args) File "/opt/conda/lib/python3.8/site-packages/modelscope/utils/registry.py", line 184, in build_from_cfg LazyImportModule.import_module(sig) File "/opt/conda/lib/python3.8/site-packages/modelscope/utils/import_utils.py", line 475, in import_module importlib.import_module(module_name) File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 671, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 843, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/opt/conda/lib/python3.8/site-packages/modelscope/trainers/nlp/translation_evaluation_trainer.py", line 17, in <module> from torch.utils.tensorboard import SummaryWriter File "/opt/conda/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py", line 12, in <module> from .writer import FileWriter, SummaryWriter # noqa: F401 File "/opt/conda/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 16, in <module> from ._embedding import ( File "/opt/conda/lib/python3.8/site-packages/torch/utils/tensorboard/_embedding.py", line 9, in <module> _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join") File "/opt/conda/lib/python3.8/site-packages/tensorboard/lazy.py", line 65, in __getattr__ return getattr(load_once(self), attr_name) File "/opt/conda/lib/python3.8/site-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/opt/conda/lib/python3.8/site-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/opt/conda/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 45, in tf import tensorflow File "/opt/conda/lib/python3.8/site-packages/tensorflow/__init__.py", line 37, in <module> from tensorflow.python.tools import module_util as _module_util File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 45, in <module> from tensorflow.python.feature_column import feature_column_lib as feature_column File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module> from tensorflow.python.feature_column.feature_column import * File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module> from tensorflow.python.layers import base File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/layers/base.py", line 16, in <module> from tensorflow.python.keras.legacy_tf_layers import base File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module> from tensorflow.python.keras import models File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/models.py", line 22, in <module> from tensorflow.python.keras.engine import functional File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module> from tensorflow.python.keras.engine import training as training_lib File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 52, in <module> from tensorflow.python.keras.saving import hdf5_format File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module> import h5py File "/opt/conda/lib/python3.8/site-packages/h5py/__init__.py", line 46, in <module> from ._conv import register_converters as _register_converters File "h5py/h5t.pxd", line 14, in init h5py._conv File "h5py/h5t.pyx", line 293, in init h5py.h5t File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 320, in __getattr__ raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'typeDict' ====================================================================== ERROR: test_run_with_unite_mup_large (__main__.TranslationEvaluationTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/opt/conda/lib/python3.8/site-packages/tensorboard/compat/__init__.py) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "tests/trainers/test_translation_evaluation_trainer.py", line 31, in test_run_with_unite_mup_large trainer = build_trainer(name=self.name, default_args=default_args) File "/opt/conda/lib/python3.8/site-packages/modelscope/trainers/builder.py", line 39, in build_trainer return build_from_cfg(cfg, TRAINERS, default_args=default_args) File "/opt/conda/lib/python3.8/site-packages/modelscope/utils/registry.py", line 184, in build_from_cfg LazyImportModule.import_module(sig) File "/opt/conda/lib/python3.8/site-packages/modelscope/utils/import_utils.py", line 475, in import_module importlib.import_module(module_name) File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 671, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 843, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/opt/conda/lib/python3.8/site-packages/modelscope/trainers/nlp/translation_evaluation_trainer.py", line 17, in <module> from torch.utils.tensorboard import SummaryWriter File "/opt/conda/lib/python3.8/site-packages/torch/utils/tensorboard/__init__.py", line 12, in <module> from .writer import FileWriter, SummaryWriter # noqa: F401 File "/opt/conda/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 16, in <module> from ._embedding import ( File "/opt/conda/lib/python3.8/site-packages/torch/utils/tensorboard/_embedding.py", line 9, in <module> _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join") File "/opt/conda/lib/python3.8/site-packages/tensorboard/lazy.py", line 65, in __getattr__ return getattr(load_once(self), attr_name) File "/opt/conda/lib/python3.8/site-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/opt/conda/lib/python3.8/site-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/opt/conda/lib/python3.8/site-packages/tensorboard/compat/__init__.py", line 45, in tf import tensorflow File "/opt/conda/lib/python3.8/site-packages/tensorflow/__init__.py", line 37, in <module> from tensorflow.python.tools import module_util as _module_util File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 45, in <module> from tensorflow.python.feature_column import feature_column_lib as feature_column File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module> from tensorflow.python.feature_column.feature_column import * File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module> from tensorflow.python.layers import base File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/layers/base.py", line 16, in <module> from tensorflow.python.keras.legacy_tf_layers import base File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module> from tensorflow.python.keras import models File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/models.py", line 22, in <module> from tensorflow.python.keras.engine import functional File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module> from tensorflow.python.keras.engine import training as training_lib File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 52, in <module> from tensorflow.python.keras.saving import hdf5_format File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module> import h5py File "/opt/conda/lib/python3.8/site-packages/h5py/__init__.py", line 46, in <module> from ._conv import register_converters as _register_converters File "h5py/h5t.pxd", line 14, in init h5py._conv File "h5py/h5t.pyx", line 293, in init h5py.h5t File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 320, in __getattr__ raise AttributeError("module {!r} has no attribute " AttributeError: module 'numpy' has no attribute 'typeDict' ---------------------------------------------------------------------- Ran 2 tests in 7.831s FAILED (errors=2) Steps to Reproduce: 如上 Actual results: 测试失败 Expected results: 测试通过 Additional info: 对比modelscope 官方ubuntu镜像: root@ee8906738fa2:/tmp/modelscope# python3 tests/trainers/test_translation_evaluation_trainer.py 2024-01-25 22:53:33,569 - modelscope - INFO - PyTorch version 2.1.0+cu118 Found. 2024-01-25 22:53:33,571 - modelscope - INFO - TensorFlow version 2.14.0 Found. 2024-01-25 22:53:33,571 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer 2024-01-25 22:53:33,616 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 44f0b88effe82ceea94a98cf99709694 and a total number of 946 components indexed 2024-01-25 22:53:41,046 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:53:41,904 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 /opt/conda/lib/python3.10/site-packages/tensorflow/__init__.py:29: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives import distutils as _distutils 2024-01-25 22:53:42.491129: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-01-25 22:53:42.491173: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-01-25 22:53:42.491216: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-01-25 22:53:42.501214: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. /opt/conda/lib/python3.10/site-packages/tensorflow/python/framework/dtypes.py:35: DeprecationWarning: ml_dtypes.float8_e4m3b11 is deprecated. Use ml_dtypes.float8_e4m3b11fnuz from tensorflow.tsl.python.lib.core import pywrap_ml_dtypes 2024-01-25 22:53:43.462065: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-01-25 22:53:44,670 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:53:45,076 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_unite_mup_translation_evaluation_multilingual_base 2024-01-25 22:53:53,190 - modelscope - INFO - ==========================Training Config Start========================== 2024-01-25 22:53:53,191 - modelscope - INFO - { "framework": "pytorch", "task": "translation-evaluation", "pipeline": { "type": "translation-evaluation" }, "preprocessor": { "type": "translation-evaluation-preprocessor", "max_len": 510, "pad_token_id": 1, "eos_token_id": 2 }, "model": { "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "unite", "num_attention_heads": 12, "num_hidden_layers": 12, "output_past": true, "pad_token_id": 1, "type_vocab_size": 1, "use_cache": true, "vocab_size": 250002, "mlp_hidden_sizes": [ 3072, 1024 ], "mlp_act": "tanh", "mlp_final_act": null, "mlp_dropout": 0.1, "type": "unite" }, "dataset": { "train": { "name": "train.csv", "split": "train" }, "valid": { "name": "eval.csv", "split": "eval" } }, "train": { "initialize_model_with_checkpoint": true, "num_gpus": 1, "batch_size": 2, "seed": 12, "optimizer": { "type": "AdamW", "plm_lr": 1e-05, "betas": [ 0.9, 0.98 ], "eps": 1e-09, "weight_decay": 0.0, "plm_lr_layerwise_decay": 0.95, "mlp_lr": 3e-05, "options": { "cumulative_iters": 4, "grad_clip": null } }, "lr_scheduler": { "type": "ConstantLR", "factor": 1.0, "total_iters": 3 }, "max_epochs": 3, "work_dir": "experiments_unite_base/", "hooks": [ { "type": "TensorboardHook", "interval": 1 }, { "type": "IterTimerHook" } ], "logging": { "interval": 1 }, "checkpoint": { "best": { "metric_key": "src-ref_avg", "rule": "max" }, "period": { "interval": 1 } } }, "evaluation": { "batch_size": 4, "save_outputs": true, "metrics": [ { "type": "translation-evaluation-metric", "gap_threshold": 25.0 } ], "period": { "interval": 1 } } } 2024-01-25 22:53:53,191 - modelscope - INFO - ===========================Training Config End=========================== 2024-01-25 22:53:53,191 - modelscope - INFO - Building dataloader for training ... 2024-01-25 22:53:53,191 - modelscope - INFO - Reading train csv file from train.csv ... /opt/conda/lib/python3.10/site-packages/datasets/load.py:2096: FutureWarning: 'ignore_verifications' was deprecated in favor of 'verification_mode' in version 2.9.1 and will be removed in 3.0.0. You can remove this warning by passing 'verification_mode=no_checks' instead. warnings.warn( 2024-01-25 22:53:54,475 - modelscope - INFO - 109 samples are given for training. Using 36 samples for each input format. Leaving the last 1 samples unused. 2024-01-25 22:53:54,476 - modelscope - INFO - Reading done, 109 items in total 2024-01-25 22:53:54,476 - modelscope - INFO - Building AdamW optimizer ... 2024-01-25 22:53:54,478 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'ConstantLR') not found in ast index file 2024-01-25 22:53:54,479 - modelscope - INFO - Stage: before_run: (ABOVE_NORMAL) OptimizerHook (LOW ) LrSchedulerHook (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: before_train_epoch: (LOW ) LrSchedulerHook -------------------- Stage: before_train_iter: (ABOVE_NORMAL) OptimizerHook -------------------- Stage: after_train_iter: (ABOVE_NORMAL) OptimizerHook (NORMAL ) EvaluationHook (LOW ) LrSchedulerHook (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: after_train_epoch: (NORMAL ) EvaluationHook (LOW ) LrSchedulerHook (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: after_val_epoch: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: after_run: (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TensorboardHook -------------------- 2024-01-25 22:53:54,495 - modelscope - INFO - Checkpoints will be saved to experiments_unite_base/ 2024-01-25 22:53:54,509 - modelscope - INFO - Checkpoints will be saved to experiments_unite_base/ 2024-01-25 22:53:54,509 - modelscope - INFO - Text logs will be saved to experiments_unite_base/ 2024-01-25 22:53:54,509 - modelscope - INFO - tensorboard files will be saved to experiments_unite_base/tensorboard_output 2024-01-25 22:53:57,830 - modelscope - INFO - epoch [1][1/18] lr: 1.000e-05, eta: 0:02:52, iter_time: 3.246, data_load_time: 2.394, memory: 1639, loss: 1.5867 2024-01-25 22:53:58,017 - modelscope - INFO - epoch [1][2/18] lr: 1.000e-05, eta: 0:01:28, iter_time: 0.173, data_load_time: 0.077, memory: 2315, loss: 0.4219 2024-01-25 22:53:58,131 - modelscope - INFO - epoch [1][3/18] lr: 1.000e-05, eta: 0:01:00, iter_time: 0.144, data_load_time: 0.089, memory: 2315, loss: 1.2569 2024-01-25 22:53:58,306 - modelscope - INFO - epoch [1][4/18] lr: 1.000e-05, eta: 0:00:46, iter_time: 0.147, data_load_time: 0.061, memory: 2410, loss: 0.9410 2024-01-25 22:53:58,539 - modelscope - INFO - epoch [1][5/18] lr: 1.000e-05, eta: 0:00:38, iter_time: 0.185, data_load_time: 0.087, memory: 3746, loss: 1.7758 2024-01-25 22:53:58,648 - modelscope - INFO - epoch [1][6/18] lr: 1.000e-05, eta: 0:00:32, iter_time: 0.189, data_load_time: 0.134, memory: 3746, loss: 1.5786 2024-01-25 22:53:58,832 - modelscope - INFO - epoch [1][7/18] lr: 1.000e-05, eta: 0:00:28, iter_time: 0.137, data_load_time: 0.056, memory: 3746, loss: 1.5095 2024-01-25 22:53:58,972 - modelscope - INFO - epoch [1][8/18] lr: 1.000e-05, eta: 0:00:25, iter_time: 0.162, data_load_time: 0.101, memory: 3746, loss: 1.4335 2024-01-25 22:53:59,087 - modelscope - INFO - epoch [1][9/18] lr: 1.000e-05, eta: 0:00:22, iter_time: 0.134, data_load_time: 0.079, memory: 3746, loss: 0.2827 2024-01-25 22:53:59,195 - modelscope - INFO - epoch [1][10/18] lr: 1.000e-05, eta: 0:00:20, iter_time: 0.114, data_load_time: 0.061, memory: 3746, loss: 0.7218 2024-01-25 22:53:59,294 - modelscope - INFO - epoch [1][11/18] lr: 1.000e-05, eta: 0:00:18, iter_time: 0.104, data_load_time: 0.055, memory: 3746, loss: 1.3647 2024-01-25 22:53:59,465 - modelscope - INFO - epoch [1][12/18] lr: 1.000e-05, eta: 0:00:16, iter_time: 0.122, data_load_time: 0.049, memory: 3746, loss: 1.1365 2024-01-25 22:53:59,551 - modelscope - INFO - epoch [1][13/18] lr: 1.000e-05, eta: 0:00:15, iter_time: 0.143, data_load_time: 0.099, memory: 3746, loss: 1.1352 2024-01-25 22:53:59,719 - modelscope - INFO - epoch [1][14/18] lr: 1.000e-05, eta: 0:00:14, iter_time: 0.118, data_load_time: 0.043, memory: 3746, loss: 0.3775 2024-01-25 22:53:59,865 - modelscope - INFO - epoch [1][15/18] lr: 1.000e-05, eta: 0:00:13, iter_time: 0.158, data_load_time: 0.091, memory: 3746, loss: 0.9682 2024-01-25 22:54:00,055 - modelscope - INFO - epoch [1][16/18] lr: 1.000e-05, eta: 0:00:12, iter_time: 0.160, data_load_time: 0.080, memory: 3746, loss: 0.8070 2024-01-25 22:54:00,176 - modelscope - INFO - epoch [1][17/18] lr: 1.000e-05, eta: 0:00:12, iter_time: 0.165, data_load_time: 0.110, memory: 3746, loss: 2.2128 2024-01-25 22:54:00,270 - modelscope - INFO - epoch [1][18/18] lr: 1.000e-05, eta: 0:00:11, iter_time: 0.114, data_load_time: 0.066, memory: 3746, loss: 5.4629 2024-01-25 22:54:00,330 - modelscope - INFO - Building dataloader for evaluating ... 2024-01-25 22:54:00,331 - modelscope - INFO - Reading eval csv file from eval.csv ... 2024-01-25 22:54:01,364 - modelscope - INFO - Reading done, 120 items in total Total test samples: 100%|█████████████████████████████████████| 120/120 [00:04<00:00, 29.08it/s] 2024-01-25 22:54:05,495 - modelscope - INFO - Evaluation results for src-ref input format 2024-01-25 22:54:05,508 - modelscope - INFO - zh-en: 33.061224 2024-01-25 22:54:05,508 - modelscope - INFO - Average evaluation result for src-ref input format: 0.330612 2024-01-25 22:54:05,508 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:03<00:00, 34.11it/s] 2024-01-25 22:54:09,029 - modelscope - INFO - Evaluation results for src input format 2024-01-25 22:54:09,040 - modelscope - INFO - zh-en: 40.408163 2024-01-25 22:54:09,041 - modelscope - INFO - Average evaluation result for src input format: 0.404082 2024-01-25 22:54:09,041 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:03<00:00, 33.15it/s] 2024-01-25 22:54:12,664 - modelscope - INFO - Evaluation results for ref input format 2024-01-25 22:54:12,675 - modelscope - INFO - zh-en: 7.755102 2024-01-25 22:54:12,675 - modelscope - INFO - Average evaluation result for ref input format: 0.077551 2024-01-25 22:54:12,675 - modelscope - INFO - 2024-01-25 22:54:12,676 - modelscope - INFO - Saving checkpoint at 1 epoch 2024-01-25 22:54:17,963 - modelscope - INFO - Saving checkpoint at 1 epoch 2024-01-25 22:54:23,228 - modelscope - INFO - epoch(eval) [1][30] memory: 3746, evaluation/src-ref_avg: 0.3306, evaluation/src-ref_zh-en: 0.3306, evaluation/src_avg: 0.4041, evaluation/src_zh-en: 0.4041, evaluation/ref_avg: 0.0776, evaluation/ref_zh-en: 0.0776 2024-01-25 22:54:25,752 - modelscope - INFO - epoch [2][1/18] lr: 1.000e-05, eta: 0:00:15, iter_time: 2.463, data_load_time: 2.399, memory: 3746, loss: 2.6457 2024-01-25 22:54:25,987 - modelscope - INFO - epoch [2][2/18] lr: 1.000e-05, eta: 0:00:14, iter_time: 0.165, data_load_time: 0.062, memory: 3746, loss: 0.8232 2024-01-25 22:54:26,108 - modelscope - INFO - epoch [2][3/18] lr: 1.000e-05, eta: 0:00:13, iter_time: 0.187, data_load_time: 0.132, memory: 3746, loss: 0.6192 2024-01-25 22:54:26,217 - modelscope - INFO - epoch [2][4/18] lr: 1.000e-05, eta: 0:00:12, iter_time: 0.121, data_load_time: 0.067, memory: 3746, loss: 0.1971 2024-01-25 22:54:26,324 - modelscope - INFO - epoch [2][5/18] lr: 1.000e-05, eta: 0:00:11, iter_time: 0.106, data_load_time: 0.053, memory: 3746, loss: 1.7651 2024-01-25 22:54:26,458 - modelscope - INFO - epoch [2][6/18] lr: 1.000e-05, eta: 0:00:11, iter_time: 0.113, data_load_time: 0.053, memory: 3746, loss: 0.4991 2024-01-25 22:54:26,694 - modelscope - INFO - epoch [2][7/18] lr: 1.000e-05, eta: 0:00:10, iter_time: 0.174, data_load_time: 0.075, memory: 3747, loss: 1.0642 2024-01-25 22:54:26,820 - modelscope - INFO - epoch [2][8/18] lr: 1.000e-05, eta: 0:00:09, iter_time: 0.195, data_load_time: 0.136, memory: 3747, loss: 0.3571 2024-01-25 22:54:26,969 - modelscope - INFO - epoch [2][9/18] lr: 1.000e-05, eta: 0:00:09, iter_time: 0.134, data_load_time: 0.066, memory: 3747, loss: 0.9691 2024-01-25 22:54:27,085 - modelscope - INFO - epoch [2][10/18] lr: 1.000e-05, eta: 0:00:08, iter_time: 0.134, data_load_time: 0.082, memory: 3747, loss: 0.4898 2024-01-25 22:54:27,175 - modelscope - INFO - epoch [2][11/18] lr: 1.000e-05, eta: 0:00:08, iter_time: 0.109, data_load_time: 0.063, memory: 3747, loss: 5.4889 2024-01-25 22:54:27,283 - modelscope - INFO - epoch [2][12/18] lr: 1.000e-05, eta: 0:00:07, iter_time: 0.098, data_load_time: 0.044, memory: 3747, loss: 1.4372 2024-01-25 22:54:27,432 - modelscope - INFO - epoch [2][13/18] lr: 1.000e-05, eta: 0:00:07, iter_time: 0.121, data_load_time: 0.054, memory: 3747, loss: 0.5515 2024-01-25 22:54:27,603 - modelscope - INFO - epoch [2][14/18] lr: 1.000e-05, eta: 0:00:06, iter_time: 0.154, data_load_time: 0.082, memory: 3747, loss: 1.0365 2024-01-25 22:54:27,719 - modelscope - INFO - epoch [2][15/18] lr: 1.000e-05, eta: 0:00:06, iter_time: 0.153, data_load_time: 0.099, memory: 3747, loss: 0.4552 2024-01-25 22:54:27,823 - modelscope - INFO - epoch [2][16/18] lr: 1.000e-05, eta: 0:00:06, iter_time: 0.114, data_load_time: 0.061, memory: 3747, loss: 1.9504 2024-01-25 22:54:28,067 - modelscope - INFO - epoch [2][17/18] lr: 1.000e-05, eta: 0:00:05, iter_time: 0.155, data_load_time: 0.052, memory: 4064, loss: 0.7299 2024-01-25 22:54:28,256 - modelscope - INFO - epoch [2][18/18] lr: 1.000e-05, eta: 0:00:05, iter_time: 0.221, data_load_time: 0.141, memory: 4064, loss: 0.1739 2024-01-25 22:54:28,326 - modelscope - INFO - Building dataloader for evaluating ... 2024-01-25 22:54:28,326 - modelscope - INFO - Reading done, 120 items in total Total test samples: 100%|█████████████████████████████████████| 120/120 [00:04<00:00, 29.07it/s] 2024-01-25 22:54:32,458 - modelscope - INFO - Evaluation results for src-ref input format 2024-01-25 22:54:32,470 - modelscope - INFO - zh-en: 33.061224 2024-01-25 22:54:32,470 - modelscope - INFO - Average evaluation result for src-ref input format: 0.330612 2024-01-25 22:54:32,470 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:03<00:00, 34.04it/s] 2024-01-25 22:54:35,998 - modelscope - INFO - Evaluation results for src input format 2024-01-25 22:54:36,010 - modelscope - INFO - zh-en: 39.591837 2024-01-25 22:54:36,010 - modelscope - INFO - Average evaluation result for src input format: 0.395918 2024-01-25 22:54:36,010 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:03<00:00, 33.36it/s] 2024-01-25 22:54:39,610 - modelscope - INFO - Evaluation results for ref input format 2024-01-25 22:54:39,621 - modelscope - INFO - zh-en: 23.265306 2024-01-25 22:54:39,621 - modelscope - INFO - Average evaluation result for ref input format: 0.232653 2024-01-25 22:54:39,621 - modelscope - INFO - 2024-01-25 22:54:39,622 - modelscope - INFO - Saving checkpoint at 2 epoch 2024-01-25 22:54:44,904 - modelscope - INFO - epoch(eval) [2][30] memory: 4064, evaluation/src-ref_avg: 0.3306, evaluation/src-ref_zh-en: 0.3306, evaluation/src_avg: 0.3959, evaluation/src_zh-en: 0.3959, evaluation/ref_avg: 0.2327, evaluation/ref_zh-en: 0.2327 2024-01-25 22:54:47,449 - modelscope - INFO - epoch [3][1/18] lr: 1.000e-05, eta: 0:00:06, iter_time: 2.480, data_load_time: 2.415, memory: 4064, loss: 0.9880 2024-01-25 22:54:47,664 - modelscope - INFO - epoch [3][2/18] lr: 1.000e-05, eta: 0:00:05, iter_time: 0.160, data_load_time: 0.066, memory: 4064, loss: 0.7404 2024-01-25 22:54:47,763 - modelscope - INFO - epoch [3][3/18] lr: 1.000e-05, eta: 0:00:05, iter_time: 0.169, data_load_time: 0.119, memory: 4064, loss: 1.2241 2024-01-25 22:54:47,915 - modelscope - INFO - epoch [3][4/18] lr: 1.000e-05, eta: 0:00:04, iter_time: 0.116, data_load_time: 0.049, memory: 4064, loss: 0.4311 2024-01-25 22:54:48,003 - modelscope - INFO - epoch [3][5/18] lr: 1.000e-05, eta: 0:00:04, iter_time: 0.130, data_load_time: 0.085, memory: 4064, loss: 2.2497 2024-01-25 22:54:48,116 - modelscope - INFO - epoch [3][6/18] lr: 1.000e-05, eta: 0:00:03, iter_time: 0.099, data_load_time: 0.044, memory: 4064, loss: 1.6233 2024-01-25 22:54:48,208 - modelscope - INFO - epoch [3][7/18] lr: 1.000e-05, eta: 0:00:03, iter_time: 0.105, data_load_time: 0.056, memory: 4064, loss: 1.0620 2024-01-25 22:54:48,380 - modelscope - INFO - epoch [3][8/18] lr: 1.000e-05, eta: 0:00:03, iter_time: 0.116, data_load_time: 0.043, memory: 4064, loss: 0.7042 2024-01-25 22:54:48,483 - modelscope - INFO - epoch [3][9/18] lr: 1.000e-05, eta: 0:00:02, iter_time: 0.150, data_load_time: 0.100, memory: 4064, loss: 0.7766 2024-01-25 22:54:48,616 - modelscope - INFO - epoch [3][10/18] lr: 1.000e-05, eta: 0:00:02, iter_time: 0.113, data_load_time: 0.052, memory: 4064, loss: 0.8106 2024-01-25 22:54:48,715 - modelscope - INFO - epoch [3][11/18] lr: 1.000e-05, eta: 0:00:02, iter_time: 0.122, data_load_time: 0.073, memory: 4064, loss: 1.2490 2024-01-25 22:54:48,851 - modelscope - INFO - epoch [3][12/18] lr: 1.000e-05, eta: 0:00:01, iter_time: 0.109, data_load_time: 0.049, memory: 4064, loss: 1.4198 2024-01-25 22:54:48,980 - modelscope - INFO - epoch [3][13/18] lr: 1.000e-05, eta: 0:00:01, iter_time: 0.135, data_load_time: 0.076, memory: 4064, loss: 1.2823 2024-01-25 22:54:49,148 - modelscope - INFO - epoch [3][14/18] lr: 1.000e-05, eta: 0:00:01, iter_time: 0.146, data_load_time: 0.071, memory: 4064, loss: 0.8499 2024-01-25 22:54:49,392 - modelscope - INFO - epoch [3][15/18] lr: 1.000e-05, eta: 0:00:00, iter_time: 0.196, data_load_time: 0.093, memory: 4086, loss: 0.8231 2024-01-25 22:54:49,566 - modelscope - INFO - epoch [3][16/18] lr: 1.000e-05, eta: 0:00:00, iter_time: 0.213, data_load_time: 0.140, memory: 4086, loss: 1.3057 2024-01-25 22:54:49,658 - modelscope - INFO - epoch [3][17/18] lr: 1.000e-05, eta: 0:00:00, iter_time: 0.146, data_load_time: 0.101, memory: 4086, loss: 0.7469 2024-01-25 22:54:49,824 - modelscope - INFO - epoch [3][18/18] lr: 1.000e-05, eta: 0:00:00, iter_time: 0.123, data_load_time: 0.046, memory: 4086, loss: 0.7546 2024-01-25 22:54:49,892 - modelscope - INFO - Building dataloader for evaluating ... 2024-01-25 22:54:49,892 - modelscope - INFO - Reading done, 120 items in total Total test samples: 100%|█████████████████████████████████████| 120/120 [00:04<00:00, 29.18it/s] 2024-01-25 22:54:54,008 - modelscope - INFO - Evaluation results for src-ref input format 2024-01-25 22:54:54,020 - modelscope - INFO - zh-en: 11.020408 2024-01-25 22:54:54,020 - modelscope - INFO - Average evaluation result for src-ref input format: 0.110204 2024-01-25 22:54:54,020 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:03<00:00, 33.83it/s] 2024-01-25 22:54:57,570 - modelscope - INFO - Evaluation results for src input format 2024-01-25 22:54:57,581 - modelscope - INFO - zh-en: 18.367347 2024-01-25 22:54:57,582 - modelscope - INFO - Average evaluation result for src input format: 0.183673 2024-01-25 22:54:57,582 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:03<00:00, 33.40it/s] 2024-01-25 22:55:01,177 - modelscope - INFO - Evaluation results for ref input format 2024-01-25 22:55:01,189 - modelscope - INFO - zh-en: 3.673469 2024-01-25 22:55:01,189 - modelscope - INFO - Average evaluation result for ref input format: 0.036735 2024-01-25 22:55:01,189 - modelscope - INFO - 2024-01-25 22:55:01,189 - modelscope - INFO - Saving checkpoint at 3 epoch 2024-01-25 22:55:06,473 - modelscope - INFO - epoch(eval) [3][30] memory: 4086, evaluation/src-ref_avg: 0.1102, evaluation/src-ref_zh-en: 0.1102, evaluation/src_avg: 0.1837, evaluation/src_zh-en: 0.1837, evaluation/ref_avg: 0.0367, evaluation/ref_zh-en: 0.0367 2024-01-25 22:55:06,474 - modelscope - INFO - Train finished. Uploading models, waiting... 2024-01-25 22:55:06,557 - modelscope - INFO - {'done': True} 2024-01-25 22:55:06,967 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:55:08,777 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 .2024-01-25 22:55:09,731 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:55:10,548 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:55:11,271 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:55:11,563 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_unite_mup_translation_evaluation_multilingual_large 2024-01-25 22:55:21,075 - modelscope - INFO - ==========================Training Config Start========================== 2024-01-25 22:55:21,076 - modelscope - INFO - { "framework": "pytorch", "task": "translation-evaluation", "pipeline": { "type": "translation-evaluation" }, "preprocessor": { "type": "translation-evaluation-preprocessor", "max_len": 510, "pad_token_id": 1, "eos_token_id": 2 }, "model": { "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "unite", "num_attention_heads": 16, "num_hidden_layers": 24, "output_past": true, "pad_token_id": 1, "type_vocab_size": 1, "use_cache": true, "vocab_size": 250002, "mlp_hidden_sizes": [ 3072, 1024 ], "mlp_act": "tanh", "mlp_final_act": null, "mlp_dropout": 0.1, "type": "unite" }, "dataset": { "train": { "name": "train.csv", "split": "train" }, "valid": { "name": "eval.csv", "split": "eval" } }, "train": { "initialize_model_with_checkpoint": true, "num_gpus": 1, "batch_size": 2, "seed": 12, "optimizer": { "type": "AdamW", "plm_lr": 1e-05, "betas": [ 0.9, 0.98 ], "eps": 1e-09, "weight_decay": 0.0, "plm_lr_layerwise_decay": 0.95, "mlp_lr": 3e-05, "options": { "cumulative_iters": 4, "grad_clip": null } }, "lr_scheduler": { "type": "ConstantLR", "factor": 1.0, "total_iters": 3 }, "max_epochs": 3, "work_dir": "experiments_unite_large/", "hooks": [ { "type": "TensorboardHook", "interval": 1 }, { "type": "IterTimerHook" } ], "logging": { "interval": 1 }, "checkpoint": { "best": { "metric_key": "src-ref_avg", "rule": "max" }, "period": { "interval": 1 } } }, "evaluation": { "batch_size": 4, "save_outputs": true, "metrics": [ { "type": "translation-evaluation-metric", "gap_threshold": 25.0 } ], "period": { "interval": 1 } } } 2024-01-25 22:55:21,076 - modelscope - INFO - ===========================Training Config End=========================== 2024-01-25 22:55:21,076 - modelscope - INFO - Building dataloader for training ... 2024-01-25 22:55:21,076 - modelscope - INFO - Reading train csv file from train.csv ... /opt/conda/lib/python3.10/site-packages/datasets/load.py:2096: FutureWarning: 'ignore_verifications' was deprecated in favor of 'verification_mode' in version 2.9.1 and will be removed in 3.0.0. You can remove this warning by passing 'verification_mode=no_checks' instead. warnings.warn( 2024-01-25 22:55:22,253 - modelscope - INFO - 109 samples are given for training. Using 36 samples for each input format. Leaving the last 1 samples unused. 2024-01-25 22:55:22,254 - modelscope - INFO - Reading done, 109 items in total 2024-01-25 22:55:22,254 - modelscope - INFO - Building AdamW optimizer ... 2024-01-25 22:55:22,259 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'ConstantLR') not found in ast index file 2024-01-25 22:55:22,260 - modelscope - INFO - Stage: before_run: (ABOVE_NORMAL) OptimizerHook (LOW ) LrSchedulerHook (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: before_train_epoch: (LOW ) LrSchedulerHook -------------------- Stage: before_train_iter: (ABOVE_NORMAL) OptimizerHook -------------------- Stage: after_train_iter: (ABOVE_NORMAL) OptimizerHook (NORMAL ) EvaluationHook (LOW ) LrSchedulerHook (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: after_train_epoch: (NORMAL ) EvaluationHook (LOW ) LrSchedulerHook (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: after_val_epoch: (VERY_LOW ) TextLoggerHook (VERY_LOW ) TensorboardHook -------------------- Stage: after_run: (LOW ) BestCkptSaverHook (LOW ) CheckpointHook (VERY_LOW ) TensorboardHook -------------------- 2024-01-25 22:55:22,277 - modelscope - INFO - Checkpoints will be saved to experiments_unite_large/ 2024-01-25 22:55:22,291 - modelscope - INFO - Checkpoints will be saved to experiments_unite_large/ 2024-01-25 22:55:22,291 - modelscope - INFO - Text logs will be saved to experiments_unite_large/ 2024-01-25 22:55:22,292 - modelscope - INFO - tensorboard files will be saved to experiments_unite_large/tensorboard_output 2024-01-25 22:55:25,202 - modelscope - INFO - epoch [1][1/18] lr: 1.000e-05, eta: 0:02:26, iter_time: 2.769, data_load_time: 2.557, memory: 5773, loss: 0.2842 2024-01-25 22:55:25,492 - modelscope - INFO - epoch [1][2/18] lr: 1.000e-05, eta: 0:01:20, iter_time: 0.316, data_load_time: 0.140, memory: 6946, loss: 1.8496 2024-01-25 22:55:25,789 - modelscope - INFO - epoch [1][3/18] lr: 1.000e-05, eta: 0:00:57, iter_time: 0.286, data_load_time: 0.117, memory: 7067, loss: 0.9591 2024-01-25 22:55:26,447 - modelscope - INFO - epoch [1][4/18] lr: 1.000e-05, eta: 0:00:50, iter_time: 0.670, data_load_time: 0.125, memory: 8401, loss: 0.1568 2024-01-25 22:55:26,852 - modelscope - INFO - epoch [1][5/18] lr: 1.000e-05, eta: 0:00:42, iter_time: 0.329, data_load_time: 0.113, memory: 8782, loss: 1.4333 2024-01-25 22:55:27,424 - modelscope - INFO - epoch [1][6/18] lr: 1.000e-05, eta: 0:00:39, iter_time: 0.572, data_load_time: 0.189, memory: 10944, loss: 0.5080 2024-01-25 22:55:27,851 - modelscope - INFO - epoch [1][7/18] lr: 1.000e-05, eta: 0:00:36, iter_time: 0.459, data_load_time: 0.190, memory: 10944, loss: 1.4123 2024-01-25 22:55:28,422 - modelscope - INFO - epoch [1][8/18] lr: 1.000e-05, eta: 0:00:33, iter_time: 0.493, data_load_time: 0.157, memory: 10944, loss: 0.5202 2024-01-25 22:55:28,829 - modelscope - INFO - epoch [1][9/18] lr: 1.000e-05, eta: 0:00:31, iter_time: 0.449, data_load_time: 0.235, memory: 10944, loss: 0.7124 2024-01-25 22:55:29,125 - modelscope - INFO - epoch [1][10/18] lr: 1.000e-05, eta: 0:00:29, iter_time: 0.374, data_load_time: 0.193, memory: 10944, loss: 1.3857 2024-01-25 22:55:29,493 - modelscope - INFO - epoch [1][11/18] lr: 1.000e-05, eta: 0:00:27, iter_time: 0.347, data_load_time: 0.116, memory: 10944, loss: 1.0442 2024-01-25 22:55:29,958 - modelscope - INFO - epoch [1][12/18] lr: 1.000e-05, eta: 0:00:26, iter_time: 0.395, data_load_time: 0.136, memory: 10944, loss: 1.0455 2024-01-25 22:55:30,376 - modelscope - INFO - epoch [1][13/18] lr: 1.000e-05, eta: 0:00:24, iter_time: 0.426, data_load_time: 0.207, memory: 10944, loss: 1.3795 2024-01-25 22:55:30,891 - modelscope - INFO - epoch [1][14/18] lr: 1.000e-05, eta: 0:00:24, iter_time: 0.527, data_load_time: 0.199, memory: 10944, loss: 1.1123 2024-01-25 22:55:31,459 - modelscope - INFO - epoch [1][15/18] lr: 1.000e-05, eta: 0:00:23, iter_time: 0.565, data_load_time: 0.188, memory: 11147, loss: 0.3557 2024-01-25 22:55:31,946 - modelscope - INFO - epoch [1][16/18] lr: 1.000e-05, eta: 0:00:22, iter_time: 0.464, data_load_time: 0.190, memory: 11147, loss: 5.1896 2024-01-25 22:55:32,301 - modelscope - INFO - epoch [1][17/18] lr: 1.000e-05, eta: 0:00:21, iter_time: 0.399, data_load_time: 0.213, memory: 11147, loss: 0.7192 2024-01-25 22:55:32,712 - modelscope - INFO - epoch [1][18/18] lr: 1.000e-05, eta: 0:00:20, iter_time: 0.428, data_load_time: 0.169, memory: 11147, loss: 0.5781 2024-01-25 22:55:32,798 - modelscope - INFO - Building dataloader for evaluating ... 2024-01-25 22:55:32,798 - modelscope - INFO - Reading eval csv file from eval.csv ... Downloading data files: 100%|███████████████████████████████████| 1/1 [00:00<00:00, 9554.22it/s] Extracting data files: 100%|█████████████████████████████████████| 1/1 [00:00<00:00, 786.33it/s] Generating train split: 0 examples [00:00, ? examples/s]/opt/conda/lib/python3.10/site-packages/pandas/io/common.py:131: ResourceWarning: unclosed file <_io.BufferedReader name='/mnt/workspace/.cache/modelscope/damo/nlp_unite_mup_translation_evaluation_multilingual_large/eval.csv'> self.handle.detach() ResourceWarning: Enable tracemalloc to get the object allocation traceback Generating train split: 120 examples [00:00, 6942.10 examples/s] 2024-01-25 22:55:34,031 - modelscope - INFO - Reading done, 120 items in total Total test samples: 100%|█████████████████████████████████████| 120/120 [00:08<00:00, 14.92it/s] 2024-01-25 22:55:42,077 - modelscope - INFO - Evaluation results for src-ref input format 2024-01-25 22:55:42,089 - modelscope - INFO - zh-en: -31.428571 2024-01-25 22:55:42,089 - modelscope - INFO - Average evaluation result for src-ref input format: -0.314286 2024-01-25 22:55:42,089 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:06<00:00, 19.51it/s] 2024-01-25 22:55:48,243 - modelscope - INFO - Evaluation results for src input format 2024-01-25 22:55:48,255 - modelscope - INFO - zh-en: -15.918367 2024-01-25 22:55:48,255 - modelscope - INFO - Average evaluation result for src input format: -0.159184 2024-01-25 22:55:48,255 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:06<00:00, 19.27it/s] 2024-01-25 22:55:54,485 - modelscope - INFO - Evaluation results for ref input format 2024-01-25 22:55:54,497 - modelscope - INFO - zh-en: -24.081633 2024-01-25 22:55:54,497 - modelscope - INFO - Average evaluation result for ref input format: -0.240816 2024-01-25 22:55:54,497 - modelscope - INFO - 2024-01-25 22:55:54,498 - modelscope - INFO - Saving checkpoint at 1 epoch 2024-01-25 22:56:07,231 - modelscope - INFO - Saving checkpoint at 1 epoch 2024-01-25 22:56:19,765 - modelscope - INFO - epoch(eval) [1][30] memory: 11147, evaluation/src-ref_avg: -0.3143, evaluation/src-ref_zh-en: -0.3143, evaluation/src_avg: -0.1592, evaluation/src_zh-en: -0.1592, evaluation/ref_avg: -0.2408, evaluation/ref_zh-en: -0.2408 2024-01-25 22:56:22,925 - modelscope - INFO - epoch [2][1/18] lr: 1.000e-05, eta: 0:00:24, iter_time: 2.979, data_load_time: 2.565, memory: 11147, loss: 4.8808 2024-01-25 22:56:23,279 - modelscope - INFO - epoch [2][2/18] lr: 1.000e-05, eta: 0:00:23, iter_time: 0.362, data_load_time: 0.182, memory: 11147, loss: 1.4385 2024-01-25 22:56:23,639 - modelscope - INFO - epoch [2][3/18] lr: 1.000e-05, eta: 0:00:21, iter_time: 0.354, data_load_time: 0.177, memory: 11147, loss: 0.6582 2024-01-25 22:56:24,201 - modelscope - INFO - epoch [2][4/18] lr: 1.000e-05, eta: 0:00:21, iter_time: 0.555, data_load_time: 0.179, memory: 11147, loss: 1.1768 2024-01-25 22:56:24,578 - modelscope - INFO - epoch [2][5/18] lr: 1.000e-05, eta: 0:00:20, iter_time: 0.417, data_load_time: 0.185, memory: 11147, loss: 0.7451 2024-01-25 22:56:24,975 - modelscope - INFO - epoch [2][6/18] lr: 1.000e-05, eta: 0:00:19, iter_time: 0.343, data_load_time: 0.145, memory: 11147, loss: 1.2105 2024-01-25 22:56:25,385 - modelscope - INFO - epoch [2][7/18] lr: 1.000e-05, eta: 0:00:18, iter_time: 0.418, data_load_time: 0.199, memory: 11147, loss: 0.8132 2024-01-25 22:56:25,822 - modelscope - INFO - epoch [2][8/18] lr: 1.000e-05, eta: 0:00:17, iter_time: 0.467, data_load_time: 0.190, memory: 11147, loss: 0.3505 2024-01-25 22:56:26,120 - modelscope - INFO - epoch [2][9/18] lr: 1.000e-05, eta: 0:00:16, iter_time: 0.343, data_load_time: 0.161, memory: 11147, loss: 1.6900 2024-01-25 22:56:26,699 - modelscope - INFO - epoch [2][10/18] lr: 1.000e-05, eta: 0:00:15, iter_time: 0.457, data_load_time: 0.116, memory: 11147, loss: 0.7491 2024-01-25 22:56:27,104 - modelscope - INFO - epoch [2][11/18] lr: 1.000e-05, eta: 0:00:15, iter_time: 0.454, data_load_time: 0.239, memory: 11147, loss: 0.8758 2024-01-25 22:56:27,343 - modelscope - INFO - epoch [2][12/18] lr: 1.000e-05, eta: 0:00:14, iter_time: 0.337, data_load_time: 0.189, memory: 11147, loss: 0.4616 2024-01-25 22:56:27,907 - modelscope - INFO - epoch [2][13/18] lr: 1.000e-05, eta: 0:00:13, iter_time: 0.474, data_load_time: 0.091, memory: 11147, loss: 0.8847 2024-01-25 22:56:28,294 - modelscope - INFO - epoch [2][14/18] lr: 1.000e-05, eta: 0:00:12, iter_time: 0.373, data_load_time: 0.180, memory: 11147, loss: 0.1583 2024-01-25 22:56:28,651 - modelscope - INFO - epoch [2][15/18] lr: 1.000e-05, eta: 0:00:12, iter_time: 0.374, data_load_time: 0.195, memory: 11147, loss: 1.1917 2024-01-25 22:56:29,201 - modelscope - INFO - epoch [2][16/18] lr: 1.000e-05, eta: 0:00:11, iter_time: 0.545, data_load_time: 0.177, memory: 11147, loss: 0.6198 2024-01-25 22:56:29,501 - modelscope - INFO - epoch [2][17/18] lr: 1.000e-05, eta: 0:00:10, iter_time: 0.354, data_load_time: 0.182, memory: 11147, loss: 0.5833 2024-01-25 22:56:30,035 - modelscope - INFO - epoch [2][18/18] lr: 1.000e-05, eta: 0:00:10, iter_time: 0.437, data_load_time: 0.128, memory: 11147, loss: 0.5790 2024-01-25 22:56:30,122 - modelscope - INFO - Building dataloader for evaluating ... 2024-01-25 22:56:30,123 - modelscope - INFO - Reading done, 120 items in total Total test samples: 100%|█████████████████████████████████████| 120/120 [00:08<00:00, 14.96it/s] 2024-01-25 22:56:38,147 - modelscope - INFO - Evaluation results for src-ref input format 2024-01-25 22:56:38,159 - modelscope - INFO - zh-en: -23.265306 2024-01-25 22:56:38,159 - modelscope - INFO - Average evaluation result for src-ref input format: -0.232653 2024-01-25 22:56:38,159 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:06<00:00, 19.64it/s] 2024-01-25 22:56:44,271 - modelscope - INFO - Evaluation results for src input format 2024-01-25 22:56:44,282 - modelscope - INFO - zh-en: -33.061224 2024-01-25 22:56:44,283 - modelscope - INFO - Average evaluation result for src input format: -0.330612 2024-01-25 22:56:44,283 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:06<00:00, 19.26it/s] 2024-01-25 22:56:50,515 - modelscope - INFO - Evaluation results for ref input format 2024-01-25 22:56:50,526 - modelscope - INFO - zh-en: -23.265306 2024-01-25 22:56:50,527 - modelscope - INFO - Average evaluation result for ref input format: -0.232653 2024-01-25 22:56:50,527 - modelscope - INFO - 2024-01-25 22:56:50,527 - modelscope - INFO - Saving checkpoint at 2 epoch 2024-01-25 22:57:03,227 - modelscope - INFO - deleting checkpoint: experiments_unite_large/best_epoch1_src-ref_avg-0.3142857142857143 2024-01-25 22:57:03,822 - modelscope - INFO - Saving checkpoint at 2 epoch 2024-01-25 22:57:16,294 - modelscope - INFO - epoch(eval) [2][30] memory: 11147, evaluation/src-ref_avg: -0.2327, evaluation/src-ref_zh-en: -0.2327, evaluation/src_avg: -0.3306, evaluation/src_zh-en: -0.3306, evaluation/ref_avg: -0.2327, evaluation/ref_zh-en: -0.2327 2024-01-25 22:57:19,375 - modelscope - INFO - epoch [3][1/18] lr: 1.000e-05, eta: 0:00:10, iter_time: 2.887, data_load_time: 2.608, memory: 11147, loss: 0.5929 2024-01-25 22:57:19,759 - modelscope - INFO - epoch [3][2/18] lr: 1.000e-05, eta: 0:00:09, iter_time: 0.430, data_load_time: 0.195, memory: 11147, loss: 2.2666 2024-01-25 22:57:20,053 - modelscope - INFO - epoch [3][3/18] lr: 1.000e-05, eta: 0:00:09, iter_time: 0.322, data_load_time: 0.150, memory: 11147, loss: 0.3718 2024-01-25 22:57:20,445 - modelscope - INFO - epoch [3][4/18] lr: 1.000e-05, eta: 0:00:08, iter_time: 0.326, data_load_time: 0.119, memory: 11147, loss: 1.7617 2024-01-25 22:57:20,694 - modelscope - INFO - epoch [3][5/18] lr: 1.000e-05, eta: 0:00:07, iter_time: 0.313, data_load_time: 0.186, memory: 11147, loss: 1.5502 2024-01-25 22:57:20,935 - modelscope - INFO - epoch [3][6/18] lr: 1.000e-05, eta: 0:00:07, iter_time: 0.271, data_load_time: 0.122, memory: 11147, loss: 0.6524 2024-01-25 22:57:21,308 - modelscope - INFO - epoch [3][7/18] lr: 1.000e-05, eta: 0:00:06, iter_time: 0.323, data_load_time: 0.091, memory: 11147, loss: 1.2068 2024-01-25 22:57:21,885 - modelscope - INFO - epoch [3][8/18] lr: 1.000e-05, eta: 0:00:05, iter_time: 0.480, data_load_time: 0.141, memory: 11147, loss: 0.3999 2024-01-25 22:57:22,429 - modelscope - INFO - epoch [3][9/18] lr: 1.000e-05, eta: 0:00:05, iter_time: 0.558, data_load_time: 0.238, memory: 11147, loss: 1.3902 2024-01-25 22:57:22,944 - modelscope - INFO - epoch [3][10/18] lr: 1.000e-05, eta: 0:00:04, iter_time: 0.554, data_load_time: 0.224, memory: 11147, loss: 0.9272 2024-01-25 22:57:23,327 - modelscope - INFO - epoch [3][11/18] lr: 1.000e-05, eta: 0:00:04, iter_time: 0.423, data_load_time: 0.185, memory: 11147, loss: 0.6970 2024-01-25 22:57:23,814 - modelscope - INFO - epoch [3][12/18] lr: 1.000e-05, eta: 0:00:03, iter_time: 0.416, data_load_time: 0.145, memory: 11147, loss: 0.5413 2024-01-25 22:57:24,240 - modelscope - INFO - epoch [3][13/18] lr: 1.000e-05, eta: 0:00:02, iter_time: 0.442, data_load_time: 0.216, memory: 11147, loss: 0.8826 2024-01-25 22:57:24,708 - modelscope - INFO - epoch [3][14/18] lr: 1.000e-05, eta: 0:00:02, iter_time: 0.499, data_load_time: 0.200, memory: 11147, loss: 0.5340 2024-01-25 22:57:24,943 - modelscope - INFO - epoch [3][15/18] lr: 1.000e-05, eta: 0:00:01, iter_time: 0.311, data_load_time: 0.169, memory: 11147, loss: 0.6754 2024-01-25 22:57:25,618 - modelscope - INFO - epoch [3][16/18] lr: 1.000e-05, eta: 0:00:01, iter_time: 0.511, data_load_time: 0.091, memory: 11147, loss: 0.6638 2024-01-25 22:57:26,036 - modelscope - INFO - epoch [3][17/18] lr: 1.000e-05, eta: 0:00:00, iter_time: 0.481, data_load_time: 0.256, memory: 11147, loss: 1.0455 2024-01-25 22:57:26,480 - modelscope - INFO - epoch [3][18/18] lr: 1.000e-05, eta: 0:00:00, iter_time: 0.474, data_load_time: 0.192, memory: 11147, loss: 0.3829 2024-01-25 22:57:26,566 - modelscope - INFO - Building dataloader for evaluating ... 2024-01-25 22:57:26,567 - modelscope - INFO - Reading done, 120 items in total Total test samples: 100%|█████████████████████████████████████| 120/120 [00:08<00:00, 14.86it/s] 2024-01-25 22:57:34,646 - modelscope - INFO - Evaluation results for src-ref input format 2024-01-25 22:57:34,657 - modelscope - INFO - zh-en: -24.897959 2024-01-25 22:57:34,657 - modelscope - INFO - Average evaluation result for src-ref input format: -0.248980 2024-01-25 22:57:34,657 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:06<00:00, 19.50it/s] 2024-01-25 22:57:40,815 - modelscope - INFO - Evaluation results for src input format 2024-01-25 22:57:40,826 - modelscope - INFO - zh-en: -42.857143 2024-01-25 22:57:40,826 - modelscope - INFO - Average evaluation result for src input format: -0.428571 2024-01-25 22:57:40,827 - modelscope - INFO - Total test samples: 100%|█████████████████████████████████████| 120/120 [00:06<00:00, 19.13it/s] 2024-01-25 22:57:47,103 - modelscope - INFO - Evaluation results for ref input format 2024-01-25 22:57:47,114 - modelscope - INFO - zh-en: -24.897959 2024-01-25 22:57:47,114 - modelscope - INFO - Average evaluation result for ref input format: -0.248980 2024-01-25 22:57:47,115 - modelscope - INFO - 2024-01-25 22:57:47,115 - modelscope - INFO - Saving checkpoint at 3 epoch 2024-01-25 22:57:59,812 - modelscope - INFO - epoch(eval) [3][30] memory: 11147, evaluation/src-ref_avg: -0.2490, evaluation/src-ref_zh-en: -0.2490, evaluation/src_avg: -0.4286, evaluation/src_zh-en: -0.4286, evaluation/ref_avg: -0.2490, evaluation/ref_zh-en: -0.2490 2024-01-25 22:57:59,814 - modelscope - INFO - Train finished. Uploading models, waiting... 2024-01-25 22:57:59,861 - modelscope - INFO - {'done': True} 2024-01-25 22:58:00,930 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 2024-01-25 22:58:01,641 - modelscope - WARNING - Model revision not specified, use revision: v2.6.0 . ---------------------------------------------------------------------- Ran 2 tests in 263.994s OK