Bug 5854 - sysom网页迁移实施过程中,任务迁移状态卡在“运行中”不更新,而待升级主机对应任务命令已结束运行
Summary: sysom网页迁移实施过程中,任务迁移状态卡在“运行中”不更新,而待升级主机对应任务命令已结束运行
Status: NEW
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: OsMigration (show other bugs) OsMigration
Version: 8.2
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: yunqi-zwt
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-12 21:05 UTC by camel
Modified: 2023-08-04 15:56 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description camel cmss_group 2023-07-12 21:05:28 UTC
Description of problem:
sysom网页中,迁移实施页面,在对某台待迁移机器进行迁移操作,或者批量迁移多台机器时,偶尔出现某项操作开始后,“迁移状态”卡在“运行中”很久不更新,而到待升级机器查看对应操作的命令已经结束运行,此时无法对该机器进行其他操作,只能从数据库中修改“迁移状态”字段。

Version-Release number of selected component (if applicable):
使用基于下面这个commit点的定制代码,未对任务下发和状态更新代码逻辑进行过修改:https://gitee.com/anolis/sysom/commit/0e2c489be1fdec87e8067c68fcff30e8c7c0cc62

后面关于这个问题,合入了两个更新:
https://gitee.com/anolis/sysom/commit/7fd2c54e451e0f878694af76d0dd00f391cd6dd2https://gitee.com/anolis/sysom/commit/546fb1753cb70fc041320820fc49dbf6831f5765

How reproducible:
特定环境下,概率性出现。

Steps to Reproduce:
1.向sysom添加待升级机器,进入“迁移实施”页面。
2.从“操作”下拉菜单执行迁移任务,或者批量迁移添加的机器。
3.观察“迁移状态”列是否正确更新。

Actual results:
某项迁移任务(如系统备份,风险评估任务)对应的命令在待升级机器已经执行结束很久,而sysom页面该机器的“迁移状态”还停留在“运行中”。

Expected results:
当某项迁移任务对应的命令在待升级机器已经执行结束,sysom页面该机器的“迁移状态”从“运行中”更新为“就绪中”或者“成功”的正常状态。

Additional info:
卡在 “运行中”的机器,对应任务在sysom.mig_job 表里的job_result 字段是空的。
Comment 1 camel cmss_group 2023-08-03 08:42:55 UTC
经与社区专家沟通定位,在日志中发现报错,任务结果写数据库时候,连接被重置了,导致没写进数据库,进一步导致 迁移状态没更新。临时方案把数据库连接数调大,具体调整方案:编辑/etc/my.cnf文件,输入:
[mysqld]
max_connections=2000
保存退出,然后重启mysql
systemctl restart mysqld
Comment 2 camel cmss_group 2023-08-03 09:04:14 UTC
(In reply to camel from comment #1)
> 经与社区专家沟通定位,在日志中发现报错,任务结果写数据库时候,连接被重置了,导致没写进数据库,进一步导致
> 迁移状态没更新。临时方案把数据库连接数调大,具体调整方案:编辑/etc/my.cnf文件,输入:
> [mysqld]
> max_connections=2000
> 保存退出,然后重启mysql
> systemctl restart mysqld

如上优化mysql最大连接数后,在当时环境中,初步使用未再次出现迁移状态未更新的问题。后续在其他sysom 部署环境下,仍会出现迁移状态未更新的问题。排查migration-error 日志文件,发现还是出现连接数据库失败的问题,如下:
1、下发环境准备yum install任务,在待迁移主机上yum命令执行成功结束后,报错如下:
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 756, in _write_bytes
    self._sock.sendall(data)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/cursors.py", line 148, in execute
    result = self._query(query)
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/cursors.py", line 310, in _query
    conn.query(q)
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 547, in query
    self._execute_command(COMMAND.COM_QUERY, sql)
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 814, in _execute_command
    self._write_bytes(packet)
  File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 760, in _write_bytes
    CR.CR_SERVER_GONE_ERROR, "MySQL server has gone away (%r)" % (e,)
pymysql.err.OperationalError: (2006, "MySQL server has gone away (ConnectionResetError(104, 'Connection reset by peer'))")

2、下发迁移评估leapp命令,leapp程序执行结束,可以获取到评估报告leapp-report.txt了,但连接数据库报错如下:
177464 [INFO] -- 2023-07-31 19:14:02 -- P_ 6773_T_140342677989120 - <channel:67>: host 10.170.113.101 get file /var/log/bclinux-sysmt/leapp-report.txt to        /tmp/migration/imp/10.170.113.101/mig_ass_report.log
177465 [INFO] -- 2023-07-31 19:14:02 -- P_ 6773_T_140342677989120 - <channel:68>: {'code': 0, 'err_msg': '', 'result': '', 'echo': {}, 'job_id': '', 'is_f       inished': True}
177466 [INFO] -- 2023-07-31 19:14:02 -- P_ 6773_T_140343228966656 - <channel:67>: host 10.170.113.100 get file /var/tmp/state.json to /tmp/migration/imp/1       0.170.113.100/mig_imp_rate.log
177467 [INFO] -- 2023-07-31 19:14:02 -- P_ 6773_T_140343228966656 - <channel:68>: {'code': 0, 'err_msg': '', 'result': '', 'echo': {}, 'job_id': '', 'is_f       inished': True}
177468 /usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/_auth.py:8: CryptographyDeprecationWarning: Python 3.6 is no longer suppor       ted by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
177469   from cryptography.hazmat.backends import default_backend
177470 [INFO] -- 2023-07-31 19:14:06 -- P_ 11450_T_140343521056576 - <apps:24>: >>> Migration module loading success
177471 Exception in thread Thread-1:
177472 Traceback (most recent call last):
177473   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 756, in _write_bytes
177474     self._sock.sendall(data)
177475 BrokenPipeError: [Errno 32] Broken pipe
177476
177477 During handling of the above exception, another exception occurred:
177478
177479 Traceback (most recent call last):
177480   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
177481     return self.cursor.execute(sql, params)
177482   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/django/db/backends/mysql/base.py", line 73, in execute
177483     return self.cursor.execute(query, args)
177484   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/cursors.py", line 148, in execute
177485     result = self._query(query)
177486   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/cursors.py", line 310, in _query
177487     conn.query(q)
177488   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 547, in query
177489     self._execute_command(COMMAND.COM_QUERY, sql)
177490   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 814, in _execute_command
177491     self._write_bytes(packet)
177492   File "/usr/local/sysom/server/virtualenv/lib64/python3.6/site-packages/pymysql/connections.py", line 760, in _write_bytes
177493     CR.CR_SERVER_GONE_ERROR, "MySQL server has gone away (%r)" % (e,)
……
Comment 3 wenlylinux alibaba_cloud_group 2023-08-04 15:56:43 UTC
MySQL server has gone away..这个问题应该是数据库链接超时导致的
可以按以下方案进行修复:

1.编辑/etc/my.cnf文件,输入:
[mysqld]
max_connections=2000
保存退出,然后重启mysql:
systemctl restart mysqld

2.打开sysom安装目录/sysom_server/sysom_migration/conf/commom.py文件
找到DATABASES,然后在default里新增一个CONN_MAX_AGE:3600,例如:
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'xx',
        'USER': 'xx',
        'PASSWORD': 'xx',
        'HOST': 'xx',
        'PORT': 'xx',
        'CONN_MAX_AGE': 3600,
    }
}
然后重启sysom_migration服务:
supervisorctl restart sysom-migration