|国家预印本平台
首页|anonical Workflow for Machine Learning Tasks

anonical Workflow for Machine Learning Tasks

anonical Workflow for Machine Learning Tasks

中文摘要英文摘要

here is a huge gap between (1) the state of workflow technology on the one hand and the practices inthe many labs working with data driven methods on the other and (2) the awareness of the FAIR principlesand the lack of changes in practices during the last 5 years. The CWFR concept has been defined which ismeant to combine these two intentions, increasing the use of workflow technology and improving FAIRcompliance. In the study described in this paper we indicate how this could be applied to machine learningwhich is now used by almost all research disciplines with the well-known effects of a huge lack of repeatabilityand reproducibility.Researchers will only change practices if they can work efficiently and are not loaded with additionaltasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out todo machine learning on selected data collections and immediately create a comprehensive and FAIRcompliant documentation. The researcher is guided by such a framework and information once entered caneasily be shared and reused. The many iterations normally required in machine learning can be dealt withefficiently using CWFR methods.Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity todocument all actions and to exchange information between steps without the researcher needing tounderstand anything about PIDs and FDO details is probably the way to increase efficiency in repeatingresearch workflows. As the Galaxy project indicates, the availability of supporting tools will be important tolet researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessaryto include all steps necessary for doing a machine learning task including those that require human interactionand to document all phases with the help of structured FDOs.

here is a huge gap between (1) the state of workflow technology on the one hand and the practices inthe many labs working with data driven methods on the other and (2) the awareness of the FAIR principlesand the lack of changes in practices during the last 5 years. The CWFR concept has been defined which ismeant to combine these two intentions, increasing the use of workflow technology and improving FAIRcompliance. In the study described in this paper we indicate how this could be applied to machine learningwhich is now used by almost all research disciplines with the well-known effects of a huge lack of repeatabilityand reproducibility.Researchers will only change practices if they can work efficiently and are not loaded with additionaltasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out todo machine learning on selected data collections and immediately create a comprehensive and FAIRcompliant documentation. The researcher is guided by such a framework and information once entered caneasily be shared and reused. The many iterations normally required in machine learning can be dealt withefficiently using CWFR methods.Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity todocument all actions and to exchange information between steps without the researcher needing tounderstand anything about PIDs and FDO details is probably the way to increase efficiency in repeatingresearch workflows. As the Galaxy project indicates, the availability of supporting tools will be important tolet researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessaryto include all steps necessary for doing a machine learning task including those that require human interactionand to document all phases with the help of structured FDOs.

Binyam, Gebre、Peter, Wittenburg、Christophe, Blanchi

10.12074/202211.00447V1

计算技术、计算机技术

WorkflowMachine learningDigital objectsFAIRData management

WorkflowMachine learningDigital objectsFAIRData management

Binyam, Gebre,Peter, Wittenburg,Christophe, Blanchi.anonical Workflow for Machine Learning Tasks[EB/OL].(2022-11-28)[2025-08-03].https://chinaxiv.org/abs/202211.00447.点此复制

评论