Using a Workflow Management Platform in Textual Data Management
Using a Workflow Management Platform in Textual Data Management
he paper gives a brief introduction about the workflow management platform, Flowable, and how it isused for textual-data management. It is relatively new with its first release on 13 October, 2016. Despite theshort time on the market, it seems to be quickly well-noticed with 4.6 thousand stars on GitHub at themoment. The focus of our project is to build a platform for text analysis on a large scale by including manydifferent text resources. Currently, we have successfully connected to four different text resources andobtained more than one million works. Some resources are dynamic, which means that they might add moredata or modify their current data. Therefore, it is necessary to keep data, both the metadata and the raw data,from our side up to date with the resources. In addition, to comply with FAIR principles, each work isassigned a persistent identifier (PID) and indexed for searching purposes. In the last step, we perform somestandard analyses on the data to enhance our search engine and to generate a knowledge graph. End-userscan utilize our platform to search on our data or get access to the knowledge graph. Furthermore, they cansubmit their code for their analyses to the system. The code will be executed on a High-Performance Cluster(HPC) and users can receive the results later on. In this case, Flowable can take advantage of PIDs for digitalobjects identification and management to facilitate the communication with the HPC system. As one mayalready notice, the whole process can be expressed as a workflow. A workflow, including error handling andnotification, has been created and deployed. Workflow execution can be triggered manually or afterpredefined time intervals. According to our evaluation, the Flowable platform proves to be powerful andflexible. Further usage of the platform is already planned or implemented for many of our projects.
he paper gives a brief introduction about the workflow management platform, Flowable, and how it isused for textual-data management. It is relatively new with its first release on 13 October, 2016. Despite theshort time on the market, it seems to be quickly well-noticed with 4.6 thousand stars on GitHub at themoment. The focus of our project is to build a platform for text analysis on a large scale by including manydifferent text resources. Currently, we have successfully connected to four different text resources andobtained more than one million works. Some resources are dynamic, which means that they might add moredata or modify their current data. Therefore, it is necessary to keep data, both the metadata and the raw data,from our side up to date with the resources. In addition, to comply with FAIR principles, each work isassigned a persistent identifier (PID) and indexed for searching purposes. In the last step, we perform somestandard analyses on the data to enhance our search engine and to generate a knowledge graph. End-userscan utilize our platform to search on our data or get access to the knowledge graph. Furthermore, they cansubmit their code for their analyses to the system. The code will be executed on a High-Performance Cluster(HPC) and users can receive the results later on. In this case, Flowable can take advantage of PIDs for digitalobjects identification and management to facilitate the communication with the HPC system. As one mayalready notice, the whole process can be expressed as a workflow. A workflow, including error handling andnotification, has been created and deployed. Workflow execution can be triggered manually or afterpredefined time intervals. According to our evaluation, the Flowable platform proves to be powerful andflexible. Further usage of the platform is already planned or implemented for many of our projects.
Triet, Ho Anh Doan、Ramin, Yahyapour、Sven, Bingert
计算技术、计算机技术自动化技术、自动化技术设备
Flowableworkflowtext analysisknowledge graphpersistent identifier
Flowableworkflowtext analysisknowledge graphpersistent identifier
Triet, Ho Anh Doan,Ramin, Yahyapour,Sven, Bingert.Using a Workflow Management Platform in Textual Data Management[EB/OL].(2022-11-28)[2025-05-04].https://chinaxiv.org/abs/202211.00431.点此复制
评论