David Lillis :: Towards an Open Science Platform for the Evaluation of Data Fusion

Towards an Open Science Platform for the Evaluation of Data Fusion

Weinan Huang, Junyi Chen, Lei Meng and David Lillis

Conference In 3rd IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA 2018), pages 290--294, Chengdu, China, 2018.

Abstract

Combining the results of different search engines in order to improve upon their performance has been the subject of many research papers. This has become known as the ``Data Fusion'' task, and has great promise in dealing with the vast quantity of unstructured textual data that is a feature of many Big Data scenarios. However, no universally-accepted evaluation methodology has emerged in the community. This makes it difficult to make meaningful comparisons between the various proposed techniques from reading the literature alone. Variations in the datasets, metrics, and baseline results have all contributed to this difficulty. This paper argues that a more unified approach is required, and that a centralised software platform should be developed to aid researchers in making comparisons between their algorithms and others. The desirable qualities of such a system have been identified and proposed, and an early prototype has been developed. Re-implementing algorithms published by other researchers is a great burden on those proposing new techniques. The prototype system has the potential to greatly reduce this burden and thus encourage more comparable results being generated and published more easily.