A fast big data collection system using MapReduce framework

Bing Li, Chun Chung Chan

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

3 Citations (Scopus)


Social network like a corpus with valuable data, has attracted much attention from a various fields of researchers in recent years, especially in the subject of big data analytics. However, as the foundation, the part of efficient and accurate data collection has not been focused much in the past published works. During the data among the web increasing rapidly, this article will identify two major challenges that traditional distributed based web crawler systems cannot adapt, which is fast handling the big data in social networks and suiting for multiple web sources with a uniformed collecting model. To deal with these two challenges thus to build a foundation of the big data analytics, this article will propose an Ontology based adapted web crawler system called OACM system, which uses MapReduce model to effectively balance the processing resources thus to fasten the processing speed of the collection procedure and designs a uniformed Ontology model to estimate the semantic content of both social networks and collecting tasks to adapt different web sources. During a set of experiments, the proposed OACM system could optimize the system resource scheduling efficiently and could achieve the task of collecting large amount of data from multiple web sources.
Original languageEnglish
Title of host publicationCCIS 2014 - Proceedings of 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems
Number of pages6
ISBN (Electronic)9781479947201
Publication statusPublished - 1 Jan 2014
Event3rd IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2014 - Shenzhen, China
Duration: 27 Nov 201429 Nov 2014


Conference3rd IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2014


  • Big Data Analytics
  • MapReduce
  • Ontology Model
  • Social Network
  • Web Crawler

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Artificial Intelligence
  • Computational Theory and Mathematics

Cite this