TY - GEN
T1 - Title extraction from Loosely Structured Data Records
AU - Wu, Yi Pu
AU - Zhang, Xue Jie
AU - Li, Qing
AU - Chen, Jing
PY - 2008/12/25
Y1 - 2008/12/25
N2 - In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet.
AB - In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected from the Internet.
KW - Forum data
KW - Loosely structured data records
KW - Structured data records
KW - Title extraction
UR - http://www.scopus.com/inward/record.url?scp=57849155380&partnerID=8YFLogxK
U2 - 10.1109/ICMLC.2008.4620851
DO - 10.1109/ICMLC.2008.4620851
M3 - Conference article published in proceeding or book
AN - SCOPUS:57849155380
SN - 9781424420964
T3 - Proceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
SP - 2623
EP - 2628
BT - Proceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
T2 - 7th International Conference on Machine Learning and Cybernetics, ICMLC
Y2 - 12 July 2008 through 15 July 2008
ER -