Approximation algorithms for Hamming clustering problems

Leszek Gąsieniec, Jesper Andreas Jansson, Andrzej Lingas

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

6 Citations (Scopus)

Abstract

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ: The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S. First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomial- time solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any ε > 0 it is NP-hard to split S into at most pk<sup>1/7–ε</sup> clusters whose Hamming diameter doesn't exceed the p-diameter. Fur- thermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2<sup>O(pϱ/ε)</sup>k<sup>O(p/ε)</sup>n<sup>2</sup>-time (1+ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k+n)): Finally, we show how to find in O((formula presented) + kn log n + k<sup>2</sup> log n)(2<sup>ϱ</sup>k)<sup>2/ε</sup>) time a set L of O(p log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε)%, for any constant 0 < ε < 1.
Original languageEnglish
Title of host publicationCombinatorial Pattern Matching - 11th Annual Symposium, CPM 2000, Proceedings
PublisherSpringer Verlag
Pages108-118
Number of pages11
ISBN (Electronic)3540676333, 9783540676331
Publication statusPublished - 1 Jan 2000
Externally publishedYes
Event11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000 - Montreal, Canada
Duration: 21 Jun 200023 Jun 2000

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1848
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th Annual Symposium on Combinatorial Pattern Matching, CPM 2000
Country/TerritoryCanada
CityMontreal
Period21/06/0023/06/00

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this