|Name:||(07) Managed Database Caching for Massively Parallel Sequence Alignment Tasks|
|Time:||Monday, June 23, 2014
05:04 pm - 05:11 pm
CCL - Congress Center Leipzig
|Presenter:||Rikky Wenang Purbojati, Nanyang Technological University|
|Abstract:||Sequence alignment is one of the most common algorithms used in analyzing short DNA sequences in genomics. Due to the massive amount of data produced by next-generation DNA sequencing technologies, it is typically done in a high-performance computer cluster for efficiency. Given how the algorithm works, invoking large batches of alignments involves loading multiple identical indexing databases from network-attached storage to working memory repeatedly. This redundancy and inefficiency can result in wasted resource utilization; and in extreme cases, overall cluster performance degradation. We addressed this problem by implementing an aligner-specific caching mechanism. It avoids any redundant IO request by intelligently caching the first database request in the local storage. Any subsequent database request then will be served from local storage. By benchmarking this mechanism, we showed that it improves the performance of the job execution, especially in very large batches of job.
Rikky W. Purbojati, Nanyang Technological University