I finished my Bachelors of Science in 2011, in Electrical Engineering at the Sharif University of Technology, Tehran, Iran. I became a Ph.D. student in September 2011 at the University of California, Davis. I began my studies on communication systems and wireless sensor networks. In 2013, I became interested in distributed systems and parallel algorithms, and started working with Prof. John Owens on using Graphics Processing Unit (GPU) for general-purpose parallel computations. I got my PhD on December 2017.

Research Interests:

I am interested in doing research on general-purpose GPU computing problems. My main interest is on the design and analysis of parallel algorithms that suits the GPU hardware, followed by efficient implementation of such algorithms on the GPU; More specifically: large-scale problems that involve high-performance primitive algorithms, data structures, graph processing, linear algebra, data analysis, machine learning, etc.

Although the GPU provides a specific framework for parallel processing, my interests are not limited to GPUs. I am also interested in high-performance computing using other parallel frameworks to deal with large-scale problems, as well as doing research on distributed systems and cloud-computing frameworks.

Research Experience:

My work involves developing new theoretical approaches and high-performance implementations of complex algorithms and data structures. To name a few:

  1. String matching: I designed a two-stage string matching procedure based on the classic Rabin-Karp algorithm. My implementation provides best-of-class high-performance single-pattern exact matching on the GPU.
  2. Multisplit: I designed and implemented a set of primitive algorithms called “multisplit”. Multisplit categorizes input elements into contiguous segments depending on the bucket they belong to (by a user-defined identifier). Multisplit is a useful primitive algorithm that can be used as a building block in implementing a fast radix sort and histogram on the GPU [Github repo].
  3. Dynamic data structures: data structures for the GPU that can be efficiently updated at runtime, while also providing fast queries. So far, I have designed and implemented an ordered dictionary with batch updates that also supports search and range queries, and a dynamic hash table that efficiently supports concurrent updates and search queries. I am currently working on a data structure suitable for dynamic graphs on the GPU.
  4. Apache Spark with GPUs: I designed and implemented a framework to use GPUs in Apache Spark, so that certain computationally expensive tasks can be done more efficiently. Our goal was hiding the common complexity of exploiting GPUs from the Apache Spark user.



  1. Saman Ashkiani, Parallel Algorithms and Dynamic Data Structures on the Graphics Processing Unit: a warp-centric approach, PhD Dissertation, University of California, Davis, December 2017.
  2. Saman Ashkiani, Martin Farach-Colton, John D. Owens, A Dynamic Hash Table for the GPU, CoRR, abs/1710.11246, October 2017.
  3. Saman Ashkiani, Shengren Li, Martin Farach-Colton, Nina Amenta, John D. owens, GPU LSM: A Dynamic Dictionary Data Structure for the GPU, CoRR, abs/1707.05354, July 2017.
  4. Saman Ashkiani, Andrew Davidson, Ulrich Meyer, John D. Owens, GPU Multisplit: an extended study of a parallel algorithm, ACM Transactions on Parallel Computing (TOPC)- Special Issue: Invited papers from PPoPP 2016, Volume 4, Issue 1, October 2017.
  5. Saman Ashkiani, Andrew Davidson, Ulrich Meyer, John D. Owens, GPU Multisplit, Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2016)
  6. Saman Ashkiani, Nina Amenta, John D. Owens, Parallel Aproaches to the String matching Problem on the GPU, Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2016)
  7. Saman Ashkiani, Anna Scaglione, Pulse Coupled Discrete Oscillators Dynamics for Network Scheduling, the 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton 2012)
  8. Saman Ashkiani, Massoud Babaie-Zadeh, Christian Jutten, Error Correction via Smoothed L0-norm recovery, Statistical Signal Processing workshop (SSP 2011)
  9. Andrea Rueetschi, Saman Ashkiani, Anna Scaglione, On Scheduling Without a Master Clock: Coupled Oscillator Time Division Multiplexing, 45th Asilomar Conference on Signal, Systems and Computers (Asilomar 2012)
  10. Saman Ashkiani, Anna Scaglione, Discrete Dithered Desynchronization, arXiv preprint arXiv:1210.2122, 2012.