Refereed Publications

Behrooz Zarebavani, Ahmed H. Mahmoud, Ana Dodik, Changcheng Yuan, Serban D. Porumbescu, John D. Owens, Maryam Mehri Dehnavi, and Justin Solomon. Fast Sparse Matrix Permutation for Mesh-Based Direct Solvers. ACM Transactions on Graphics, 45(4), July 2026. [ bib | DOI | http ]

Jiayi Yuan, Cameron Shinn, Kai Xu, Jingze Cui, George Klimiashvili, Guangxuan Xiao, Perkz Zheng, Bo Li, Yuxin Zhou, Zhouhai Ye, Weijie You, Tian Zheng, Dominic Brown, Pengbo Wang, Markus Hoehnerbach, Richard Cai, Julien Demouth, John D. Owens, Xia Hu, Song Han, Timmy Liu, and Huizi Mao. BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding. In Proceedings of Machine Learning and Systems, volume 8 of MLSys 2026, May 2026. Best Research Paper Award. [ bib | http ]

Thomas Smith, Raph Levien, and John D. Owens. Decoupled Fallback: A Portable Single-Pass GPU Scan. In Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '25, pages 255–268, July 2025. [ bib | DOI | http ]

Ahmed H. Mahmoud, Serban D. Porumbescu, and John D. Owens. Dynamic Mesh Processing on the GPU (Abstract). In Proceedings of the 3rd Highlights of Parallel Computing Workshop, HOPC '25, pages 7–9, July 2025. [ bib | DOI ]

Ahmed H. Mahmoud, Serban D. Porumbescu, and John D. Owens. Dynamic Mesh Processing on the GPU. ACM Transactions on Graphics, 44(4):136:1–19, July 2025. [ bib | DOI | http ]

Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, and John D. Owens. Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms. IEEE Transactions on Parallel and Distributed Systems, 36(2):226–238, February 2025. [ bib | DOI | code | http ]

Yuxin Chen, Aydın Buluç, Katherine Yelick, and John D. Owens. Accelerating Multi-GPU Embedding Retrieval with PGAS-Style Communication for Deep Learning Recommendation Systems. In Parallel Applications Workshop, Alternatives To MPI+X, PAW-ATM2024, pages 1262–1273, November 2024. [ bib | DOI | http ]

John D. Owens and Bruce Hoppe. Helping Faculty Teach Software Performance Engineering. In Proceedings of the 14th NSF/TCPP Workshop on Parallel and Distributed Computing Education, EduPar-24, pages 338–341, May 2024. [ bib | DOI | http ]

Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. RETROSPECTIVE: Memory Access Scheduling. In José F. Martínez and Lizy K. John, editors, ISCA@50 25-Year Retrospective: 1996–2020. ACM SIGARCH and IEEE TCCA, June 2023. [ bib | http ]

Afton Geil, Serban D. Porumbescu, and John D. Owens. Maximum Clique Enumeration on the GPU. In Proceedings of the Workshop on Graphs, Architectures, Programming, and Learning, GrAPL 2023, pages 234–244, May 2023. [ bib | DOI | http ]

Toluwanimi O. Odemuyiwa, Hadi Asghari-Moghaddam, Michael Pellauer, Kartik Hegde, Po-An Tsai, Neal Crago, Aamer Jaleel, John D. Owens, Edgar Solomonik, Joel Emer, and Christopher Fletcher. Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, volume 3 of ASPLOS '23, pages 18–32, March 2023. [ bib | DOI | http ]

Jonathan D. Wapman, Sean Treichler, Serban D. Porumbescu, and John D. Owens. Harmonic CUDA: Asynchronous Programming on GPUs. In Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '23, pages 39–49, February 2023. [ bib | DOI | http ]

Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, and John D. Owens. Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP ’23, pages 429–431. ACM, February 2023. [ bib | DOI ]

Muhammad Osama, Serban D. Porumbescu, and John D. Owens. A Programming Model for GPU Load Balancing. In Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, pages 79–91, February/March 2023. [ bib | DOI | code | http ]

Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, Martín Farach-Colton, and John D. Owens. Analyzing and Implementing GPU Hash Tables. In SIAM Symposium on Algorithmic Principles of Computer Systems, APOCS23, pages 33–50, January 2023. [ bib | DOI | code | http ]

Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, and John D. Owens. Building a Performance Model for Deep Learning Recommendation Model Training on GPUs. In 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics, HiPC 2022, pages 48–58. IEEE, December 2022. [ bib | DOI | http ]

Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydın Buluç, Katherine Yelick, and John D. Owens. Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '22, pages 708–723, November 2022. [ bib | DOI | code | http ]

Muhammad A. Awad, Serban D. Porumbescu, and John D. Owens. A GPU Multiversion B-Tree. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2022, pages 481–493, October 2022. [ bib | DOI | code | http ]

Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydın Buluç, Katherine Yelick, and John D. Owens. Atos: A Task-Parallel GPU Scheduler for Graph Analytics. In Proceedings of the International Conference on Parallel Processing, ICPP 2022, August/September 2022. [ bib | DOI | arXiv | full talk ]

Kerry A. Seitz, Jr., Theresa Foley, Serban D. Porumbescu, and John D. Owens. Supporting Unified Shader Specialization by Co-opting C++ Features. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 5(3):25:1–25:17, July 2022. [ bib | DOI | ACM DL | http ]

Muhammad Osama, Serban D. Porumbescu, and John D. Owens. Essentials of Parallel Graph Analytics. In Proceedings of the Workshop on Graphs, Architectures, Programming, and Learning, GrAPL 2022, pages 314–317, May 2022. [ bib | DOI | code | http ]

Carl Yang, Aydın Buluç, and John D. Owens. GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU. ACM Transactions on Mathematical Software, 48(1):1:1–1:51, February 2022. Editors' Pick for Notable Papers, ACM TOMS, 2024. [ bib | DOI | http ]

Zhongyi Lin, Evangelos Georganas, and John D. Owens. Towards Flexible and Compiler-friendly Layer Fusion for CNNs on Multi-core CPUs. In Euro-Par 2021: Proceedings of the 27th International European Conference on Parallel and Distributed Computing, September 2021. [ bib | DOI | http ]

Weitang Liu, Xiaoyun Wang, John D. Owens, and Yixuan Li. Energy-based Out-of-distribution Detection. In Advances in Neural Information Processing Systems, volume 33 of NeurIPS 2020, December 2020. [ bib | code | .html ]

Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, and John D. Owens. Dynamic Graphs on the GPU. In Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020, pages 739–748, May 2020. [ bib | DOI | http ]

Ahmed Abdelkader, Chandrajit L. Bajaj, Mohamed S. Ebeida, Ahmed H. Mahmoud, Scott A. Mitchell, John D. Owens, and Ahmad A. Rushdi. VoroCrust: Voronoi Meshing Without Clipping. ACM Transactions on Graphics, 39(3):23:1–23:16, May 2020. [ bib | DOI | ACM DL | http ]

Kerry A. Seitz, Jr., T. Foley, Serban D. Porumbescu, and John D. Owens. Staged Metaprogramming for Shader System Development. ACM Transactions on Graphics, 38(6):202:1–202:15, November 2019. [ bib | DOI | ACM DL | http ]

Benjamin Brock, Yuxin Chen, Jiakun Yan, John D. Owens, Aydın Buluç, and Katherine Yelick. RDMA vs. RPC for Implementing Distributed Data Structures. In Proceedings of the IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms, IA³ 2019, pages 17–22, November 2019. [ bib | DOI | http ]

Leyuan Wang and John D. Owens. Fast BFS-Based Triangle Counting on GPUs. In Proceedings of the IEEE High Performance Extreme Computing Conference, HPEC '19, September 2019. 2019 GraphChallenge Finalist. [ bib | DOI | http ]

Xiaoyun Wang, Zhongyi Lin, Carl Yang, and John D. Owens. Accelerating DNN Inference with GraphBLAS and the GPU. In Proceedings of the IEEE High Performance Extreme Computing Conference, HPEC '19, September 2019. 2019 GraphChallenge Student Innovation Award. [ bib | DOI | http ]

Zhongyi Lin, Matthew Yih, Jeffrey M. Ota, John D. Owens, and Pınar Muyan-Özçelik. Benchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection. IEEE Transactions on Intelligent Vehicles, 4(3):385–395, September 2019. [ bib | DOI | code | http ]

Muhammad Osama, Minh Truong, Carl Yang, Aydın Buluç, and John D. Owens. Graph Coloring on the GPU. In Proceedings of the Workshop on Graphs, Architectures, Programming, and Learning, GrAPL 2019, pages 231–240, May 2019. [ bib | DOI | code | http ]

Muhammad A. Awad, Saman Ashkiani, Rob Johnson, Martín Farach-Colton, and John D. Owens. Engineering a High-Performance GPU B-Tree. In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, pages 145–157, February 2019. [ bib | DOI | code | ACM DL | http ]

Matthew Yih, Jeffrey M. Ota, John D. Owens, and Pınar Muyan-Özçelik. FPGA versus GPU for Speed-Limit-Sign Recognition. In Proceedings of the 21st IEEE International Conference on Intelligent Transportation Systems, ITSC 2018, pages 843–850, November 2018. [ bib | DOI | code | http ]

Carl Yang, Aydın Buluç, and John D. Owens. Implementing Push-Pull Efficiently in GraphBLAS. In Proceedings of the International Conference on Parallel Processing, ICPP 2018, pages 89:1–89:11, August 2018. [ bib | DOI | code | ACM DL | http ]

Carl Yang, Aydın Buluç, and John D. Owens. Design Principles for Sparse Matrix Multiplication on the GPU. In Marco Aldinucci, Luca Padovani, and Massimo Torquati, editors, Euro-Par 2018: Proceedings of the 24th International European Conference on Parallel and Distributed Computing, pages 672–687, August 2018. Distinguished Paper and Best Artifact Award. [ bib | DOI | code | http ]

John D. Owens. Technical Perspective: Graphs, Betweenness Centrality, and the GPU. Communications of the ACM, 61(8):84, August 2018. [ bib | DOI | ACM DL | http ]

Zhongyi Lin, Jeffrey M. Ota, John D. Owens, and Pınar Muyan-Özçelik. Benchmarking Deep Learning Frameworks with FPGA-suitable Models on a Traffic Sign Dataset. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium, IV '18, pages 1197–1203, June 2018. [ bib | DOI | http ]

Ahmed Abdelkader, Chandrajit L. Bajaj, Mohamed S. Ebeida, Ahmed H. Mahmoud, Scott A. Mitchell, John D. Owens, and Ahmad A. Rushdi. VoroCrust Illustrated: Theory and Challenges (Multimedia Exposition). In Bettina Speckmann and Csaba D. Tóth, editors, 34th International Symposium on Computational Geometry (SoCG 2018), volume 99 of Leibniz International Proceedings in Informatics (LIPIcs), pages 77:1–77:4, Dagstuhl, Germany, June 2018. Schloss Dagstuhl—Leibniz-Zentrum für Informatik. [ bib | DOI | http ]

Ahmed Abdelkader, Chandrajit L. Bajaj, Mohamed S. Ebeida, Ahmed H. Mahmoud, Scott A. Mitchell, John D. Owens, and Ahmad Rushdi. Sampling Conditions for Conforming Voronoi Meshing by the VoroCrust Algorithm. In Bettina Speckmann and Csaba D. Tóth, editors, 34th International Symposium on Computational Geometry (SoCG 2018), volume 99 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1–1:16, Dagstuhl, Germany, June 2018. Schloss Dagstuhl—Leibniz-Zentrum für Informatik. [ bib | DOI | http ]

Yuechao Pan, Roger Pearce, and John D. Owens. Scalable Breadth-First Search on a GPU Cluster. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, pages 1090–1101, May 2018. [ bib | DOI | http ]

Afton Geil, Martin Farach-Colton, and John D. Owens. Quotient Filters: Approximate Membership Queries on the GPU. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, pages 451–462, May 2018. [ bib | DOI | http ]

Saman Ashkiani, Shengren Li, Martin Farach-Colton, Nina Amenta, and John D. Owens. GPU LSM: A Dynamic Dictionary Data Structure for the GPU. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, pages 430–440, May 2018. [ bib | DOI | http ]

Saman Ashkiani, Martin Farach-Colton, and John D. Owens. A Dynamic Hash Table for the GPU. In Proceedings of the 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, pages 419–429, May 2018. [ bib | DOI | code | http ]

Yangzihao Wang, Yuechao Pan, Andrew Davidson, Yuduo Wu, Carl Yang, Leyuan Wang, Muhammad Osama, Chenshan Yuan, Weitang Liu, Andy T. Riffel, and John D. Owens. Gunrock: GPU Graph Analytics. ACM Transactions on Parallel Computing, 4(1):3:1–3:49, August 2017. [ bib | DOI | code | ACM DL | http ]

Pınar Muyan-Özçelik and John D. Owens. Methods for Multitasking among Real-time Embedded Compute Tasks Running on the GPU. Concurrency and Computation: Practice and Experience, 29(15):e4118:1–e4118:14, August 2017. [ bib | DOI ]

Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, and John D. Owens. GPU Multisplit: an extended study of a parallel algorithm. ACM Transactions on Parallel Computing, 4(1):2:1–2:44, August 2017. [ bib | DOI | code | ACM DL | http ]

Ahmed Abdelkader, Ahmed H. Mahmoud, Ahmad A. Rushdi, Scott A. Mitchell, John D. Owens, and Mohamed S. Ebeida. A Constrained Resampling Strategy for Mesh Improvement. Computer Graphics Forum, 36(5):189–201, July 2017. Proceedings of the Symposium on Geometry Processing. [ bib | DOI | code | http ]

Yangzihao Wang, Sean Baxter, and John D. Owens. Mini-Gunrock: A Lightweight Graph Analytics Framework on the GPU. In Graph Algorithms Building Blocks, GABB 2017, pages 616–626, May 2017. [ bib | DOI | code | http ]

Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. Multi-GPU Graph Analytics. In Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, pages 479–490, May/June 2017. [ bib | DOI | code | http ]

David Luebke and John Owens. Pixels at Scale: High-Performance Computer Graphics and Vision. In Frontiers of Engineering: Reports on Leading-Edge Engineering from the 2016 Symposium, pages 3–5. The National Academies Press, 2017. [ bib | DOI ]

Jonathan Y. Kemal, Roger L. Davis, and John D. Owens. Multidisciplinary simulation acceleration using multiple shared memory graphical processing units. International Journal of High Performance Computing Applications, 30(4):486–508, November 2016. [ bib | DOI | http ]

Ross K. Gegan, Vishal Ahuja, John D. Owens, and Dipak Ghosal. Real-Time GPU-based Timing Channel Detection using Entropy. In Proceedings of the IEEE Conference on Communications and Network Security, CNS 2016, pages 296–305, October 2016. [ bib | DOI | http ]

Jeremy Kepner, Peter Aaltonen, David Bader, Aydın Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Jose Moreira, John D. Owens, Carl Yang, Marcin Zalewski, and Timothy Mattson. Mathematical Foundations of the GraphBLAS. In Proceedings of the IEEE High Performance Extreme Computing Conference, September 2016. [ bib | DOI | http ]

Leyuan Wang, Sean Baxter, and John D. Owens. Fast Parallel Skew and Prefix-Doubling Suffix Array Construction on the GPU. Concurrency and Computation: Practice & Experience, 28(12):3466–3484, 25 August 2016. [ bib | DOI | http ]

Saman Ashkiani, Nina Amenta, and John D. Owens. Parallel Approaches to the String Matching Problem on the GPU. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, pages 275–285, July 2016. [ bib | DOI | ACM DL | http ]

Mohamed S. Ebeida, Ahmad A. Rushdi, Muhammad A. Awad, Ahmed H. Mahmoud, Dong-Ming Yan, Shawn A. English, John D. Owens, Chandrajit L. Bajaj, and Scott A. Mitchell. Disk Density Tuning of a Maximal Random Packing. Computer Graphics Forum, 35(5):259–269, June 2016. Proceedings of the Symposium on Geometry Processing. [ bib | DOI | .pdf ]

Leyuan Wang, Yangzihao Wang, Carl Yang, and John D. Owens. A Comparative Study on Exact Triangle Counting Algorithms on the GPU. In Proceedings of the 1st High Performance Graph Processing Workshop, HPGP '16, pages 1–8, May 2016. [ bib | DOI | ACM DL | http ]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. Gunrock: A High-Performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, pages 11:1–11:12, March 2016. Distinguished Paper. [ bib | DOI | code | ACM DL | http ]

Pınar Muyan-Özçelik and John D. Owens. Multitasking Real-time Embedded GPU Computing Tasks. In Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2016, pages 78–87, March 2016. [ bib | DOI | ACM DL | http ]

Saman Ashkiani, Andrew A. Davidson, Ulrich Meyer, and John D. Owens. GPU Multisplit. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, pages 12:1–12:13, March 2016. [ bib | DOI | code | ACM DL | http ]

Yuduo Wu, Yangzihao Wang, Yuechao Pan, Carl Yang, and John D. Owens. Performance Characterization of High-Level Programming Models for GPU Graph Analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66–75, October 2015. Best Paper finalist. [ bib | DOI | http ]

Mikhail M. Shashkov, Jason Mak, Shawn Recker, Connie Nguyen, John Owens, and Kenneth I. Joy. Efficient Dense Reconstruction Using Geometry and Image Consistency Constraints. In Proceedings of the IEEE Applied Imagery Pattern Recognition Workshop, AIPR 2015, October 2015. [ bib | DOI | http ]

Leyuan Wang, Sean Baxter, and John D. Owens. Fast Parallel Suffix Array on the GPU. In Euro-Par 2015: Proceedings of the 21st International European Conference on Parallel and Distributed Computing, volume 9233 of Lecture Notes in Computer Science, pages 573–587. Springer, August 2015. Distinguished Paper. [ bib | DOI | http ]

Anjul Patney, Stanley Tzeng, Kerry A. Seitz, Jr., and John D. Owens. Piko: A Framework for Authoring Programmable Graphics Pipelines. ACM Transactions on Graphics, 34(4):147:1–147:13, August 2015. [ bib | DOI | ACM DL | http ]

Carl Yang, Yangzihao Wang, and John D. Owens. Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU. In Graph Algorithms Building Blocks, GABB 2015, pages 841–847, May 2015. [ bib | DOI | http ]

Thomas Weber, Michael Wimmer, and John D. Owens. Parallel Reyes-style Adaptive Subdivision with Bounded Memory Usage. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, i3D 2015, pages 39–45, February/March 2015. [ bib | DOI | code | ACM DL | http ]

Jonathan Kemal, Roger L. Davis, and John D. Owens. Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units. In AIAA Infotech @ Aerospace, AIAA Science and Technology Forum, January 2015. [ bib | DOI | http ]

Jason Mak, Mauricio Hess-Flores, Shawn Recker, John D. Owens, and Kenneth I. Joy. A Comparative Study of Recent GPU-Accelerated Multi-View Sequential Reconstruction Triangulation Methods for Large-Scale Scenes. In C. V. Jawahar and Shiguang Shan, editors, Big Data in 3D Computer Vision (Computer Vision—ACCV 2014 Workshops), volume 9008 of Lecture Notes in Computer Science, pages 254–269. Springer International Publishing, November 2014. [ bib | DOI | http ]

Mohamed Ebeida, Scott Mitchell, Anjul Patney, Andrew Davidson, Stanley Tzeng, Muhammad Awad, Ahmed Mahmoud, and John D. Owens. Exercises in High-Dimensional Sampling: Maximal Poisson-disk Sampling and k-d Darts. In Janine Bennett, Fabien Vivodtzev, and Valerio Pascucci, editors, Topological and Statistical Methods for Complex Data – Tackling Large-Scale, High-Dimensional, and Multivariate Data Sets, pages 221–238. Springer, November 2014. [ bib | DOI | http ]

Afton Geil, Yangzihao Wang, and John D. Owens. WTF, GPU! Computing Twitter's Who-To-Follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, pages 63–68, October 2014. [ bib | DOI | ACM DL | http ]

Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2014, pages 349–359, May 2014. [ bib | DOI | http ]

Jason Mak, Mauricio Hess-Flores, Shawn Recker, John D. Owens, and Kenneth I. Joy. GPU-Accelerated and Efficient Multi-View Triangulation for Scene Reconstruction. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV '14, pages 61–68, March 2014. [ bib | DOI | http ]

Mohamed S. Ebeida, Anjul Patney, Scott A. Mitchell, Keith R. Dalbey, Andrew A. Davidson, and John D. Owens. k-d Darts: Sampling by k-Dimensional Flat Searches. ACM Transactions on Graphics, 33(1):3:1–3:16, January 2014. [ bib | DOI | ACM DL | http ]

Mohamed S. Ebeida, Ahmed H. Mahmoud, Muhammad A. Awad, Mohammed A. Mohammed, Scott A. Mitchell, Alex Rand, and John D. Owens. Sifted Disks. Computer Graphics Forum, 32(2):509–518, May 2013. [ bib | DOI | .pdf ]

Stanley Tzeng, Brandon Lloyd, and John D. Owens. A GPU Task-Parallel Model with Dependency Resolution. IEEE Computer, 45(8):34–41, August 2012. [ bib | DOI | http ]

Stanley Tzeng, Anjul Patney, Andrew Davidson, Mohamed S. Ebeida, Scott A. Mitchell, and John D. Owens. High-Quality Parallel Depth-of-Field Using Line Samples. In Proceedings of High Performance Graphics, HPG '12, pages 23–31, June 2012. [ bib | DOI | http ]

Shengren Li, Lance Simons, Jagadeesh Bhaskar Pakaravoor, Fatemeh Abbasinejad, John D. Owens, and Nina Amenta. kANN on the GPU with Shifted Sorting. In Proceedings of High Performance Graphics, HPG '12, pages 39–47, June 2012. [ bib | DOI | http ]

Ritesh A. Patel, Yao Zhang, Jason Mak, and John D. Owens. Parallel Lossless Data Compression on the GPU. In Proceedings of Innovative Parallel Computing, InPar '12, May 2012. [ bib | DOI | http ]

Kshitij Gupta, Jeff A. Stuart, and John D. Owens. A Study of Persistent Threads Style GPU Programming for GPGPU Workloads. In Proceedings of Innovative Parallel Computing, InPar '12, May 2012. [ bib | DOI | http ]

Mohamed S. Ebeida, Scott A. Mitchell, Anjul Patney, Andrew A. Davidson, and John D. Owens. A Simple Algorithm for Maximal Poisson-Disk Sampling in High Dimensions. Computer Graphics Forum, 31(2):785–794, May 2012. [ bib | DOI | http ]

Andrew Davidson, David Tarjan, Michael Garland, and John D. Owens. Efficient Parallel Merge Sort for Fixed and Variable Length Keys. In Proceedings of Innovative Parallel Computing, InPar '12, May 2012. [ bib | DOI | http ]

Andrew Davidson and John Owens. Toward Techniques for Auto-tuning GPU Algorithms. In Kristján Jónasson, editor, Applied Parallel and Scientific Computing, volume 7134 of Lecture Notes in Computer Science, pages 110–119. Springer Berlin / Heidelberg, February 2012. [ bib | DOI ]

Yao Zhang, John Ludd Recker, Robert Ulichney, Ingeborg Tastl, and John D. Owens. Plane-dependent Error Diffusion on a GPU. In Proceedings of SPIE: IS&T/SPIE Electronic Imaging 2012 / Parallel Processing for Imaging Applications II, volume 8295B, pages 8295B–59:1–10, January 2012. [ bib | DOI | http ]

Mohamed S. Ebeida, Anjul Patney, John D. Owens, and Eric Mestreau. Isotropic conforming refinement of quadrilateral and hexahedral meshes using two-refinement templates. International Journal for Numerical Methods in Engineering, 88(10):974–985, 9 December 2011. [ bib | DOI | http ]

Kshitij Gupta and John D. Owens. Compute & Memory Optimizations for High-Quality Speech Recognition on Low-End GPU Processors. In Proceedings of the International Conference on High Performance Computing, HiPC 2011, December 2011. [ bib | DOI | http ]

Yao Zhang, Jonathan Cohen, Andrew A. Davidson, and John D. Owens. A Hybrid Method for Solving Tridiagonal Systems on the GPU. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 11, pages 117–132. Morgan Kaufmann, October 2011. [ bib | DOI | http ]

Jeff A. Stuart, Pavan Balaji, and John D. Owens. Extending MPI to Accelerators. In Proceedings of the First Workshop on Architectures and Systems for Big Data, ASBD 2011, pages 19–23, October 2011. [ bib | DOI | ACM DL | http ]

Mark Silberstein, Assaf Schuster, and John D. Owens. Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 36, pages 501–517. Morgan Kaufmann, October 2011. [ bib | DOI ]

Mohamed S. Ebeida, Scott A. Mitchell, Andrew A. Davidson, Anjul Patney, Patrick M. Knupp, and John D. Owens. Efficient and Good Delaunay Meshes From Random Points. In Proceedings of the SIAM Conference on Geometric and Physical Modeling, GD/SPM11, pages 1506–1515, October 2011. [ bib | DOI | http ]

Dan A. Alcantara, Vasily Volkov, Shubhabrata Sengupta, Michael Mitzenmacher, John D. Owens, and Nina Amenta. Building an Efficient Hash Table on the GPU. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 2, chapter 4, pages 39–53. Morgan Kaufmann, October 2011. [ bib | DOI ]

Everett H. Phillips, Yao Zhang, Roger L. Davis, and John D. Owens. Acceleration of 2-D Compressible Flow Solvers with Graphics Processing Unit Clusters. Journal of Aerospace Computing, Information, and Communication, 8(8):237–249, August 2011. [ bib | DOI | http ]

John Jenkins, Isha Arkatkar, John D. Owens, Alok Choudhary, and Nagiza F. Samatova. Lessons Learned from Exploring the Backtracking Paradigm on the GPU. In Euro-Par 2011: Proceedings of the 17th International European Conference on Parallel and Distributed Computing, volume 6853 of Lecture Notes in Computer Science, pages 425–437. Springer, August/ September 2011. [ bib | DOI | http ]

Jeff A. Stuart, Michael Cox, and John D. Owens. GPU-to-CPU Callbacks. In Euro-Par 2010 Workshops: Proceedings of the Third Workshop on UnConventional High Performance Computing (UCHPC 2010), volume 6586 of Lecture Notes in Computer Science, pages 365–372. Springer, July 2011. [ bib | DOI | http ]

Mohamed S. Ebeida, Anjul Patney, Scott A. Mitchell, Andrew Davidson, Patrick M. Knupp, and John D. Owens. Efficient Maximal Poisson-Disk Sampling. ACM Transactions on Graphics, 30(4):49:1–49:12, July 2011. [ bib | DOI | ACM DL | http ]

Christopher P. Stone, Earl P. N. Duque, Yao Zhang, David Car, John D. Owens, and Roger L. Davis. GPGPU parallel algorithms for structured-grid CFD codes. In Proceedings of the 20th AIAA Computational Fluid Dynamics Conference, number 2011-3221, June 2011. [ bib | DOI | http ]

Vladimir Glavtchev, Pınar Muyan-Özçelik, Jeffrey M. Ota, and John D. Owens. Feature-Based Speed Limit Sign Detection Using a Graphics Processing Unit. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium, IV '11, pages 195–200, June 2011. [ bib | DOI | http ]

Jeff A. Stuart and John D. Owens. Multi-GPU MapReduce on GPU Clusters. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, pages 1068–1079, May 2011. [ bib | DOI | http ]

Andrew Davidson, Yao Zhang, and John D. Owens. An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, pages 956–965, May 2011. [ bib | DOI | http ]

Andrew Davidson and John D. Owens. Register Packing for Cyclic Reduction: A Case Study. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pages 4:1–4:6, March 2011. [ bib | DOI | ACM DL | http ]

Yao Zhang and John D. Owens. A Quantitative Performance Analysis Model for GPU Architectures. In Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture, HPCA-17, pages 382–393, February 2011. [ bib | DOI | http ]

Pınar Muyan-Özçelik, Vladimir Glavtchev, Jeffrey M. Ota, and John D. Owens. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU. In Wen-mei W. Hwu, editor, GPU Computing Gems, volume 1, chapter 32, pages 497–516. Morgan Kaufmann, February 2011. [ bib | DOI | http ]

Yao Zhang, John Ludd Recker, Robert Ulichney, Giordano B. Beretta, Ingeborg Tastl, I-Jong Lin, and John D. Owens. A Parallel Error Diffusion Implementation on a GPU. In Proceedings of SPIE: IS&T/SPIE Electronic Imaging 2011 / Parallel Processing for Imaging Applications, volume 7872, pages 78720K:1–9, January 2011. [ bib | DOI | http ]

Shubhabrata Sengupta, Mark Harris, Michael Garland, and John D. Owens. Efficient Parallel Scan Algorithms for many-core GPUs. In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore and Accelerators, Chapman & Hall/CRC Computational Science, chapter 19, pages 413–442. Taylor & Francis, January 2011. [ bib | DOI | http ]

Pınar Muyan-Özçelik, Vladimir Glavtchev, Jeffery M. Ota, and John D. Owens. A Template-Based Approach for Real-Time Speed-Limit-Sign Recognition on an Embedded System using GPU Computing. In Michael Goesele, Stefan Roth, Arjan Kuijper, Bernt Schiele, and Konrad Schindler, editors, DAGM 2010: Proceedings of the 32nd Annual Symposium of the German Association for Pattern Recognition, volume 6376 of Lecture Notes in Computer Science, pages 162–171. Springer, September 2010. [ bib | DOI | http ]

Stanley Tzeng, Anjul Patney, and John D. Owens. Task Management for Irregular-Parallel Workloads on the GPU. In Proceedings of High Performance Graphics, HPG '10, pages 29–37, June 2010. 2019 High Performance Graphics Test of Time Award for the most influential paper from HPG's 2010 predecessor conferences. [ bib | DOI | http ]

Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, and John D. Owens. Multi-GPU Volume Rendering using MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing / The First International Workshop on MapReduce and its Applications, HPDC '10 / MAPREDUCE '10, pages 841–848, June 2010. [ bib | DOI | ACM DL | http ]

Everett H. Phillips, Roger L. Davis, and John D. Owens. Unsteady Turbulent Simulations on a Cluster of Graphics Processors. In Proceedings of the 40th AIAA Fluid Dynamics Conference, number AIAA 2010-5036, June 2010. [ bib | DOI | http ]

Anjul Patney, Stanley Tzeng, and John D. Owens. Fragment-Parallel Composite and Filter. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering), 29(4):1251–1258, June 2010. [ bib | DOI | http ]

Andrew Davidson and John D. Owens. Toward Techniques for Auto-Tuning GPU Algorithms. In State of the Art in Scientific and Parallel Computing, Para 2010, June 2010. [ bib | http ]

Yao Zhang, Jonathan Cohen, and John D. Owens. Fast Tridiagonal Solvers on the GPU. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pages 127–136, January 2010. [ bib | DOI | ACM DL | http ]

Kshitij Gupta and John D. Owens. Three-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors. In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pages 146–151, December 2009. [ bib | DOI | http ]

Dan A. Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D. Owens, and Nina Amenta. Real-Time Parallel Hashing on the GPU. ACM Transactions on Graphics, 28(5):154:1–154:9, December 2009. [ bib | DOI | ACM DL | http ]

Anjul Patney, Mohamed S. Ebeida, and John D. Owens. Parallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces. In Proceedings of High Performance Graphics, HPG '09, pages 99–108, August 2009. [ bib | DOI | ACM DL | http ]

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, and Kenneth I. Joy. Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management, volume 5566 of Lecture Notes in Computer Science, pages 110–129. Springer, June 2009. [ bib | DOI | http ]

Jeff A. Stuart and John D. Owens. Message Passing on Data-Parallel Architectures. In Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009, May 2009. [ bib | DOI | http ]

Brian Budge, Tony Bernardin, Jeff A. Stuart, Shubhabrata Sengupta, Kenneth I. Joy, and John D. Owens. Out-of-core Data Management for Path Tracing on Hybrid Resources. Computer Graphics Forum (Proceedings of Eurographics 2009), 28(2):385–396, April 2009. [ bib | DOI | http ]

Everett H. Phillips, Yao Zhang, Roger L. Davis, and John D. Owens. Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units. In Proceedings of the 47th AIAA Aerospace Sciences Meeting, number AIAA 2009-565, January 2009. [ bib | DOI | http ]

Anjul Patney and John D. Owens. Real-Time Reyes-Style Adaptive Surface Subdivision. ACM Transactions on Graphics, 27(5):143:1–143:8, December 2008. [ bib | DOI | ACM DL | http ]

Sanjiv S. Samant, Junyi Xia, Pınar Muyan-Özçelik, and John D. Owens. High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. Medical Physics, 35(8):3546–3553, August 2008. [ bib | DOI ]

Pınar Muyan-Özçelik, John D. Owens, Junyi Xia, and Sanjiv S. Samant. Fast Deformable Registration on the GPU: A CUDA Implementation of Demons. In Proceedings of the 2008 International Conference on Computational Science and Its Applications (First Technical Session on UnConventional High Performance Computing), UCHPC '08, pages 223–233, July 2008. [ bib | DOI | http ]

Mark Silberstein, Assaf Schuster, Dan Geiger, Anjul Patney, and John D. Owens. Efficient Computation of Sum-products on GPUs Through Software-Managed Cache. In Proceedings of the 22nd ACM International Conference on Supercomputing, ICS '08, pages 309–318, June 2008. [ bib | DOI | ACM DL | http ]

John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips. GPU Computing. Proceedings of the IEEE, 96(5):879–899, May 2008. [ bib | DOI | http ]

Adam Moerschell and John D. Owens. Distributed Texture Memory in a Multi-GPU Environment. Computer Graphics Forum, 27(1):130–151, March 2008. [ bib | DOI | http ]

Aaron E. Lefohn, Shubhabrata Sengupta, and John D. Owens. Resolution-Matched Shadow Maps. ACM Transactions on Graphics, 26(4):20:1–20:17, October 2007. [ bib | DOI | ACM DL | http ]

John D. Owens, William J. Dally, Ron Ho, D. N. Jayasimha, Stephen W. Keckler, and Li-Shiuan Peh. Research Challenges for On-Chip Interconnection Networks. IEEE Micro, 27(5):96–108, September/October 2007. [ bib | DOI | .html ]

Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens. Scan Primitives for GPU Computing. In Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, GH '07, pages 97–106, August 2007. Best Paper Award. 2017 High Performance Graphics Test of Time Award for the most influential paper from HPG's 2007–08 predecessor conferences. [ bib | DOI | http ]

Mark Harris, Shubhabrata Sengupta, and John D. Owens. Parallel Prefix Sum (Scan) with CUDA. In Hubert Nguyen, editor, GPU Gems 3, chapter 39, pages 851–876. Addison Wesley, August 2007. [ bib | http ]

John D. Owens. Towards Multi-GPU Support for Visualization. Journal of Physics: Conference Series, 78:012055 (5pp), June 2007. [ bib | DOI | http ]

Kwan-Liu Ma, Robert Ross, Jian Huang, Greg Humphreys, Nelson Max, Kenneth Moreland, John D. Owens, and Han-Wei Shen. Ultra-Scale Visualization: Research and Education. Journal of Physics: Conference Series, 78:012088 (6pp), June 2007. [ bib | DOI | http ]

John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, and Tim Purcell. A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum, 26(1):80–113, March 2007. [ bib | DOI | http ]

John D. Owens. The Installation and Use of OpenType Fonts in L^AT_EX. TUGboat: Communications of the T_EX Users Group, 27(2):112–118, December 2006. [ bib | http ]

Adam Moerschell and John D. Owens. Distributed Texture Memory in a Multi-GPU Environment. In Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, GH '06, pages 31–38, September 2006. [ bib | DOI | ACM DL | http ]

Leo Szumel and John D. Owens. The Virtual Pheromone Communication Primitive. In Phillip B. Gibbons, Tarek Abdelzaher, James Aspnes, and Ramesh Rao, editors, Proceedings of the Second IEEE International Conference on Distributed Computing in Sensor Systems, volume 4026 of Lecture Notes in Computer Science, pages 135–149. Springer, June 2006. [ bib | DOI | http ]

Shubhabrata Sengupta, Aaron E. Lefohn, and John D. Owens. A Work-Efficient Step-Efficient Prefix Sum Algorithm. In Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, pages D–26–27, May 2006. [ bib | http ]

Aaron E. Lefohn, Shubhabrata Sengupta, Joe Kniss, Robert Strzodka, and John D. Owens. Glift: Generic Data Structures for the GPU. In Proceedings of the 2006 Workshop on Edge Computing Using New Commodity Architectures, pages D–15–16, May 2006. [ bib | http ]

Sung W. Park, Lars Linsen, Oliver Kreylos, John D. Owens, and Bernd Hamann. Discrete Sibson Interpolation. IEEE Transactions on Visualization and Computer Graphics, 12(2):243–253, March/April 2006. [ bib | DOI | http ]

Aaron E. Lefohn, Joe Kniss, Robert Strzodka, Shubhabrata Sengupta, and John D. Owens. Glift: Generic, Efficient, Random-Access GPU Data Structures. ACM Transactions on Graphics, 25(1):60–99, January 2006. [ bib | DOI | ACM DL | http ]

Sung Park, Lars Linsen, Oliver Kreylos, John D. Owens, and Bernd Hamann. A Framework for Real-Time Volume Visualization of Streaming Scattered Data. In Proceedings of the Tenth International Fall Workshop on Vision, Modeling, and Visualization, VMV 2005, pages 225–232, November 2005. [ bib | http ]

John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, and Tim Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics 2005, State of the Art Reports, pages 21–51, August 2005. [ bib | http ]

Aaron Lefohn, Shubhabrata Sengupta, Joe Kniss, Robert Strzodka, and John D. Owens. Dynamic Adaptive Shadow Maps on Graphics Hardware. In Technical Sketches Program, ACM SIGGRAPH, August 2005. [ bib | DOI | ACM DL | http ]

Joe Kniss, Aaron Lefohn, Shubhabrata Sengupta, Robert Strzodka, and John D. Owens. Octree Textures on Graphics Hardware. In Technical Sketches Program, ACM SIGGRAPH, August 2005. [ bib | DOI | ACM DL | http ]

Leo Szumel, Jason LeBrun, and John D. Owens. Towards a Mobile Agent Framework for Sensor Networks. In Proceedings of the Second IEEE Workshop on Embedded Networked Sensors, EmNetS-II, pages 79–87, May 2005. [ bib | DOI | .html ]

John Owens. Streaming Architectures and Technology Trends. In Matt Pharr, editor, GPU Gems 2, chapter 29, pages 457–470. Addison Wesley, March 2005. [ bib | http ]

Aaron Lefohn, Joe Kniss, and John Owens. Implementing Efficient Parallel Data Structures on GPUs. In Matt Pharr, editor, GPU Gems 2, chapter 33, pages 521–545. Addison Wesley, March 2005. [ bib | http ]

Andrew T. Riffel, Aaron E. Lefohn, Kiril Vidimce, Mark Leone, and John D. Owens. Mio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, GH '04, pages 35–44, August 2004. [ bib | DOI | ACM DL | http ]

Ujval J. Kapasi, Scott Rixner, William J. Dally, Brucek Khailany, Jung Ho Ahn, Peter Mattson, and John D. Owens. Programmable Stream Processors. IEEE Computer, 36(8):54–62, August 2003. [ bib | DOI | http ]

Brucek Khailany, William J. Dally, Scott Rixner, Ujval J. Kapasi, John D. Owens, and Brian Towles. Exploring the VLSI Scalability of Stream Processors. In Proceedings of the Ninth Annual International Symposium on High-Performance Computer Architecture, HPCA-9, pages 153–164, February 2003. [ bib | DOI | http ]

Ben Serebrin, John D. Owens, Brucek Khailany, Peter Mattson, Ujval J. Kapasi, Chen H. Chen, Jinyung Namkoong, Stephen P. Crago, Scott Rixner, and William J. Dally. A Stream Processor Development Platform. In Proceedings of the IEEE International Conference on Computer Design, ICCD 2002, pages 303–308, Freiburg, Germany, September 2002. [ bib | DOI | .pdf ]

John D. Owens, Ujval J. Kapasi, Peter Mattson, Brian Towles, Ben Serebrin, Scott Rixner, and William J. Dally. Media Processing Applications on the Imagine Stream Processor. In Proceedings of the IEEE International Conference on Computer Design, ICCD 2002, pages 295–302, Freiburg, Germany, September 2002. [ bib | DOI | http ]

John D. Owens, Brucek Khailany, Brian Towles, and William J. Dally. Comparing Reyes and OpenGL on a Stream Architecture. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, GH '02, pages 47–56, September 2002. [ bib | http ]

Ujval J. Kapasi, William J. Dally, Brucek Khailany, John D. Owens, and Scott Rixner. The Imagine Stream Processor. In Proceedings of the IEEE International Conference on Computer Design, ICCD 2002, pages 282–288, Freiburg, Germany, September 2002. [ bib | DOI | http ]

Ujval J. Kapasi, Peter Mattson, William J. Dally, John D. Owens, and Brian Towles. Stream Scheduling. In Proceedings of the 3rd Workshop on Media and Streaming Processors, pages 101–106, Austin, TX, 2 December 2001. [ bib | http ]

Brucek Khailany, William J. Dally, Ujval J. Kapasi, Peter Mattson, Jinyung Namkoong, John D. Owens, Brian Towles, Andrew Chang, and Scott Rixner. Imagine: Media Processing with Streams. IEEE Micro, 21(2):35–46, March/April 2001. [ bib | DOI | http ]

Ujval J. Kapasi, William J. Dally, Scott Rixner, Peter R. Mattson, John D. Owens, and Brucek Khailany. Efficient Conditional Operations for Data-parallel Architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO-33, pages 159–170, December 2000. [ bib | DOI | ACM DL | http ]

Peter Mattson, William J. Dally, Scott Rixner, Ujval J. Kapasi, and John D. Owens. Communication Scheduling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS-IX, pages 82–92, November 2000. [ bib | DOI | ACM DL | .pdf ]

John D. Owens, William J. Dally, Ujval J. Kapasi, Scott Rixner, Peter Mattson, and Ben Mowery. Polygon Rendering on a Stream Architecture. In Proceedings of the ACM SIGGRAPH/Eurographics Workshop on Graphics Hardware, HWWS '00, pages 23–32, August 2000. [ bib | DOI | ACM DL | http ]

Brucek Khailany, William J. Dally, Scott Rixner, Ujval J. Kapasi, Peter Mattson, Jin Namkoong, John D. Owens, and Brian Towles. Imagine: Signal and Image Processing Using Streams. In Hotchips 12, August 2000. [ bib | http ]

Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. Memory Access Scheduling. In Proceedings of the 27th International Symposium on Computer Architecture, ISCA-2000, pages 128–138, June 2000. [ bib | DOI | ACM DL | .pdf ]

Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson, Ujval Kapasi, and John D. Owens. Register Organization for Media Processing. In Proceedings of the Sixth Annual International Symposium on High-Performance Computer Architecture, HPCA-6, pages 375–386, January 2000. [ bib | DOI | .pdf ]

Scott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo Lopez-Lagunas, Peter Mattson, and John D. Owens. A Bandwidth-Efficient Architecture for Media Processing. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, MICRO-31, pages 3–13, December 1998. [ bib | DOI | .pdf ]

Non-Refereed Publications

Radoyeh Shojaei, Predrag Djurdjevic, Mostafa El-Khamy, James Goel, Kasper Mecklenburg, John Owens, Piınar Muyan-Özçelik, Tom St. John, Jinho Suh, and Arjun Suresh. MLPerf Automotive. CoRR, abs/2510.27065(2510.27065v1), October 2025. [ bib | arXiv | http ]

Toluwanimi O. Odemuyiwa, Joel S. Emer, and John D. Owens. The EDGE Language: Extended General Einsums for Graph Algorithms. CoRR, abs/2404.11591(2404.11591v1), April 2024. [ bib | arXiv ]

Cameron Shinn, Collin McCarthy, Saurav Muralidharan, Muhammad Osama, and John D. Owens. The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks. CoRR, abs/2310.00496(2310.00496v2), September 2023. [ bib | arXiv ]

Matthew Drescher, Muhammad A. Awad, Serban D. Porumbescu, and John D. Owens. BOBA: A Parallel Lightweight Graph Reordering Algorithm with Heavyweight Implications. CoRR, abs/2306.10410(2306.10410v2), June 2023. [ bib | arXiv ]

Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, and John D. Owens. Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU. CoRR, abs/2301.03598(2301.03598v1), January 2023. [ bib | arXiv ]

Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, Martín Farach-Colton, and John D. Owens. Better GPU Hash Tables. CoRR, abs/2108.07232(2108.07232v3), August 2021. [ bib | arXiv | code ]

Leyuan Wang and John D. Owens. Fast Gunrock Subgraph Matching (GSM) on GPUs. CoRR, abs/2003.01527(2003.01527v1), March 2020. [ bib | arXiv ]

Weitang Liu, Lifeng Wei, James Sharpnack, and John D. Owens. Unsupervised Object Segmentation with Explicit Localization Module. CoRR, abs/1911.09228(1911.09228v1), November 2019. [ bib | arXiv ]

Weitang Liu, Emad Barsoum, and John D. Owens. Object Localization and Motion Transfer learning with Capsules. CoRR, abs/1805.07706(1805.07706v1), May 2018. [ bib | arXiv ]

Kerry A. Seitz, Jr., Alex Kennedy, Owen Ransom, Bassam A. Younis, and John D. Owens. A GPU Implementation for Two-Dimensional Shallow Water Modeling. CoRR, abs/1309.1230(1309.1230v1), September 2013. [ bib | arXiv ]

Stanley Tzeng and John D. Owens. Finding Convex Hulls Using Quickhull on the GPU. CoRR, abs/1201.2936(1201.2936v1), January 2012. [ bib | arXiv ]

Jeff A. Stuart and John D. Owens. Efficient Synchronization Primitives for GPUs. CoRR, abs/1110.4623(1110.4623v1), October 2011. [ bib | arXiv ]

Luke J. Gosink, Kesheng Wu, E. Wes Bethel, John D. Owens, and Kenneth I. Joy. Bin-Hash Indexing: A Parallel Method For Fast Query Processing. Technical Report LBNL-729E, Lawrence Berkeley National Laboratory, 20 August 2008. [ bib | .pdf ]

Michael Kass, Aaron Lefohn, and John Owens. Interactive Depth of Field Using Simulated Diffusion on a GPU. Technical Report #06-01, Pixar Animation Studios, January 2006. http://graphics.pixar.com/library/DepthOfField. [ bib | http ]

John D. Owens, Shubhabrata Sengupta, and Daniel Horn. Assessment of Graphic Processing Units (GPUs) for Department of Defense (DoD) Digital Signal Processing (DSP) Applications. Technical Report ECE-CE-2005-3, Department of Electrical and Computer Engineering, University of California, Davis, October 2005. http://www.ece.ucdavis.edu/cerl/techreports/2005-3/. [ bib | http ]

John D. Owens. GPUs tapped for general computing. EE Times, 13 December 2004. http://www.eet.com/news/latest/showArticle.jhtml?articleID=55300884. [ bib | http ]

John D. Owens. On The Scalability of Sensor Network Routing and Compression Algorithms. Technical Report ECE-CE-2004-1, Computer Engineering Research Laboratory, University of California, Davis, 2004. http://www.ece.ucdavis.edu/cerl/techreports/2004-1/. [ bib | http ]

Leo Szumel and John D. Owens. On the Feasibility of the UC Davis Metanet. Technical Report ECE-CE-2003-2, Computer Engineering Research Laboratory, University of California, Davis, 2003. http://www.ece.ucdavis.edu/cerl/techreports/2003-2/. [ bib | http ]

John D. Owens. Computer Graphics on a Stream Architecture. PhD thesis, Stanford University, November 2002. [ bib | http ]

This file was generated by bibtex2html 1.99.

Navigate

Refereed Publications

Non-Refereed Publications