--- ---

John Owens's calculated h-index is 61. This page was automatically generated on 2025-09-20.

3413Owens:2007:ASOA Survey of General-Purpose Computation on Graphics Hardware
3165Owens:2008:GCGPU Computing
1895Liu:2020:EODEnergy-based Out-of-distribution Detection
1390Rixner:2000:MASMemory Access Scheduling
1216Harris:2007:PPSParallel Prefix Sum (Scan) with CUDA
869Sengupta:2007:SPFScan Primitives for GPU Computing
722Wang:2016:GAHGunrock: A High-Performance Graph Processing Library on the GPU
660Owens:2007:RCFResearch Challenges for On-Chip Interconnection Networks
504Khailany:2001:IMPImagine: Media Processing with Streams
450Kapasi:2003:PSPProgrammable Stream Processors
414Zhang:2011:AQPA Quantitative Performance Analysis Model for GPU Architectures
411Rixner:2000:ROFRegister Organization for Media Processing
369Kapasi:2002:TISThe Imagine Stream Processor
362Rixner:1998:ABAA Bandwidth-Efficient Architecture for Media Processing
351Zhang:2010:FTSFast Tridiagonal Solvers on the GPU
336Kepner:2016:MFOMathematical Foundations of the GraphBLAS
315Gupta:2012:ASOA Study of Persistent Threads Style GPU Programming for GPGPU Workloads
284Stuart:2011:MMOMulti-GPU MapReduce on GPU Clusters
280Alcantara:2009:RPHReal-Time Parallel Hashing on the GPU
267Davidson:2014:WPGWork-Efficient Parallel GPU Methods for Single-Source Shortest Paths
228Lefohn:2006:GGEGlift: Generic, Efficient, Random-Access GPU Data Structures
181Wang:2017:GGGGunrock: GPU Graph Analytics
180Owens:2005:SAAStreaming Architectures and Technology Trends
176Tzeng:2010:TMFTask Management for Irregular-Parallel Workloads on the GPU
172Muyan-Ozcelik:2008:FDRFast Deformable Registration on the GPU: A CUDA Implementation of Demons
163Yang:2018:DPFDesign Principles for Sparse Matrix Multiplication on the GPU
159Park:2006:DSIDiscrete Sibson Interpolation
155Silberstein:2008:ECOEfficient Computation of Sum-products on GPUs Through Software-Managed Cache
150Patel:2012:PLDParallel Lossless Data Compression on the GPU
149Samant:2008:HPCHigh performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy
146Yang:2022:GAHGraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
145Kapasi:2000:ECOEfficient Conditional Operations for Data-parallel Architectures
143Kass:2006:IDOInteractive Depth of Field Using Simulated Diffusion on a GPU
140Ebeida:2011:EMPEfficient Maximal Poisson-Disk Sampling
137Ebeida:2012:ASAA Simple Algorithm for Maximal Poisson-Disk Sampling in High Dimensions
128Owens:2002:MPAMedia Processing Applications on the Imagine Stream Processor
125Davidson:2011:AAMAn Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
125Sengupta:2006:AWSA Work-Efficient Step-Efficient Prefix Sum Algorithm
113Stuart:2009:MPOMessage Passing on Data-Parallel Architectures
103Phillips:2009:RAPRapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units
102Davidson:2012:EPMEfficient Parallel Merge Sort for Fixed and Variable Length Keys
101Lefohn:2007:RSMResolution-Matched Shadow Maps
97Owens:2000:PROPolygon Rendering on a Stream Architecture
94Ashkiani:2018:ADHA Dynamic Hash Table for the GPU
93Pan:2017:MGAMulti-GPU Graph Analytics
87Alcantara:2011:BAEBuilding an Efficient Hash Table on the GPU
85Stuart:2010:MVRMulti-GPU Volume Rendering using MapReduce
76Kapasi:2001:SSStream Scheduling
76Stuart:2011:ESPEfficient Synchronization Primitives for GPUs
75Budge:2009:ODMOut-of-core Data Management for Path Tracing on Hybrid Resources
75Patney:2008:RRAReal-Time Reyes-Style Adaptive Surface Subdivision
75Khailany:2003:ETVExploring the VLSI Scalability of Stream Processors
72Wang:2016:ACSA Comparative Study on Exact Triangle Counting Algorithms on the GPU
71Awad:2019:EAHEngineering a High-Performance GPU B-Tree
71Davidson:2011:RPFRegister Packing for Cyclic Reduction: A Case Study
70Mattson:2000:CSCommunication Scheduling
69Jenkins:2011:LLFLessons Learned from Exploring the Backtracking Paradigm on the GPU
66Davidson:2012:TTFToward Techniques for Auto-tuning GPU Algorithms
63Patney:2009:PVTParallel View-Dependent Tessellation of Catmull-Clark Subdivision Surfaces
62Owens:2002:CGOComputer Graphics on a Stream Architecture
61Abdelkader:2020:VVMVoroCrust: Voronoi Meshing Without Clipping

59Moerschell:2008:DTMDistributed Texture Memory in a Multi-GPU Environment
57Lefohn:2005:IEPImplementing Efficient Parallel Data Structures on GPUs
56Szumel:2005:TAMTowards a Mobile Agent Framework for Sensor Networks
50Yang:2018:IPEImplementing Push-Pull Efficiently in GraphBLAS
48Tzeng:2012:AGTA GPU Task-Parallel Model with Dependency Resolution
47Yang:2015:FSMFast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
44Ebeida:2011:EAGEfficient and Good Delaunay Meshes From Random Points
42Awad:2020:DGODynamic Graphs on the GPU
42Ebeida:2011:ICRIsotropic conforming refinement of quadrilateral and hexahedral meshes using two-refinement templates
41Owens:2002:CRAComparing Reyes and OpenGL on a Stream Architecture
40Wu:2015:PCOPerformance Characterization of High-Level Programming Models for GPU Graph Analytics
40Riffel:2004:MFMMio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling
38Lin:2019:BDLBenchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection
37Geil:2018:QFAQuotient Filters: Approximate Membership Queries on the GPU
37Lefohn:2005:DASDynamic Adaptive Shadow Maps on Graphics Hardware
33Osama:2023:SWP:posterStream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU
33Stuart:2010:GCGPU-to-CPU Callbacks
33Osama:2023:SWPStream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU
32Ashkiani:2018:GLAGPU LSM: A Dynamic Dictionary Data Structure for the GPU
32Stone:2011:GPAGPGPU parallel algorithms for structured-grid CFD codes
32Stuart:2011:EMTExtending MPI to Accelerators
31Zhang:2011:AHMA Hybrid Method for Solving Tridiagonal Systems on the GPU
31Owens:2005:AOGAssessment of Graphic Processing Units (GPUs) for Department of Defense (DoD) Digital Signal Processing (DSP) Applications
30Osama:2019:GCOGraph Coloring on the GPU
30Tzeng:2012:FCHFinding Convex Hulls Using Quickhull on the GPU
28Glavtchev:2011:FSLFeature-Based Speed Limit Sign Detection Using a Graphics Processing Unit
28Kniss:2005:OTOOctree Textures on Graphics Hardware
26Ashkiani:2016:GMGPU Multisplit
26Patney:2015:PAFPiko: A Framework for Authoring Programmable Graphics Pipelines
25Lin:2022:BAPBuilding a Performance Model for Deep Learning Recommendation Model Training on GPUs
25Gosink:2009:DPBData Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures
24Ashkiani:2017:GMAGPU Multisplit: an extended study of a parallel algorithm
24Gupta:2009:TOFThree-Layer Optimizations for Fast GMM Computations on GPU-like Parallel Processors
24Wang:2020:FGSFast Gunrock Subgraph Matching (GSM) on GPUs
23Odemuyiwa:2023:ASDAccelerating Sparse Data Orchestration via Dynamic Reflexive Tiling
23Zhang:2011:APEA Parallel Error Diffusion Implementation on a GPU
23Patney:2010:FCAFragment-Parallel Composite and Filter
22Wang:2015:FSAFast Parallel Suffix Array on the GPU
22Phillips:2010:UTSUnsteady Turbulent Simulations on a Cluster of Graphics Processors
22Park:2005:AFFA Framework for Real-Time Volume Visualization of Streaming Scattered Data
21Wang:2016:FPSFast Parallel Skew and Prefix-Doubling Suffix Array Construction on the GPU
20Chen:2022:SIPScalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way
20Ebeida:2014:KDS$k$-d Darts: Sampling by $k$-Dimensional Flat Searches
20Ma:2007:UVRUltra-Scale Visualization: Research and Education
20Serebrin:2002:ASPA Stream Processor Development Platform
20Gosink:2008:BIABin-Hash Indexing: A Parallel Method For Fast Query Processing
19Mahmoud:2021:RAGRXMesh: A GPU Mesh Data Structure
19Abdelkader:2018:SCFSampling Conditions for Conforming Voronoi Meshing by the VoroCrust Algorithm
19Muyan-Ozcelik:2010:ATAA Template-Based Approach for Real-Time Speed-Limit-Sign Recognition on an Embedded System using GPU Computing
18Awad:2023:AAIAnalyzing and Implementing GPU Hash Tables
18Abdelkader:2017:ACRA Constrained Resampling Strategy for Mesh Improvement
18Muyan-Ozcelik:2011:RSRReal-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU
17Osama:2022:EOPEssentials of Parallel Graph Analytics
17Wang:2019:ADIAccelerating DNN Inference with GraphBLAS and the GPU
17Pan:2018:SBSScalable Breadth-First Search on a GPU Cluster
16Chen:2022:AATAtos: A Task-Parallel GPU Scheduler for Graph Analytics
16Gupta:2011:CAMCompute \& Memory Optimizations for High-Quality Speech Recognition on Low-End GPU Processors
16Szumel:2006:TVPThe Virtual Pheromone Communication Primitive
16Khailany:2000:ISAImagine: Signal and Image Processing Using Streams
14Awad:2022:AGMA GPU Multiversion B-Tree
12Yih:2018:FVGFPGA versus GPU for Speed-Limit-Sign Recognition
12Muyan-Ozcelik:2016:MREMultitasking Real-time Embedded GPU Computing Tasks
12Geil:2014:WGCWTF, GPU! Computing Twitter's Who-To-Follow on the GPU
12Ebeida:2013:SDSifted Disks
12Zhang:2012:PDEPlane-dependent Error Diffusion on a GPU
11Osama:2023:APMA Programming Model for GPU Load Balancing
11Seitz:2019:SMFStaged Metaprogramming for Shader System Development
10Seitz:2022:SUSSupporting Unified Shader Specialization by Co-opting C++ Features
10Ebeida:2016:DDTDisk Density Tuning of a Maximal Random Packing
9Wang:2019:FBTFast BFS-Based Triangle Counting on GPUs
9Ashkiani:2016:PATParallel Approaches to the String Matching Problem on the GPU
8Liu:2018:OLAObject Localization and Motion Transfer learning with Capsules
7Owens:2007:TMSTowards Multi-GPU Support for Visualization
6Owens:2004:GTFGPUs tapped for general computing
5Abdelkader:2018:VITVoroCrust Illustrated: Theory and Challenges (Multimedia Exposition)
5Weber:2015:PRAParallel Reyes-style Adaptive Subdivision with Bounded Memory Usage
5Ebeida:2014:EIHExercises in High-Dimensional Sampling: Maximal Poisson-disk Sampling and $k$-d Darts
4Lin:2018:BDLBenchmarking Deep Learning Frameworks with FPGA-suitable Models on a Traffic Sign Dataset
4Mak:2014:GAEGPU-Accelerated and Efficient Multi-View Triangulation for Scene Reconstruction
4Phillips:2011:AO2Acceleration of 2-D Compressible Flow Solvers with Graphics Processing Unit Clusters
4Odemuyiwa:2024:TELThe EDGE Language: Extended General Einsums for Graph Algorithms
4Awad:2021:BGHBetter GPU Hash Tables
3Lin:2025:TUPTowards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms
3Brock:2019:RVRRDMA vs.\ RPC for Implementing Distributed Data Structures
3Muyan-Ozcelik:2017:MFMMethods for Multitasking among Real-time Embedded Compute Tasks Running on the GPU
3Gegan:2016:RGTReal-Time GPU-based Timing Channel Detection using Entropy
2Owens:2018:TPGTechnical Perspective: Graphs, Betweenness Centrality, and the GPU
2Wang:2017:MALMini-Gunrock: A Lightweight Graph Analytics Framework on the GPU
2Kemal:2016:MSAMultidisciplinary simulation acceleration using multiple shared memory graphical processing units
2Silberstein:2011:ASCApplying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads
2Drescher:2023:BAPBoba: A parallel lightweight graph reordering algorithm with heavyweight implications
2Shinn:2023:TSRThe Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks
2Seitz:2013:AGIA GPU Implementation for Two-Dimensional Shallow Water Modeling
2Owens:2004:OTSOn The Scalability of Sensor Network Routing and Compression Algorithms
2Szumel:2003:OTFOn the Feasibility of the UC Davis Metanet
1Geil:2023:MCEMaximum Clique Enumeration on the GPU
1Wapman:2023:HCAHarmonic CUDA: Asynchronous Programming on GPUs
1Owens:2006:TIAThe Installation and Use of OpenType Fonts in \LaTeX
1Liu:2019:UOSUnsupervised Object Segmentation with Explicit Localization Module

---