Zeeshan Zia

Zeeshan Zia
Microsoft Studio A
15291 NE 40th St
Redmond, WA 98052
zeeshan.zia@microsoft

Dr. Zeeshan Zia researches computer vision and deep learning solutions at Microsoft. His core expertise lies in applied machine learning, 3D object localization, and Simultaneous Localization And Mapping (SLAM), and he is particularly interested in exploring how data-driven techniques can contribute to robust 3D perception. In the recent past, he worked on 3D perception for autonomous vehicles at an industrial research lab in Silicon Valley. He did his graduate studies in computer vision and machine learning at TU-Munich and ETH-Zurich, after which he spent a couple of years at Imperial College London as a postdoc.

Curriculum Vitae

External Links

Industry

NEC Laboratories America
Cupertino, CA
...
Researcher
2015-2017

Qualcomm Research
Vienna, Austria
...
Research Intern
Summer 2013

Siemens Corp. Technologies
Munich, Germany
...
Engineering Intern
Summer 2008

SUPARCO
Karachi, Pakistan
...
Engineering Intern
2006

Academia

Postdoc
Imperial College London
...
London, UK
2014-2015

PhD
Swiss Federal Institute of Technology
...
Zurich, Switzerland
2009-2013

MS
Munich University of Technology
...
Munich, Germany
2007-2009

Selected Recent Publications see all...

  • M.F. Salem, Q.H. Tran, M.Z. Zia, P. Vernaza, M. Chandraker. Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences. arXiv (ECCV submission). 2018Conference
    Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), especially when applied to the task of geometric feature matching. While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks. We leverage this insight together with explicit supervision at multiple levels of the feature hierarchy for better regularization, to learn more effective descriptors in the context of geometric matching tasks. Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks. We propose concrete CNN architectures employing these ideas, and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets.
    @inproceedings{salem18arxiv,
     author = {M.F. Salem and Q.H. Tran and M.Z. Zia and P. Vernaza and M. Chandraker},
     title = {Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences},
     booktitle = {arXiv (1803.07231)},
     year = {2018}
    }
  • C. Li, M.Z. Zia, X. Yu, G. Hager, M. Chandraker. Deep Supervision with Intermediate Concepts. arXiv (TPAMI submission). 2018Journal
    Recent data-driven approaches to scene interpretation predominantly pose inference as an end-to-end black-box mapping, commonly performed by a Convolutional Neural Network (CNN). However, decades of work on perceptual organization in both human and machine vision suggests that there are often intermediate representations that are intrinsic to an inference task, and which provide essential structure to improve generalization. In this work, we explore an approach for injecting prior domain structure into neural network training by supervising hidden layers of a CNN with intermediate concepts that normally are not observed in practice. We formulate a probabilistic framework which formalizes these notions and predicts improved generalization via this deep supervision method. One advantage of this approach is that we are able to train only from synthetic CAD renderings of cluttered scenes, where concept values can be extracted, but apply the results to real images. Our implementation achieves the state-of-the-art performance of 2D/3D keypoint localization and image classification on real image benchmarks, including KITTI, PASCAL VOC, PASCAL3D+, IKEA, and CIFAR100. We provide additional evidence that our approach outperforms alternative forms of supervision, such as multi-task networks
    @inproceedings{li2018tpami,
     author = {C. Li and Q.H. Tran and M.Z. Zia and G. Hager and M. Chandraker},
     title = {Deep Supervision with Intermediate Concepts.},
     booktitle = {arXiv (1801.03399)},
     year = {2018}
    }