• Anime Sketch Colorization and Style Transfer

    In this project I explore how to color a sketch in the style according to another colored image.

  • NeX novel view synthesis

    Novel view synthesis aims to generate a visual scene representation from just a sparse set of images and has a wide variety of uses in VR/AR. In this blog, we will explore a new approach to this problem called NeX.

  • NeRF Exploration

    The focus of project is to explore various approaches to generating 3-D scenes with Neural Radiance Fields. we utilize NeRF and NeRF variants to model Royce Hall and generate and generate novel viewpoints using prior images.

  • Enhanced Self-Driving with Vision Based model.

    Self-driving is a hot topic for deep vision applications. However, Vision-based urban driving is hard. Lots of methods for learning to drive have been proposed in the past several years. In this work, we focus on reproducing “Learning to drive from a world on rails” and trying to solve the drawbacks of the methods such as high pedestrian friction rate. We will also utilize the lidar(which is preinstalled on most cars with some self-driving capability) data to achieve better benchmark results.

  • Text Guided Art Generation

    Our project mainly works on investigating and innovating a text-guided image generation model and an art painting model and connecting the two models together to create artworks with only text inputs. Some artists have already applied AI/Deep Learning in their art creation (VQGAN-CLIP), while the development of diffusion models and transformers may provide more stable and human-like output. In this project, we will talk about how we utilized VQGAN + CLIP and Paint Transformer for the whole generating pipeline and present Colab demos.

  • Interactive Environment, Embodied AI

    My project is to investigate current trends in Embodied AI development and research, to report on methods of using virtual environments to enable reinforcement learning, and guide the reader into setting up an interactive environment on their own computer using iGibson.

  • MetaFormer and DETR

    Our previous topic was on egocentric Pose Estimation, however, due to the complications of setting up physical environment GUI using Google Cloud and the incompleteness of the official repository, after discussing with Prof. Zhou, we decided to swithc our topic to MetaFormer and DETR.

    As we know, DETR is a recent popular algorithm developed by Facebook (Meta) that performs instance segmentation task without using anchor box, which was one of the core features of networks like Faster R-CNN and Mask R-CNN, significantly reducing the complexity of the training pipeline. The Core of the DETR network is the transformer encoder and transformer decoder, which all utilize the MultiHead Attention module. While reading papers, we came across a paper that discusses the effect of the overall structure of transformer (MetaFormer) and we decided to combine these 2 interesting papers.

  • DeepFake Generation

    This is the blog that records and explains technical details for Chenda Duan and Zhengtong Liu’s CS188 DLCV project. We investigate some novel and powerful methods of two topics in DeepFake Generation: Image Animation and Image-to-Image Translation methods. For better understanding and for fun, we have create a demo. You may need to create a copy and modify the path in the deepFake_demo.ipynb file.

  • Graph Convolution Networks for fusion of RGB-D images

    3D Point Cloud understanding is critical for many robotics applications with unstructured environments. Point Cloud Data can be obtained directly (e.g. LIDAR) or indirectly through depth maps (stereo cameras, depth from de-focus, etc.), however efficient merging the information gained from point clouds with 2D image features and textures is an open problem. Graph convolutional networks have the ability to exploit geometric information in the scene that is difficult for 2D based image recognition approaches to reason about and we test how these features can improve classification models.

  • Pose Estimation

    Pose estimation is the use of machine learning to estimate the pose of a person or animal from an image or a video by examining the spatial locations of key body joints. Our project will consist of two main parts: (1) this article, detailing pose estimation background and modeling techniques (2) an interative Google Colaboratory document demonstrating pose estimation in action.

  • Trajectory Prediction

    Behaviour prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars. In this blog, we will investigate a few different approaches to tackling this multifaceted problem and reproduce the work of Gao, Jiyang et al. by implementing VectorNet in PyTorch.

  • An Introduction to Medical Image Segmentation

    Medical image segmentation serves as the backbone of medical image processing in today’s world. In order to account for the variability in medical imaging, medical image segmentation detects boundaries within 2D and 3D images in order to identify crucial features and sizes of objects within them. This has tremendously assisted research, diagnosis, and computer-based surgery within the medical field. With the rise of deep learning algorithms, medical image segmentation has seen an increase in accuracy and performance and has led to incredible new innovations within the medical field.

  • MMDetection

    In this article, we investigate state of the art object detection algorithms and their implementations in MMDetection. We write an up-to-date guide for setting up and running MMDetection on a structured COCO-2017 dataset in Google Colab based on our experience. We analyse and classify the key component structure (detection head/neck/backbone) of MMdetection models. After which, we reproduce the object detection task for popular models on main-stream benchmarks including the COCO-2017 dataset and construct a datailed interactive table. In the last part of our work, we use signature object detection algorithms on real-live photos and videos taken at UCLA campus and provide error analysis. Finally, we perform basic adversarial attacks on our models using representave photos and evaluate their effects.

    Video 1. Spotlight Overview.

  • Deepfake Detection

  • Video Colorization

    Historical videos like old movies are all black and white before the invention of colored cameras. However, have you wondered how the good old time looked like with colors? We will attempt to colorize old videos with the power of deep neural networks.

  • Generative Adversarial Networks with Transformers

    While Vision Transformers have caught quite some attention in the community, it is still yet to be explored how such powerful models could work on building powerful GANs. Based on some recent progress in studying Transformers’ position encoding system, we want to explore the possibility of building a vision-oriented transformer block that is simple, light-weighted yet effective for a stable training of transformer GANs.

  • Depth From Stereo Visions

    Our project explores the use of Deep Learning for inferring the depth data based on two side-by-side camera images. This is done by determining which pixels on each image corresponding to the same object (a process known as stereo matching), and then calculating the distance between corresponding pixels, from which the depth can be calculated (with information about the placement of the cameras). While there exist classical vision based solutions to stereo matching, deep learning can produce better results.

  • Tunable GANs

    Today, generative adversarial networks (GANs) make up most state-of-the-art deepfake portrait generators. However, most of these deep learning networks generate completely random images; the user has little to no control over the output, and it is difficult to determine how the images are produced. This gap limits the utility of modern deepfake technology for potential users like special effects teams. We will therefore investigate how to make GAN-based deepfake generators tunable and understandable so that users can edit different aspects of generated images in real time. We then examine some of the applications of tunable GANs.

  • Medical Image Segmentation

    The use of deep learning methods in medical image analysis has contributed to the rise of new fast and reliable techniques for the diagnosis, evaluation, and quantification of human diseases. We will study and discuss various robust deep learning frameworks used in medical image segmentation, such as PDV-Net and ResUNet++. Furthermore, we will implement our own neural network to explore and demonstrate the capabilities of deep learning in this medical field.

  • Sheet Music Recognition

    Sheet Music Recognition is a difficult task. Zaragoza et al. devised a method for recognizing monophonic scores (one staff). We extend this functionality for piano sheet music with multiple staves (treble and bass). https://github.com/NingWang1729/tf-end-to-end

  • Text Guided Image Generation

    Text-guided image generation is an important milestone for both natural language procesing and computer vision. It seeks to use natural language prompts to generate new images or edit previous images. Recently diffusion models have been shown to produce better results than GANS in regards to text-guided image generation. In this article, we will be examining GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

  • Post Template

    This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.

  • An Overview of Deep Learning for Curious People (Sample post)

    Starting earlier this year, I grew a strong curiosity of deep learning and spent some time reading about this field. To document what I’ve learned and to provide some interesting pointers to people with similar interests, I wrote this overview of deep learning models and their applications.