UCLA CS188 Course Projects

Jeremy Tsai on Mar 19, 2022
Anime Sketch Colorization and Style Transfer

In this project I explore how to color a sketch in the style according to another colored image.
Kevin Tang, Edmond Xie on Mar 19, 2022
NeX novel view synthesis

Novel view synthesis aims to generate a visual scene representation from just a sparse set of images and has a wide variety of uses in VR/AR. In this blog, we will explore a new approach to this problem called NeX.
Felix Zhang on Mar 19, 2022
NeRF Exploration

The focus of project is to explore various approaches to generating 3-D scenes with Neural Radiance Fields. we utilize NeRF and NeRF variants to model Royce Hall and generate and generate novel viewpoints using prior images.
Zifan Zhou, Justin Cui on Mar 19, 2022
Enhanced Self-Driving with Vision Based model.

Self-driving is a hot topic for deep vision applications. However, Vision-based urban driving is hard. Lots of methods for learning to drive have been proposed in the past several years. In this work, we focus on reproducing “Learning to drive from a world on rails” and trying to solve the drawbacks of the methods such as high pedestrian friction rate. We will also utilize the lidar(which is preinstalled on most cars with some self-driving capability) data to achieve better benchmark results.
Zifan He, Wenjie Mo on Mar 19, 2022
Text Guided Art Generation

Our project mainly works on investigating and innovating a text-guided image generation model and an art painting model and connecting the two models together to create artworks with only text inputs. Some artists have already applied AI/Deep Learning in their art creation (VQGAN-CLIP), while the development of diffusion models and transformers may provide more stable and human-like output. In this project, we will talk about how we utilized VQGAN + CLIP and Paint Transformer for the whole generating pipeline and present Colab demos.
David Watson on Mar 18, 2022
Interactive Environment, Embodied AI

My project is to investigate current trends in Embodied AI development and research, to report on methods of using virtual environments to enable reinforcement learning, and guide the reader into setting up an interactive environment on their own computer using iGibson.
Guofeng Zhang, Zihao Dong on Mar 17, 2022
MetaFormer and DETR

Our previous topic was on egocentric Pose Estimation, however, due to the complications of setting up physical environment GUI using Google Cloud and the incompleteness of the official repository, after discussing with Prof. Zhou, we decided to swithc our topic to MetaFormer and DETR.

As we know, DETR is a recent popular algorithm developed by Facebook (Meta) that performs instance segmentation task without using anchor box, which was one of the core features of networks like Faster R-CNN and Mask R-CNN, significantly reducing the complexity of the training pipeline. The Core of the DETR network is the transformer encoder and transformer decoder, which all utilize the MultiHead Attention module. While reading papers, we came across a paper that discusses the effect of the overall structure of transformer (MetaFormer) and we decided to combine these 2 interesting papers.
Zhengtong Liu, Chenda Duan on Mar 17, 2022
DeepFake Generation

This is the blog that records and explains technical details for Chenda Duan and Zhengtong Liu’s CS188 DLCV project. We investigate some novel and powerful methods of two topics in DeepFake Generation: Image Animation and Image-to-Image Translation methods. For better understanding and for fun, we have create a demo. You may need to create a copy and modify the path in the deepFake_demo.ipynb file.
Alexander Swerdlow, Puneet Nayyar on Mar 16, 2022
Graph Convolution Networks for fusion of RGB-D images

3D Point Cloud understanding is critical for many robotics applications with unstructured environments. Point Cloud Data can be obtained directly (e.g. LIDAR) or indirectly through depth maps (stereo cameras, depth from de-focus, etc.), however efficient merging the information gained from point clouds with 2D image features and textures is an open problem. Graph convolutional networks have the ability to exploit geometric information in the scene that is difficult for 2D based image recognition approaches to reason about and we test how these features can improve classification models.
Nicholas Dean and Ellie Krugler on Mar 16, 2022
Pose Estimation

Pose estimation is the use of machine learning to estimate the pose of a person or animal from an image or a video by examining the spatial locations of key body joints. Our project will consist of two main parts: (1) this article, detailing pose estimation background and modeling techniques (2) an interative Google Colaboratory document demonstrating pose estimation in action.
Sudhanshu Agrawal, Jenson Choi on Mar 4, 2022
Trajectory Prediction

Behaviour prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars. In this blog, we will investigate a few different approaches to tackling this multifaceted problem and reproduce the work of Gao, Jiyang et al. by implementing VectorNet in PyTorch.
Aaron Minkov on Feb 28, 2022
An Introduction to Medical Image Segmentation

Medical image segmentation serves as the backbone of medical image processing in today’s world. In order to account for the variability in medical imaging, medical image segmentation detects boundaries within 2D and 3D images in order to identify crucial features and sizes of objects within them. This has tremendously assisted research, diagnosis, and computer-based surgery within the medical field. With the rise of deep learning algorithms, medical image segmentation has seen an increase in accuracy and performance and has led to incredible new innovations within the medical field.
Yu Zhou, Zongyang Yue on Feb 20, 2022
MMDetection

In this article, we investigate state of the art object detection algorithms and their implementations in MMDetection. We write an up-to-date guide for setting up and running MMDetection on a structured COCO-2017 dataset in Google Colab based on our experience. We analyse and classify the key component structure (detection head/neck/backbone) of MMdetection models. After which, we reproduce the object detection task for popular models on main-stream benchmarks including the COCO-2017 dataset and construct a datailed interactive table. In the last part of our work, we use signature object detection algorithms on real-live photos and videos taken at UCLA campus and provide error analysis. Finally, we perform basic adversarial attacks on our models using representave photos and evaluate their effects.

Video 1. Spotlight Overview.
Arnav Garg on Feb 4, 2022
Deepfake Detection
Tony Xia, Vince Ai on Jan 28, 2022
Video Colorization

Historical videos like old movies are all black and white before the invention of colored cameras. However, have you wondered how the good old time looked like with colors? We will attempt to colorize old videos with the power of deep neural networks.
Sidi Lu on Jan 27, 2022
Generative Adversarial Networks with Transformers

While Vision Transformers have caught quite some attention in the community, it is still yet to be explored how such powerful models could work on building powerful GANs. Based on some recent progress in studying Transformers’ position encoding system, we want to explore the possibility of building a vision-oriented transformer block that is simple, light-weighted yet effective for a stable training of transformer GANs.
Alex Mikhalev, David Morley on Jan 27, 2022
Depth From Stereo Visions

Our project explores the use of Deep Learning for inferring the depth data based on two side-by-side camera images. This is done by determining which pixels on each image corresponding to the same object (a process known as stereo matching), and then calculating the distance between corresponding pixels, from which the depth can be calculated (with information about the placement of the cameras). While there exist classical vision based solutions to stereo matching, deep learning can produce better results.
Sabrina Liu, Daniel Smith on Jan 27, 2022
Tunable GANs

Today, generative adversarial networks (GANs) make up most state-of-the-art deepfake portrait generators. However, most of these deep learning networks generate completely random images; the user has little to no control over the output, and it is difficult to determine how the images are produced. This gap limits the utility of modern deepfake technology for potential users like special effects teams. We will therefore investigate how to make GAN-based deepfake generators tunable and understandable so that users can edit different aspects of generated images in real time. We then examine some of the applications of tunable GANs.
Lenny Wu, Katie Chang on Jan 27, 2022
Medical Image Segmentation

The use of deep learning methods in medical image analysis has contributed to the rise of new fast and reliable techniques for the diagnosis, evaluation, and quantification of human diseases. We will study and discuss various robust deep learning frameworks used in medical image segmentation, such as PDV-Net and ResUNet++. Furthermore, we will implement our own neural network to explore and demonstrate the capabilities of deep learning in this medical field.
Ning Wang and Alan Yao on Jan 26, 2022
Sheet Music Recognition

Sheet Music Recognition is a difficult task. Zaragoza et al. devised a method for recognizing monophonic scores (one staff). We extend this functionality for piano sheet music with multiple staves (treble and bass). https://github.com/NingWang1729/tf-end-to-end
Devin Yerasi, Jing Zou on Jan 18, 2022
Text Guided Image Generation

Text-guided image generation is an important milestone for both natural language procesing and computer vision. It seeks to use natural language prompts to generate new images or edit previous images. Recently diffusion models have been shown to produce better results than GANS in regards to text-guided image generation. In this article, we will be examining GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.
UCLAdeepvision on Jan 18, 2022
Post Template

This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.
Lilian Weng on Jun 21, 2017
An Overview of Deep Learning for Curious People (Sample post)

Starting earlier this year, I grew a strong curiosity of deep learning and spent some time reading about this field. To document what I’ve learned and to provide some interesting pointers to people with similar interests, I wrote this overview of deep learning models and their applications.

Anime Sketch Colorization and Style Transfer In this project I explore how to color a sketch in the style according to another colored image.

NeX novel view synthesis Novel view synthesis aims to generate a visual scene representation from just a sparse set of images and has a wide variety of uses in VR/AR. In this blog, we will explore a new approach to this problem called NeX.

NeRF Exploration The focus of project is to explore various approaches to generating 3-D scenes with Neural Radiance Fields. we utilize NeRF and NeRF variants to model Royce Hall and generate and generate novel viewpoints using prior images.

Deepfake Detection

Video Colorization Historical videos like old movies are all black and white before the invention of colored cameras. However, have you wondered how the good old time looked like with colors? We will attempt to colorize old videos with the power of deep neural networks.

Sheet Music Recognition Sheet Music Recognition is a difficult task. Zaragoza et al. devised a method for recognizing monophonic scores (one staff). We extend this functionality for piano sheet music with multiple staves (treble and bass). https://github.com/NingWang1729/tf-end-to-end

Post Template This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.

Anime Sketch Colorization and Style Transfer

In this project I explore how to color a sketch in the style according to another colored image.

NeX novel view synthesis

Novel view synthesis aims to generate a visual scene representation from just a sparse set of images and has a wide variety of uses in VR/AR. In this blog, we will explore a new approach to this problem called NeX.

NeRF Exploration

The focus of project is to explore various approaches to generating 3-D scenes with Neural Radiance Fields. we utilize NeRF and NeRF variants to model Royce Hall and generate and generate novel viewpoints using prior images.

Video Colorization

Historical videos like old movies are all black and white before the invention of colored cameras. However, have you wondered how the good old time looked like with colors? We will attempt to colorize old videos with the power of deep neural networks.

Sheet Music Recognition

Sheet Music Recognition is a difficult task. Zaragoza et al. devised a method for recognizing monophonic scores (one staff). We extend this functionality for piano sheet music with multiple staves (treble and bass). https://github.com/NingWang1729/tf-end-to-end

Post Template

This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.