2023W, UCLA CS188 Course Projects

Michael Simon, Victor Lin on Mar 29, 2023
Contrastive Language–Image Pre-training Benchmarks and Explainability

The manual labelling of high quality datasets for the purposes of training Computer Vision models remains one of the most time consuming tasks for Computer Vision research. Consequently, alternative methods of extracting label information from existing images and visual data is one of the areas of focus for recent research. In this blog we explore one of the state of the art models in pre-training Image Classification models, namely CLIP (Contrastive Language–Image Pre-training). Extracting latent labels from images already associated with text widely available on the internet is a promising method to fast-track the training of Computer Vision models using text and image encoders. These models demonstrate the power of Contrastive Pre-training to perform well with “zero-shot” classification, or classifying images which the model has not been trained on or seen before.
Euibin Kim on Mar 28, 2023
Galaxy Detection/Classification with Computer Vision

In this blog, I will share my experience in using a machine learning model (based on YOLO) that detects and classifies galaxies from public datasets from the Sloan Digital Sky Survey (SDSS) and Galaxy Zoo while taking CS 188: Deep Learning for Computer Vision at UCLA.
Karl Goeltner, Rudy Orre (Team 04) on Mar 28, 2023
Object Detection Algorithms in MMDetection

Topic: Object Detection Algorithms This project compares object detection algorithms using MMDetection between YOLO and Fast R-CNN family of models.
Evan He, Samarth Upadhyaya on Mar 27, 2023
Generating Images with Diffusion Models

This project focuses on the generating images from text using diffusion models.
Drake Cote, Nathan Paredes-Kao on Mar 26, 2023
Mushroom Classification Project Proposal

Mushrooms can be delicious, symbols in popular culture, and found nearly anywhere, but dealing with them safely can be tricky because while many look the same to the untrained eye, some mushrooms are extremely poisonous. Here, we attempted to classify 1394 types of mushrooms with as little as 3 images for some species using various deep learning methods.
Bryce Stulman on Mar 26, 2023
An Introduction to NeRFs

This project is an introduction to the Neural Rendering Field (NeRF) model for generating novel views of 3D scenes. It explains what a NeRF is, how it works, and provides a complete example of how to build the original NeRF architecture using PyTorch in Google Colab.
Maxwell Dalton (Team 41) on Mar 26, 2023
Skin Lesion Classification

It is crucial to catch skin cancer at an early stage. This can be done using image classification techniques. Specifically, this post aims to explore using different methods to handle class imbalance while classifying skin lesions using the HAM10000 dataset.
Team 32 (Kevin Jiang, Michael Yang) on Mar 26, 2023
Trajectory Prediction

Trajectory prediction of pedestrians and vehicles involves utilizing information relating to the previous locations of such subjects to predict where they will be in the future, which is important in contexts such as that of an autonomous vehicle but may be complex as subjects may respond unpredictably in a real-time environment. We examine two approaches to trajectory prediction, a stepwise goal approach with SGNet [1] [2] and a graph approach with PGP [3] [4], while also briefly examining a third model, Trajectron++ [5] [6], as a comparison. We work with the ETH / UCY (obtained through the Trajectron++ repository [6]) and nuScenes [7] datasets during these studies.
Zifan Zhou on Mar 26, 2023
NeRF Models

This project delves into the cutting-edge technology behind Image-generative AIs, particularly Neural Radiance Fields (NeRF). We will investigate the research and methodologies behind NeRF and its applications in 3D scene reconstruction and rendering. Additionally, we will provide a hands-on, toy implementation using Pytorch and Google Colab.
Enoch Xu, JR Bronkar (Team 08) on Mar 26, 2023
Video Grounding

Topic: Video Grounding
Akhil Vintas and Jeffrey Yang on Mar 26, 2023
Melanoma and Skin Cancer Detection

Skin cancer is one of the most prevalent cancers in the US and often misdiagnosed or underdiagnosed globally. Benign and malignant lesions often appear visually similar and are only distinguishable after intensive tests. However, deep learning models have proven to be extremely accurate in the classification of melanoma and other skin cancers. Recently, convolutional neural network (CNN) models such as Resnet-50 have achieved over 85% classification accuracy on the ISIC binary melanoma classification datasets and ensemble classifiers have reached over 95% accuracy. In this project, we will explore the relevant high-performing CNN models and their efficacy when utilized for skin cancer classification. We will run various experiments on these models to explore performance-related differences and potential issues with current datasets available. We find from our experiments that pre-trained CNNs are already quite robust in their ability to categorize seven different skin lesion categories from the ISIC 2018 dataset. However, the specific errors of the networks depend significantly on the data, demonstrating potential input-related biases and issues with current data available.
Siwei Yuan, Yunqiu Han on Mar 26, 2023
Exploring The Effectiveness Of Classification Models In Autonomous Driving Simulators Through Imitation Learning

In this blog, we explore the effectiveness of supervised learning (classification) models that are trained by imitation learning in autonomous driving simulators.
Anubha Kale, Ellen Wei on Mar 26, 2023
Object Detection Algorithms

This project explores different R-CNN based deep learning algorithms used for object detection. We analyze the performance 5 different R-CNN based models for transfer learning on a new dataset, using OpenMMLab’s MMDetection toolbox to finetune these 5 models, pretrained on COCO and using the Aerial Maritime target dataset.
Michael Yu and Zhuoran Liao on Mar 26, 2023
Human Action Recognition using Pose Estimation and Transformers

This post is a review of different models for pose estimation and their applications to human action recognition (HAR) when combined with Transformers.
Yuxiao Lu on Mar 25, 2023
Adversarial examples to DNNs

DNN driven image recognition have been used in many real-world scenarios, such as for detection of road case or people. However, the DNNs could be vulnerable to adversarial examples (AEs), which are designed by attackers and can mislead the model to predict incorrect outputs while hardly be distinguished by human eyes. This blog aims to introduce how to generate AEs and how to defend these attacks.

YouTube link of the demo: https://youtu.be/3G267xLYzPU
Yuxi Chang on Mar 25, 2023
Anime ViT-StyleGAN2

Generative adversarial network (GAN) is a type of generative nueral network capable of varies tasks such as image creation, super-resolution, and image classifications[1]. This project will explore the usage of GAN model on the specific domain of anime character and try to find improvemnts on current GAN model by using more advanced discriminators.
Weizhen Wang, Baiting Zhu on Mar 25, 2023
Trajectory Prediction in Autonomous Vehicles

In recent years, the computer vision community has became more involved in automonomous vehicle. With evergrowing hardware support on modeling more complicated interactions between agents and street objects, trajectory prediction of traffic flows is yielding more promising results. The introduction of Transformer technique also galvanized the deep learning community, and our team will explore the application of Transformer in trajectory prediction.
Alex Xia on Mar 22, 2023
Text Guided Image Generation

My project investigates GLIDE, a state of the art text to image generation diffusion model by OpenAI that uses classifier-free model. I ran investigations on how a filtered version of this model performed by giving different kinds of text input and tuning different hyperparameters to observe the generated output. The results that I found is that while GLIDE definitely is quite powerful in generating images, there are still quite a numerb of issues including bias, image quality, novel images, and image types.
Ethan Truong, Archisha Datta on Mar 18, 2023
Deep Learning-Based Single Image Super-Resolution Techniques

Image super-resolution is a process used to upscale low-resolution images to higher resolution images while preserving texture and semantic data. We will outline how state-of-the art techniques have evolved over the last decade and compare each model to its predecessor. We will also show PyTorch implementations for some of the described models.
Lukas Brockenbrough, Laurence Yang on Feb 26, 2023
Human Activty Classification Using Transformer Encoders
Victoria Lam, Austin Law on Feb 26, 2023
Driving Simulators

Self-driving is a hot topic among deep vision learning. One way of training driving models is using imitation learning. In this work, we focus on reproducing the findings from “End-to-end Driving via Conditional Imitation Learning.” To do so, we utilize CARLA, a driving simulator, and emulate the models created in said paper using their provided dataset.
Andrew Fantino and Nicholas Oosthuizen on Feb 26, 2023
Analysis of Panoptic Image Segmentation Performance

Panoptic segmentation is a type of image segmentations that unifies instance and semantic segmentation. In this project we compare three different panoptic segmentations models: Panoptic FPN, MaskFormer, and Mask2Former. This includes descriptions of their architectures, objective analysis and subjective analysis of the models. We evaluated eaach modle on the COCO validation dataset with mmdetection, compared differences in model segmentation, and visualized the attentions of MaskFormer and Mask2Former with Detectron2 and HuggingFace.
Ryan Vuong and Travis Graening on Feb 25, 2023
Object Detection

In this project, we wish to dive deep and examine the YOLO v7 (You Only Look Once) object detection model while exploring any possible improvements. YOLO v7 is one of the fastest object detection models in the world today which provides highly accurate real-time object detection. During our examination, we will inspect the layers and algorithms within YOLO v7, test the pre-trained YOLO v7 model provided by the YOLO team, and train and test YOLO v7 against different datasets.
Alex haddad on Jan 31, 2023
Panoptic Scene Graph Generation and Panoptic Segmentation

In this post, we’ll explore what panoptic scene graph generation and panoptic segmentation are, their implementation, and potential applications.
Daniel Yoonseo Kim on Jan 30, 2023
Facial Detection and Recognition

This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.
Michael Ryu on Jan 29, 2023
GAN Network Application towards Face Restoration

Analyzing the Application of Generative Adversarial Networks For Face Restoration
Ben Klingher, Erik Ren on Jan 29, 2023
Object Detection and Classification of Cars by Make using MMDetection
Ashley Zhu on Jan 29, 2023
The Influence of Deep Learning in Dance

Below is our setup for part 1…
Ian Galvez, James Youn on Jan 29, 2023
Visual Physical Reasoning

Our project will be based around the topic of visual physical reasoning. Essentially, visual physical reasoning deals with whether computers can answer “common sense” queries about an image, such as “how many chairs are at the table?”, or “where is the red cube in relation to the purple ball?”
Brandon Le on Jan 29, 2023
Image Similarity

Image Similarity has important applications in areas such as retrieval, classification, change detection, quality evaluation and registration. Here, we aim to improve the performance of models to pair images based on their similarities.
Tang Mohan on Jan 29, 2023
Exploring CLIP

Abstract: This project explores CLIP - Contrasive Language-Image Pre-training, a pre-training technique that jointly trains image and text encoders to map them into the same feature space. This project reproduces CLIP’s one shot ability and performance of a linear classifier using CLIP’s features. It also proposes a method to initialize the few-shot classifiers, solving the previously discovered problem that few-shot classifiers could underperform zero-shot classifier. It is found that there is a special type of overfitting in few-shot classification, which poses a significant challenge to the concept of few-shot classification. This project also trains a image generation model and a semantic segmentation model using CLIP’s features, which can provide us insight into the amount of information that CLIP’s features contain.
Thant Zin Oo on Jan 29, 2023
Language Representation for Computer Vision

In this blog I will investigate natural language representations as they are used in computer vision. A foray into the predominant language architecture, Transformers, will be linked to tasks in image captioning and art generation.
Charlotte Meyer and Hussein Hassan on Jan 29, 2023
Art Generation using Cycle GANS
Caleb Lee and Rohan Rao on Jan 29, 2023
Action Localization
Ning Xu on Jan 28, 2023
CLIP and Some Applicatios

CLIP (Contrastive Language-Image Pre-training) is a neural network model developed by OpenAI that combines the strengths of both computer vision and natural language processing to improve image recognition and object classification. CLIP has been shown to achieve state-of-the-art results on a wide range of image recognition benchmarks, and it has been used in various applications such as image captioning and image search. I found an open-source SimpleCLIP model online and conducted some experiments to reconstruct the model. Using ideas and methods from some papers, I attempted to address its overfitting issue.
Aristotle Henderson, John Houser on Jan 28, 2023
Pose Estimation

Pose estimation research has been growing rapidly and recent advances have allowed us to accurately detect the various joints in the human body from just a photo. Convolutional Neural Networks have been the medium for obtaining high performance models and in this post we explore the novel model HRNet. Applications of pose estimation include identifying classes of actions undertaken by individuals, their poses, and animation.
Janice Tsai and Zachary Chang on Jan 28, 2023
Dataset Analysis for Deepfake Detection Using Mesonet
Abstract

Deepfake detection models and algorithms are the future in preventing malicious attacks against entities that wish to use deepfakes to circumvent modern security measures, sway the opinions of groups of people, or simply to deceive other entities. This analysis of a deepfake detection model can potentially bolster our confidence in future detection models. At the core of the effectiveness of every model is the data used to train and test the model. Good data (generally) creates better models that are more trustworthy to be deployed in real applications. In this study, the MesoNet model is used as the model for training and testing and evaluation. From the three datasets we investigated, the model trained on Dataset 1 (found in Dataset Control) performed the best. We then chose various properties of an image to analyze to compare the three datasets, which include RGB colors, contrast, brightness, and entropy. After comparing the mean and standard deviation across the image properties, we found that the mean values from Dataset 1 are in between the other two datasets and has higher variance in 2 of the 4 image properties observed.
Isaac Li, Ivana Chang on Jan 27, 2023
Text Guided Image Generation

Text-to-image models are machine learning models which take as input a natural language description and produce an image matching the given description. We will explore the architecture and design of one such model, Stable Diffusion, and explore extensions of Stable Diffusion through experiments with prompt-to-prompt editing and model finetuning.
Sarah Mauricio and Andres Cruz on Jan 26, 2023
Deep Fake Generation

The use of deep learning methods in deep fake generation has contributed to the rise of fake of fake images which has some very serious ethical dilemmas. We will look at two different ways to generate deepfake pictures and videos, and will then focus in on Image-to-Image Translation. CycleGAN and StarGAN are two different models we will by studying to create deep fake images using Image-to-Image Translation.
Dave Ho, Anthony Zhu on Jan 26, 2023
Generating Images with Diffusion Models

This project explores the latest technology behind Image-generative AIs such as DALLE-2 and Imagen. Specifically we’ll be going over the research and techniques behind Diffusion Models, and a toy implementation in Pytorch/COLAB.
Shrea Chari, Dhakshin Suriakannu on Jan 24, 2023
Deepfake Generation

This blog details Shrea Chari and Dhakshin Suriakannu’s project for the course CS 188: Computer Vision. This project discusses the topic of Deepfake Generation and takes a deep dive into three of the most popular models: Cycle-GAN, StarGAN and StyleGAN.
Kuan Heng (Jordan) Lin on Mar 27, 2022
DDIM Inversion and Latent Space Manipulation

We explore the inversion and latent space manipulation of diffusion models, particularly the denoising diffusion implicit model (DDIM), a variant of the denoising diffusion probabilistic model (DDPM) with deterministic (and acceleratable) sampling and thus a meaningful mapping from the latent space \(\mathcal{Z}\) to the image space \(\mathcal{X}\). We implement and compare optimization-based, learning-based, and hybrid inversion methods adapted from GAN inversion, and we find that optimization-based methods work well, but learning-based and hybrid methods run into obstacles fundamental to diffusion models. We also perform latent space interpolation to show that the DDIM latent space is continuous and meaningful, just like that of GANs. Lastly, we apply GAN semantic feature editing methods to DDIMs, visualizing binary attribute decision boundaries to showcase the unique interpretability of the diffusion latent space.
Mengran (Diana) Dai, Yupei Hu on Mar 26, 2022
Investigation on the Robustness of CLIP

Topic: CLIP: Text-image joint embedding
Hongzhe Du and Olivia Zhang on Feb 26, 2022
Multi View Stereo (MVS)

This post provides an introduction to Multi View Stereo (MVS) and presents to deep learning based algorithms for MVS reconstruction.
Freddy Aguilar, Luis Frias on Jan 30, 2022
Deepfake Generation

This article investigates the topic of DeepFake Generation, comparing two models, GAN and CNN, and analyzing their similarities and differences in the context of Faceswap. The study was conducted by implementing the Deepfacelab model and comparing it with the Faceswap CNN model. The hypothesis is that GAN will perform better due to its traditional generative model nature that specializes in image generation, whereas CNNs are designed for image processing tasks, such as object recognition and segmentation. The results of the study shed light on the effectiveness of these models for DeepFake Generation.
Jay Jay Phoemphoolsinchai on Jan 29, 2022
Object Detection
Justin Kyle Chang, Oliver De Visser on Jan 29, 2022
Deepfake Detection

Detecting synthetic media has been an ongoing concern over the recent years due to the increasing amount of deepfakes on the internet. In this project, we will explore the different methods and algorithms that are used in deepfake detection.
Kevin Huang, Brian Compton on Jan 29, 2022
Semantic Image Editing

Image generation has the potential to greatly streamline creative processes for professional artists. While current capabilities of image generators have shown to be impressive, these models aren’t able to produce good results without fail. However, slightly tweaking text prompts can often result in wildly different images. Semantic image editing has the potential to alleviate this problem, by letting users of image generation models to slightly tweak results in order to achieve higher quality end results. This project will explore and compare some technologies currently available for semantic image editing.
Team 37 on Jan 27, 2022
Medical Imaging

Medical Imaging analysis has always been a target for the many methods in deep learning to help with diagnostics, evaluations, and quantifying of medical diseases. In this study, we learn and implement models of Logistic Regression and ResNet18 to use in medical image classification. We use image classification to train our models to find brain tumors in brain MRI images. Furthermore, we will use our own implementation of LogisticRegression and finetune our ResNet18 model to better fit our needs.
Vivian Ha on Jan 18, 2022
Pose Estimation

Introduction: This project explores the latest technology behind pose estimation. Pose estimation uses machine learning to estimate the pose of a person or animal by looking at key joint positions in the body. This project consists of 2 parts: this blog post, and a demonstration of pose estimation.

Object Detection Algorithms in MMDetection Topic: Object Detection Algorithms This project compares object detection algorithms using MMDetection between YOLO and Fast R-CNN family of models.

Generating Images with Diffusion Models This project focuses on the generating images from text using diffusion models.

An Introduction to NeRFs This project is an introduction to the Neural Rendering Field (NeRF) model for generating novel views of 3D scenes. It explains what a NeRF is, how it works, and provides a complete example of how to build the original NeRF architecture using PyTorch in Google Colab.

Skin Lesion Classification It is crucial to catch skin cancer at an early stage. This can be done using image classification techniques. Specifically, this post aims to explore using different methods to handle class imbalance while classifying skin lesions using the HAM10000 dataset.

Video Grounding Topic: Video Grounding

Exploring The Effectiveness Of Classification Models In Autonomous Driving Simulators Through Imitation Learning In this blog, we explore the effectiveness of supervised learning (classification) models that are trained by imitation learning in autonomous driving simulators.

Human Action Recognition using Pose Estimation and Transformers This post is a review of different models for pose estimation and their applications to human action recognition (HAR) when combined with Transformers.

Human Activty Classification Using Transformer Encoders

Panoptic Scene Graph Generation and Panoptic Segmentation In this post, we’ll explore what panoptic scene graph generation and panoptic segmentation are, their implementation, and potential applications.

Facial Detection and Recognition This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.

GAN Network Application towards Face Restoration Analyzing the Application of Generative Adversarial Networks For Face Restoration

Object Detection and Classification of Cars by Make using MMDetection

The Influence of Deep Learning in Dance Below is our setup for part 1…

Image Similarity Image Similarity has important applications in areas such as retrieval, classification, change detection, quality evaluation and registration. Here, we aim to improve the performance of models to pair images based on their similarities.

Language Representation for Computer Vision In this blog I will investigate natural language representations as they are used in computer vision. A foray into the predominant language architecture, Transformers, will be linked to tasks in image captioning and art generation.

Art Generation using Cycle GANS

Action Localization

Dataset Analysis for Deepfake Detection Using Mesonet

Abstract

Generating Images with Diffusion Models This project explores the latest technology behind Image-generative AIs such as DALLE-2 and Imagen. Specifically we’ll be going over the research and techniques behind Diffusion Models, and a toy implementation in Pytorch/COLAB.

Deepfake Generation This blog details Shrea Chari and Dhakshin Suriakannu’s project for the course CS 188: Computer Vision. This project discusses the topic of Deepfake Generation and takes a deep dive into three of the most popular models: Cycle-GAN, StarGAN and StyleGAN.

Investigation on the Robustness of CLIP Topic: CLIP: Text-image joint embedding

Multi View Stereo (MVS) This post provides an introduction to Multi View Stereo (MVS) and presents to deep learning based algorithms for MVS reconstruction.

Object Detection

Deepfake Detection Detecting synthetic media has been an ongoing concern over the recent years due to the increasing amount of deepfakes on the internet. In this project, we will explore the different methods and algorithms that are used in deepfake detection.

Object Detection Algorithms in MMDetection

Topic: Object Detection Algorithms This project compares object detection algorithms using MMDetection between YOLO and Fast R-CNN family of models.

Generating Images with Diffusion Models

This project focuses on the generating images from text using diffusion models.

An Introduction to NeRFs

This project is an introduction to the Neural Rendering Field (NeRF) model for generating novel views of 3D scenes. It explains what a NeRF is, how it works, and provides a complete example of how to build the original NeRF architecture using PyTorch in Google Colab.

Skin Lesion Classification

It is crucial to catch skin cancer at an early stage. This can be done using image classification techniques. Specifically, this post aims to explore using different methods to handle class imbalance while classifying skin lesions using the HAM10000 dataset.

Video Grounding

Topic: Video Grounding

Exploring The Effectiveness Of Classification Models In Autonomous Driving Simulators Through Imitation Learning

In this blog, we explore the effectiveness of supervised learning (classification) models that are trained by imitation learning in autonomous driving simulators.

Human Action Recognition using Pose Estimation and Transformers

This post is a review of different models for pose estimation and their applications to human action recognition (HAR) when combined with Transformers.

Panoptic Scene Graph Generation and Panoptic Segmentation

In this post, we’ll explore what panoptic scene graph generation and panoptic segmentation are, their implementation, and potential applications.

Facial Detection and Recognition

This block is a brief introduction of your project. You can put your abstract here or any headers you want the readers to know.

GAN Network Application towards Face Restoration

Analyzing the Application of Generative Adversarial Networks For Face Restoration

The Influence of Deep Learning in Dance

Below is our setup for part 1…

Image Similarity

Image Similarity has important applications in areas such as retrieval, classification, change detection, quality evaluation and registration. Here, we aim to improve the performance of models to pair images based on their similarities.

Language Representation for Computer Vision

In this blog I will investigate natural language representations as they are used in computer vision. A foray into the predominant language architecture, Transformers, will be linked to tasks in image captioning and art generation.

Generating Images with Diffusion Models

This project explores the latest technology behind Image-generative AIs such as DALLE-2 and Imagen. Specifically we’ll be going over the research and techniques behind Diffusion Models, and a toy implementation in Pytorch/COLAB.

Deepfake Generation

This blog details Shrea Chari and Dhakshin Suriakannu’s project for the course CS 188: Computer Vision. This project discusses the topic of Deepfake Generation and takes a deep dive into three of the most popular models: Cycle-GAN, StarGAN and StyleGAN.

Investigation on the Robustness of CLIP

Topic: CLIP: Text-image joint embedding

Multi View Stereo (MVS)

This post provides an introduction to Multi View Stereo (MVS) and presents to deep learning based algorithms for MVS reconstruction.

Deepfake Detection

Detecting synthetic media has been an ongoing concern over the recent years due to the increasing amount of deepfakes on the internet. In this project, we will explore the different methods and algorithms that are used in deepfake detection.