2024W, UCLA CS188 Course Projects

Botao Xia on Mar 22, 2024
Super-resolution via diffusion method

Super resolution enhances image resolution from low to high, with modern techniques like convolutional neural networks and diffusion models like SR3 significantly improving image detail and quality. This Post explore a simplified implementation of SR3. View code [Here]
Colin Melendez on Mar 22, 2024
NeRFs for Synthesizing Novel Views of 3D Scenes

In the domain of generative images, NeRFs are a powerful tool for generating novel views of 3d scenes with an extremely high degree of detail. here, we will review the basics of Neural Radiance Field (NeRF) models, look at how they can be used to generate novel views, and investigate an optimization to the original design with the KiloNeRF model to see how we can improve on some of it’s shortcomings.
Fuyi Yang, John Tran on Mar 22, 2024
Deep Learning Based Image-to-Image Translation Techniques

The goal of this project is to explore and understand the problem of image to image translation. Two approaches addressing this topic will be analyzed: CycleGan and FreeControl. An implementation of CycleGAN is also discussed.
Liyu Zerihun on Mar 22, 2024
Exploring CLIP for Zero-Shot Classification

In this blog article, we’ll delve into CLIP(Contrastive Language-Image Pre-training), focusing primarily on its application in zero-shot classification. Unlike examining a pre-trained version, we will embark on training CLIP ourselves, crafting the core components of the model and employing a distinct, smaller dataset for training purposes. We’ll also introduce custom loss functions and dissect specific elements of CLIP, such as its contrastive loss mechanism. This article aims to dissect the architecture and its implications, making minor adjustments to better grasp what drives its effectiveness.
Josh McDermott, Ryan Carmack, Michael Reed on Mar 22, 2024
Pose Estimation

The problem of Human Pose Estimation is widely applicable in computer vision—almost any task involving human interaction could benefit from pose estimation. As such, we explore the techniques and developments in this field by discussing three works relevant and reflective of these advances.
Saim Ahmad, William Hsieh, Mehul Jain, Michael Shi on Mar 22, 2024
Human Mesh Recovery

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. Human Mesh Recovery (HMR) is a computer vision task that involves reconstructing a detailed 3D mesh model of the human body using 2D images or videos. In the 2D realm, we have seen solutions to extract 2D key points. Now, HMR aims to go a step further by capturing the shape and pose of the human body.
Xuanhao Cui on Mar 22, 2024
Visuomotor Policy Learning

In visuomotor policy learning, an agent learns to excel at a sequential decision-making task involving visual inputs and motor control. Two important applications include autonomous driving and robotics control.
Elizabeth Moh, Megan Jacob, Selina Huynh, Stephanie Wei on Mar 22, 2024
A Comparison of Point Cloud 3D Object Detection Methods

3D object detection is a very important task that is critical to many current and relevant problems. It has numerous applications for developing car technology involving features such as obstacle avoidance and autonomous driving. Another valuable application is medical imaging, specifically brain tumor segmentation. Our paper explores the most recent advances in 3D object detection using point clouds. Doing this, we acknowledge that work in this area is less progressed than with 2D object detection. We analyze performance and model design, evaluating prominent 3D object detection models VoxelNet, PointRCNN, SE-SSN, and GLENet with the widely used KITTI dataset as a common benchmark. In our examination, we determined that VoxelNet, as one of the earlier models in 3D object detection, had a poorer performance than the later advancements. After VoxelNet, PointRCNN performs next best, then SE-SSN, and then GLENet. With each development, we discuss how differences in design decisions and architecture contribute to improved average precision and inference times. These advancements in 3D object detection show a lot of promise and potential for future computer vision applications.
Shan Jiang, Brandon Vuong, Seth Carlson, Joseph Yu on Mar 22, 2024
Text-to-Video Generation

In this paper, we will discuss diffusion-based video generation models. We will first do a preliminary exploration of diffusion, then extend it to video generation by examining Video Diffusion Models by Jonathon Ho, et al. We will then follow this with a refinement of video diffusion models by conducting a deep dive into Imagen Video, a high definition video generation model developed by researchers at Google. Through this paper, we aim to provide an overview of diffusion-based video generation, as well as rigorously cover a high definition refinement of the basic video diffusion model.
Krystof Latka, Patrick Dai, Srikar Nimmagadda, Andy Lewis on Mar 22, 2024
3D Scene Representation from Sparse 2D Images

In this report, we first introduce the topic of 3D scene representation as a whole. We briefly go over classical approaches and explain some of the common issues preventing them from rendering a high-quality scene. Then, we discuss three deep learning based approaches in more depth, taking Neural Radiance Fields (NeRFs) as our starting point. Instant NGP improves upon the original NeRF paper by suggesting a hash encoding of inputs to the MLP that speeds up training at least 200x. Zero-1-to-3 combines NeRF and diffusion to offer extraordinary zero-shot scene synthesis abilities. Lastly, we give the most attention to 3D Gaussian Splatting, which represents the entire scene as a set of 3D Gaussians, enabling efficient training time and real-time scene rendering.
Ashley Le, Charles Nguyen, Jaelyn Fan, Minhao Ren on Mar 22, 2024
U-Net for Biomedical Image Segmentation

This report dives into the aspect of biomedical image segmentation, using U-Net and U-Net 3D. U-Net revolutionized the field with its unique U-shaped architecture designed for precise semantic segmentation, while 3D U-Net tackled 3D images, enhancing efficiency and accuracy in 3D segmentation compared the regular U-Net.
Satvik Eltepu, Connor Steigerwald, Abhi Nadella, and Parsa Hajipour on Mar 22, 2024
Polyp Segmentation for Colorectal Cancer

Polyp Segmentation for Colorectal Cancer
Pratosh Menon, Bulent Yesilyurt, Hayden D'Souza, Riley Bruins on Mar 22, 2024
Deep Learning for Prostate Segmentation

We aim to analyze how we can use deep learning technique for prostate image segmentation. Prostate cancer is the second most common form of cancer for men worldwide and the fifth leading cause of death for men globally [3]. However, this is a statistic that can be considerably changed with early stage detection. In fact, the cancer is completely curable within 5 years if we catch it early. To this end, we explore how we can use existing deep learning architectures to help with prostate image segmentation to catch early prostate cancer in patients.
William Zhou, Leon Lenk, Maxine Wu, Artin Kim on Mar 22, 2024
Multimodal Vision-Language Models: Applications to LaTeX OCR

LaTeX is widely utilized in scientific and mathematical fields. Our objective was to develop a pipeline capable of transforming hand-written equations into LaTeX code. To achieve this, we devised a two-step model. Initially, we employed an R-CNN model to delineate bounding boxes around equations on a standard ruled piece of paper, utilizing a custom dataset we generated. Subsequently, we passed these selected regions into a TrOCR model pre-trained on Im2LaTeX-100k, a dataset comprising rendered LaTeX images. We further fine-tuned the model on a handwritten mathematical expressions dataset on Kaggle, which is a collection of the CROHME handwritten digit competition datasets over three years [6] [7] [8]. Our model successfully generated the ground LaTeX accurately for 4 out of 8 hand-drawn examples we produced. For the remaining 4 examples, it produced LaTeX similar to the ground truth, albeit with minor errors.
William Zhao, Suhas Nagar, Lucas Jeong on Mar 22, 2024
Navigating the Future: A Comparative Analysis of Trajectory Prediction Models

Trajectory prediction is a challenging task due to the multimodal nature of human behavior and the complexity in multi-agent systems. In this technical report we explore three machine learning approaches that aim to tackle these challenges: Social GAN, Social-STGCNN, and EvolveGraph. Social GAN uses variety loss to generate diverse trajectories and a pooling module to model subtle social cues. Social-STGCNN models social interactions explicitly through a graphical structure. EvolveGraph establishes a framework for forecasting the evolution of the interaction graph. We compare the advantages and disadvantages of these approaches at the end of the report.
Daniel Kim, Luciano Kholos, Steph Mach on Mar 22, 2024
Road Agent Behavior and Trajectory Prediction

In this paper, we review various deep learning models that are capable of predicting the future actions of various road actors.
Aidan Jan, Jacob Ryan, Randall Scharpf, Howard Zhu on Mar 22, 2024
Final Report - Hand Gesture Recognition

Hand gesture recognition aims to identify hand gestures in the context of space and time. It is a subset of computer vision whose goal is to learn and classify hand gestures.
Hassan Rizvi, Hairan Liang, Isaac Blender on Mar 22, 2024
Image Inpainting

Image inpainting is the task of reconstructing missing regions in an image, which is important in many computer vision applications. Some of these applications include restoration of damaged artwork, object removal, and image compression.
Joe Lin, Michael Song, and Alexander Chien on Mar 22, 2024
Novel View Synthesis

In this article, we take a deep dive into the problem of novel view synthesis and discuss three different representative methods. First, we examine the classical method of light fields rendering, which involves learning a 4D representation of a scene and applying multiple compression techniques. We then jump into Neural Radiance Fields (NeRFs), a modern deep learning approach that has achieved much better visual clarity and memory usage by implicitly modeling scene representations with a neural network. Our discussion of NeRFs’ drawbacks will lead us to the emergence of 3D Gaussian Splatting (3D-GS), an effective rasterization technique that explicitly encodes a scene with thousands of anisotropic gaussians.
Justin Nerhood, Sammy Shang, Yuzhou Gao on Mar 22, 2024
Text to Image Generation
Lawrence Liao, John Li, Guanhua Ji, William Wu on Mar 22, 2024
Image Style Translation

Image style translation is the process of converting an image from one style to another. We will explore two deep learning approaches to image style translation: CycleGAN and Stable Diffusion, and compare their performance on the task of converting realistic images to Monet-style paintings.
Yu-Chen Lung, Edward Ng, Warrick He, Alan Yu on Mar 22, 2024
Trajectory Prediction

Estimating the future positions of people or objects (agents) is a crucial capability that can serve diverse applications. These include self-driving cars, as being pursued by Waymo and Tesla, to Hawkeye that can determine an out or not call in tennis, to analyzing crowd behavior to ensure no human-crush occurs. Knowing the past, present, and most importantly, future trajectory of agents is clearly an important tool.
Chae Yeon Seo on Mar 22, 2024
Conditional Control of Text-to-Image Diffusion: ControlNet and FreeControl

In the realm of artificial intelligence, the text-to-image(T2I) generation has become a focal point of research. ControlNet stands out by offering users precise spatial control over T2I diffusion models. On the other hand, FreeControl represents a paradigm shift by granting users unparalleled control over T2I generation without the need for extensive training processes. This blog aims to provide an in-depth comparison between ControlNet and FreeControl.
Teshinee Kukamjad, Chaidhat Chaimongkol on Mar 22, 2024
Monocular Depth Estimation

This document discusses the CS 188 report on various image generation methods, including traditional GANs, conditional GANs, and the Pix2Pix method. It explains the concepts, loss functions, architectures, and performance of each method. The document also covers the implementation details, training process, and post-processing techniques used in the project. The results and discussion highlight the strengths and limitations of the Pix2Pix method in generating realistic images.
Andy Zhang, Justin Sheu, and Gavin Wong on Mar 22, 2024
Depth Estimation

Depth estimation is defined as the computer vision task of calculating the distance from a camera to objects in a scene. It has become more and more important, with a wide variety of applications in many fields, including autonomous vehicles, medicine, and human-computer interaction. Recent deep learning methods have enhanced the accuracy, robustness, and efficiency of depth estimation methods in both images and videos. We discuss two different deep learning approaches to depth estimation, including an Unsupervised CNN, and Depth Anything. We compare and contrast these approaches, and expand on the existing code by combining it with other effective architectures to further enhance the depth estimation capabilities.
Kyan Kornfeld on Mar 21, 2024
Human Pose Estimation

Human pose estimation (HPE), a pivotal task in computer vision, seeks to deduce the configuration of human body parts from images or sequences of video frames. In this post, I will examine two approaches to human pose estimation: MMPose, the most common library for HPE, and OpenPifPaf, a more lightweight, efficient pose estimation model.
Bryan Kwan and Chanakya Gidipally on Mar 21, 2024
Object Tracking

Object tracking has been a prevalent Computer Vision task for many years with many different deep learning approaches. In this report, we will explore the inner workings of two different approaches, DeepSORT for multiple object tracking and SiamRPN++ for single object tracking, comparing and contrasting their capabilities. We also briefly look at ODTrack, a more recent tracking algorithm. Finally, we have a short demo on DeepSORT with YOLOv8.
Nevin Liang, Jeffrey Shen, Manav Gandhi on Mar 21, 2024
Trajectory Prediction

In the AV pipeline, trajectory prediction plays a crucial role in the Autonomous Vehicle (AV) pipeline. In this report we’ll explore two different machine learning approaches to trajectory prediction for Autonomous Vehicles: a Conv-based architecture which uses rasterized semantic maps, and a GNN-based architecture which uses a vector-based representation of the scene.
Jason Inaba, Curtis Chou, Kratik Agrawal on Mar 21, 2024
UCLA Human Pose Estimation + Trajectory Prediction for AVs

This report explores the topic of Human Pose Estimation - specifically its history, approaches, and application to the realm of Autonomous Driving. We talk in depth about some of the overarching approaches that are taken when tackling HPE and then we go in depth on some exciting developments in the space. We also connect our research to work done in Trajectory Prediction, also with a focus on autonomous driving.
Aidan Wittenberg, Delia Ivascu, Rafi Rajoyan on Mar 21, 2024
Deep Neural Networks for Facial Recognition

Facial recognition is the technology of identifying human beings by analyzing their faces from pictures, video footage or in real time. Facial recognition has been an issue for computer vision until recently. The introduction of deep learning techniques which are able to grasp big data faces and analyze rich and complex images of faces has made this easier, enabling new technology to be efficient and later become even better than human vision in facial recognition.
Daniel Tsai on Mar 21, 2024
3D Model Generation through Machine Learning

Ever since the advent of 3D graphics, computer graphics engineers have been looking for better, more streamlined ways to create models (data representing a 3D object), a historically a tedious and long process. The goal of this post is to trace the history of 3D model generation through AI vision algorithms and suggest a potential method not just generating 3D models but pre-preparing them for use in animation and game development.

To be clear we will be discussing how AI is leveraged to generate models in a format usable by the wider 3D community. We will not be discussing algorithms that choose to represent 3D data in a form unusable by traditional animation software such as Blender, Maya or Unity (e.g. unconverted NERF data).
Vikram Nagapudi, Justin Downing, Raj Jain on Mar 20, 2024
Computer Vision for Brain Tumor Detection

In our exploration of brain tumor classification and segmentation, we explore both 2D and 3D Convolutional Neural Networks. The brain tumor detection issue is paramount - millions of people die each year due to brain cancer, and early detection from MRI scans is integral in the prevention and aid in cancer treatment. The brain tumor classification is a machine problem, and researchers have attacked this issue in many ways, as we will elaborate on below.
Rishabh Sharma, Jayson Tian, Ivan Sit on Mar 20, 2024
Putting a Name to a Face – Deep Learning for Facial Classification

Our report examines the chronological development and mechanisms behind the evolution of deep learning architectures used for facial recognition, as well as industry approaches and applications.
Rory Hemmings, Tomasz Jezak, Brody Jones, Hank Lin on Mar 20, 2024
Fine-tuning Stable Diffusion with Dreambooth

Stable diffusion is an extremely powerful text-to-image model, however it struggles with generating images of specific subjects. We decided to address this by exploring the state-of-the-art fine-tuning method DreamBooth to evaluate its ability to create images with custom faces, as well as its ability to replicate custom environments.
Chris Wang, Darren Wu, Emily Yu, Kimberly Yu on Mar 20, 2024
Natural Image Classification

Natural image classification involves the process of developing and training machine learning models capable of accurately assigning labels to images depicting real-world objects, landscapes, and scenes at a scale decipherable to humans. In the project, we will cover traditional approaches as well as deep learning approaches to tackle this problem.
Aaron Shi, Diana Estrada, Arturo Flores on Mar 20, 2024
Galaxy Morphology

In this Blog, we will take a look at how Deep Learning impacted the Astronomy community with deep representation learning. It’s interesting to recognize that transferring the learned representations from a different dataset to other unseen related-tasks is more effective than starting to learn everything from scratch! These tasks for astronomers include identifying galaxies with similar morphology to a query galaxy, detecting interesting anomalies, and adapting models to new tasks with only a small number of newly labeled galaxies. In this post, we will share our own results and insights about a paper “Practical Galaxy Morphology Tools from Deep Supervised Representation Learning” and their contributions that, even to our days, are still very meaningful to understanding Neural Networks (7 min read)
Alexander Zhang on Mar 18, 2024
Anomaly Detection for Semantic Segmentation

When semantic segmentation models are used for safety-critical applications such as autonomous vehicles, they have to handle unusual things that don’t fall into one of the predefined categories that they were trained for. We discuss two state-of-the-art methods that attempt to detect and localize anomalies, one using kNN and the other using generative normalizing flow models.
Sean Tang, James Shiffer, Steve Zang, Alex Walburg on Mar 18, 2024
3d Bounding Box Estimation Using Deep Learning and Geometry

This blog delves into techniques for estimating 3d bounding boxes from monocular images, examining common datasets and evaluating three prominent methods. Additionally, it investigates enhancing the Deep3dBox model’s performance by incorporating temporal and stereo information, assessing the scalability of its geometric insights.
Chenyang Lai and Sam Hopkins on Mar 15, 2024
Mitigating Biases in Computer Vision

In this blog post, we investigate the issue of biases and model leakage in computer vision, specifically in classification tasks. We discuss traditional and deep learning approaches to prevent biases in classification models.
Team 13 on Mar 15, 2024
Image Generation

In the realm of image generation, deep learning technologies have revolutionized traditional methods, enabling advanced applications like text-to-image generation, style transfer, and image translation with unprecedented effectiveness. This blog highlights the transformative capabilities of deep learning for generating high-quality images. We will compare and contrast Generative Adversarial Networks (GANs) and diffusion models and show the strengths and applications of each model
Jack He, Allen Wang, Yuheng Ding, James Jin on Mar 7, 2024
Text Guided Image Editing using Diffusion

In this study, we explore the advancements in image generation, particularly focusing on DiffEdit, an innovative approach leveraging text-conditioned diffusion models for semantic image editing (Couairon et al., 2022 [1]). Semantic image editing aims to modify images in response to textual prompts, enabling precise and context-aware alterations. A thorough comparison between DiffEdit and various traditional and deep learning-based methodologies have been conducted, highlighting its consistency and effectiveness in semantic editing tasks. Additionally, we introduce an interactive framework that integrates DiffEdit with BLIP and other text-to-image models to create a comprehensive end-to-end generation and editing pipeline. Moreover, we delve into a novel technique for text-guided mask generation within DiffEdit, proposing a method for object segmentation based solely on textual queries. We emphasize the importance of integrating AI safety and ethical considerations, ensuring our advancements are both technologically groundbreaking and responsibly implemented.
Wei Jun Ong, Matthew Teo, Mingyang Li, Pierce Chong on Feb 28, 2024
Exploring different techniques for Facial Expression Recognition (FER)

Facial expression recognition (FER) is a pivotal task in computer vision with applications spanning from human-computer interaction to affective computing. In this project, we conduct a comparative analysis of three prominent model architectures for FER: a feature decomposition and feature reconstruction network (FDRL), a cross-fusion transformer-based network (POSTERV2), and YOLOv5. These architectures represent diverse approaches in leveraging deep learning techniques for facial expression analysis. We evaluate the performance of each model architecture on RAF-DB, which encompass a wide range of facial expressions under various contexts. The evaluation metrics include accuracy and number of parameters, which give comprehensive insights into the models’ capabilities in recognizing facial expressions accurately across different datasets. Our evaluations have shown that the POSTERV2 model outperforms the other models in terms of accuracy. We also present a demonstration of the YOLOv5 model running on a webcam and training a custom model to recognize “awake” and “sleep” expressions. Our findings provide valuable insights into the strengths and limitations of different model architectures for FER, which can guide the selection of appropriate models for specific applications.
Jordan Jiang, Vrishank Krishnamurthy, Yang Xing, Zihan Xue on Jan 1, 2024
Vision-Language Alignment in Large Vision-Language Models: Advances and Challenges

One recent advancement and trend that’s at the intersection of computer vision and natural language processsing is the development of large vision-language models like GPT-4V. Therefore, aligning vision and language becomes increasingly crucial for models to develop higher-level joint understanding and reasoning capabilities over multi-modal information. We explore the progress and limitations of vision-language models by first giving an introduction to CLIP (Contrastive Language–Image Pre-training), an important work connecting text and image and influencing many state-of-the-art vision-language models today. We will then discuss models influence by CLIP, and some limitations shared by them, which is potentially a promising future direction to explore.
Alejandro Marquez Vera and Reetinav Das on Jan 1, 2024
The ViViT Model for Video Classification

The ViViT Model for Video Classification
Gregor MacDonald on Jan 1, 2024
Image Generation

In this project I will provide a brief overview of the history of generative modeling with respect to computer vision and dive into an application of the transformer architecture to the field.
Daniel Teo, William Wu, Gene Jeoung, Ezekiel Kim on Jan 1, 2024
A Deep Dive Comparison of 3D Object Detection Methodologies

This report delves into the realm of 3D object detection that is a large topic in the realm in computer vision. The demand for sophisticated perception systems is especially relevant in the recent developments in autonomous vehicles, robotics, and augmented reality. The ability to detect and localize objects in a three-dimensional space has become increasingly important due to the high precision and accuracy that these methods require, especially in the case of autonomous vehicles where lives are at stake. This report provides an overview of 3D object detection, explaining the many different methods in depth to achieve accurate detection. We go over three vastly different approaches - point based methods, voxel-based methods, and transformer based methods - aiming to explain and compare the performance, strength and weaknesses, and architecture of each.
May Wang, Sue Tang, Cixuan Zhang, and Xuanzhe Han on Jan 1, 2024
Facial-Action-Detection
Siddhartha Mishra, Pranav Varmaraja on Mar 22, 2023
Mitigating Spurious Correlations in Deep Learning

Spurious correlations pose a major challenge in training deep learning models that can generalize well to arbitrary real life data. Models often exploit easy-to-learn features that are correlated with prediction(s) in the training set but not semantically meaningful in the real world. In this post, we provide an overview of the spurious correlation problem by characterizing it mathematically, discuss some of its causes, and describe three key methods for mitigating such correlations: GroupDRO, GEORGE, and SPARE, along with some empirical results for each method.

Deep Learning Based Image-to-Image Translation Techniques The goal of this project is to explore and understand the problem of image to image translation. Two approaches addressing this topic will be analyzed: CycleGan and FreeControl. An implementation of CycleGAN is also discussed.

Visuomotor Policy Learning In visuomotor policy learning, an agent learns to excel at a sequential decision-making task involving visual inputs and motor control. Two important applications include autonomous driving and robotics control.

Polyp Segmentation for Colorectal Cancer Polyp Segmentation for Colorectal Cancer

Road Agent Behavior and Trajectory Prediction In this paper, we review various deep learning models that are capable of predicting the future actions of various road actors.

Final Report - Hand Gesture Recognition Hand gesture recognition aims to identify hand gestures in the context of space and time. It is a subset of computer vision whose goal is to learn and classify hand gestures.

Image Inpainting Image inpainting is the task of reconstructing missing regions in an image, which is important in many computer vision applications. Some of these applications include restoration of damaged artwork, object removal, and image compression.

Text to Image Generation

Putting a Name to a Face – Deep Learning for Facial Classification Our report examines the chronological development and mechanisms behind the evolution of deep learning architectures used for facial recognition, as well as industry approaches and applications.

Mitigating Biases in Computer Vision In this blog post, we investigate the issue of biases and model leakage in computer vision, specifically in classification tasks. We discuss traditional and deep learning approaches to prevent biases in classification models.

The ViViT Model for Video Classification The ViViT Model for Video Classification

Image Generation In this project I will provide a brief overview of the history of generative modeling with respect to computer vision and dive into an application of the transformer architecture to the field.

Facial-Action-Detection

Deep Learning Based Image-to-Image Translation Techniques

The goal of this project is to explore and understand the problem of image to image translation. Two approaches addressing this topic will be analyzed: CycleGan and FreeControl. An implementation of CycleGAN is also discussed.

Visuomotor Policy Learning

In visuomotor policy learning, an agent learns to excel at a sequential decision-making task involving visual inputs and motor control. Two important applications include autonomous driving and robotics control.

Polyp Segmentation for Colorectal Cancer

Polyp Segmentation for Colorectal Cancer

Road Agent Behavior and Trajectory Prediction

In this paper, we review various deep learning models that are capable of predicting the future actions of various road actors.

Final Report - Hand Gesture Recognition

Hand gesture recognition aims to identify hand gestures in the context of space and time. It is a subset of computer vision whose goal is to learn and classify hand gestures.

Image Inpainting

Image inpainting is the task of reconstructing missing regions in an image, which is important in many computer vision applications. Some of these applications include restoration of damaged artwork, object removal, and image compression.

Putting a Name to a Face – Deep Learning for Facial Classification

Our report examines the chronological development and mechanisms behind the evolution of deep learning architectures used for facial recognition, as well as industry approaches and applications.

Mitigating Biases in Computer Vision

In this blog post, we investigate the issue of biases and model leakage in computer vision, specifically in classification tasks. We discuss traditional and deep learning approaches to prevent biases in classification models.

The ViViT Model for Video Classification

The ViViT Model for Video Classification

Image Generation

In this project I will provide a brief overview of the history of generative modeling with respect to computer vision and dive into an application of the transformer architecture to the field.