Skip to content

Latest commit

 

History

History
executable file
·
81 lines (63 loc) · 3.61 KB

index.md

File metadata and controls

executable file
·
81 lines (63 loc) · 3.61 KB
layout
home
Jose Costa Pereira

Hello and a warm welcome to my webpage! I'm a Senior Research Scientist at Huawei R&D in London, UK. Presumably, you found yourself here because you want to know more about my research interests. Below you can find a brief overview; feel free to reach out!

{%- include social.html -%}

 

At Huawei I work in the Noah’s Ark Lab a group in the R&D division were focus is heavily placed on computational photography. In particular, I’m interested in image-to-image restoration techniques, and (no-reference) image quality metrics from a perceptual standpoint. Many solutions have been proposed to address these and, to a certain extent, with great success. But the problem of designing a "no-reference image quality metric" -- also known as NR-IQA or B-IQA -- remains largely unsolved for images in the wild. As one can imagine it is very difficult to come-up with a model that mimics the human visual system (HVS) in terms of opinions about image quality. And this is exactly the ultimate goal of Image Quality Assessment (IQA).

But I'm also interested in other topics. These include, of course, Generative AI with the ubiquitous Transformer architecture and its attention mechanisms, Stable Diffusion for state-of-the-art generative models of realistic images (and videos), and any other theme that contributes to the development of (more) intelligent systems: from automation of everyday tasks (e.g. copilot style) to expert-level domain of knowledge (e.g. medical diagnosis). At INESCTEC, within the VCMI group, I collaborated with students and other researchers in the development of new CADx/e (Computer Aided Diagnosis/Detection) tools for breast cancer screening. Given the success of deep learning frameworks in many visual recognition tasks, these tools are being extensively used in medical imaging applications.

Before this modern-age of (convolutional) neural networks, it was somewhat difficult to come-up with machine-friendly descriptors that were representative of an image; what is known today as an image embedding. In clear contradiction with text snippets, where a simple bag-of-words descriptor (with a decent vocabulary size and some smart pre-processing) would do the trick. A big breakthrough was achieved with the paper on "ImageNet Classification with Deep Convolutional Neural Networks" by Krizhevsky et al. presented at NIPS 2012. They showed to the community the benefits of using a convolutional neural network -- trained on a lot of data -- in solving classical computer vision problems. At that time I was a bit reluctant in adopting neural networks. Surely I saw them as good "feature extractors" but still lacking the semantic interpretation intrinsic to text feature vectors. Today, the quest for finding a unified multimodal representation for texts, images, audios and videos is more active than ever. I find this to be a very interesting topic of research; a "good" (i.e. descriptive/accurate/robust) representation of modalities is essential for any task performed by an intelligent system.

You can have a look at my scholar profile for more details on my publications. And please reach out if you find any of these topics interesting. My email is below.