Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【CS-part1】New submissions for Mon, 6 May 24 #1391

Open
Yukeaaa opened this issue May 7, 2024 · 0 comments
Open

【CS-part1】New submissions for Mon, 6 May 24 #1391

Yukeaaa opened this issue May 7, 2024 · 0 comments

Comments

@Yukeaaa
Copy link
Owner

Yukeaaa commented May 7, 2024

Keyword: volume render

There is no result

Keyword: volumetric render

There is no result

Keyword: remote render

There is no result

Keyword: hybrid render

There is no result

Keyword: raycast

There is no result

Keyword: medical imaging

There is no result

Keyword: medical visualization

There is no result

Keyword: interactive volume

There is no result

Keyword: rendering

Requirements-driven Slicing of Simulink Models Using LLMs

  • Authors: Dipeeka Luitel, Shiva Nejati, Mehrdad Sabetzadeh
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2405.01695
  • Pdf link: https://arxiv.org/pdf/2405.01695
  • Abstract
    Model slicing is a useful technique for identifying a subset of a larger model that is relevant to fulfilling a given requirement. Notable applications of slicing include reducing inspection effort when checking design adequacy to meet requirements of interest and when conducting change impact analysis. In this paper, we present a method based on large language models (LLMs) for extracting model slices from graphical Simulink models. Our approach converts a Simulink model into a textual representation, uses an LLM to identify the necessary Simulink blocks for satisfying a specific requirement, and constructs a sound model slice that incorporates the blocks identified by the LLM. We explore how different levels of granularity (verbosity) in transforming Simulink models into textual representations, as well as the strategy used to prompt the LLM, impact the accuracy of the generated slices. Our preliminary findings suggest that prompts created by textual representations that retain the syntax and semantics of Simulink blocks while omitting visual rendering information of Simulink models yield the most accurate slices. Furthermore, the chain-of-thought and zero-shot prompting strategies result in the largest number of accurate model slices produced by our approach.

HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2

  • Authors: Miriam Jäger, Theodor Kapler, Michael Feßenbecker, Felix Birkelbach, Markus Hillemann, Boris Jutzi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2405.02005
  • Pdf link: https://arxiv.org/pdf/2405.02005
  • Abstract
    In the fields of photogrammetry, computer vision and computer graphics, the task of neural 3D scene reconstruction has led to the exploration of various techniques. Among these, 3D Gaussian Splatting stands out for its explicit representation of scenes using 3D Gaussians, making it appealing for tasks like 3D point cloud extraction and surface reconstruction. Motivated by its potential, we address the domain of 3D scene reconstruction, aiming to leverage the capabilities of the Microsoft HoloLens 2 for instant 3D Gaussian Splatting. We present HoloGS, a novel workflow utilizing HoloLens sensor data, which bypasses the need for pre-processing steps like Structure from Motion by instantly accessing the required input data i.e. the images, camera poses and the point cloud from depth sensing. We provide comprehensive investigations, including the training process and the rendering quality, assessed through the Peak Signal-to-Noise Ratio, and the geometric 3D accuracy of the densified point cloud from Gaussian centers, measured by Chamfer Distance. We evaluate our approach on two self-captured scenes: An outdoor scene of a cultural heritage statue and an indoor scene of a fine-structured plant. Our results show that the HoloLens data, including RGB images, corresponding camera poses, and depth sensing based point clouds to initialize the Gaussians, are suitable as input for 3D Gaussian Splatting.

WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights

  • Authors: Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2405.02066
  • Pdf link: https://arxiv.org/pdf/2405.02066
  • Abstract
    The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

  • Authors: Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2405.02280
  • Pdf link: https://arxiv.org/pdf/2405.02280
  • Abstract
    Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so.

Keyword: cinematic rendering

There is no result

Keyword: volume data

There is no result

Keyword: remote visualization

There is no result

Keyword: direct volume rendering

There is no result

Keyword: mobile device

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

  • Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2405.01851
  • Pdf link: https://arxiv.org/pdf/2405.01851
  • Abstract
    There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.

Keyword: transfer function

There is no result

Keyword: retrieval

Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant

  • Authors: Olli Järviniemi, Evan Hubinger
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2405.01576
  • Pdf link: https://arxiv.org/pdf/2405.01576
  • Abstract
    We study the tendency of AI systems to deceive by constructing a realistic simulation setting of a company AI assistant. The simulated company employees provide tasks for the assistant to complete, these tasks spanning writing assistance, information retrieval and programming. We then introduce situations where the model might be inclined to behave deceptively, while taking care to not instruct or otherwise pressure the model to do so. Across different scenarios, we find that Claude 3 Opus 1) complies with a task of mass-generating comments to influence public perception of the company, later deceiving humans about it having done so, 2) lies to auditors when asked questions, and 3) strategically pretends to be less capable than it is during capability evaluations. Our work demonstrates that even models trained to be helpful, harmless and honest sometimes behave deceptively in realistic scenarios, without notable external pressure to do so.

Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications

  • Authors: Sujit Khanna, Shishir Subedi
  • Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2405.01585
  • Pdf link: https://arxiv.org/pdf/2405.01585
  • Abstract
    In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving domain-specific tabular data analysis tasks by presenting a unique RAG workflow that mitigates the scalability issues of existing tabular LLM solutions. Specifically, we present Tabular Embedding Model (TEM), a novel approach to fine-tune embedding models for tabular Retrieval-Augmentation Generation (RAG) applications. Embedding models form a crucial component in the RAG workflow and even current SOTA embedding models struggle as they are predominantly trained on textual datasets and thus underperform in scenarios involving complex tabular data. The evaluation results showcase that our approach not only outperforms current SOTA embedding models in this domain but also does so with a notably smaller and more efficient model structure.

Question Suggestion for Conversational Shopping Assistants Using Product Metadata

  • Authors: Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2405.01738
  • Pdf link: https://arxiv.org/pdf/2405.01738
  • Abstract
    Digital assistants have become ubiquitous in e-commerce applications, following the recent advancements in Information Retrieval (IR), Natural Language Processing (NLP) and Generative Artificial Intelligence (AI). However, customers are often unsure or unaware of how to effectively converse with these assistants to meet their shopping needs. In this work, we emphasize the importance of providing customers a fast, easy to use, and natural way to interact with conversational shopping assistants. We propose a framework that employs Large Language Models (LLMs) to automatically generate contextual, useful, answerable, fluent and diverse questions about products, via in-context learning and supervised fine-tuning. Recommending these questions to customers as helpful suggestions or hints to both start and continue a conversation can result in a smoother and faster shopping experience with reduced conversation overhead and friction. We perform extensive offline evaluations, and discuss in detail about potential customer impact, and the type, length and latency of our generated product questions if incorporated into a real-world shopping assistant.

Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming

  • Authors: Saikat Chakraborty, Gabriel Ebner, Siddharth Bhat, Sarah Fakhoury, Sakina Fatima, Shuvendu Lahiri, Nikhil Swamy
  • Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2405.01787
  • Pdf link: https://arxiv.org/pdf/2405.01787
  • Abstract
    Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F* type. We provide a program-fragment checker that queries F* to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.

TOPICAL: TOPIC Pages AutomagicaLly

  • Authors: John Giorgi, Amanpreet Singh, Doug Downey, Sergey Feldman, Lucy Lu Wang
  • Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2405.01796
  • Pdf link: https://arxiv.org/pdf/2405.01796
  • Abstract
    Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org

SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India

  • Authors: Salam Michael Singh, Shubhmoy Kumar Garg, Amitesh Misra, Aaditeshwar Seth, Tanmoy Chakraborty
  • Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2405.01858
  • Pdf link: https://arxiv.org/pdf/2405.01858
  • Abstract
    Sexual education aims to foster a healthy lifestyle in terms of emotional, mental and social well-being. In countries like India, where adolescents form the largest demographic group, they face significant vulnerabilities concerning sexual health. Unfortunately, sexual education is often stigmatized, creating barriers to providing essential counseling and information to this at-risk population. Consequently, issues such as early pregnancy, unsafe abortions, sexually transmitted infections, and sexual violence become prevalent. Our current proposal aims to provide a safe and trustworthy platform for sexual education to the vulnerable rural Indian population, thereby fostering the healthy and overall growth of the nation. In this regard, we strive towards designing SUKHSANDESH, a multi-staged AI-based Question Answering platform for sexual education tailored to rural India, adhering to safety guardrails and regional language support. By utilizing information retrieval techniques and large language models, SUKHSANDESH will deliver effective responses to user queries. We also propose to anonymise the dataset to mitigate safety measures and set AI guardrails against any harmful or unwanted response generation. Moreover, an innovative feature of our proposal involves integrating ``avatar therapy'' with SUKHSANDESH. This feature will convert AI-generated responses into real-time audio delivered by an animated avatar speaking regional Indian languages. This approach aims to foster empathy and connection, which is particularly beneficial for individuals with limited literacy skills. Partnering with Gram Vaani, an industry leader, we will deploy SUKHSANDESH to address sexual education needs in rural India.

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

  • Authors: Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2405.01868
  • Pdf link: https://arxiv.org/pdf/2405.01868
  • Abstract
    This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17% and proactivity by 27%, and achieving a tenfold enhancement in recommendation accuracy.

Semi-Parametric Retrieval via Binary Token Index

  • Authors: Jiawei Zhou, Li Dong, Furu Wei, Lei Chen
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2405.01924
  • Pdf link: https://arxiv.org/pdf/2405.01924
  • Abstract
    The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored. To address these demands, we introduce Semi-parametric Vocabulary Disentangled Retrieval (SVDR). SVDR is a novel semi-parametric retrieval framework that supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. In our evaluation on three open-domain question answering benchmarks with the entire Wikipedia as the retrieval corpus, SVDR consistently demonstrates superiority. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and an 9% higher top-1 accuracy compared to BM25 when using a binary token index. Specifically, the adoption of a binary token index reduces index preparation time from 30 GPU hours to just 2 CPU hours and storage size from 31 GB to 2 GB, achieving a 90% reduction compared to an embedding-based index.

Comparative Analysis of Retrieval Systems in the Real World

  • Authors: Dmytro Mozolevskyi, Waseem AlShikh
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2405.02048
  • Pdf link: https://arxiv.org/pdf/2405.02048
  • Abstract
    This research paper presents a comprehensive analysis of integrating advanced language models with search and retrieval systems in the fields of information retrieval and natural language processing. The objective is to evaluate and compare various state-of-the-art methods based on their performance in terms of accuracy and efficiency. The analysis explores different combinations of technologies, including Azure Cognitive Search Retriever with GPT-4, Pinecone's Canopy framework, Langchain with Pinecone and different language models (OpenAI, Cohere), LlamaIndex with Weaviate Vector Store's hybrid search, Google's RAG implementation on Cloud VertexAI-Search, Amazon SageMaker's RAG, and a novel approach called KG-FID Retrieval. The motivation for this analysis arises from the increasing demand for robust and responsive question-answering systems in various domains. The RobustQA metric is used to evaluate the performance of these systems under diverse paraphrasing of questions. The report aims to provide insights into the strengths and weaknesses of each method, facilitating informed decisions in the deployment and development of AI-driven search and retrieval systems.

REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

  • Authors: Deepa Tilwani, Yash Saxena, Ali Mohammadi, Edward Raff, Amit Sheth, Srinivasan Parthasarathy, Manas Gaur
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2405.02228
  • Pdf link: https://arxiv.org/pdf/2405.02228
  • Abstract
    Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with Perplexity.ai (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93% and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study con tributes valuable insights into the reliability of RAG for automated citation generation tasks.

Keyword: video retrieval

There is no result

Keyword: mobile

Digital Twin-Empowered Task Assignment in Aerial MEC Network: A Resource Coalition Cooperation Approach with Generative Model

  • Authors: Xin Tang, Qian Chen, Rong Yu, Xiaohuan Li
  • Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2405.01555
  • Pdf link: https://arxiv.org/pdf/2405.01555
  • Abstract
    To meet the demands for ubiquitous communication and temporary edge computing in 6G networks, aerial mobile edge computing (MEC) networks have been envisioned as a new paradigm. However, dynamic user requests pose challenges for task assignment strategies. Most of the existing research assumes that the strategy is deployed on ground-based stations or UAVs, which will be ineffective in an environment lacking infrastructure and continuous energy supply. Moreover, the resource mutual exclusion problem of dynamic task assignment has not been effectively solved. Toward this end, we introduce the digital twin (DT) into the aerial MEC network to study the resource coalition cooperation approach with the generative model (GM), which provides a preliminary coalition structure for the coalition game. Specifically, we propose a novel network framework that is composed of an application plane, a physical plane, and a virtual plane. After that, the task assignment problem is simplified to convex optimization programming with linear constraints. And then, we also propose a resource coalition cooperation approach that is based on a transferable utility (TU) coalition game to obtain an approximate optimal solution. Numerical results confirm the effectiveness of our proposed approach in terms of energy consumption and utilization of resources.

Rapid Mobile App Development for Generative AI Agents on MIT App Inventor

  • Authors: Jaida Gao, Calab Su, Etai Miller, Kevin Lu, Yu Meng
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
  • Arxiv link: https://arxiv.org/abs/2405.01561
  • Pdf link: https://arxiv.org/pdf/2405.01561
  • Abstract
    The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications.

Convert any android device into a programmable IoT device with the help of IoT Everywhere Framework

  • Authors: Vishnu Joshi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2405.01568
  • Pdf link: https://arxiv.org/pdf/2405.01568
  • Abstract
    The world around us is transforming as the field of the Internet of Things is taking over the world faster than we thought. Everyone in the tech industry is building wonderful things with the help of IoT. Smartwatches, smart coffee machines, smart television, smart homes are some of the examples. Building IoT sensor modules with sensors that connect to the internet can be very intimidating for people who have just stepped into the field. Quality components and microcontrollers can be costly too. Components such as proximity sensor, humidity sensor, air pressure sensor, accelerometer, gyroscope, flashlight, microphone, speaker, gsm module, wifi module, Bluetooth modules, and many more. But to program these we need to know java or kotlin and mobile application development. With the use of the IoT Everywhere framework and Origin programming language, one can convert any Android smartphone into an IoT device. This helps students of electrical engineering to grasp the idea of programming since it provides a lot of abstraction through simple function calls it can help to introduce programming to school students, it helps students who are fascinated by IoT and who wants to learn the basic of interfacing components or sensors and helps the student who has no access to an actual personal computer learn to program.

Q-learning-based Opportunistic Communication for Real-time Mobile Air Quality Monitoring Systems

  • Authors: Trung Thanh Nguyen, Truong Thao Nguyen, Dinh Tuan Anh Nguyen, Thanh Hung Nguyen, Phi Le Nguyen
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2405.01609
  • Pdf link: https://arxiv.org/pdf/2405.01609
  • Abstract
    We focus on real-time air quality monitoring systems that rely on devices installed on automobiles in this research. We investigate an opportunistic communication model in which devices can send the measured data directly to the air quality server through a 4G communication channel or via Wi-Fi to adjacent devices or the so-called Road Side Units deployed along the road. We aim to reduce 4G costs while assuring data latency, where the data latency is defined as the amount of time it takes for data to reach the server. We propose an offloading scheme that leverages Q-learning to accomplish the purpose. The experiment results show that our offloading method significantly cuts down around 40-50% of the 4G communication cost while keeping the latency of 99.5% packets smaller than the required threshold.

New design of smooth PSO-IPF navigator with kinematic constraints

  • Authors: Mahsa Mohaghegh, Hedieh Jafarpourdavatgar, Samaneh Alsadat Saeedinia
  • Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2405.01794
  • Pdf link: https://arxiv.org/pdf/2405.01794
  • Abstract
    Robotic applications across industries demand advanced navigation for safe and smooth movement. Smooth path planning is crucial for mobile robots to ensure stable and efficient navigation, as it minimizes jerky movements and enhances overall performance Achieving this requires smooth collision-free paths. Partial Swarm Optimization (PSO) and Potential Field (PF) are notable path-planning techniques, however, they may struggle to produce smooth paths due to their inherent algorithms, potentially leading to suboptimal robot motion and increased energy consumption. In addition, while PSO efficiently explores solution spaces, it generates long paths and has limited global search. On the contrary, PF methods offer concise paths but struggle with distant targets or obstacles. To address this, we propose Smoothed Partial Swarm Optimization with Improved Potential Field (SPSO-IPF), combining both approaches and it is capable of generating a smooth and safe path. Our research demonstrates SPSO-IPF's superiority, proving its effectiveness in static and dynamic environments compared to a mere PSO or a mere PF approach.

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

  • Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2405.01851
  • Pdf link: https://arxiv.org/pdf/2405.01851
  • Abstract
    There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.

Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants

  • Authors: Francesco Maldonato, Izgh Hadachi
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2405.01889
  • Pdf link: https://arxiv.org/pdf/2405.01889
  • Abstract
    The increasing demand for direct electric energy in the grid is also tied to the increase of Electric Vehicle (EV) usage in the cities, which eventually will totally substitute combustion engine Vehicles. Nevertheless, this high amount of energy required, which is stored in the EV batteries, is not always used and it can constitute a virtual power plant on its own. Bidirectional EVs equipped with batteries connected to the grid can therefore charge or discharge energy depending on public needs, producing a smart shift of energy where and when needed. EVs employed as mobile storage devices can add resilience and supply/demand balance benefits to specific loads, in many cases as part of a Microgrid (MG). Depending on the direction of the energy transfer, EVs can provide backup power to households through vehicle-to-house (V2H) charging, or storing unused renewable power through renewable-to-vehicle (RE2V) charging. V2H and RE2V solutions can complement renewable power sources like solar photovoltaic (PV) panels and wind turbines (WT), which fluctuate over time, increasing the self-consumption and autarky. The concept of distributed energy resources (DERs) is becoming more and more present and requires new solutions for the integration of multiple complementary resources with variable supply over time. The development of these ideas is coupled with the growth of new AI techniques that will potentially be the managing core of such systems. Machine learning techniques can model the energy grid environment in such a flexible way that constant optimization is possible. This fascinating working principle introduces the wider concept of an interconnected, shared, decentralized grid of energy. This research on Reinforcement Learning control strategies for Electric Vehicles and Renewable energy sources Virtual Power Plants focuses on providing solutions for such energy supply optimization models.

Optimizing Robot Dispersion on Grids: with and without Fault Tolerance

  • Authors: Rik Banerjee, Manish Kumar, Anisur Rahaman Molla
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
  • Arxiv link: https://arxiv.org/abs/2405.02002
  • Pdf link: https://arxiv.org/pdf/2405.02002
  • Abstract
    The introduction and study of dispersing mobile robots across the nodes of an anonymous graph have recently gained traction and have been explored within various graph classes and settings. While optimal dispersion solution was established for {\em oriented} grids [Kshemkalyani et al., WALCOM 2020], a significant unresolved question pertains to whether achieving optimal dispersion is feasible on an {\em unoriented} grid. This paper investigates the dispersion problem on unoriented grids, considering both non-faulty and faulty robots. The challenge posed by unoriented grids lies in the absence of a clear sense of direction for a single robot moving between nodes, as opposed to the straightforward navigation of oriented grids. We present three deterministic algorithms tailored to our robot model. The first and second algorithms deal with the dispersion of faulty and non-faulty robots, ensuring both time and memory optimization in oriented and unoriented grids, respectively. Faulty robots that are prone to crashing at any time, causing permanent failure. In both settings, we achieve dispersion in $O(\sqrt{n})$ rounds while requiring $O(\log n)$ bits of memory per robot. The third algorithm tackles faulty robots prone to crash faults in an unoriented grid. In this scenario, our algorithm operates within $O(\sqrt{n} \log n)$ time and uses $O(\sqrt{n} \log n)$ bits of memory per robot. The robots need to know the value of $n$ for termination.

Accurate Pose Prediction on Signed Distance Fields for Mobile Ground Robots in Rough Terrain

  • Authors: Martin Oehler, Oskar von Stryk
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2405.02121
  • Pdf link: https://arxiv.org/pdf/2405.02121
  • Abstract
    Autonomous locomotion for mobile ground robots in unstructured environments such as waypoint navigation or flipper control requires a sufficiently accurate prediction of the robot-terrain interaction. Heuristics like occupancy grids or traversability maps are widely used but limit actions available to robots with active flippers as joint positions are not taken into account. We present a novel iterative geometric method to predict the 3D pose of mobile ground robots with active flippers on uneven ground with high accuracy and online planning capabilities. This is achieved by utilizing the ability of signed distance fields to represent surfaces with sub-voxel accuracy. The effectiveness of the presented approach is demonstrated on two different tracked robots in simulation and on a real platform. Compared to a tracking system as ground truth, our method predicts the robot position and orientation with an average accuracy of 3.11 cm and 3.91{\deg}, outperforming a recent heightmap-based approach. The implementation is made available as an open-source ROS package.

Keyword: smartphone

Convert any android device into a programmable IoT device with the help of IoT Everywhere Framework

  • Authors: Vishnu Joshi
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2405.01568
  • Pdf link: https://arxiv.org/pdf/2405.01568
  • Abstract
    The world around us is transforming as the field of the Internet of Things is taking over the world faster than we thought. Everyone in the tech industry is building wonderful things with the help of IoT. Smartwatches, smart coffee machines, smart television, smart homes are some of the examples. Building IoT sensor modules with sensors that connect to the internet can be very intimidating for people who have just stepped into the field. Quality components and microcontrollers can be costly too. Components such as proximity sensor, humidity sensor, air pressure sensor, accelerometer, gyroscope, flashlight, microphone, speaker, gsm module, wifi module, Bluetooth modules, and many more. But to program these we need to know java or kotlin and mobile application development. With the use of the IoT Everywhere framework and Origin programming language, one can convert any Android smartphone into an IoT device. This helps students of electrical engineering to grasp the idea of programming since it provides a lot of abstraction through simple function calls it can help to introduce programming to school students, it helps students who are fascinated by IoT and who wants to learn the basic of interfacing components or sensors and helps the student who has no access to an actual personal computer learn to program.

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

  • Authors: Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, Wangmeng Zuo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2405.02171
  • Pdf link: https://arxiv.org/pdf/2405.02171
  • Abstract
    In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple cameras in modern smartphones, the more zoomed (telephoto) image can be naturally leveraged as the reference to guide the super-resolution (SR) of the lesser zoomed (ultra-wide) image, which gives us a chance to learn a deep network that performs SR from the dual zoomed observations (DZSR). Secondly, for self-supervised learning of DZSR, we take the telephoto image instead of an additional high-resolution image as the supervision information, and select a center patch from it as the reference to super-resolve the corresponding ultra-wide image patch. To mitigate the effect of the misalignment between ultra-wide low-resolution (LR) patch and telephoto ground-truth (GT) image during training, we first adopt patch-based optical flow alignment and then design an auxiliary-LR to guide the deforming of the warped LR features. To generate visually pleasing results, we present local overlapped sliced Wasserstein loss to better represent the perceptual difference between GT and output in the feature space. During testing, DZSR can be directly deployed to super-solve the whole ultra-wide image with the reference of the telephoto image. In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images. Experiments show that our methods achieve better quantitative and qualitative performance against state-of-the-arts. Codes are available at https://github.com/cszhilu1998/SelfDZSR_PlusPlus.

Keyword: medical volume data

There is no result

@Yukeaaa Yukeaaa self-assigned this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment