Selected Publications


2024

  1. Affordance-based Robot Manipulation with Flow Matching
    Fan Zhang, and Michael Gienger.

    arXiv, 2024

    We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.

2023

  1. Visual-Tactile Learning of Garment Unfolding for Robot-Assisted Dressing
    Fan Zhang, and Yiannis Demiris.

    IEEE Robotics and Automation Letters (RA-L), 2023

    Assistive robots have the potential to support disabled and elderly people in daily dressing activities. An intermediate stage of dressing is to manipulate the garment from a crumpled initial state to an unfolded configuration that facilitates robust dressing. Applying quasi-static grasping actions with vision feedback on garment unfolding usually suffers from occluded grasping points. In this work, we propose a dynamic manipulation strategy: tracing the garment edge until the hidden corner is revealed. We introduce a model-based approach, where a deep visual-tactile predictive model iteratively learns to perform servoing from raw sensor data. The predictive model is formalized as Conditional Variational Autoencoder with contrastive optimization, which jointly learns underlying visual-tactile latent representations, a latent garment dynamics model, and future predictions of garment states. Two cost functions are explored: the visual cost defined by garment corner positions guarantees the gripper to move towards the corner, while the tactile cost defined by garment edge poses prevents the garment from falling from the gripper. The experimental results demonstrate the improvement of our contrastive visual-tactile model predictive control over single sensing modality and baseline model learning techniques. The proposed method enables a robot to unfold back-opening hospital gowns and perform upper-body dressing.
  2. Contrastive Self-Supervised Learning for Automated Multi-Modal Dance Performance Assessment
    Yun Zhong, Fan Zhang, and Yiannis Demiris.

    ICASSP, 2023

    A fundamental challenge of analyzing human motion is to effectively represent human movements both spatially and temporally. We propose a contrastive self-supervised strategy to tackle this challenge. Particularly, we focus on dancing, which involves a high level of physical and intellectual abilities. Firstly, we deploy Graph and Residual Neural Networks with Siamese architecture to represent the dance motion and music features respectively. Secondly, we apply the InfoNCE loss to contrastively embed the high-dimensional multimedia signals onto the latent space without label supervision. Finally, our proposed framework is evaluated on a multi-modal Dance- Music-Level dataset composed of various dance motions, music, genres and choreographies with dancers of different expertise levels. Experimental results demonstrate the robustness and improvements of our proposed method over 3 baselines and 6 ablation studies across tasks of dance genres, choreographies classification and dancer expertise level assessment.

2022

  1. Learning Garment Manipulation Policies towards Robot-Assisted Dressing
    Fan Zhang, and Yiannis Demiris.

    Science Robotics, 2022

    Assistive robots have the potential to support people with disabilities in a variety of activities of daily living such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here we report a dressing pipeline intended for these people, and experimentally validate it on a medical training manikin. The pipeline is comprised of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user’s arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area prior to grasping. The approach combines prehensile and non-prehensile actions, and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of over 90%.

2020

  1. Learning Grasping Points for Garment Manipulation in Robot-Assisted Dressing
    Fan Zhang, and Yiannis Demiris.

    IEEE International Conference on Robotics and Automation (ICRA), 2020

    Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a pre-defined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.

2019

  1. Probabilistic Real-Time User Posture Tracking for Personalized Robot-Assisted Dressing
    Fan Zhang, Antoine Cully, and Yiannis Demiris.

    IEEE Transactions on Robotics (T-RO)

    Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. We propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable camera-less and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian Process Latent Variable Model, taking the user's movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multi-task control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.

2017

  1. Personalized Robot-Assisted Dressing using User Modeling in Latent Spaces
    Fan Zhang, Antoine Cully, and Yiannis Demiris.

    2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.
  2. Preoperative Planning for the Multi-Arm Surgical Robot using PSO-GP-based Performance Optimization
    Fan Zhang, Zhiyuan Yan, and Zhijiang Du.

    2017 IEEE International Conference on Robotics and Automation (ICRA)

    For the robotically-assisted minimally invasive surgery, preoperative planning is essential towards assisting surgeons to prepare the intervention and to decide the best access to the surgical site. Many recent studies in preoperative planning have focused on the pose selection of the robot and the port placement. However, as such techniques cannot evaluate the performance of the multi-arm cooperation, their applications are constrained in real practise with multi-arm surgical robots. In this paper, the surgical workspace is divided and the subspaces are assigned with different weights to reflect the internal differences within the surgical workspace. We propose three metrics to evaluate the performance of the multi-arm surgical robot: Global Isotropy Index (GII) to measure the dexterity of one single robot arm; Cooperation Capability Index (CCI) to reflect the performance of the multi-arm cooperation; Minimum Distance Index (MDI) to describe the collision avoidance of the robotic arms. We also propose a combination of Particle Swarm Optimization (PSO) and Gaussian Process (GP) to locate the port placement and robot positioning. The proposed integrated PSO-GP-based optimization strategy is implemented on a three-arm surgical robot. Two sets of experiments are carried out to validate our method. The results demonstrate that the performance optimization strategy based on PSO-GP is capable of guiding surgeons to plan an intervention with the multi-arm surgical robot.

2016

  1. Preoperative Setup Planning for Robotic Surgery based on a Simulation Platform and Gaussian Process
    Fan Zhang, Zhiyuan Yan, and Zhijiang Du,

    2016 IEEE International Conference on Mechatronics and Automation (ICMA)
    Best Student Paper Award

    For the robotically-assisted minimally invasive surgery, preoperative planning is essential towards assisting surgeons to prepare the intervention and to decide the best access to the surgical site. Many recent studies in preoperative planning cannot evaluate the performance of the multi-arm cooperation, and thus their applications are constrained in real practise with multi-arm surgical robots. In this paper, we establish a simulation platform of a three-arm surgical robot. We propose to use two objective functions, Global Isotropy Index (GII) and Cooperation Capability Index (CCI), to reflect the dexterity of a robot arm and the performance of the multi-arm cooperation respectively. We also propose to use Gaussian Process Regression (GPR) to locate the optimal port placement and robot positioning. Simulation experiments are carried out to validate our method. The results demonstrate that our proposed performance optimization strategy based on GP is capable to guide surgeons to plan an intervention with the multi-arm surgical robot.

2015

  1. An Under-Actuated Manipulation Controller based on Workspace Analysis and Gaussian Processes
    Fan Zhang, Yanyu Su, Xiang Zhang, Wei Dong, and Zhijiang Du,

    2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    The kinematic modelling has been applied to many controllers of under-actuated manipulators. Most of these studies assume that the control process is conducted within the workspace. However, as such a kinematic model cannot describe the situations when the stable grasping is violated in the real environment, these controllers may fail unexpectedly. In this paper, we propose a combination of kinematics based Workspace Analysis (WA) and Gaussian Process Classification (GPC) to model the success rates of control actions in the theoretical workspace. We also use the Gaussian Process Regression (GPR) to model the residual between the prediction of the WA and the ground truth data. We then apply this integrated model, Gaussian Processes enhanced Workspace Analysis (GP-WA), into an optimal controller. The optimal controller is implemented on a planar under-actuated gripper with two three-phalanx fingers. Two sets of simulation experiments are carried out to validate our method. The results demonstrate that the optimal manipulation controller based on GP-WA achieves high control accuracy for manipulating a wide range of objects.