TacEva: A Performance Evaluation Framework for Vision-Based Tactile Sensors

1Imperial-X Initiative, Imperial College London, 2Waseda University,
3King's College London, 4Queen Mary University of London

Equal Contribution, *Corresponding author
Research framework diagram

We propose a comprehensive framework for evaluating vision-based tactile sensors, systematically comparing performance across design properties and sensing performance. In this paper, we showcase our framework against four representative VBTS (ViTacTip, MagicTac, GelSight, GelSightWM).

Abstract

Vision-based tactile sensors (VBTSs) are widely used in robotic tasks, because of the high spatial resolution they offer and their relatively low manufacturing costs. However, variations in their sensing mechanisms, structural dimension, and other parameters lead to significant performance disparities between VBTSs currently in use. This makes it challenging to optimize VBTSs for specific tasks, as both the initial choice and subsequent fine-tuning are hindered by the lack of standardized metrics. To address this issue, we present TacEva, a comprehensive evaluation framework for the quantitative analysis of VBTS performance. We define a set of performance metrics that capture and quantify the key characteristics displayed in typical application scenarios. For each metric, we designed an experimental pipeline that provides a structured procedure for performance quantification. We then applied this evaluation approach to multiple VBTSs with distinct sensing mechanisms. The results show that the proposed framework yields a thorough evaluation of each design, and provides quantitative indicators for each performance dimension. This enables researchers to pre-select the most appropriate VBTS on a task by task basis, and also offers performance-guided insights for the optimization of VBTS design.

Existing VBTS Work and Their Evaluation Methods

Explore papers on vision-based tactile sensors. Use search and filters; click headers to sort.

For detailed explanation of mechanisms, refer to this comprehensive survey. VBTSs are categorized according to their underlying sensing principles, including: (i) Intensity Mapping Method (IMM), which infers contact geometry and pressure through spatial variations in reflected light intensity; (ii) Marker Displacement Method (MDM), which detects surface deformation by tracking the displacement of embedded markers under force; and (iii) Modality Fusion Method (MFM), which employs transparent skin to enable multimodal perception.

Mechanism Sensor Paper Performance Metrics / Description

Standard Performance

Calibration Process

Calibration Process

VBTS calibration setup diagram

Definition. Two sequential steps with the sensor on a robot: (1) Surface geometry via first-contact mapping with a 10 mm spherical indenter; (2) Force/position mapping from synchronized images and 6‑axis F/T labels across randomized normal + shear stimuli.

Protocol. Probe the surface on a grid (≈0.1 mm steps) until contact (threshold ≈0.02 N), then indent to safe depths per device while adding small x–y displacements. Train a common ResNet‑18 baseline (70/20/10 split) to regress $(P_x, P_y, P_z, F_x, F_y, F_z)$. Report MAE, $R^2$, and sMAPE: $$\text{sMAPE} = \frac{1}{n} \sum_{i=1}^{n} \frac{|y_i - \hat{y}_i|}{\frac{|y_i| + |\hat{y}_i|}{2} + \epsilon} \times 100\%$$

Calibration error results

Analysis. ViTacTip minimizes absolute force errors; GelSight variants excel in Pz; marker‑free GelSightWM is strong in Fz/Pz but weaker in Fxy; MagicTac is competitive in Pxy yet noisier in Fz.

Spatial Resolution

Spatial Resolution

Spatial resolution evaluation results showing accuracy vs tolerance for different VBTS

Definition. Ability to distinguish closely spaced features. We report accuracy as a function of tolerance $\epsilon$ using a grating‑classification task:

$$\text{SR}(\epsilon) = \frac{1}{N} \sum_{i=1}^{N} \mathbf{1}[|\hat{r}_i - r_i| \leq \epsilon]$$

Protocol. 3D‑printed dot/line gratings (≈0.05–2.0 mm). 100 presses per sample with randomized yaw. Train classifier; sweep ε.

Analysis. Above ≈5 mm, all near‑perfect. At 0.05 mm, GelSight/GelSightWM ≈99%, MagicTac ≈98%, ViTacTip ≈80% — reflecting gel stiffness/geometry and effective pixel density.

Evaluation using dot and line grating samples, with spacing from 0 mm (flat) to 2 mm, to determine the minimum resolvable feature size. All four sensors were benchmarked using grating-based samples for spatial resolution assessment.

Spatial Resolution Test Samples

Dot and Line samples from 0.0625 mm to 2 mm spacing - examples shown below

Dot 0.05mm
Dot 2mm
Line 0.05mm
Line 2mm

Sensitivity

Sensitivity

Sensitivity analysis showing normal compliance and uniformity maps for different VBTS

Definition. Normal compliance: $S = \Delta z / F$ (mm/N). Uniformity (0–1): $U = 1 / (1 + \sigma/|\mu|)$ from binned sensitivity means.

Protocol. Reuse calibration data; bin by (x,y); compute mean S per bin to form maps; aggregate μ, σ for U.

Analysis. ViTacTip is most sensitive but less uniform (edge‑enhanced S); GelSight/MagicTac are stiffer with higher U.

Robustness

Spatial Robustness

Spatial Robustness

Spatial robustness evaluation showing error stability across location and depth

Definition. Stability of error across location and depth. Compute MAE per radial bin and per depth bin; robustness (lower is better):

$$R_{\text{spatial},c} = \frac{1}{2} \left[ \text{STD}(\{m^{\text{dist}}_b\}) + \text{STD}(\{m^{\text{depth}}_d\}) \right]$$

Protocol. Collect a held‑out grid (≈1.6k points) with the same probing pattern; evaluate by bins over normalized radius/depth.

Analysis. ViTacTip holds force errors flat across the surface; planar gels show edge growth (notably in Fz and Pxy). Depth improves Pxy after shallow contact.

Lighting Robustness

Lighting Robustness

Lighting robustness experimental setup and results
Lighting robustness comparison table

Definition. Sensitivity of prediction error to illumination changes (transparent/semi‑transparent devices). Example metric:

$$R_{\text{light}} = \frac{|\frac{I_c}{I_o} - 1|}{|\frac{I_c}{I_o} - 1| + |\frac{\text{MAE}_c}{\text{MAE}_o} - 1|}$$

Protocol. Test under four scenes (diffuse/point/mixed; varying intensity). Compare to training‑light baseline using mean grayscale intensity.

Analysis. ViTacTip's errors grow under bright point sources; MagicTac's intensities shift less but error variance can rise due to grid interactions with external light.

Repeatability

Repeatability

Repeatability analysis showing per-channel variability across multiple sensor measurements

Definition. Across $N$ repeats at $K$ points and $D$ depths, per‑channel variability (lower is better):

$$\text{Rep}_c = \frac{1}{KD} \sum_{k,d} \text{STD}(\hat{c}_{k,d,1..N})$$

Protocol. K≈100 random points, step 0.1 mm to max depth; N=10 repeats per (point,depth).

Analysis. ViTacTip is most repeatable for forces and competitive for positions; GelSight is strongest in Pz; MagicTac is intermediate for position and higher variance for force.

Additional Analysis

Inter‑sensor Variability

Inter‑sensor Variability

Inter-sensor variability analysis

Compare reconstructed surfaces across units of the same type via rigid alignment and nearest‑neighbor distances inside the common hull; the mean absolute surface gap summarizes manufacturing consistency.

Hysteresis

Hysteresis

Hysteresis analysis

Quantify the area between load/unload F–Δz curves (trapezoidal rule) over the overlap range at multiple surface points; ViTacTip shows measurable, spatially varying hysteresis, while GelSight variants/MagicTac show no clear hysteresis under our protocol.

Summary of VBTS Evaluation

Summary comparison of VBTS performance across all evaluation metrics

Selection guide: ViTacTip — best for low‑force, deep/soft contacts and force repeatability; sensitive to lighting and weaker in ultra‑fine resolution. MagicTac — fast, strong planar localization; force estimates noisier; control lighting when possible. GelSight — highest camera resolution and stable depth (Pz); modest frame rate and edge effects. GelSightWM — practical choice when shear is secondary; robust Pz/Fz without markers.

BibTeX

BibTex Code Here