Project

Generative AI Research

EditID v1/v2 — Training-Free Editable Identity Customization

Two-stage framework for personalized text-to-image generation on Flux. v1 introduces a training-free identity-feature decoupling scheme that severs the zero-sum trade-off between identity fidelity and prompt editability. v2 adds a data-lubrication mechanism that pushes data efficiency further. Self-built IBench evaluation system shows SOTA on identity preservation and editability simultaneously.

Training-Free Identity Injection for Personalized Generation

Family of training-free methods that inject reference identity into text-to-image diffusion without per-subject fine-tuning. DVI disentangles semantic and visual identity components; FlexID modulates injection intent across spatial regions; Inject Where It Matters adapts injection to spatially-relevant tokens; Dual-Channel Attention Guidance refines control under multi-condition prompts.

Image Editing on Flow-based Diffusion Transformers

A series of training-free editing methods over MMDiT / Flux architectures, exploring how attention routing, temporal-channel modulation, and semantics-aware region isolation give precise edit control without retraining. Includes AdaEdit (flow-based image editing), Edit Spillover (a probe for whether editing models understand world relations), AttnRouter (per-category attention routing on MMDiT), Edit Fidelity Field (region isolation for scene text editing), and PhysEdit (physically-consistent region-aware edits).

Diffusion Transformer Inference Acceleration

Inference framework and per-method accelerations for production diffusion / video models. TypemovieInfer is a unified consumer-GPU runtime combining Para-Attention parallelism, KV cache, and FP8 quantization, delivering ~4x speed-up on Wan2.1-14B-720p. LayerCache exploits layer-wise velocity heterogeneity in flow matching. Frequency-Aware Caching gives error-bounded caching for DiT generation. FastUSP is a multi-level collaborative acceleration framework for distributed inference.

Hyperspectral Image Classification

Hyperspectral Image Classification — 8-year Research Line

Long-running research line on deep architectures for hyperspectral remote sensing imagery, covering 3D-CNN, dense connections, dynamic group convolution, selective kernels, KAN, Mamba-Transformer, dynamic snake, and wavelet receptive fields. The lead paper alone (Multi-scale Dense Networks, IEEE TGRS 2019) has 200+ Google Scholar citations; the series spans IEEE TGRS, JSTARS, JARS, Remote Sensing Letters, Spectroscopy Letters, Arabian J. Sci. & Eng., International J. of Image and Data Fusion, and 中国图象图形学报.

Suning AIGC Platform

Suning AIGC Platform Suning AIGC Platform Suning AIGC Platform

Suning AIGC Platform

Provides AIGC services including image/video generation based on diffusion models and LLMs, covering model photo / product photo / poster image / anime avatar generation, controlled-ID type generation, marketing short-video generation for combination fission, lip-sync digital humans for e-commerce live streaming, face swapping, and script generation for voice-over marketing.

E-Commerce Inpainting with Mask Guidance in ControlNet

E-Commerce Inpainting with Mask Guidance in ControlNet

E-commerce image generation has long been a core demand, with the goal of restoring the missing background while preserving the foreground product. This work addresses overcompletion — the difficulty in maintaining product features under diffusion-model inpainting — via two solutions: (1) an instance-mask fine-tuned inpainting model and (2) a train-free mask-guidance approach that introduces refined product masks as constraints when combining ControlNet with UNet, preventing the model from over-rebuilding the main product.

Training-Free Style-Consistent Image Synthesis with Condition & Mask Guidance

Training-Free Style-Consistent Image Synthesis with Condition & Mask Guidance

Train-free framework for style-consistent e-commerce image generation. Operates at the QKV level inside attention (self- and cross-attention), using shared KV to amplify similarity in cross-attention and using attention maps to generate mask guidance that steers style-consistent generation while preserving the product's main composition.

Intelligent Creative Platform

Iwogh Platform (木牛流马)

Iwogh Platform (木牛流马)

Iwogh is Suning's internal creative design platform with three core modules: intelligent parsing, intelligent creation, and intelligent optimization, plus a set of real-time creative tools. One of China's earliest intelligent creative production platforms, benchmarked against Alibaba 鹿班 and JD 羚珑.

Intelligent Parsing Intelligent Parsing

Intelligent Parsing

Automated framework for parsing creative materials (banners, posters, designer manuscripts) into structured design semantics. Comprises material recognition, preprocess, smartname, and label layers — using detection (Cascade RCNN, GFL), layer-level filtering, intelligent naming, and multi-level tagging. Significantly boosts downstream intelligent creation and creative optimization in Suning's production scenarios, lifting creative material exposure, circulation, and click-through rates.

Smartbanner

Smartbanner

Intelligent banner design framework that balances creative freedom against design rules. With only product, copy and size as inputs, Smartbanner's planner / actuator / adjuster / generator pipeline synthesizes high-freedom, design-compliant banners. Deployed at production scale, lifting CTR by 30%, designer efficiency by 500%, and synthesizing hundreds of millions of images annually.

ADCT — Dynamic Creative Optimization under Sparse/Ambiguous Samples ADCT — Dynamic Creative Optimization under Sparse/Ambiguous Samples

ADCT — Dynamic Creative Optimization under Sparse/Ambiguous Samples

Two-stage cascade for ad-creative CTR estimation under sparse and ambiguous samples. Stage 1: autoco-based ranking + a transformer-based rerank trained with rank-distillation soft labels to extract creative order knowledge and link ambiguous samples to positive/negative pairs. Stage 2: a bandit selects from Stage 1's top-N for live serving. Online A/B testing shows +10% CTR vs baseline.

PS Tamper Detection PS Tamper Detection

PS Tamper Detection

Three-step pipeline (feature-assist, audit-point localization, tamper recognition) for document Photoshop-tamper detection with graded output (tampered / suspected / untampered). Uses EXIF + binary-stream + noise feature assistance, detection frameworks for localization, and a dual-path dual-stream (RGB + ELA) recognition network with self-correlation percentile pooling and NetVLAD fusion. Accuracy 0.804 on internal benchmarks; saved Suning RMB 3M+/year.