Code
AIGC
- Summary
    - Awesome-Chinese-Stable-Diffusion :Focus on some basic models of Chinese stable diffusion.
 
- LLM
    - EcommerceLLM :E-commerce scene LLM fine-tuned based on qwen1.5 and llama3.
- EcommerceLLMQwen2.5 :A Qwen2.5 series e-commerce large language model fine-tuned on e-commerce data.
- MiniLLaMA3 :A mini version of Llama 3, covering the entire pipeline from data construction (0-1), tokenizer training, pre-training (PT), supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF).
- ECOMCPM :Language model trained on e-commerce data from ‘What’s Worth Buying’ website, a Chinese pretrained model similar to GPT2.
- UniFlow :Unified large language model, multimodal, FLUX generation interface, FastAPI deployment, configurable deployment services.
- Medical_R1 :Fine-tune deepseek-r1 on medical data.
- EcommerceLLMQwen3 :Qwen3 Series E-commerce Large Model Fine-tuned with E-commerce Data E-commerce Large Model after E-commerce Data SFT.
 
- LMM
    - XrayLLaVA :Xray Large Multi-modal Model, fine-tuned on LLaVA for the Xray’s multi-modal large model, using 4 V100 GPUs based on the llava1d6-mistral-7b-instruct model. LLaVA is among the most popular model methodologies and architectures in large multi-modal language models. Fine-tuning LLaVA helps us evaluate and compare the potential of training large multi-modal language models in vertical scenarios.
- XrayQwenVL : Xray Large Multi-modal Model, fine-tuned on QwenVL for Xray’s multi-modal large model, using 4 V100 GPUs based on the qwenvl-chat model for fine-tuning.
- XrayQwen2VL :Fine-tuned on Qwen2vl using the Xray open-source dataset. The training LORA weights have been released for academic research. For inference, the original qwen2-vl-7b-instruct weights need to be loaded separately, and the LORA weights can be merged using llamafactory’s merge LORA function. Llamafactory 0.9.0 was used for fine-tuning in this experiment.
- EcommerceOCRBench :A larger-scale OCR benchmark dataset for multimodal large language models in e-commerce, modeled after OCRBench.
- OCRPaliGemma :A multimodal large language model with a focus on OCR text detection.
- XrayLLama3.2Vision :Xray Large Multi-model Model, based on llama3.2-vision fine-tuning Xray’s multi-modal large model, fine-tuned on 4 A800 based on llama3_2-11b-vision-instruct model.
 
- SD
    - HOME-CLIP :ChineseCLIP model was fine-tuned on home decoration and furniture data crawled from Visual China.
- HOME-DALLE1 :DALL-E 1 model for Chinese home decoration and furniture scenes.
- controlnet_aux_add :Auxiliary functions of ControlNet, an additional library of huggingface’s ControlNet aux, adding preprocessors not present in aux.
- EcommerceSD :A focus on image generation in e-commerce scenarios, including model generation and inpainting.
- MaskControlnet :A ControlNet-based generative model conditioned on masks, trained on a massive dataset of e-commerce cutout images (saliency map detection data).
- ChatAce :Picture editing based on flux acp++, mainly character consistency editing.
- ChatFlux :webui based chatdit, supports generating pictures through conversations.
- Typemovie-ParaAttention :TypeMovie-ParaAttention is an enhanced version of ParaAttention, designed to accelerate Diffusion Transformer (DiT) model inference with context parallelism, dynamic caching, and a new high-performance SageAttention backend.
- EditIDv2 :Typemovie’s EditIDv2 ensures character identity consistency in complex text-to-image generation, using minimal data for enhanced semantic editing, as shown in IBench tests.
- IBench :Image evaluation system in Editid.
- RealtimeFlux :This is the first model enabling real-time flux-based sketch-to-image generation, akin to Ji Meng’s Smart Canvas and Krea.ai’s real-time rendering, built on the Nunchaku and Flux framework.
 
- Video generation
    - EcommerceVideoDataset :A dataset primarily used for creating e-commerce animations.
 
- Digital Human
    - Wav2lipAll :Training a virtual digital human based on wav2lip, with lip shape driving, including data processing procedures, etc. The model includes sizes 96x96, 192x192, 192x288, 288x288.
- TalkingFace :Training set for 2D virtual digital human projects similar to wav2lip, geneface++.
 
Comfyui-extension and Stable-diffusion-webui-extension
- ComfyUI_AliControlnetInpainting
- ComfyUI_CompareModelWeights
- ComfyUI_Diffusers
- ComfyUI_MasaCtrl
- ComfyUI_VisualAttentionMap
- ComfyUI_SelfGuidance
- ComfyUI_CrossImageAttention
- ComfyUI_Style_Aligned
- ComfyUI_M3Net
- ComfyUI_VideoEditing
- ComfyUI_InternVL2
- ComfyUI_LLaSM
- ComfyUI_Qwen3Omni
- ComfyUI_Gemma3
- ComfyUI_BatchPrompt
- ComfyUI_KimiVL
- ComfyUI_FluxLayerDiffuse
- ComfyUI_QWQ32B
- ComfyUI_FluxAttentionMask
- ComfyUI_Moonlight
- ComfyUI_DeepSeekVL2
- ComfyUI_ChatGen
- ComfyUI_1Prompt1Story
- ComfyUI_Cogview4
- ComfyUI_FluxClipWeight
- ComfyUI_CompareModelWeights
- ComfyUI_FluxCustomId
- sd_webui_ZeST
- sd_webui_instantid
- sd_webui_prompt_translator_architecture
- sd_webui_musetalk
- sd_webui_tokenize_anything
- sd_webui_ootdiffusion
- sd_webui_animate_anything
- sd_webui_powerpaint
- sd_webui_outpainting
- sd_webui_matting
- sd_webui_reatime_lcm_canvas
- sd_webui_beautifulprompt
- sd_webui_lama
- sd_webui_sghm
CV and Creatives
- CV
    - Camera_blur_detection :Perform region detection on photos captured by the camera and provide a blur determination, C++ code, using FastDeploy for multi-platform deployment, VS2019.
- mmdetection_add :Add the implemented object detection algorithms, including EfficientDet, YOLOv4/v5, etc.
- mmclassification_add :Add the implemented classification algorithms to mmcls, including GhostNet, etc.
- Answer_card_identification :Answer Sheet Project, intelligent grading.
- mmsynth :Reorganized text_render in mm format.
 
- Creatives
    - TextErasing :Text erasing algorithm, Alibaba’s self-supervised text erasing with controllable image synthesis algorithm, will provide two versions.
- AllRank :Learn-to-rank framework, the re-ranking module in recall/coarse ranking/fine ranking/re-ranking, previously mainly used for dynamic creative optimization to re-rank features including images.
- mmgeneration_add :GAN and other traditional image generation algorithms.
- Xiaobao :VideoClip, a video editing application.
 
Deployment Acceleration
- KuaiZai :Mainly some project codes for multi-platform deployment.
- PlateRec :License plate recognition, based on PaddleOCR, ONNX Runtime, C++.
- Yolov5_rknnlite2 :YOLOv5 pedestrian detection, deployed on RK3588, RKNLite2.
Hyperspectral classification
- DGCNet-for-HSI
- FSKNet-for-HSI
- STNet-for-HSI
- WCNet-for-HSI
- SGDSCNet-for-HSI
- MVNet-for-HSI
- KANet-for-HSI
- EKGNet-for-HSI
- DACNet-for-HSI
- mmhyperspectral