Code
AIGC
- Summary
- Awesome-Chinese-Stable-Diffusion :Focus on some basic models of Chinese stable diffusion.
- LLM
- EcommerceLLM :E-commerce scene LLM fine-tuned based on qwen1.5 and llama3.
- EcommerceLLMQwen2.5 :A Qwen2.5 series e-commerce large language model fine-tuned on e-commerce data.
- MiniLLaMA3 :A mini version of Llama 3, covering the entire pipeline from data construction (0-1), tokenizer training, pre-training (PT), supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF).
- ECOMCPM :Language model trained on e-commerce data from ‘What’s Worth Buying’ website, a Chinese pretrained model similar to GPT2.
- LMM
- XrayLLaVA :Xray Large Multi-modal Model, fine-tuned on LLaVA for the Xray’s multi-modal large model, using 4 V100 GPUs based on the llava1d6-mistral-7b-instruct model. LLaVA is among the most popular model methodologies and architectures in large multi-modal language models. Fine-tuning LLaVA helps us evaluate and compare the potential of training large multi-modal language models in vertical scenarios.
- XrayQwenVL : Xray Large Multi-modal Model, fine-tuned on QwenVL for Xray’s multi-modal large model, using 4 V100 GPUs based on the qwenvl-chat model for fine-tuning.
- XrayQwen2VL :Fine-tuned on Qwen2vl using the Xray open-source dataset. The training LORA weights have been released for academic research. For inference, the original qwen2-vl-7b-instruct weights need to be loaded separately, and the LORA weights can be merged using llamafactory’s merge LORA function. Llamafactory 0.9.0 was used for fine-tuning in this experiment.
- EcommerceOCRBench :A larger-scale OCR benchmark dataset for multimodal large language models in e-commerce, modeled after OCRBench.
- OCRPaliGemma :A multimodal large language model with a focus on OCR text detection.
- SD
- HOME-CLIP :ChineseCLIP model was fine-tuned on home decoration and furniture data crawled from Visual China.
- HOME-DALLE1 :DALL-E 1 model for Chinese home decoration and furniture scenes.
- controlnet_aux_add :Auxiliary functions of ControlNet, an additional library of huggingface’s ControlNet aux, adding preprocessors not present in aux.
- EcommerceSD :A focus on image generation in e-commerce scenarios, including model generation and inpainting.
- MaskControlnet :A ControlNet-based generative model conditioned on masks, trained on a massive dataset of e-commerce cutout images (saliency map detection data).
- Video generation
- EcommerceVideoDataset :A dataset primarily used for creating e-commerce animations.
- Digital Human
- Wav2lipAll :Training a virtual digital human based on wav2lip, with lip shape driving, including data processing procedures, etc. The model includes sizes 96x96, 192x192, 192x288, 288x288.
- TalkingFace :Training set for 2D virtual digital human projects similar to wav2lip, geneface++.
Comfyui-extension and Stable-diffusion-webui-extension
- ComfyUI_AliControlnetInpainting
- ComfyUI_CompareModelWeights
- ComfyUI_Diffusers
- ComfyUI_MasaCtrl
- ComfyUI_VisualAttentionMap
- ComfyUI_SelfGuidance
- ComfyUI_CrossImageAttention
- ComfyUI_Style_Aligned
- ComfyUI_M3Net
- ComfyUI_VideoEditing
- ComfyUI_InternVL2
- ComfyUI_LLaSM
- sd_webui_ZeST
- sd_webui_instantid
- sd_webui_prompt_translator_architecture
- sd_webui_musetalk
- sd_webui_tokenize_anything
- sd_webui_ootdiffusion
- sd_webui_animate_anything
- sd_webui_powerpaint
- sd_webui_outpainting
- sd_webui_matting
- sd_webui_reatime_lcm_canvas
- sd_webui_beautifulprompt
- sd_webui_lama
- sd_webui_sghm
CV and Creatives
- CV
- Camera_blur_detection :Perform region detection on photos captured by the camera and provide a blur determination, C++ code, using FastDeploy for multi-platform deployment, VS2019.
- mmdetection_add :Add the implemented object detection algorithms, including EfficientDet, YOLOv4/v5, etc.
- mmclassification_add :Add the implemented classification algorithms to mmcls, including GhostNet, etc.
- Answer_card_identification :Answer Sheet Project, intelligent grading.
- mmsynth :Reorganized text_render in mm format.
- Creatives
- TextErasing :Text erasing algorithm, Alibaba’s self-supervised text erasing with controllable image synthesis algorithm, will provide two versions.
- AllRank :Learn-to-rank framework, the re-ranking module in recall/coarse ranking/fine ranking/re-ranking, previously mainly used for dynamic creative optimization to re-rank features including images.
- mmgeneration_add :GAN and other traditional image generation algorithms.
- Xiaobao :VideoClip, a video editing application.
Deployment Acceleration
- KuaiZai :Mainly some project codes for multi-platform deployment.
- PlateRec :License plate recognition, based on PaddleOCR, ONNX Runtime, C++.
- Yolov5_rknnlite2 :YOLOv5 pedestrian detection, deployed on RK3588, RKNLite2.