Code

AIGC

  • Summary
  • LLM
    • EcommerceLLM :E-commerce scene LLM fine-tuned based on qwen1.5 and llama3.
    • EcommerceLLMQwen2.5 :A Qwen2.5 series e-commerce large language model fine-tuned on e-commerce data.
    • MiniLLaMA3 :A mini version of Llama 3, covering the entire pipeline from data construction (0-1), tokenizer training, pre-training (PT), supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF).
    • ECOMCPM :Language model trained on e-commerce data from ‘What’s Worth Buying’ website, a Chinese pretrained model similar to GPT2.
    • UniFlow :Unified large language model, multimodal, FLUX generation interface, FastAPI deployment, configurable deployment services.
    • Medical_R1 :Fine-tune deepseek-r1 on medical data.
    • EcommerceLLMQwen3 :Qwen3 Series E-commerce Large Model Fine-tuned with E-commerce Data E-commerce Large Model after E-commerce Data SFT.
  • LMM
    • XrayLLaVA :Xray Large Multi-modal Model, fine-tuned on LLaVA for the Xray’s multi-modal large model, using 4 V100 GPUs based on the llava1d6-mistral-7b-instruct model. LLaVA is among the most popular model methodologies and architectures in large multi-modal language models. Fine-tuning LLaVA helps us evaluate and compare the potential of training large multi-modal language models in vertical scenarios.
    • XrayQwenVL : Xray Large Multi-modal Model, fine-tuned on QwenVL for Xray’s multi-modal large model, using 4 V100 GPUs based on the qwenvl-chat model for fine-tuning.
    • XrayQwen2VL :Fine-tuned on Qwen2vl using the Xray open-source dataset. The training LORA weights have been released for academic research. For inference, the original qwen2-vl-7b-instruct weights need to be loaded separately, and the LORA weights can be merged using llamafactory’s merge LORA function. Llamafactory 0.9.0 was used for fine-tuning in this experiment.
    • EcommerceOCRBench :A larger-scale OCR benchmark dataset for multimodal large language models in e-commerce, modeled after OCRBench.
    • OCRPaliGemma :A multimodal large language model with a focus on OCR text detection.
    • XrayLLama3.2Vision :Xray Large Multi-model Model, based on llama3.2-vision fine-tuning Xray’s multi-modal large model, fine-tuned on 4 A800 based on llama3_2-11b-vision-instruct model.
  • SD
    • HOME-CLIP :ChineseCLIP model was fine-tuned on home decoration and furniture data crawled from Visual China.
    • HOME-DALLE1 :DALL-E 1 model for Chinese home decoration and furniture scenes.
    • controlnet_aux_add :Auxiliary functions of ControlNet, an additional library of huggingface’s ControlNet aux, adding preprocessors not present in aux.
    • EcommerceSD :A focus on image generation in e-commerce scenarios, including model generation and inpainting.
    • MaskControlnet :A ControlNet-based generative model conditioned on masks, trained on a massive dataset of e-commerce cutout images (saliency map detection data).
    • ChatAce :Picture editing based on flux acp++, mainly character consistency editing.
    • ChatFlux :webui based chatdit, supports generating pictures through conversations.
    • Typemovie-ParaAttention :TypeMovie-ParaAttention is an enhanced version of ParaAttention, designed to accelerate Diffusion Transformer (DiT) model inference with context parallelism, dynamic caching, and a new high-performance SageAttention backend.
    • EditIDv2 :Typemovie’s EditIDv2 ensures character identity consistency in complex text-to-image generation, using minimal data for enhanced semantic editing, as shown in IBench tests.
    • IBench :Image evaluation system in Editid.
    • RealtimeFlux :This is the first model enabling real-time flux-based sketch-to-image generation, akin to Ji Meng’s Smart Canvas and Krea.ai’s real-time rendering, built on the Nunchaku and Flux framework.
  • Video generation
  • Digital Human
    • Wav2lipAll :Training a virtual digital human based on wav2lip, with lip shape driving, including data processing procedures, etc. The model includes sizes 96x96, 192x192, 192x288, 288x288.
    • TalkingFace :Training set for 2D virtual digital human projects similar to wav2lip, geneface++.

Comfyui-extension and Stable-diffusion-webui-extension

CV and Creatives

  • CV
    • Camera_blur_detection :Perform region detection on photos captured by the camera and provide a blur determination, C++ code, using FastDeploy for multi-platform deployment, VS2019.
    • mmdetection_add :Add the implemented object detection algorithms, including EfficientDet, YOLOv4/v5, etc.
    • mmclassification_add :Add the implemented classification algorithms to mmcls, including GhostNet, etc.
    • Answer_card_identification :Answer Sheet Project, intelligent grading.
    • mmsynth :Reorganized text_render in mm format.
  • Creatives
    • TextErasing :Text erasing algorithm, Alibaba’s self-supervised text erasing with controllable image synthesis algorithm, will provide two versions.
    • AllRank :Learn-to-rank framework, the re-ranking module in recall/coarse ranking/fine ranking/re-ranking, previously mainly used for dynamic creative optimization to re-rank features including images.
    • mmgeneration_add :GAN and other traditional image generation algorithms.
    • Xiaobao :VideoClip, a video editing application.

Deployment Acceleration

  • KuaiZai :Mainly some project codes for multi-platform deployment.
  • PlateRec :License plate recognition, based on PaddleOCR, ONNX Runtime, C++.
  • Yolov5_rknnlite2 :YOLOv5 pedestrian detection, deployed on RK3588, RKNLite2.

Hyperspectral classification

Leraning

Hit Counter