3.2 Gen AI: Models & Architecture
Sun, 26 Apr 26
Course Structure & Feedback
Curriculum designed by Michigan team + Simply Learn collaboration
Instructors deliver pre-designed content, cannot modify curriculum
Feedback collection encouraged for curriculum team review
Student feedback drives potential content updates
Industry Context & Complexity
YouTube tutorials insufficient for production environments
Production deployment requires understanding of:
Platform integration
Security considerations
Scalability requirements
Cost optimization
Class accommodates varying experience levels (beginners to advanced)
Natural Language Processing Fundamentals
NLP enables machines to understand human language (text, audio, speech, video)
Core capabilities:
Natural Language Understanding (NLU)
Natural Language Generation (NLG)
Applications: translation, speech recognition, sentiment analysis
Three model categories:
Rule-based systems (if/else logic)
Deep learning models (RNN, CNN)
Statistical models (Markov chains)
Markov Chains & N-grams
Markov chains predict next word based solely on previous word
Limitations:
Short-term memory only
Cannot capture long-term dependencies
Lack semantic understanding
N-grams improvement:
Unigram: relies on 1 previous word
Bigram: relies on 2 previous words
Trigram: relies on 3 previous words
Historical applications: speech recognition, spelling correction, early chatbots
Large Language Models Architecture
Calculate joint probability distribution of entire sentences
Process full context simultaneously (not sequential like Markov chains)
Context window determines maximum input size
Gemini 1.5 Pro: 1 million tokens (~1,500 pages)
Key advantages over traditional models:
Parallel processing capability
Self-attention mechanism
Positional encoding for word relationships
LLM Training Process
Data preparation & corpus cleaning
Tokenization (character, word, or sub-word based)
Embedding creation (vector representations)
Neural network training through transformer architecture
Fine-tuning vs pre-training distinction:
Pre-training: creates foundation models
Fine-tuning: adapts for specific domains/use cases
Model Comparison Demo Results
Same mathematical word problem given to multiple LLMs
Inconsistent responses across models:
ChatGPT: “insufficient information” then “0 students”
DeepSeek: “0 students”
Claude Sonnet 4.6: “52 students”
Copilot: varied responses based on model version
Demonstrates probabilistic nature and production challenges
Production Considerations
Security: avoid sharing PII/enterprise data with public models
Evaluation: need ground truth datasets for testing
Cost optimization: balance model capability with expense
Latency requirements: thinking models slower but more accurate
Model selection factors:
Task specificity (general vs specialized)
Context length requirements
Licensing (open source vs proprietary)
Infrastructure needs for deployment
No comments:
Post a Comment