AI Newsletter #007 (2023/07/03 – 2023/07/09)

Big Company

Google: Plans to Train Its AI with The Entire Public Internet
Google has updated its privacy policy to allow it to use publicly available information to train its AI models, such as Google Translate, Bard, and Cloud AI capabilities. This means that anything posted online, including on Google products, can be used to improve its AI tools. Google is now using large language models (LLMs) to power its new AI projects.

OpenAI: ChatGPT’s Explosive Growth Shows First Decline in Traffic Since Launch
ChatGPT, a popular AI chatbot, experienced a decline in website traffic and unique visitors for the first time in June, according to analytics firm Similarweb. This is likely due to the novelty of the chatbot wearing off and an increased demand for generative AI with real-time information.

OpenAI: Makes GPT-4 Generally Available
OpenAI has released GPT-4, its latest text-generating model, through its API. GPT-4 can generate text (including code) and accept image and text inputs, and performs at a human level on various benchmarks. OpenAI is also making its DALL-E 2 and Whisper APIs generally available, and plans to deprecate old models. In the future, OpenAI will allow developers to fine-tune GPT-4 and GPT-3.5 Turbo with their own data.

OpenAI: Code Interpreter Comes to All ChatGPT Plus Users
OpenAI has released its Code Interpreter plug-in to ChatGPT Plus subscribers, allowing users to generate charts, maps, data visualizations, analyze music playlists, create interactive HTML files, clean datasets and extract color palettes from images.

New Product / Product Updates

MindOS: Create Autonomous AI Agents for Your Professional Tasks
MindOS is the professional agent version of Character AI. With MindOS, you can create your own agents or use ones created by others. MindOS lets you provide external documents or links to create a knowledge base for the agent. Unlike other AI agent projects that can take hours to converge, MindOS provides you with an answer in seconds.

LangChain: Lets You See What Your Agents are Thinking
Streamlit, an AI app development platform, has been integrated into LangChain, a top toolkit for LLM integration. This integration give developers a deeper look into the ‘thought processes’ of their AI agents.

OpenAI: ChatGPT Users Abused Web Browsing Feature So OpenAI Has Turned It Off
OpenAI has disabled its “Browse with Bing” feature on ChatGPT due to security concerns, but some GPT-4 plugins still allow users to ask the chatbot to read URLs and PDFs, causing frustration among ChatGPT Plus users.


This week, we introduce three research works aiming at extending the context length of large language models.

LongNet: Scaling Transformers to 1,000,000,000 Tokens
Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted.

The authors introduce LONGNET, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, they propose dilated attention, which expands the attentive field exponentially as the distance grows.

LongNet has significant advantages: 1) it has a linear computation complexity and a logarithm dependency between tokens; 2) it can be served as a distributed trainer for extremely long sequences; 3) its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization.

Experiments results demonstrate that LONGNET yields strong performance on both long-sequence modeling and general language tasks. This work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence. The code is available at

Extending Context Window of Large Language Models Via Position Interpolation
Position Interpolation (PI) extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B.

Meanwhile, the extended model by Position Interpolation preserves quality relatively well on tasks within its original context window.

The key idea of Position Interpolation is to down-scale the position indices so that the maximum position index matches the previous context window limit in the pre-training stage.

Models extended via position Interpolation retain their original architecture and can reuse most pre-existing optimization and infrastructure.

Focused Transformer: Contrastive Training for Context Scaling
The idea of Focused Transformer (FoT) comes from the plain idea to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys.

The authors identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish.

To tackle this problem, the authors introduce FoT, a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length.

The resulting models fine-tuned on 3B and 7B OpenLLaMA checkpoints, which are named LongLLaMA, exhibit advancements in tasks requiring a long context. The authors further illustrate that LongLLaMA models adeptly manage a 256k context length for passkey retrieval.

Want to receive TechNavi Newsletter in email?