AI Newsletter #003 (2023/06/05-2023/06/11)

Big Companies

Apple: AI in WWDC

At WWDC 2023, Apple showcased the integration of AI into its upcoming products, including iOS, iPadOS, macOS, AirPods, and the Apple Vision Pro augmented reality headset. Here are the major AI-powered features:

1. In iOS 17, the keyboard leverages a Transformer language model, which is state-of-the-art for word prediction, making autocorrect more accurate than ever.
2. In iOS 17, the new app Journal uses on-device machine learning to create personalized suggestions of moments to inspire your writing.
3. The AirPods will use machine learning to adjust to the user’s environment.
4. WatchOS 10 introduces a new way to enjoy the watch face. Turn the digital crown to reveal widgets in a Smart Stack. The Smart Stack uses machine learning to show you relevant information right when you need it.
5. In order to enable users to see each other during video communication, Apple Vision Pro uses an advanced encoder-decoder neural network to create a digital persona that represents your facial and hand movements.

Watch WWDC 2023 at:

Google: A Big Week of Partnerships

Google Cloud announced three major strategic partnerships this week with Priceline, Mayo Clinic, and Salesforce.

Google and Priceline have partnered to provide customers with AI-powered tools to help them find hotels in New York near attractions. The AI technology will be able to generate text as if it were written by a human and extract accurate information from existing data.
Read more at:

Google Cloud and Mayo Clinic have partnered to use AI technologies in healthcare, providing medical professionals with quick access to patient information. This is the first step in a larger collaboration between the two companies to develop AI applications for healthcare.
Read more at:

Google Cloud and Salesforce have announced a strategic partnership to help businesses leverage data and AI. The partnership includes products and services such as Google’s BigQuery tooling, Salesforce’s Data Cloud, and Google’s Vertex AI.
Read more at:

Zoom and Cisco: Adding AI Features to Video Conferences

On June 5th, Zoom introduced new generative AI-powered features called ZoomIQ: the Zoom Meeting Summary and Zoom Team Chat Compose. Two days later, Cisco announced its plan for “intelligent meeting recaps”. Earlier, Microsoft Teams and Salesforce’s Slack had already added GPT-powered features to their product.

Read more at:


Funding Rounds This Week

AI startup Cohere raises funds from Nvidia, valued at $2.2 billion
Cohere, an AI foundation model company based in Toronto, has raised $270 million in a funding round from investors such as Nvidia, Oracle and Salesforce Ventures, and is now valued at $2.2 billion.

Read more at:


Interesting Product Updates

Personalized Experience: The New Trend of AI Chatbot?

According to leaked information, ChatGPT will have two upcoming features: profile and file uploading. The add of profiles is expected to offer users personalized chat experience, and the file uploading is expected to reveive and analyzed files.

Perplexity also announced to add the profile feature.

Read more at:

Magic AI’s LTM-1: an LLM with a 5,000,000 token context window

Magic is a code-generating platform and it is releasing LTM-1, which enables 50x larger context windows than transformers. This means Mafic can see the entire repository of code.

Product Page:


Interesting Project


OpenDan is an open source personal AI OS. It provides a runtime environment for various AI modules as well as protocols for interoperability between them. With OpenDan, users can securely collaborate with various AI modules using their private data to create powerful personal AI agents, such as butler, lawyer, docter, teacher, assistant, girl or boy friends.

This video explains well what OpenDan is and the difference between OpenDan and ChatGPT:

The project page:

Mr. Ranedeer AI Tutor

With just a paragraph as a prompt, GPT-4 can become an omnipotent tutor. The Mr. Ranedeer project has gathered more than 13k stars on Github. With this project, the learning subject, depth of knowledge, and communication style can all be adjusted, allowing AI to help you learn any subject, 24/7, without losing patience.

It’s worth noting that the author behind this project is a 17-year-old high school student.

Project Page:


Research of The Week

Fine-Tuning Language Models with Just Forward Passes

Fine-tuning language models has demonstrated outstanding performance in various downstream tasks. However, the parameter size of these massive models often reaches billions or even hundreds of billions, requiring a significant amount of memory to train. Additionally, traditional backpropagation methods exhibit slow optimization in such large-scale models.

The authors of this paper propose MeZO, a memory-efficient zeroth-order optimizer. By improving the classical Zeroth-Order Stochastic Gradient Descent (ZO-SGD) method to operate in-place, this approach achieves the same memory usage as the inference stage, making fine-tuning language models more efficient. For instance, using a single A100 80GB GPU, MeZO can train a model with 30 billion parameters, while traditional backpropagation methods can only train models with 2.7 billion parameters under the same budget.

As shown in the figure below, the authors conducted experiments on the OPT-13B model and compared their results. Despite using only 1/12 of the memory, MeZO outperformed zero-shot and ICL in seven tasks.

Read the full paper at:

White-Box Transformers via Sparse Rate Reduction

In this paper, the researchers argue that the objective of representation learning is to compress and transform the distribution of data (such as token sets), towards a mixture of
low-dimensional Gaussian distributions on an incoherent subspace. The quality of the final representation can be measured by a unified objective function called sparse rate reduction.

From this perspective, popular deep network models like Transformer can naturally be seen as realizing iterative schemes to progressively optimize this objective.

Specifically, the research findings indicate that standard Transformer blocks can be derived from alternating optimizations of complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token set by minimizing their lossy coding rate, while the subsequent multi-layer perceptron attempt to sparsify the representation of tokens.

This leads to a family of white-box transformer-like deep network architectures which are mathematically fully interpretable. Despite their simplicity, experiments show that these networks indeed learn to optimize the designed objective: they compress and sparsify representations of large-scale real-world vision datasets such as ImageNet, and achieve performance very close
to thoroughly engineered transformers such as ViT.

Find the paper at:

StyleDrop: Text-to-Image Generation in Any Style

Text-to-image models trained on large image and text pairs have enabled the creation of rich and
diverse images encompassing many genres and themes. Many effort has been put into “prompt engineering” to represent the style of the image, such as adding “Van Gogh”, “anime” or “steampunk” into the prompt. However, a wide range of styles are simply hard to describe in
text form, due to the nuances of color schemes, illumination and other characteristics.

In this paper, the authors introduce StyleDrop which allows significantly higher level of stylized text-to-image synthesis, using as few as one image as an example of a given style. Experiments show that StyleDrop achieves unprecedented accuracy and fidelity in stylized image synthesis.

Read the full paper at:

Want to receive TechNavi Newsletter in email?