Technology with Shaw Pence

Obstacles on and decisions behind a journey to an RAG system

Recently, I was assigned to build an RAG system to mitigate LLM’s hallucination on proprietary documents. I can take advantage of the given pre-trained BERT/Roberta models and an API towards a gte model as an encoding service as project resources. Therefore, I don’t need to build the system from scratch. Here’s something I learned from this project from obstacles and my decisions.

RAG, a bird’s-eye view.

Let’s quickly review RAG for those who did not get acquainted with it. If you already know about it, skip this section.

RAG, Retrieval Augmented Generation, is a practical approach to alleviate the hallucination problem of LLM. Hallucination means a status when LLM cannot respond to users’ input correctly, even concoct misinformation in a severe tone. This usually happens when LLM meet a problem which does not exist in its training dataset. RAG fixes this problem by furnishing the LLM with relevant information as prompts embedded into the user’s input prompts. This was first formulated in (Jiang et al., 2023) by a team whose members are from Facebook, UCL, and NYU. The name of RAG indicates that the system comprises three components (or processes): Retrieval, Augmentation, and Generation. Making each part happens can produce an LLM, which tells jargon.
(more…)

April 24, 2024
A Retrospective on Chatbots: History, Technologies and Applications
```
This is one of my course reports at first sumbitted in April, 2023 and published here in Dec, 2023.
```
History

When ChatGPT has been introduced months ago, the current dynamics of chatbots advances Alan Turing (1950)’s ideal, where he proposed the question “Can machines think”. This retrospective will look back at the history of chatbots, from the early days of chatbots to the present day, highlighting the past milestones that have shaped today’s more powerful chatbots.

The first chatbot is called ELIZA (Weizenbaum, 1966) as a response to the Turing test (Shum et al., 2018). ELIZA was invented by Weizenbaum in the Artificial Intelligence Laboratory of MIT in 1964 (Skrebeca et al., 2021). This machine, named DOCTOR at that time, worked as a Rogerian psychotherapist to answer emotions-related questions. Because Weizenbaum only used a rule-based approach and limited patterns to generate answers in ELIZA, it is evident that these utterances deviate from humans’ words, especially from professional therapists.

PARRY is another robot in the early years created by Kenneth Colby, a Stanford psychiatrist as well as computer scientist, in 1972 (Colby, 1976). As a part of psychology research, this work assumes the role of a simulated paranoid schizophrenic patient rather than a doctor as ELIZA, equipping with effective emotion variables like anger, fear, and mistrust. Compared with ELIZA, beyond the similar pattern and rule-based approach shared by both of them, PARRY is more well-engineered with a more advanced structure (Thorat & Jadhav, 2020).

Ractor is another notable chatbot (Chamberlain, 1984). It was developed by William Chamberlain and Thomas Etter in 1983 for Amiga, Apple II, Macintosh platforms (Zemčík, 2019). The main function of Ractor is to create proses like the following one:
(more…)
December 30, 2023
On Some NumPy Functions

Recently, I have been working on a Recommender System model, and I frequently use NumPy to preprocess data before feeding them into the PyTorch model. I have discovered some problems in NumPy functions, so I am sharing them as my experience and a part of my coding notes.

np.datetime64 This is a data type addressing date and time data in a NumPy matrix. However, it looks like this function is quite costly. When indexing on a matrix with np.datetime64 column(s), the processing will become very slow. Here’s my advice: avoid using it, and try to replace it with some other measures. For example, if you want to do some simple arithmetic on days, you’d better operate integer plus or minus n*24*60*60 on timestamp; do not convert it to a np.datetime64 type.

np.datetime64 is stored as int64 in memory. If you try to store it as int32, it will overflow.

When you concatenate an int32 matrix with an int64 matrix, int32 will convert to float64 automatically, not int64, due to the difference in precision and length.

np.isin This is a function try to detect whether an element in a NumPy container exists in another NumPy container. It is useful when we try to do intersection operations between a matrix and a vector, specifically if we have a matrix and we want to filter the matrix by doing an intersection on a column of this matrix with another specified vector. In this scenario, other intersection functions are not applicable, such as np.in1d or np.intersect1d. However, this is also a costly function. Therefore, the only way I can do now to accelerate this function is to leverage multiple processes.

These are my observations. If you find any error or disagree, it is welcome to discuss it below.

November 6, 2023