Python Notebook Setup
I used this workflow to summarize roughly twenty posts from my old blog.
The responses from OpenAI were slow, because this is a lot of work, and the default timeout set by LangChain was ten minutes, so I ended up running it on batches of five articles at a time.
Grouping 'like' articles increased the quality of responses and I was very pleased to see the intermediate steps. The output text was good, but the intermediate steps contained a level of detail I didn't expect.
- LangChain and requisites installed. See LangChain installation docs
- A folder named
.txtfiles you want to summarize.
- An API key from OpenAI
- A file named
constants.pyto store the API key
Load the API key and documents.
The articles are stored like this.
map_chain so we can run it later.
Do the same for the reduce_chain.
This last block does the following:
- Initialize chains for combining, mapping, and reducing documents using LangChain modules.
- Split the input documents into manageable chunks using a character-based text splitter.
- Execute the map-reduce chain on the split documents and retrieve the output text and intermediate steps.
For brevity, here is the Intermediate Steps of the first document.
This modular research workflow empowers you to adapt and extend the code to suit your specific research needs.
Feel free to modify the code according to your research requirements and explore different document types by adjusting the document loaders.