Suppose we are given a stream of characters and we need to find the first non-repeating character from the stream. We need to tell the first non-repeating character in O(1) time at any moment in python for data science.
We will use a DLL (doubly linked list) for efficiently getting the first non-repeating character from the stream. DLL will contain all the non-repeating characters in order.
We will maintain two arrays. One array will maintain the characters which we have already visited two or more times; we will call it repeated. The other array consists of pointers to linked list nodes, we will call it DLL. Both the arrays will be of equal size, i.e., the size of the alphabet, which is 256.
Steps for finding the character –
- First, we will create an empty DLL. We will also create two arrays in DLL and repeated of size 256. repeated[x] will be true in case x is repeated two or more times. In DLL[x] we have pointers to a DLL node in case the character x is present in DLL otherwise, it will be NULL.
- We will next initialize all the entries of inDLL as Null and repeated as false.
- To obtain the first non-repeating character, we need to return the character at the head of the DLL.
- We need to take the following steps in order to process a new character ‘x’ in a stream.
- In case the repeated[x] is true, we need to ignore the character.
- If repeated[x] is false and inDLL[x] is NULL, we will append x to DLL and then store the address of new DLL node in the inDLL[x].
- In case repeated[x] is false and inDLL[x] is not NULL, we will get the DLL node of x by using inDLL[x] and removing the node. We will also mark inDLL[x] as NULL and repeated[x] as true.
In case we maintain a tail pointer, we will append a new node to DLL is O(1). If we remove a node from DLL also it will be O(1). So, in both the operations addition of new character and finding the first noon-repeating character will take O(1) time.
Below we will see the implementation of this in Python.