Extract with the help of beautiful soup


I want to fetch the stock price from web site: http://www.bseindia.com/ For example stock price appears as "S&P BSE :25,489.57".I want to fetch the numeric part of it as "25489.57"

This is the code i have written as of now.It is fetching the entire div in which this amount appears but not the amount.

Below is the code:

from bs4 import BeautifulSoup
from urllib.request import urlopen

page = "http://www.bseindia.com"

html_page = urlopen(page)

html_text = html_page.read()
soup = BeautifulSoup(html_text,"html.parser")
divtag = soup.find_all("div",{"class":"sensexquotearea"})
for oye in divtag:
    tdidTags = oye.find_all("div", {"class": "sensexvalue2"})

    for tag in tdidTags:
        tdTags = tag.find_all("div",{"class":"newsensexvaluearea"})
        for newtag in tdTags:
            tdnewtags = newtag.find_all("div",{"class":"sensextext"})
            for rakesh in tdnewtags:
                tdtdsp1 = rakesh.find_all("div",{"id":"tdsp"})
                for texts in tdtdsp1:

I had a look around in what is going on when that page loads the information and I was able to simulate what the javascript is doing in python.

I found out it is referencing a page called IndexMovers.aspx?ln=en check it out here

It looks like this page is a comma separated list of things. First comes the name, next comes the price, and then a couple other things you don't care about.

To simulate this in python, we request the page, split it by the commas, then read through every 6th value in the list, adding that value and the value one after that to a new list called stockInformation.

Now we can just loop through stock information and get the name using item[0] and price with item[1]

import requests

newUrl = "http://www.bseindia.com/Msource/IndexMovers.aspx?ln=en"
response = requests.get(newUrl).text
commaItems = response.split(",")

#create list of stocks, each one containing information
#index 0 is the name, index 1 is the price
#the last item is not included because for some reason it has no price info on indexMovers page
stockInformation = []
for i, item in enumerate(commaItems[:-1]):
    if i % 6 == 0:
        newList = [item, commaItems[i+1]]

#print each item and its price from your list
for item in stockInformation:
    print(item[0], "has a price of", item[1])

This prints out:

S&P BSE SENSEX has a price of 25489.57
SENSEX#S&P BSE 100 has a price of 7944.50
BSE-100#S&P BSE 200 has a price of 3315.87
BSE-200#S&P BSE MidCap has a price of 11156.07
MIDCAP#S&P BSE SmallCap has a price of 11113.30
SMLCAP#S&P BSE 500 has a price of 10399.54
BSE-500#S&P BSE GREENEX has a price of 2234.30
GREENX#S&P BSE CARBONEX has a price of 1283.85
CARBON#S&P BSE India Infrastructure Index has a price of 152.35
INFRA#S&P BSE CPSE has a price of 1190.25
CPSE#S&P BSE IPO has a price of 3038.32
#and many more... (total of 40 items)

Which clearly is equivlent to the values shown on the page

So there you have it, you can simulate exactly what the javascript on that page is doing to load the information. Infact you now have even more information than was just shown to you on the page and the request is going to be faster because we are downloading just data, not all that extraneous html.