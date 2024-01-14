In the realm of business intelligence and data analytics, SAP BusinessObjects stands tall as a powerful tool, empowering organizations to transform raw data into actionable insights. While its capabilities are robust, accessing and managing data programmatically can offer even greater flexibility. This blog post delves into the world of SAP BusinessObjects and Python, demonstrating a step-by-step approach to retrieve the list of documents effortlessly. Why This Matters In SAP BusinessObjects, years of operation can result in a cluttered mess of documents and folders. Cleaning up this chaos is crucial for data teams. By using Python to retrieve the details like path and last modified date and status for all documents, you gain a powerful tool. Python Solution Part 1: Authentication To initiate the authentication process, please replace the placeholder values for "username," "password," and "localhost" with your specific configuration details. import requests\nimport pandas as pd\nimport xml.etree.ElementTree as ET\n\n# define the login request parameters\nusername = 'username'\npassword = 'password'\nlocalhost = 'localhost'\n\nauth_type = 'secEnterprise'\nlogin_url = 'http://{}:6405/biprws/logon/long'.format(localhost)\nlogin_data = f'<attrs xmlns="http://www.sap.com/rws/bip"><attr name="userName" type="string">{username}</attr><attr name="password" type="string">{password}</attr><attr name="auth" type="string" possibilities="secEnterprise,secLDAP,secWinAD,secSAPR3">{auth_type}</attr></attrs>'\nlogin_headers = {'Content-Type': 'application/xml'}\n\n# send the login request and retrieve the response\nlogin_response = requests.post(login_url, headers=login_headers, data=login_data)\n\n# parse the XML response and retrieve the logonToken\nroot = ET.fromstring(login_response.text)\nlogon_token = root.find('.//{http://www.sap.com/rws/bip}attr[@name="logonToken"]').text\napi_headers = {'Content-Type': 'application/xml', 'X-SAP-LogonToken': logon_token} The code focuses on the initial authentication process, forming a secure connection to the server. User credentials, server details, and authentication type are configured, and a POST request is made to the specified login URL. The XML response from the server is parsed to extract the crucial . This token is then employed to construct headers for subsequent API requests, ensuring authenticated access to SAP BusinessObjects. logonToken Part 2: Data Retrieval and DataFrame Creation Previewing Retrieved Data: First Document's Name As we venture into data retrieval from SAP BusinessObjects, a peek at the obtained information reveals its structure. This Python snippet fetchs all the information about documents from the server. If you run the code, it will print the name of the first document. url = "http://{}:6405/biprws/raylight/v1/documents/".format(localhost)\nresponse = requests.get(url,api_headers)\nroot = ET.fromstring(response.text)\n\nfirst_docu_key = root.findall('document')[0][2].tag\nfirst_docu_item = root.findall('document')[0][2].text\nprint(first_docu_key, ":", first_docu_item) Data Transformation Functions: Transform to DataFrame The Python functions, and , work together to simplify SAP BusinessObjects data retrieval. The first function transforms XML data into a structured pandas DataFrame, capturing document attributes. The second function efficiently handles scenarios with documents exceeding a single request's limit by appending multiple DataFrames. Collectively, these functions streamline the conversion of XML to DataFrame and provide an easy solution for handling a large number of documents. get_dataframe_from_response get_all_dataframe def get_dataframe_from_response(response):\n # Parse the XML data\n root = ET.fromstring(response.text)\n # Extract the data into a list of dictionaries\n res = []\n for item in root.findall('document'):\n doc_dict = {}\n for elem in item.iter():\n if elem.text is not None:\n doc_dict[elem.tag] = elem.text\n res.append(doc_dict)\n # Convert the list of dictionaries to a pandas dataframe\n df = pd.DataFrame(res)\n return df\n\ndef get_all_dataframe(url):\n documents = []\n for i in range(50):\n offset = i * 50\n url_offset = url + "?offset={}&limit=50".format(offset)\n response = requests.get(url_offset, headers=api_headers)\n df = get_dataframe_from_response(response=response)\n if df.empty:\n break\n else:\n documents.append(df)\n dataframe = pd.concat(documents, axis=0)\n return dataframe Retrieve detailed information about SAP BusinessObjects documents effortlessly using a single line of Python code. Utilize the function, and the resulting DataFrame provides a straightforward overview of document attributes. get_all_dataframe df_documents url = "http://{}:6405/biprws/raylight/v1/documents/".format(localhost)\ndf_documents = get_all_dataframe(url=url)\nprint(df_documents.head()) Showcasing df_documents: What follows is a glimpse into the dataframe structure document id cuid name folderId description 10283 AfZQen_U5hGgHqB8 Revenue Report 10782 NaN 12012 AUgbex_JocxFfvSFw Sales Report 11931 NaN 12435 AaGqyXfPrFIuC1Eac Cost Report 11965 NaN 11232 ATvl8iD_ii2HdxkKEY Inventory Report 11038 NaN 11023 cyslJAAy.JAJBB13hE Finance Report 11021 NaN Part 3: Document Details Extraction If you need additional details such as the document's path, last updated time, scheduling status, size, and refresh status, utilize the following function. This function fetches the specified details for each document in the DataFrame, providing a more comprehensive overview of each entry. df_documents def get_document_detail(documentID, detail):\n url = 'http://{}:6405/biprws/raylight/v1/documents/{}'.format(localhost, documentID)\n res = requests.get(url, headers={\n "Accept": "application/json",\n "Content-Type": "application/json",\n "X-SAP-LogonToken": logon_token\n }).json()\n return res['document'][detail]\n\ndef get_more_information_from_documents(df):\n\t\tdetails = ['path', 'updated', 'scheduled', 'size', 'refreshOnOpen']\n\t\tfor detail in details:\n df[detail] = [get_document_detail(id, detail) for id in df['id'].values]\n return df\n\ndf_documents_more_info = get_more_information_from_documents(df_documents) Showcasing df_documents_more_info: document id cuid name folderId description path updated scheduled size refreshOnOpen 10283 AfZQen_U5hGgHqB8 Revenue Report 10782 NaN Public Folders/Test 2023-06-04T08:24:23.461Z false 64613 true 12012 AUgbex_JocxFfvSFw Sales Report 11931 NaN Public Folders/Test 2023-06-04T08:30:17.907Z false 64481 true 12435 AaGqyXfPrFIuC1Eac Cost Report 11965 NaN Public Folders/Test 2020-06-22T02:06:55.858Z false 65471 true 11232 ATvl8iD_ii2HdxkKEY Inventory Report 11038 NaN Public Folders/Test 2023-07-17T08:06:38.444Z false 171294 true 11023 cyslJAAy.JAJBB13hE Finance Report 11021 NaN Public Folders/Test/Test 2023-07-08T03:04:05.241Z false 168952 true Thank you for taking the time to explore data-related insights with me. 