Production Level Knowledge & Tips

  1. 未分類
  2. 7 view

[Haskell][XML][CompleteSuggestionAPI]Parse result xml of Google Complete Suggestion API with Haskell

Target

Try to parse result XML of Google Complete Suggestion API with Haskell.

Environment

  • OS
    • Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
  • GHC
    • 7.4.1

Investigate way to achieve this

Access to the API

We can access the API like following URL.
Just HTTP GET is okay.
http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=[search words]
Network.HTTP is useful to do HTTP GET with Haskell.
import Control.Monad.Trans (liftIO)
import Network.HTTP

main = do
  resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
  putStrLn resultxml
This code use following methods.
Prelude> import Network.HTTP
Prelude Network.HTTP> :t Network.HTTP.simpleHTTP
Network.HTTP.simpleHTTP
  :: HStream ty =>
     Request ty -> IO (Network.Stream.Result (Response ty))

Prelude Network.HTTP> :t Network.HTTP.getRequest
Network.HTTP.getRequest :: String -> Request_String

Prelude Network.HTTP> :t getResponseBody
getResponseBody :: Network.Stream.Result (Response ty) -> IO ty

Prelude Network.HTTP> :t urlEncode
urlEncode :: String -> String
The code’s result is following.
<?xml version="1.0"?>
<toplevel>
    <CompleteSuggestion>
        <suggestion data="haskell cabal" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal sandbox" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal install" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal ã¢ãããã¼ã" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal windows" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal ä¾å­é¢ä¿" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal ä½¿ã                                                                                                                                                                                          æ¹" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal ã¢ã³ã¤ã³ã¹ãã¼ã«" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal proxy" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal github" />
    </CompleteSuggestion>
</toplevel>
Well, it’s garbled…
Following way to fix it is simple.
Just using Codec.Binary.UTF8.String.
import Control.Monad.Trans (liftIO)
import Network.HTTP
import Codec.Binary.UTF8.String as S

main = do
  resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
  putStrLn $ S.decodeString resultxml
“decodeString” function’s type is following.
Prelude> import Codec.Binary.UTF8.String as S
Prelude S> :t S.decodeString
S.decodeString :: String -> String
The result is following.
<?xml version="1.0"?>
<toplevel>
    <CompleteSuggestion>
        <suggestion data="haskell cabal" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal sandbox" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal install" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal アップデート" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal windows" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal 依存関係" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal 使い方" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal アンインストール" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal proxy" />
    </CompleteSuggestion>
    <CompleteSuggestion>
        <suggestion data="haskell cabal github" />
    </CompleteSuggestion>
</toplevel>
HTTP GET part was done.

Parse XML

For simplicity, I’ll think following simple XML doc.
<?xml version="1.0"?>
<toplevel>
    <CompleteSuggestion>
        <suggestion data="haskell cabal" />
    </CompleteSuggestion>
</toplevel>
In this part, I’ll use Text.XML.Light module.
It provides simple methods. To begin with, I’ll pick up “CompleteSuggestion” element with following code.
import Text.XML.Light

main = do
  case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
    Nothing   -> putStrLn "Error"
    Just root -> putStrLn $ show $ findChildren (unqual "CompleteSuggestion") root
This codes using following methods.
Prelude> import Text.XML.Light
Prelude Text.XML.Light> :t parseXMLDoc
parseXMLDoc
  :: Text.XML.Light.Lexer.XmlSource s => s -> Maybe Element
Prelude Text.XML.Light> parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>"

Just (
  Element {
    elName = QName {
      qName = "toplevel",
      qURI = Nothing,
      qPrefix = Nothing
    },
    elAttribs = [],
    elContent = [
      Elem (
        Element {
          elName = QName {
            qName = "CompleteSuggestion",
            qURI = Nothing,
            qPrefix = Nothing
          },
          elAttribs = [],
          elContent = [
            Elem (
              Element {
                elName = QName {
                  qName = "suggestion", qURI = Nothing, qPrefix = Nothing
                },
                elAttribs = [
                  Attr {
                    attrKey = QName {
                      qName = "data",
                      qURI = Nothing,
                      qPrefix = Nothing
                    },
                    attrVal = "haskell cabal"
                  }
                ],
                elContent = [],
                elLine = Just 1
              }
            )
          ],
          elLine = Just 1
        }
      )
    ],
    elLine = Just 1
  }
)

Prelude Text.XML.Light> :t unqual
unqual :: String -> QName
Prelude Text.XML.Light> unqual "CompleteSuggestion"

QName {qName = "CompleteSuggestion", qURI = Nothing, qPrefix = Nothing}

Prelude Text.XML.Light> :t findChildren
findChildren :: QName -> Element -> [Element]
The code’s result is following.
[
  Element {
    elName = QName {
      qName = "CompleteSuggestion",
      qURI = Nothing,
      qPrefix = Nothing
    },
    elAttribs = [],
    elContent = [
      Elem (
        Element {
          elName = QName {
            qName = "suggestion",
            qURI = Nothing, qPrefix = Nothing
          },
          elAttribs = [
            Attr {
              attrKey = QName {
                qName = "data",
                qURI = Nothing,
                qPrefix = Nothing
              },
              attrVal = "haskell cabal"
            }
          ],
          elContent = [],
          elLine = Just 1
        }
      )
    ],
    elLine = Just 1
  }
]
Looks success to pick up “CompleteSuggestion” element. Next, picking up “suggestion” element with following code.
import Text.XML.Light

main = do
  case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
    Nothing   -> putStrLn "Error"
    Just root -> putStrLn $ show $ findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0)
“findChild” function’s type is following.
Prelude Text.XML.Light> :t findChild
findChild :: QName -> Element -> Maybe Element
The code’s result is following.
Just (
  Element {
    elName = QName {
      qName = "suggestion",
      qURI = Nothing,
      qPrefix = Nothing
    },
    elAttribs = [
      Attr {
        attrKey = QName {
          qName = "data",
          qURI = Nothing,
          qPrefix = Nothing
        },
        attrVal = "haskell cabal"
      }
    ],
    elContent = [],
    elLine = Just 1
  }
)

Looks success to pick up “suggestion” element. And then, I’ll get attributes of “suggestion” element.
import Text.XML.Light

main = do
  case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
    Nothing   -> putStrLn "Error"
    Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of
      Nothing   -> []
      Just elem -> elAttribs elem
New function “elAttribs” type is following.
Prelude Text.XML.Light> :t elAttribs
elAttribs :: Element -> [Attr]
Result is following.
[
  Attr {
    attrKey = QName {
      qName = "data",
      qURI = Nothing,
      qPrefix = Nothing
    },
    attrVal = "haskell cabal"
  }
]

OK. Looks good. Last step of parsing investigation, Getting “attrVal”.
main = do
  case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of
    Nothing   -> putStrLn "Error"
    Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of
      Nothing   -> ""
      Just elem ->
        let
          [Attr key val] = elAttribs elem
        in
          val
The result is following.
"haskell cabal"

Good!

Complete Code

Actually, “CompleteSuggestion” element is not one.
So map function is needed like following.
import Control.Monad.Trans (liftIO)
import Network.HTTP
import Codec.Binary.UTF8.String as S
import Data.List
import Text.XML.Light

getText :: Element -> String -> String
getText parent childName =
  case findChild (unqual childName) parent of
    Nothing -> ""
    Just child ->
      let
        [Attr key val] = elAttribs child
      in
        val

parseCompleteSuggestionTag :: Element -> String
parseCompleteSuggestionTag element = getText element "suggestion"

main = do
  resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody)
  case parseXMLDoc resultxml of
    Nothing -> putStrLn "parse error"
    Just root ->
      putStrLn ("[\"" ++ (Data.List.intercalate "\",\"" $ map S.decodeString $ map (\a -> parseCompleteSuggestionTag a) (findChildren (unqual "CompleteSuggestion") root)) ++ "\"]")
The result is following.
[
 "haskell cabal",
 "haskell cabal sandbox",
 "haskell cabal install",
 "haskell cabal アップデート",
 "haskell cabal windows",
 "haskell cabal 依存関係",
 "haskell cabal 使い方",
 "haskell cabal アンインストール",
 "haskell cabal proxy",
 "haskell cabal github"
]

未分類 recent post

  1. Install sbt 1.0.0 and run sample template

  2. Visualization of Neural Network and its Train…

  3. [Machine Learning]Created docker image includ…

  4. [Node.js]How to write batch script with Node.…

  5. [Hive]Get top n items for each category

関連記事

Comment

  1. No comments yet.

  1. No trackbacks yet.

PAGE TOP