Target
Try to parse result XML of Google Complete Suggestion API with Haskell.Environment
- OS
- Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
- GHC
- 7.4.1
Investigate way to achieve this
Access to the API
We can access the API like following URL.Just HTTP GET is okay.
Network.HTTP is useful to do HTTP GET with Haskell.http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=[search words]
This code use following methods.import Control.Monad.Trans (liftIO) import Network.HTTP main = do resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody) putStrLn resultxml
The code’s result is following.Prelude> import Network.HTTP Prelude Network.HTTP> :t Network.HTTP.simpleHTTP Network.HTTP.simpleHTTP :: HStream ty => Request ty -> IO (Network.Stream.Result (Response ty)) Prelude Network.HTTP> :t Network.HTTP.getRequest Network.HTTP.getRequest :: String -> Request_String Prelude Network.HTTP> :t getResponseBody getResponseBody :: Network.Stream.Result (Response ty) -> IO ty Prelude Network.HTTP> :t urlEncode urlEncode :: String -> String
Well, it’s garbled…<?xml version="1.0"?> <toplevel> <CompleteSuggestion> <suggestion data="haskell cabal" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal sandbox" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal install" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal ã¢ãããã¼ã" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal windows" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal ä¾åé¢ä¿" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal 使ã æ¹" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal ã¢ã³ã¤ã³ã¹ãã¼ã«" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal proxy" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal github" /> </CompleteSuggestion> </toplevel>
Following way to fix it is simple.
Just using Codec.Binary.UTF8.String.
“decodeString” function’s type is following.import Control.Monad.Trans (liftIO) import Network.HTTP import Codec.Binary.UTF8.String as S main = do resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody) putStrLn $ S.decodeString resultxml
The result is following.Prelude> import Codec.Binary.UTF8.String as S Prelude S> :t S.decodeString S.decodeString :: String -> String
HTTP GET part was done.<?xml version="1.0"?> <toplevel> <CompleteSuggestion> <suggestion data="haskell cabal" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal sandbox" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal install" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal アップデート" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal windows" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal 依存関係" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal 使い方" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal アンインストール" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal proxy" /> </CompleteSuggestion> <CompleteSuggestion> <suggestion data="haskell cabal github" /> </CompleteSuggestion> </toplevel>
Parse XML
For simplicity, I’ll think following simple XML doc.In this part, I’ll use Text.XML.Light module.<?xml version="1.0"?> <toplevel> <CompleteSuggestion> <suggestion data="haskell cabal" /> </CompleteSuggestion> </toplevel>
It provides simple methods. To begin with, I’ll pick up “CompleteSuggestion” element with following code.
This codes using following methods.import Text.XML.Light main = do case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of Nothing -> putStrLn "Error" Just root -> putStrLn $ show $ findChildren (unqual "CompleteSuggestion") root
The code’s result is following.Prelude> import Text.XML.Light Prelude Text.XML.Light> :t parseXMLDoc parseXMLDoc :: Text.XML.Light.Lexer.XmlSource s => s -> Maybe Element Prelude Text.XML.Light> parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" Just ( Element { elName = QName { qName = "toplevel", qURI = Nothing, qPrefix = Nothing }, elAttribs = [], elContent = [ Elem ( Element { elName = QName { qName = "CompleteSuggestion", qURI = Nothing, qPrefix = Nothing }, elAttribs = [], elContent = [ Elem ( Element { elName = QName { qName = "suggestion", qURI = Nothing, qPrefix = Nothing }, elAttribs = [ Attr { attrKey = QName { qName = "data", qURI = Nothing, qPrefix = Nothing }, attrVal = "haskell cabal" } ], elContent = [], elLine = Just 1 } ) ], elLine = Just 1 } ) ], elLine = Just 1 } ) Prelude Text.XML.Light> :t unqual unqual :: String -> QName Prelude Text.XML.Light> unqual "CompleteSuggestion" QName {qName = "CompleteSuggestion", qURI = Nothing, qPrefix = Nothing} Prelude Text.XML.Light> :t findChildren findChildren :: QName -> Element -> [Element]
Looks success to pick up “CompleteSuggestion” element. Next, picking up “suggestion” element with following code.[ Element { elName = QName { qName = "CompleteSuggestion", qURI = Nothing, qPrefix = Nothing }, elAttribs = [], elContent = [ Elem ( Element { elName = QName { qName = "suggestion", qURI = Nothing, qPrefix = Nothing }, elAttribs = [ Attr { attrKey = QName { qName = "data", qURI = Nothing, qPrefix = Nothing }, attrVal = "haskell cabal" } ], elContent = [], elLine = Just 1 } ) ], elLine = Just 1 } ]
“findChild” function’s type is following.import Text.XML.Light main = do case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of Nothing -> putStrLn "Error" Just root -> putStrLn $ show $ findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0)
The code’s result is following.Prelude Text.XML.Light> :t findChild findChild :: QName -> Element -> Maybe Element
Just ( Element { elName = QName { qName = "suggestion", qURI = Nothing, qPrefix = Nothing }, elAttribs = [ Attr { attrKey = QName { qName = "data", qURI = Nothing, qPrefix = Nothing }, attrVal = "haskell cabal" } ], elContent = [], elLine = Just 1 } )
Looks success to pick up “suggestion” element. And then, I’ll get attributes of “suggestion” element.
New function “elAttribs” type is following.import Text.XML.Light main = do case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of Nothing -> putStrLn "Error" Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of Nothing -> [] Just elem -> elAttribs elem
Result is following.Prelude Text.XML.Light> :t elAttribs elAttribs :: Element -> [Attr]
[ Attr { attrKey = QName { qName = "data", qURI = Nothing, qPrefix = Nothing }, attrVal = "haskell cabal" } ]
OK. Looks good. Last step of parsing investigation, Getting “attrVal”.
The result is following.main = do case parseXMLDoc "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion data=\"haskell cabal\" /></CompleteSuggestion></toplevel>" of Nothing -> putStrLn "Error" Just root -> putStrLn $ show $ case findChild (unqual "suggestion") (findChildren (unqual "CompleteSuggestion") root !! 0) of Nothing -> "" Just elem -> let [Attr key val] = elAttribs elem in val
"haskell cabal"
Good!
Complete Code
Actually, “CompleteSuggestion” element is not one.So map function is needed like following.
The result is following.import Control.Monad.Trans (liftIO) import Network.HTTP import Codec.Binary.UTF8.String as S import Data.List import Text.XML.Light getText :: Element -> String -> String getText parent childName = case findChild (unqual childName) parent of Nothing -> "" Just child -> let [Attr key val] = elAttribs child in val parseCompleteSuggestionTag :: Element -> String parseCompleteSuggestionTag element = getText element "suggestion" main = do resultxml <- liftIO $ (Network.HTTP.simpleHTTP (Network.HTTP.getRequest ("http://suggestqueries.google.com/complete/search?hl=ja&ie=utf_8&oe=utf_8&client=toolbar&q=" ++ urlEncode("haskell cabal"))) >>= getResponseBody) case parseXMLDoc resultxml of Nothing -> putStrLn "parse error" Just root -> putStrLn ("[\"" ++ (Data.List.intercalate "\",\"" $ map S.decodeString $ map (\a -> parseCompleteSuggestionTag a) (findChildren (unqual "CompleteSuggestion") root)) ++ "\"]")
[ "haskell cabal", "haskell cabal sandbox", "haskell cabal install", "haskell cabal アップデート", "haskell cabal windows", "haskell cabal 依存関係", "haskell cabal 使い方", "haskell cabal アンインストール", "haskell cabal proxy", "haskell cabal github" ]
No comments yet.